Sources

mix

This is a helper to mix data objects from two or more sources into one stream. When mixed, dataobjects are interleaved. For example:

>>> from processor import sources
>>> source1 = [1,2,3]
>>> source2 = [5,6,7,8]
>>> print(list(sources.mix(source1, source2)))

[1, 5, 2, 6, 3, 7, 8]

Mix source iterates through each given source until it raises StopIteration. That means, if you'll give it an infinite sources like a web.hook, then resulting source also will be infinite.

imap

Imap source is able to read new emails from specified folder on IMAP server. All you need is to specify server's address, optional port and user credentials:

Example:

from processor import run_pipeline, source, outputs
run_pipeline(
    sources.imap("imap.gmail.com",
                          "username",
                          "****word",
                          "Inbox"),
    outputs.debug())

This script will read Inbox folder at server imap.gmail.com and print resulting dicts to the terminal's screen.

github

Access to private repositories

To have access to private repositories, you need to generate a "personal access token" at the GitHub.

All you need to do this, is to click on the image below and it will open a page with only scopes needed for the Processor:

Then copy this token into the clipboard and pass it as a access_token parameter to each github.**** source.

Note

Access token not only let the processor read from private repositories, but also makes rate limits higher, so you could poll GitHub's API more frequently.

Without token you can make only 60 request per hour, but with token – 5000 requests per hour.

github.releases

Outputs new releases of the given repository. On first call, it will output all the most recent releases, then remeber position on next calls will return only new releases if any were found.

Example:

from processor import run_pipeline, source, outputs

github_creds = dict(access_token='keep-it-in-secret')
run_pipeline(
    sources.github.releases('https://github.com/mozilla/metrics-graphics', **github_creds),
    outputs.debug())

This source returns following fields:

source: github.releases
type: github.release
payload: The object returned by GitHub's API. See section "Response" at GitHub's docs on repos/releases.

twitter

Note

To use this source, you need to obtain an access token from twitter. There is a detailed instruction how to do this Twitter's documentation. You could encapsulate twitter credentials into a dict:

twitter_creds = dict(consumer_key='*', consumer_secret='', access_token='', access_secret='') sources.twitter.search('Some query',twitter_creds) sources.twitter.followers(*twitter_creds)

twitter.search

This source runs search by given query in Twitter and returns fresh results:

from processor import run_pipeline, source, outputs
run_pipeline(
    sources.twitter.search('iOS release notes', **twitter_creds),
    outputs.debug())

It returns following fields:

source: twitter.search
type: twitter.tweet
other: Other fields are same as them returns Twitter API. See section "Example Result" at twitter's docs on search/tweets.

twitter.followers

First invocation returns all who you follows, each next -- only new followers:

from processor import run_pipeline, source, outputs
run_pipeline(
    sources.twitter.followers(**twitter_creds),
    outputs.debug())

It returns following fields:

source: twitter.followers
type: twitter.user
other: Other fields are same as them returns Twitter API. See section "Example Result" at twitter's docs on followers/list.

web.hook

This source starts a webserver which listens on a given interface and port. All GET and POST requests are transformed into the data objects.

Configuration example:

run_pipeline(sources.web.hook(host='0.0.0.0', port=1999),
             outputs.debug())

By default, it starts on localhost:8000, but in this case on 0.0.0.0:1999.

Here is example of data objects, produced by this source when somebody posts JSON:

{'data': {'some-value': 0},
 'headers': {'Accept': 'application/json',
   'Accept-Encoding': 'gzip, deflate',
   'Connection': 'keep-alive',
   'Content-Length': '17',
   'Content-Type': 'application/json; charset=utf-8',
   'Host': '127.0.0.1:1999',
   'User-Agent': 'HTTPie/0.8.0'},
 'method': 'POST',
 'path': '/the-hook',
 'query': {'query': ['var']},
 'source': 'web.hook',
 'type': 'http-request'}

This source returns data objects with following fields:

source: web.hook
type: http-request
method: GET or POST
path: Resource path without query arguments
query: Query arguments
headers: A headers dictionary. Please, note, this is usual dictionary with case sensitive keys.
data: Request data, if this was a POST, None for GET. If requests has application/json content type, then data decoded automatically into the python representation. For other content types, if there is charset part, then data is decoded from bytes into a string, otherwise, it remains as bytes.

Note

This source runs in blocking mode. This means it blocks run_pipeline execution until somebody interupt it.

No other sources could be processed together with web.hook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sources.rst

sources.rst

Sources

mix

imap

github

Access to private repositories

github.releases

twitter

twitter.search

twitter.followers

web.hook

Files

sources.rst

Latest commit

History

sources.rst

File metadata and controls

Sources

mix

imap

github

Access to private repositories

github.releases

twitter

twitter.search

twitter.followers

web.hook