Skip to content

Latest commit

 

History

History
213 lines (151 loc) · 5.99 KB

sources.rst

File metadata and controls

213 lines (151 loc) · 5.99 KB

Sources

mix

This is a helper to mix data objects from two or more sources into one stream. When mixed, dataobjects are interleaved. For example:

>>> from processor import sources
>>> source1 = [1,2,3]
>>> source2 = [5,6,7,8]
>>> print(list(sources.mix(source1, source2)))

[1, 5, 2, 6, 3, 7, 8]

Mix source iterates through each given source until it raises StopIteration. That means, if you'll give it an infinite sources like a web.hook, then resulting source also will be infinite.

imap

Imap source is able to read new emails from specified folder on IMAP server. All you need is to specify server's address, optional port and user credentials:

Example:

from processor import run_pipeline, source, outputs
run_pipeline(
    sources.imap("imap.gmail.com",
                          "username",
                          "****word",
                          "Inbox"),
    outputs.debug())

This script will read Inbox folder at server imap.gmail.com and print resulting dicts to the terminal's screen.

github

Access to private repositories

To have access to private repositories, you need to generate a "personal access token" at the GitHub.

All you need to do this, is to click on the image below and it will open a page with only scopes needed for the Processor:

image

Then copy this token into the clipboard and pass it as a access_token parameter to each github.**** source.

Note

Access token not only let the processor read from private repositories, but also makes rate limits higher, so you could poll GitHub's API more frequently.

Without token you can make only 60 request per hour, but with token – 5000 requests per hour.

github.releases

Outputs new releases of the given repository. On first call, it will output all the most recent releases, then remeber position on next calls will return only new releases if any were found.

Example:

from processor import run_pipeline, source, outputs

github_creds = dict(access_token='keep-it-in-secret')
run_pipeline(
    sources.github.releases('https://github.com/mozilla/metrics-graphics', **github_creds),
    outputs.debug())

This source returns following fields:

source

github.releases

type

github.release

payload

The object returned by GitHub's API. See section "Response" at GitHub's docs on repos/releases.

twitter

Note

To use this source, you need to obtain an access token from twitter. There is a detailed instruction how to do this Twitter's documentation. You could encapsulate twitter credentials into a dict:

twitter_creds = dict(consumer_key='*', consumer_secret='', access_token='', access_secret='') sources.twitter.search('Some query',twitter_creds) sources.twitter.followers(*twitter_creds)

twitter.search

This source runs search by given query in Twitter and returns fresh results:

from processor import run_pipeline, source, outputs
run_pipeline(
    sources.twitter.search('iOS release notes', **twitter_creds),
    outputs.debug())

It returns following fields:

source

twitter.search

type

twitter.tweet

other

Other fields are same as them returns Twitter API. See section "Example Result" at twitter's docs on search/tweets.

twitter.followers

First invocation returns all who you follows, each next -- only new followers:

from processor import run_pipeline, source, outputs
run_pipeline(
    sources.twitter.followers(**twitter_creds),
    outputs.debug())

It returns following fields:

source

twitter.followers

type

twitter.user

other

Other fields are same as them returns Twitter API. See section "Example Result" at twitter's docs on followers/list.

web.hook

This source starts a webserver which listens on a given interface and port. All GET and POST requests are transformed into the data objects.

Configuration example:

run_pipeline(sources.web.hook(host='0.0.0.0', port=1999),
             outputs.debug())

By default, it starts on localhost:8000, but in this case on 0.0.0.0:1999.

Here is example of data objects, produced by this source when somebody posts JSON:

{'data': {'some-value': 0},
 'headers': {'Accept': 'application/json',
   'Accept-Encoding': 'gzip, deflate',
   'Connection': 'keep-alive',
   'Content-Length': '17',
   'Content-Type': 'application/json; charset=utf-8',
   'Host': '127.0.0.1:1999',
   'User-Agent': 'HTTPie/0.8.0'},
 'method': 'POST',
 'path': '/the-hook',
 'query': {'query': ['var']},
 'source': 'web.hook',
 'type': 'http-request'}

This source returns data objects with following fields:

source

web.hook

type

http-request

method

GET or POST

path

Resource path without query arguments

query

Query arguments

headers

A headers dictionary. Please, note, this is usual dictionary with case sensitive keys.

data

Request data, if this was a POST, None for GET. If requests has application/json content type, then data decoded automatically into the python representation. For other content types, if there is charset part, then data is decoded from bytes into a string, otherwise, it remains as bytes.

Note

This source runs in blocking mode. This means it blocks run_pipeline execution until somebody interupt it.

No other sources could be processed together with web.hook.