The Yahoo pipelines are translated into pipelines of Python generators which should give a close match to the original data flow. Each call to the final generator will ripple through the pipeline issuing .next() calls until the source is exhausted.
The modules are topologically sorted to give their creation order. The main output and inputs are connected via the yielded values and the first parameter. Other inputs are passed as named parameters referencing the input module.
The JSON representation of the configuration parameters maps closely onto Python dictionaries and so is left as-is and passed and parsed as-and-when needed.
Each Yahoo module is coded as a separate Python module. This might help in future if the generators are made to run on separate processors/machines and we could use queues to plumb them together.
If using a Python version before 2.6 then simplejson is needed:
Put the source code in a directory, say, pipeline/pipe2py.
Make the package available to Python, e.g.
export PYTHONPATH=pipeline
Run in the test directory:
python testbasics.py
In test-mode, modules needing user input use the default values rather than prompting the user. This is done by setting context.test==True.
There are two ways to translate a Yahoo pipe into Python. One outputs a Python script which wraps the pipeline in a function which can then be imported and run from another Python program (i.e. compiled). The other interprets the pipeline on-the-fly and executes it within the current process (i.e. interpreted).
Both of the following will create a python file named after the input argument with a .py extension (using the compile.parse_and_write_pipe function). This file can then be run directly or imported into other pipelines.
The first pulls the pipeline definition directly from Yahoo. The second loads the pipeline definition from a file:
- python compile.py -p pipelineid
or
- python compile.py pipelinefile
Subpiplines are expected to be contained in python files named pipe_PIPEID.py, where PIPEID is the Yahoo ID for the pipeline, e.g.
pipe_2de0e4517ed76082dcddf66f7b218057.py
So if you do use the second option you should store your pipeline definitions in files named the same way, e.g.
pipe_2de0e4517ed76082dcddf66f7b218057.json
then compile.py will output files with the expected naming convention.
Example:
from pipe2py.compile import parse_and_build_pipe from pipe2py import Context pipe_def = """json representation of the pipe""" p = parse_and_build_pipe(Context(), pipe_def) for i in p: print i
Some pipelines need to prompt the user for input values. When running a compiled pipe, it defaults to prompting the user via the console, but in other situations this may not be appropriate, e.g. when integrating with a website. In such cases, the input values can instead be read from the pipe's context (a set of values passed into every pipe). The context.inputs dictionary can be pre-populated with user input before the pipe is executed.
To determine which prompts are needed, the pipeline can be called initially with context.describe_input==True, and this will return a list of tuples defining the inputs needed (it will not execute the pipe), e.g.:
context = Context(describe_input=True) p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(context, None) need_inputs = p print need_inputs >>> [(u'0', u'username', u'Twitter username', u'text', u''), ... (u'1', u'statustitle', ... u'Status title [string] or [logo] means twitter icon', u'text', ... u'logo')]
Each tuple is of the form:
(position, name, prompt, type, default)
The list of tuples is sorted by position, i.e. the order in which they should be presented to the user. The name should be used as a key in the context.inputs dictionary. The prompt is the prompt for the user. Type is the data type, e.g. text, number. And default is the default value (used if no value is given), e.g. to run the above pipe with pre-defined inputs, and no console prompting:
context = Context(inputs={'username':'greg', 'statustitle':'logo'}, console=False) p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(context, None) for i in p: print i