-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any example to use pipeline offline? #911
Comments
@mickeydonkey the best place to look for examples on how the pipeline machinery works is the pipeline test suite, which lives in There isn't a great short answer to the question of "how do I use Pipeline with real data", because the Pipeline API exists primarily for simplifying computations on large point-in-time datasets, and there aren't many such datasets freely available for public use. The most promising one that I'm aware of is Quandl's WIKI dataset, which contains a couple thousand assets and includes dividends and splits. I have a branch lying around somewhere that started building machinery for creating a zipline-compatible asset database from there. The long answer to your question is that, to run an algorithm using the Pipeline machinery, you need need to write a function,
If the dataset you want to use is small enough to hold in memory all at once, then you can use the built-in
The |
@ssanderson Thanks for the pointer, I am on my way of inspecting and running tests. I think it will take me some time to fully understand your long answer. |
Given this recommendation: "The most promising one that I'm aware of is Quandl's WIKI dataset, which contains a couple thousand assets and includes dividends and splits." in relation to its implementation as a data bundle in 1.0, was the intent to enable Pipeline computations in zipline? If so, my naive attempt at running a Quantopian-tested algorithm did not seem to initialize the Pipeline. |
I got a bit further. Guided by the zipline source code, I instantiated TradingAlgorithm() instead of calling run_algorithm(). I gave a get_pipeline_loader parameter to its constructor => get_pipeline_loader=lambda column: pipeline_loader, where pipeline_loader comes from USEquityPricingLoader.from_files(path, path). However, this seems to only factor in securities which I explicitly referenced using symbols(), not the broader universe. It would be great if I could download complete data bundles by sector and use that in the pipeline. |
|
Great! I hesitate to add much more info to this issue as I might have loaded the data incorrectly. I do not want to add noise to your issues. As soon as 1.0.1 is out, or this particular fix is committed to the master, I will try again using the natural method and give a more detailed account should I continue to experience the behavior. |
Is it possible to extend See: |
what's the relationship between data bundle and pipeline.data ? |
I've cobbled together a minimal example of running a pipeline combining pricing and a custom data source as:
The pipeline recognizes
I can see that the
which returns
Can you give some guidance of where I am going wrong pointing the |
hey @marketneutral! I think the issue you're running into here is that In general, zipline represents dates as midnight of the date in question, localized to UTC. If I could wave a magic wand, I would remove the timezone localization (or, even better, use a pandas Period instead of a timestamp), but doing so would be a major backwards compatibility headache. Given the current state of things, the right fix for this is to change the construction of
which appears to print the expected result:
|
Hi @marketneutral and @ssanderson, I have been using the code in this thread as a starting point to include fundamental data in a backtest algorithm. Hopefully you can help me understand the errors coming from
However, the following code (in an iPython cell below the previous code), produces the following errors:
Error 1
Error 2
It seems that when you use the Can you please help me understand how to overcome these errors? |
@calmitchell617 the error you're getting there is happening because your algorithm doesn't know anything about your Internally, pipeline_loader = USEquityPricingLoader(
bundle_data.equity_daily_bar_reader,
bundle_data.adjustment_reader,
)
def choose_loader(column):
if column in USEquityPricing.columns:
return pipeline_loader
raise ValueError(
"No PipelineLoader registered for column %s." % column
) (source) The easiest short-term fix for this is probably to add a new optional In the medium term, it'd be nice to allow people to register their own pipeline dispatch functions as part of the Zipline extension machinery. @llllllllll might have thoughts about what an API for that would look like. |
Thank you Scott, that worked. For anyone wondering what I did, here is a brief summary of the changes I made to load external data, using the In zipline/utils/run_algo.py, I changed
I also changed the
I then changed the
Lastly, I changed the nested
As @ssanderson said, this is a short term solution, and only allows you to load one column of external data, but it could definitely be expanded upon. Here is the part of the iPython notebook where I call
|
@calmitchell617 glad that worked for you. Would you be interested in putting together a PR to update |
I would be happy to give it a shot, will follow up early next week. |
would be great, Cal. Your code helped me as well. If you need to generate earnings data, I made a script that does that and creates files for every bundle you registered. Not production ready, but useful if you want to test with dates. https://github.com/peterfabakker/zipline-utils/blob/master/getearnings.py |
Hey @ssanderson, I've been trying to create a minimal pipeline example to get data via the blaze loader, similar to the minimal
The test I can see from your tests is
And now on to get the engine going
and on to running the pipline
This returns a
If you've gotten this far, thank you 😃 ... I feel like I am close here! Thanks in advance for any pointers. |
hey @marketneutral. I won't have time to write up a full reply today, but take a look at the module docs for |
@marketneutral are you sure those are running on the same database? If I run your script and look at the schema on the sqlite CLI, I see:
which is what i would expect given that you're calling |
Yes, thank you. That was genesis of the error. Now I |
Hey @ssanderson This is my complete working minimal example: https://github.com/marketneutral/research-tools/blob/master/pipeline-blaze-minimal.ipynb |
Awesome! One thing to be mindful of if you're using the blaze loader is that the expected semantics for In our experience, these semantics are generally what you want for things like pricing and fundamental data, but they can be "off by one" from what you might expect for other datasets, so it's important to be mindful of date-labelling conventions when you're setting up your data. |
Hi @marketneutral thank you for putting together this minimal example, it is very helpful. I'm having some trouble running it though. Do you have a dependency list that you know this works with? |
@RaymondMcT I've spent a little (very little) time refactoring this into a Python package with a proper |
I hear ya, If I get it going I'll report back with a list of working dependencies. |
hey @RaymondMcT, try this https://github.com/marketneutral/alphatools and please lmk. |
@RaymondMcT can you file an issue here so we don't pollute this thread with non-valued added things for the fine Quantopian folks? |
@marketneutral Hi, I have run your code, it works. But when I change the dates, I get the following error: Traceback (most recent call last): |
This means your date does not exist in the history. |
this question was originally posted on zipline's google group:
There are some posts on quantopian.com introducing pipeline under the online Algorithm environment, while I am wondering how to modify the algorithm to run pipeline offline.
can anyone provide an short example ? I really appreciate any help you could provide.
The text was updated successfully, but these errors were encountered: