Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data source that uses a local hdf5 file #110

Closed
ehebert opened this issue Mar 15, 2013 · 10 comments
Closed

Data source that uses a local hdf5 file #110

ehebert opened this issue Mar 15, 2013 · 10 comments

Comments

@ehebert
Copy link
Contributor

ehebert commented Mar 15, 2013

Analogous to #108, provide a wrapper for using a local hdf5 file(s) as a data source.

@benmccann
Copy link
Contributor

+1

i would like to be able to cache quotes retrieved from yahoo (or other sources) instead of fetching with every run of the algorithm. along with hdf5 support I would also like to see load_from_yahoo take an existing DataFrame and be able to add stocks to it. also, i'm not really familiar with DataFrame yet, but I wonder if storing every stock in the same DataFrame or h5f5 file is the best way to go. if the data sets grow very large (e.g. a thousand stocks with per-second quotes) will this become difficult to deal with? does a DataFrame need to fit into memory?

@ehebert
Copy link
Contributor Author

ehebert commented Apr 2, 2013

@benmccann, I agree, caching the Yahoo data would be a great improvement.
(In regards to this ticket, I think hdf5 would be a good format for that cache.)

Would the existing DataFrame you would like to pass to load_from_yahoo contain OHLCV data, or other types of data?

A DataFrame does need to fit into memory.
There is a thread here, https://groups.google.com/forum/?fromgroups=&hl=en#!topic/zipline/fLojh3EfJp0, about using PyTables directly (which provides a generator from a hdf5 source).

@benmccann
Copy link
Contributor

my thought with passing a DataFrame to load_from_yahoo was that if i loaded 10 securities into a DataFrame by calling load_from_yahoo and then wanted to add another 10 to my dataset there's no real way to do that right now

@MichaelWS
Copy link
Contributor

something like this works with each node being a date in iso format

https://gist.github.com/MichaelWS/e5eb873e32b089a4487e

@ehebert
Copy link
Contributor Author

ehebert commented Jul 8, 2013

Michael:

Apologies for the delay in follow up.
The gist link appears to be dead.

  • Eddie

MichaelRB notifications@github.com writes:

something like this works with each node being a date in iso format

https://gist.github.com/MichaelRB/e5eb873e32b089a4487e


Reply to this email directly or view it on GitHub.*

@MichaelWS
Copy link
Contributor

Sorry about that. This should work

https://gist.github.com/MichaelWS/e5eb873e32b089a4487e

@MichaelWS
Copy link
Contributor

Here's a pull request to fix this. I had a to throw something together for a friend so I figured I would contribute this back.
#244

@llllllllll
Copy link
Contributor

@ehebert has this been addressed?

@ehebert
Copy link
Contributor Author

ehebert commented Oct 30, 2015

It has not, but we could make a wrapper for BcolzDailyBarWriter which reformats an hdf5 file to bcolz. (which is a quick ctable.fromhdf5(table_path).copy(rootdir=output_path).

With incoming changes on lazy-mainline branch the backtest data (not just pipeline) will be sourced from files created by BcolzDailyBarWriter and BcolzMinuteBarWriter classes.

@llllllllll
Copy link
Contributor

I think we have settled on bcolz as the internal format for zipline. With the new data bundle changes we also support the case of caching yahoo data instead of downloading on each run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants