-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement csvdir bundle #1860
Conversation
71a93a9
to
2cbfa9b
Compare
da7c3cb
to
2340d1a
Compare
83cbf8c
to
5f7b851
Compare
@freddiev4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @bartosh! I have some feedback for you below 🙂
zipline/data/bundles/csvdir.py
Outdated
|
||
Parameters | ||
---------- | ||
tframe: tuple, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo here. Looks like it should be tframes
@@ -1,5 +1,7 @@ | |||
# These imports are necessary to force module-scope register calls to happen. | |||
from . import quandl # noqa | |||
from . import csvdir # noqa | |||
|
|||
from .core import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add csvdir
to __all__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should as quandl is not there.
zipline/data/bundles/csvdir.py
Outdated
|
||
from . import core as bundles | ||
|
||
logger = logbook.Logger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make the output from the logger a little cleaner by removing the timestamp and log level. We can do this with a CustomHandler
handler = StreamHandler(sys.stdout, format_string=" | {record.message}")
logger = Logger(__name__)
logger.handlers.append(handler)
Output before hander:
Loading custom pricing data: [#########---------------------------] 25%[2017-10-19 18:00:46.369496] INFO: zipline.data.bundles.csvdir: AAPL: sid 0
Loading custom pricing data: [##################------------------] 50%[2017-10-19 18:00:46.405959] INFO: zipline.data.bundles.csvdir: IBM: sid 1
Loading custom pricing data: [###########################---------] 75%[2017-10-19 18:00:46.447738] INFO: zipline.data.bundles.csvdir: KO: sid 2
Loading custom pricing data: [####################################] 100%[2017-10-19 18:00:46.481477] INFO: zipline.data.bundles.csvdir: MSFT: sid 3
Loading custom pricing data: [####################################] 100%
Merging daily equity files: [####################################]
Output after handler
Loading custom pricing data: [#########---------------------------] 25% | AAPL: sid 0
Loading custom pricing data: [##################------------------] 50% | IBM: sid 1
Loading custom pricing data: [###########################---------] 75% | KO: sid 2
Loading custom pricing data: [####################################] 100% | MSFT: sid 3
Loading custom pricing data: [####################################] 100%
Merging daily equity files: [####################################]
If we want to actually log this data that might require some other Handler and a default directory in which to put those logs (like ~/.zipline/logs/
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Thank you for the suggestion!
zipline/data/bundles/csvdir.py
Outdated
------- | ||
ingest : callable | ||
The bundle ingest function | ||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline above Examples
# Hardcode the exchange to "CSVDIR" for all assets and (elsewhere) | ||
# register "CSVDIR" to resolve to the NYSE calendar, because these | ||
# are all equities and thus can use the NYSE calendar. | ||
metadata['exchange'] = "CSVDIR" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave this for another PR, as long as we document this somewhere in the code with a TODO
Users can pass in their own trading calendar using the --trading-calendar
flag; if they want to make a bundle with their own csv data, the dates in that bundle might not map correctly to the dates in the NYSE
calendar.
I think the solution to this is to change the calendar in zipline.data.bundles.core
in our register()
function
@curry
def register(name,
f,
calendar_name='NYSE',
start_session=None,
end_session=None,
minutes_per_day=390,
create_writers=True):
And add an option to zipline ingest
that works in the same way that we pass --trading-calendar
to zipline run
. #1800 shows how we do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented csvdir_equities function especially for this purpose. It can be used in extension.py to use another calendar or other custom parameters, e.g.
register('csvdir-custom', csvdir_equities(['daily', 'minute'],
"/Users/test/csvdir"),
calendar_name='CUSTOM', minutes_per_day=24*60)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good call. That slipped past me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code you share the code of your 'CUSTOM' calendar? :)
register('custom-csvdir-bundle', | ||
csvdir_equities(["daily", "minute"], | ||
'/full/path/to/the/csvdir/directory')) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not in this PR but I think we should add an example of how this works, with expected output and such; the best place for that would be http://www.zipline.io/bundles.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I'll create pr for that.
for tframe in tframes: | ||
ddir = os.path.join(csvdir, tframe) | ||
|
||
symbols = sorted(item.split('.csv')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for consistent symbol order
tests/data/bundles/test_csvdir.py
Outdated
def per_symbol(symbol): | ||
df = pd.read_csv( | ||
test_resource_path('csvdir_samples', 'csvdir', | ||
'daily', symbol + '.csv'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gzip
these files?
1 similar comment
@bartosh can you gzip the |
Sure, will do. |
445c683
to
1b51ace
Compare
@freddiev4 Thank you for asking me to gzip test data. As a nice side effect of this the bundle now supports compressed csv files! @freddiev4 AppVeyor failure is not caused by my changes I believe:
Let me know when it's fixed in master. I'll rebase my branch. |
This should be good to merge 👍 don't worry about the Py34 build failure here as we removed Py34 package builds on master. |
However, could you actually rebase your branch @bartosh? As it looks like I can't merge it b/c of that failure in particular. |
Only Python 3.4 build failed. It still looks like infra error or something else not related to this PR. Can you rebuild it again? |
OK, I'll rebase it. |
The bundle for loading historical price data from the set of .csv files. It supports daily and minute time frames.
rebased |
@freddiev4 thank you for merging it! |
Thanks for your awesome work! @bartosh |
This bundle is for loading historical price data from the
set of .csv files. It supports daily and minute time frames.