New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement csvdir bundle #1860

Merged
merged 2 commits into from Nov 9, 2017

Conversation

Projects
None yet
4 participants
@bartosh
Contributor

bartosh commented Jun 26, 2017

This bundle is for loading historical price data from the
set of .csv files. It supports daily and minute time frames.

@bartosh bartosh force-pushed the bartosh:csvdir branch from bfad4c2 to ca076c7 Jun 29, 2017

@coveralls

This comment has been minimized.

coveralls commented Jun 29, 2017

Coverage Status

Coverage decreased (-0.4%) to 87.281% when pulling ca076c7 on bartosh:csvdir into 5e6110e on quantopian:master.

@bartosh bartosh force-pushed the bartosh:csvdir branch from ca076c7 to 5c18613 Jun 30, 2017

@coveralls

This comment has been minimized.

coveralls commented Jun 30, 2017

Coverage Status

Coverage decreased (-0.4%) to 87.281% when pulling 5c18613 on bartosh:csvdir into 5e6110e on quantopian:master.

@bartosh bartosh force-pushed the bartosh:csvdir branch 2 times, most recently from 71a93a9 to 2cbfa9b Sep 24, 2017

@coveralls

This comment has been minimized.

coveralls commented Sep 24, 2017

Coverage Status

Coverage decreased (-0.04%) to 87.16% when pulling 2cbfa9b on bartosh:csvdir into 64af3d3 on quantopian:master.

@bartosh bartosh force-pushed the bartosh:csvdir branch from 2cbfa9b to 17ff27c Sep 24, 2017

@coveralls

This comment has been minimized.

coveralls commented Sep 24, 2017

Coverage Status

Coverage decreased (-0.04%) to 87.16% when pulling 17ff27c on bartosh:csvdir into 64af3d3 on quantopian:master.

@bartosh bartosh force-pushed the bartosh:csvdir branch from 17ff27c to 6255331 Sep 24, 2017

@coveralls

This comment has been minimized.

coveralls commented Sep 24, 2017

Coverage Status

Coverage increased (+0.03%) to 87.232% when pulling 6255331 on bartosh:csvdir into 64af3d3 on quantopian:master.

@bartosh bartosh changed the title from implement csvdir bundle to ENH: implement csvdir bundle Sep 25, 2017

@bartosh bartosh changed the title from ENH: implement csvdir bundle to ENH: Implement csvdir bundle Sep 25, 2017

@bartosh bartosh force-pushed the bartosh:csvdir branch 2 times, most recently from da7c3cb to 2340d1a Sep 27, 2017

@coveralls

This comment has been minimized.

coveralls commented Sep 27, 2017

Coverage Status

Coverage increased (+0.01%) to 87.215% when pulling 2340d1a on bartosh:csvdir into fbdfaf8 on quantopian:master.

@coveralls

This comment has been minimized.

coveralls commented Sep 27, 2017

Coverage Status

Coverage increased (+0.01%) to 87.215% when pulling 2340d1a on bartosh:csvdir into fbdfaf8 on quantopian:master.

@bartosh bartosh force-pushed the bartosh:csvdir branch from 2340d1a to 21337b7 Sep 27, 2017

@coveralls

This comment has been minimized.

coveralls commented Sep 27, 2017

Coverage Status

Coverage increased (+0.01%) to 87.215% when pulling 21337b7 on bartosh:csvdir into fbdfaf8 on quantopian:master.

@bartosh bartosh force-pushed the bartosh:csvdir branch 4 times, most recently from 83cbf8c to 5f7b851 Oct 14, 2017

@bartosh

This comment has been minimized.

Contributor

bartosh commented Oct 17, 2017

@freddiev4 FWIW, my plan is to review this sometime this week 🙂
Thank you. That would be great to have it reviewed!

@freddiev4

Awesome work @bartosh! I have some feedback for you below 🙂

Parameters
----------
tframe: tuple, optional

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

Small typo here. Looks like it should be tframes

@@ -1,5 +1,7 @@
# These imports are necessary to force module-scope register calls to happen.
from . import quandl # noqa
from . import csvdir # noqa
from .core import (

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

Should we also add csvdir to __all__?

This comment has been minimized.

@bartosh

bartosh Oct 23, 2017

Contributor

I don't think we should as quandl is not there.

from . import core as bundles
logger = logbook.Logger(__name__)

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

I think we can make the output from the logger a little cleaner by removing the timestamp and log level. We can do this with a CustomHandler

handler = StreamHandler(sys.stdout, format_string=" | {record.message}")
logger = Logger(__name__)
logger.handlers.append(handler)

Output before hander:

Loading custom pricing data:   [#########---------------------------]   25%[2017-10-19 18:00:46.369496] INFO: zipline.data.bundles.csvdir: AAPL: sid 0
Loading custom pricing data:   [##################------------------]   50%[2017-10-19 18:00:46.405959] INFO: zipline.data.bundles.csvdir: IBM: sid 1
Loading custom pricing data:   [###########################---------]   75%[2017-10-19 18:00:46.447738] INFO: zipline.data.bundles.csvdir: KO: sid 2
Loading custom pricing data:   [####################################]  100%[2017-10-19 18:00:46.481477] INFO: zipline.data.bundles.csvdir: MSFT: sid 3
Loading custom pricing data:   [####################################]  100%
Merging daily equity files:  [####################################]

Output after handler

Loading custom pricing data:   [#########---------------------------]   25% | AAPL: sid 0
Loading custom pricing data:   [##################------------------]   50% | IBM: sid 1
Loading custom pricing data:   [###########################---------]   75% | KO: sid 2
Loading custom pricing data:   [####################################]  100% | MSFT: sid 3
Loading custom pricing data:   [####################################]  100%
Merging daily equity files:  [####################################] 

If we want to actually log this data that might require some other Handler and a default directory in which to put those logs (like ~/.zipline/logs/)

This comment has been minimized.

@bartosh

bartosh Oct 23, 2017

Contributor

done. Thank you for the suggestion!

-------
ingest : callable
The bundle ingest function
Examples

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

Newline above Examples

# Hardcode the exchange to "CSVDIR" for all assets and (elsewhere)
# register "CSVDIR" to resolve to the NYSE calendar, because these
# are all equities and thus can use the NYSE calendar.
metadata['exchange'] = "CSVDIR"

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

I think we can leave this for another PR, as long as we document this somewhere in the code with a TODO

Users can pass in their own trading calendar using the --trading-calendar flag; if they want to make a bundle with their own csv data, the dates in that bundle might not map correctly to the dates in the NYSE calendar.

I think the solution to this is to change the calendar in zipline.data.bundles.core in our register() function

@curry
def register(name,
             f,
             calendar_name='NYSE',
             start_session=None,
             end_session=None,
             minutes_per_day=390,
             create_writers=True):

And add an option to zipline ingest that works in the same way that we pass --trading-calendar to zipline run. #1800 shows how we do that.

This comment has been minimized.

@bartosh

bartosh Oct 23, 2017

Contributor

I implemented csvdir_equities function especially for this purpose. It can be used in extension.py to use another calendar or other custom parameters, e.g.

register('csvdir-custom', csvdir_equities(['daily', 'minute'], 
              "/Users/test/csvdir"),
             calendar_name='CUSTOM', minutes_per_day=24*60)

This comment has been minimized.

@freddiev4

freddiev4 Oct 27, 2017

Contributor

Ah good call. That slipped past me.

register('custom-csvdir-bundle',
csvdir_equities(["daily", "minute"],
'/full/path/to/the/csvdir/directory'))
"""

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

Maybe not in this PR but I think we should add an example of how this works, with expected output and such; the best place for that would be http://www.zipline.io/bundles.html

This comment has been minimized.

@bartosh

bartosh Oct 23, 2017

Contributor

Thank you for the suggestion. I'll create pr for that.

for tframe in tframes:
ddir = os.path.join(csvdir, tframe)
symbols = sorted(item.split('.csv')[0]

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

👍 for consistent symbol order

def per_symbol(symbol):
df = pd.read_csv(
test_resource_path('csvdir_samples', 'csvdir',
'daily', symbol + '.csv'),

This comment has been minimized.

@freddiev4

freddiev4 Oct 21, 2017

Contributor

Can we gzip these files?

@bartosh bartosh force-pushed the bartosh:csvdir branch from 5f7b851 to d0dcd2d Oct 23, 2017

@coveralls

This comment has been minimized.

coveralls commented Oct 23, 2017

Coverage Status

Coverage increased (+0.02%) to 87.237% when pulling d0dcd2d on bartosh:csvdir into 0770d3b on quantopian:master.

1 similar comment
@coveralls

This comment has been minimized.

coveralls commented Oct 23, 2017

Coverage Status

Coverage increased (+0.02%) to 87.237% when pulling d0dcd2d on bartosh:csvdir into 0770d3b on quantopian:master.

@freddiev4

This comment has been minimized.

Contributor

freddiev4 commented Oct 27, 2017

@bartosh can you gzip the csv files?

@bartosh

This comment has been minimized.

Contributor

bartosh commented Oct 27, 2017

Sure, will do.

@bartosh bartosh force-pushed the bartosh:csvdir branch 3 times, most recently from 445c683 to 1b51ace Nov 5, 2017

@coveralls

This comment has been minimized.

coveralls commented Nov 5, 2017

Coverage Status

Coverage increased (+0.006%) to 87.248% when pulling 1b51ace on bartosh:csvdir into 856313c on quantopian:master.

@bartosh bartosh force-pushed the bartosh:csvdir branch from 1b51ace to 688aa2b Nov 5, 2017

@coveralls

This comment has been minimized.

coveralls commented Nov 5, 2017

Coverage Status

Coverage increased (+0.006%) to 87.248% when pulling 688aa2b on bartosh:csvdir into 856313c on quantopian:master.

@coveralls

This comment has been minimized.

coveralls commented Nov 5, 2017

Coverage Status

Coverage decreased (-0.003%) to 87.239% when pulling 688aa2b on bartosh:csvdir into 856313c on quantopian:master.

@bartosh

This comment has been minimized.

Contributor

bartosh commented Nov 6, 2017

@freddiev4 Thank you for asking me to gzip test data. As a nice side effect of this the bundle now supports compressed csv files!

@freddiev4 AppVeyor failure is not caused by my changes I believe:

WARNING: could not import _license.show_info
# try:
# $ conda install -n root _license
conda install conda=4.1.11 conda-build=1.21.11 anaconda-client=1.5.1 --yes -q
Fetching package metadata: ..
Solving package specifications: 
Error: Could not find some dependencies for conda 4.1.11*: vc 9.*, vc 10.*, vc 14.*

Let me know when it's fixed in master. I'll rebase my branch.

@freddiev4

This comment has been minimized.

Contributor

freddiev4 commented Nov 9, 2017

This should be good to merge 👍 don't worry about the Py34 build failure here as we removed Py34 package builds on master.

@freddiev4

This comment has been minimized.

Contributor

freddiev4 commented Nov 9, 2017

However, could you actually rebase your branch @bartosh? As it looks like I can't merge it b/c of that failure in particular.

@bartosh

This comment has been minimized.

Contributor

bartosh commented Nov 9, 2017

Only Python 3.4 build failed. It still looks like infra error or something else not related to this PR. Can you rebuild it again?

@bartosh

This comment has been minimized.

Contributor

bartosh commented Nov 9, 2017

OK, I'll rebase it.

bartosh added some commits Jun 26, 2017

ENH: implement csvdir bundle
The bundle for loading historical price data from the
set of .csv files. It supports daily and minute time frames.

@bartosh bartosh force-pushed the bartosh:csvdir branch from 688aa2b to 30c2431 Nov 9, 2017

@bartosh

This comment has been minimized.

Contributor

bartosh commented Nov 9, 2017

rebased

@coveralls

This comment has been minimized.

coveralls commented Nov 9, 2017

Coverage Status

Coverage increased (+0.006%) to 87.239% when pulling 30c2431 on bartosh:csvdir into 505c191 on quantopian:master.

@freddiev4 freddiev4 merged commit 67d983b into quantopian:master Nov 9, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@bartosh

This comment has been minimized.

Contributor

bartosh commented Nov 9, 2017

@freddiev4 thank you for merging it!

@freddiev4

This comment has been minimized.

Contributor

freddiev4 commented Nov 9, 2017

Thanks for your awesome work! @bartosh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment