Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement csvdir bundle #1860

Merged
merged 2 commits into from
Nov 9, 2017
Merged

ENH: Implement csvdir bundle #1860

merged 2 commits into from
Nov 9, 2017

Conversation

bartosh
Copy link
Contributor

@bartosh bartosh commented Jun 26, 2017

This bundle is for loading historical price data from the
set of .csv files. It supports daily and minute time frames.

@coveralls
Copy link

coveralls commented Jun 29, 2017

Coverage Status

Coverage decreased (-0.4%) to 87.281% when pulling ca076c7 on bartosh:csvdir into 5e6110e on quantopian:master.

@coveralls
Copy link

coveralls commented Jun 30, 2017

Coverage Status

Coverage decreased (-0.4%) to 87.281% when pulling 5c18613 on bartosh:csvdir into 5e6110e on quantopian:master.

@bartosh bartosh force-pushed the csvdir branch 2 times, most recently from 71a93a9 to 2cbfa9b Compare September 24, 2017 19:34
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 87.16% when pulling 2cbfa9b on bartosh:csvdir into 64af3d3 on quantopian:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 87.16% when pulling 17ff27c on bartosh:csvdir into 64af3d3 on quantopian:master.

@coveralls
Copy link

coveralls commented Sep 24, 2017

Coverage Status

Coverage increased (+0.03%) to 87.232% when pulling 6255331 on bartosh:csvdir into 64af3d3 on quantopian:master.

@bartosh bartosh changed the title implement csvdir bundle ENH: implement csvdir bundle Sep 25, 2017
@bartosh bartosh changed the title ENH: implement csvdir bundle ENH: Implement csvdir bundle Sep 25, 2017
@bartosh bartosh force-pushed the csvdir branch 2 times, most recently from da7c3cb to 2340d1a Compare September 27, 2017 08:33
@coveralls
Copy link

coveralls commented Sep 27, 2017

Coverage Status

Coverage increased (+0.01%) to 87.215% when pulling 2340d1a on bartosh:csvdir into fbdfaf8 on quantopian:master.

@coveralls
Copy link

coveralls commented Sep 27, 2017

Coverage Status

Coverage increased (+0.01%) to 87.215% when pulling 2340d1a on bartosh:csvdir into fbdfaf8 on quantopian:master.

@coveralls
Copy link

coveralls commented Sep 27, 2017

Coverage Status

Coverage increased (+0.01%) to 87.215% when pulling 21337b7 on bartosh:csvdir into fbdfaf8 on quantopian:master.

@bartosh
Copy link
Contributor Author

bartosh commented Oct 17, 2017

@freddiev4 FWIW, my plan is to review this sometime this week 🙂
Thank you. That would be great to have it reviewed!

Copy link
Contributor

@freddiev4 freddiev4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @bartosh! I have some feedback for you below 🙂


Parameters
----------
tframe: tuple, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo here. Looks like it should be tframes

@@ -1,5 +1,7 @@
# These imports are necessary to force module-scope register calls to happen.
from . import quandl # noqa
from . import csvdir # noqa

from .core import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add csvdir to __all__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should as quandl is not there.


from . import core as bundles

logger = logbook.Logger(__name__)
Copy link
Contributor

@freddiev4 freddiev4 Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make the output from the logger a little cleaner by removing the timestamp and log level. We can do this with a CustomHandler

handler = StreamHandler(sys.stdout, format_string=" | {record.message}")
logger = Logger(__name__)
logger.handlers.append(handler)

Output before hander:

Loading custom pricing data:   [#########---------------------------]   25%[2017-10-19 18:00:46.369496] INFO: zipline.data.bundles.csvdir: AAPL: sid 0
Loading custom pricing data:   [##################------------------]   50%[2017-10-19 18:00:46.405959] INFO: zipline.data.bundles.csvdir: IBM: sid 1
Loading custom pricing data:   [###########################---------]   75%[2017-10-19 18:00:46.447738] INFO: zipline.data.bundles.csvdir: KO: sid 2
Loading custom pricing data:   [####################################]  100%[2017-10-19 18:00:46.481477] INFO: zipline.data.bundles.csvdir: MSFT: sid 3
Loading custom pricing data:   [####################################]  100%
Merging daily equity files:  [####################################]

Output after handler

Loading custom pricing data:   [#########---------------------------]   25% | AAPL: sid 0
Loading custom pricing data:   [##################------------------]   50% | IBM: sid 1
Loading custom pricing data:   [###########################---------]   75% | KO: sid 2
Loading custom pricing data:   [####################################]  100% | MSFT: sid 3
Loading custom pricing data:   [####################################]  100%
Merging daily equity files:  [####################################] 

If we want to actually log this data that might require some other Handler and a default directory in which to put those logs (like ~/.zipline/logs/)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Thank you for the suggestion!

-------
ingest : callable
The bundle ingest function
Examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline above Examples

# Hardcode the exchange to "CSVDIR" for all assets and (elsewhere)
# register "CSVDIR" to resolve to the NYSE calendar, because these
# are all equities and thus can use the NYSE calendar.
metadata['exchange'] = "CSVDIR"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave this for another PR, as long as we document this somewhere in the code with a TODO

Users can pass in their own trading calendar using the --trading-calendar flag; if they want to make a bundle with their own csv data, the dates in that bundle might not map correctly to the dates in the NYSE calendar.

I think the solution to this is to change the calendar in zipline.data.bundles.core in our register() function

@curry
def register(name,
             f,
             calendar_name='NYSE',
             start_session=None,
             end_session=None,
             minutes_per_day=390,
             create_writers=True):

And add an option to zipline ingest that works in the same way that we pass --trading-calendar to zipline run. #1800 shows how we do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented csvdir_equities function especially for this purpose. It can be used in extension.py to use another calendar or other custom parameters, e.g.

register('csvdir-custom', csvdir_equities(['daily', 'minute'], 
              "/Users/test/csvdir"),
             calendar_name='CUSTOM', minutes_per_day=24*60)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good call. That slipped past me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code you share the code of your 'CUSTOM' calendar? :)

register('custom-csvdir-bundle',
csvdir_equities(["daily", "minute"],
'/full/path/to/the/csvdir/directory'))
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not in this PR but I think we should add an example of how this works, with expected output and such; the best place for that would be http://www.zipline.io/bundles.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I'll create pr for that.

for tframe in tframes:
ddir = os.path.join(csvdir, tframe)

symbols = sorted(item.split('.csv')[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for consistent symbol order

def per_symbol(symbol):
df = pd.read_csv(
test_resource_path('csvdir_samples', 'csvdir',
'daily', symbol + '.csv'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we gzip these files?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 87.237% when pulling d0dcd2d on bartosh:csvdir into 0770d3b on quantopian:master.

1 similar comment
@coveralls
Copy link

coveralls commented Oct 23, 2017

Coverage Status

Coverage increased (+0.02%) to 87.237% when pulling d0dcd2d on bartosh:csvdir into 0770d3b on quantopian:master.

@freddiev4
Copy link
Contributor

freddiev4 commented Oct 27, 2017

@bartosh can you gzip the csv files?

@bartosh
Copy link
Contributor Author

bartosh commented Oct 27, 2017

Sure, will do.

@coveralls
Copy link

coveralls commented Nov 5, 2017

Coverage Status

Coverage increased (+0.006%) to 87.248% when pulling 1b51ace on bartosh:csvdir into 856313c on quantopian:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.006%) to 87.248% when pulling 688aa2b on bartosh:csvdir into 856313c on quantopian:master.

@coveralls
Copy link

coveralls commented Nov 5, 2017

Coverage Status

Coverage decreased (-0.003%) to 87.239% when pulling 688aa2b on bartosh:csvdir into 856313c on quantopian:master.

@bartosh
Copy link
Contributor Author

bartosh commented Nov 6, 2017

@freddiev4 Thank you for asking me to gzip test data. As a nice side effect of this the bundle now supports compressed csv files!

@freddiev4 AppVeyor failure is not caused by my changes I believe:

WARNING: could not import _license.show_info
# try:
# $ conda install -n root _license
conda install conda=4.1.11 conda-build=1.21.11 anaconda-client=1.5.1 --yes -q
Fetching package metadata: ..
Solving package specifications: 
Error: Could not find some dependencies for conda 4.1.11*: vc 9.*, vc 10.*, vc 14.*

Let me know when it's fixed in master. I'll rebase my branch.

@freddiev4
Copy link
Contributor

freddiev4 commented Nov 9, 2017

This should be good to merge 👍 don't worry about the Py34 build failure here as we removed Py34 package builds on master.

@freddiev4
Copy link
Contributor

However, could you actually rebase your branch @bartosh? As it looks like I can't merge it b/c of that failure in particular.

@bartosh
Copy link
Contributor Author

bartosh commented Nov 9, 2017

Only Python 3.4 build failed. It still looks like infra error or something else not related to this PR. Can you rebuild it again?

@bartosh
Copy link
Contributor Author

bartosh commented Nov 9, 2017

OK, I'll rebase it.

The bundle for loading historical price data from the
set of .csv files. It supports daily and minute time frames.
@bartosh
Copy link
Contributor Author

bartosh commented Nov 9, 2017

rebased

@coveralls
Copy link

coveralls commented Nov 9, 2017

Coverage Status

Coverage increased (+0.006%) to 87.239% when pulling 30c2431 on bartosh:csvdir into 505c191 on quantopian:master.

@freddiev4 freddiev4 merged commit 67d983b into quantopian:master Nov 9, 2017
@bartosh
Copy link
Contributor Author

bartosh commented Nov 9, 2017

@freddiev4 thank you for merging it!

@freddiev4
Copy link
Contributor

Thanks for your awesome work! @bartosh

@cemal95
Copy link

cemal95 commented Jun 1, 2020

Does anyone know how to solve this problem. Its data from Quandl, EoD data I purchased. But I get this:
Schermafbeelding 2020-06-01 om 16 18 15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants