New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Update quandl_bundle to use Quandl API v3. #1990

Merged
merged 10 commits into from Nov 20, 2017

Conversation

Projects
None yet
6 participants
@ernestoeperez88
Member

ernestoeperez88 commented Oct 17, 2017

No description provided.

''' Import WIKI Prices data table from Quandl
'''
if show_progress:
print('Downloading WIKI metadata.')

This comment has been minimized.

@prsutherland

prsutherland Oct 20, 2017

Member

Should this use log?

asset_metadata['symbol'] = symbols
data.set_index('symbol', inplace=True)
asset_metadata['start_date'] = \

This comment has been minimized.

@prsutherland

prsutherland Oct 20, 2017

Member

This might be something that can be replaced a groupby on a DataFrame, something like:

asset_metadata = (data.ix[:, ['symbol', 'date']]
           .groupby(by='symbol',)
           .agg({'date': [np.min, np.max]})
           .reset_index())
asset_metadata['start_date'] = asset_metadata.date.amin
asset_metadata['end_date'] = asset_metadata.date.amax
del asset_metadata['date']
asset_metadata.columns = asset_metadata.columns.get_level_values(0)
@coveralls

This comment has been minimized.

coveralls commented Oct 28, 2017

Coverage Status

Coverage increased (+0.006%) to 87.208% when pulling e2cdc2a on quandl_bundle_update into 658d106 on master.

@ernestoeperez88 ernestoeperez88 changed the title from [WIP] Update quandl_bundle to latest version of Quandl API. to Updated quandl bundle to use latest version of Quandl's API. Oct 28, 2017

@ernestoeperez88 ernestoeperez88 requested a review from llllllllll Oct 28, 2017

@prsutherland

A couple of minor comments, the biggest one would be documenting the api key. Otherwise, LGTM

print('Fetching from %s' % url)
response = requests.get(url)
response.raise_for_status()
api_key = 'Replace with Quandl API key.'

This comment has been minimized.

@prsutherland

prsutherland Oct 30, 2017

Member

Might be worth documenting or linking to how to get an API key. On a less important note, parameterizing this so that the user can pass the API key from the commandline would be helpful.

QUANDL_DATA_URL = (

This comment has been minimized.

@prsutherland

prsutherland Oct 30, 2017

Member

Should we just import this from zipline/data/bundles/quandl.py?

@coveralls

This comment has been minimized.

coveralls commented Oct 31, 2017

Coverage Status

Coverage increased (+0.006%) to 87.208% when pulling 442c8ed on quandl_bundle_update into 658d106 on master.

@prsutherland

LGTM

body=quandl_response.read(),
content_type='application/zip',
status=200
)

This comment has been minimized.

@richafrank

richafrank Nov 6, 2017

Member

FWIW I don't think you need to indent all the rest of this, once you've read the quandl_response. You can close the file here, and leave the rest of the indentation unchanged.

@richafrank

Left some thoughts for you about the deprecation.

@@ -391,6 +297,10 @@ def quantopian_quandl_bundle(environ,
cache,
show_progress,
output_dir):
log.warn(
'quantopian-quandl has been deprecated and '

This comment has been minimized.

@richafrank

richafrank Nov 6, 2017

Member

Should we use the warnings module for this?

This comment has been minimized.

@richafrank

richafrank Nov 6, 2017

Member

Also, if this is now deprecated, probably we should change the default ingest bundle to be the quandl bundle itself (in zipline/__main__.py.

@coveralls

This comment has been minimized.

coveralls commented Nov 10, 2017

Coverage Status

Coverage increased (+0.1%) to 87.319% when pulling 93729e9 on quandl_bundle_update into 658d106 on master.

@richafrank

Nice @ernestoeperez88 ! Had just a couple comments.

from zipline.testing.fixtures import ZiplineTestCase
from zipline.testing.fixtures import (
ZiplineTestCase,
WithResponses

This comment has been minimized.

@richafrank

richafrank Nov 17, 2017

Member

Not a blocker, but just to give a style recommendation, you might include a trailing comma on the final item in a list (such as multiple imports) so that the next person to add an element to the list doesn't see a change on this line as well.

This comment has been minimized.

@ernestoeperez88

ernestoeperez88 Nov 17, 2017

Member

Thanks for the tip. I had noticed this in other places, but didn't know what the purpose of it was.

)
cache[key] = raw
warnings.simplefilter('once', DeprecationWarning)

This comment has been minimized.

@richafrank

richafrank Nov 17, 2017

Member

Does this mean that all DeprecationWarnings from any module will only print once? I don't think we want to decide that for users - I think we probably don't want to set up any filters for any modules. What do you think?

This comment has been minimized.

@ernestoeperez88

ernestoeperez88 Nov 17, 2017

Member

I think I misunderstood the correct use of the module on my first pass through the documentation. I found a good explanation here.

def parse_pricing_and_vol(data,
sessions,
symbol_map):
for asset_id, symbol in symbol_map.iteritems():

This comment has been minimized.

@richafrank

richafrank Nov 17, 2017

Member

I think iteritems is gone on python 3.

@richafrank

This comment has been minimized.

Member

richafrank commented Nov 17, 2017

Oh, also could you add commit prefixes to your commit messages?

@ernestoeperez88 ernestoeperez88 force-pushed the quandl_bundle_update branch from 93729e9 to ccfa692 Nov 17, 2017

@coveralls

This comment has been minimized.

coveralls commented Nov 17, 2017

Coverage Status

Coverage increased (+0.02%) to 87.341% when pulling ccfa692 on quandl_bundle_update into 28a41a5 on master.

@ernestoeperez88 ernestoeperez88 changed the title from Updated quandl bundle to use latest version of Quandl's API. to ENH: Update quandl_bundle to use Quandl API v3. Nov 17, 2017

@prsutherland

LGTM too

@richafrank

This comment has been minimized.

Member

richafrank commented Nov 20, 2017

LGTM!

@ernestoeperez88 ernestoeperez88 merged commit 079ea3d into master Nov 20, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ernestoeperez88 ernestoeperez88 deleted the quandl_bundle_update branch Nov 20, 2017

@VitalyErmilov

This comment has been minimized.

VitalyErmilov commented Nov 21, 2017

I tried to use the new version of quandl.py, unfortunаtely, it went out of memory on 4GB ram amazon instance when reading df from downloaded cvs file. On 8GB instance everything worked ok. Cаn it be optimized? Mаybe by loаding csv into df in pаrts.

@richafrank

This comment has been minimized.

Member

richafrank commented Nov 21, 2017

Thanks @VitalyErmilov - I opened #2022 to track it.

@westurner

This comment has been minimized.

westurner commented Nov 22, 2017

Dask provides a similar API to Pandas and supports datasets larger than can fit into RAM.

From https://dask.pydata.org/en/latest/dataframe.html :

A Dask DataFrame is a large parallel dataframe composed of many smaller Pandas dataframes, split along the index. These pandas dataframes may live on disk for larger-than-memory computing on a single machine, or on many different machines in a cluster.

@ernestoeperez88

This comment has been minimized.

Member

ernestoeperez88 commented Dec 20, 2017

@VitalyErmilov can you please test the latest master branch on your 4GB instance? and maybe post your findings in #2022

@freddiev4 freddiev4 referenced this pull request Jul 16, 2018

Closed

BUG: fix #1843 #1913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment