Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark downloading is broken #1965

Closed
dmichalowicz opened this issue Sep 27, 2017 · 36 comments

Comments

@dmichalowicz
Copy link
Contributor

commented Sep 27, 2017

Fix benchmark downloading from Google with pandas-datareader. This issue was originally brought up here.

We now get benchmark data from Google instead of Yahoo, as seen here.

However, it appears that as of only a week or two ago, Google changed the URL from which they are serving their financial data, causing pandas datareader to break. This is also preventing us from rebuilding the test_examples data. (For more info see the original post above).

@cyniphile

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2017

Had this issue too. Just changed back to yahoo in the meantime in get_benchmark_returns which seems to work.

@MBattagl

This comment has been minimized.

Copy link

commented Sep 27, 2017

Same issue here #1953

@freddiev4

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2017

@yiorgosn

This comment has been minimized.

Copy link

commented Oct 4, 2017

Quick solution: Use a manually downloaded local copy of SPY (yahoo lets you download manually the entire history). I modified the benchmarks.py to look for a local csv copy instead. I attach the modified benchmarks.py file it should replace the existing one (so make a a copy of the original first before you overwrite it). The benchmarks.py file is usually found in: %USERPROFILE%\Anaconda3\envs\py34\Lib\site-packages\zipline\data. If you didn't create a unique environment for it then don't specify py34 after envs.

Also make sure that your local directory is reflected in this line in the code:
new_dir = 'c:/Downloaded_csv'

benchmarks.txt

JoaoAparicio added a commit to JoaoAparicio/zipline that referenced this issue Oct 4, 2017
ENH: --force-reload optional parameter.
zipline run calls function load_market_data. This function has a
cooldown of 1 hour. If it downloaded data less than 1 hour ago it won't
download again.

This has an inconvenient side-effect: if zipline run is executed,
data is not cached, data is downloaded but for some reason data is
still not present after download (like what happens with issue quantopian#1965),
if then the issue is fixed, zipline run can still not be executed until
the cooldown time has expired. This can then lead users to open
new issues which seem related to the original one, but are not (see
for example issue quantopian#1957 and discussion therein).

This commit adds an optional --force-redownload flag to zipline run.
The default is set to false.
JoaoAparicio added a commit to JoaoAparicio/zipline that referenced this issue Oct 17, 2017
ENH: --force-reload optional parameter.
zipline run calls function load_market_data. This function has a
cooldown of 1 hour. If it downloaded data less than 1 hour ago it won't
download again.

This has an inconvenient side-effect: if zipline run is executed,
data is not cached, data is downloaded but for some reason data is
still not present after download (like what happens with issue quantopian#1965),
if then the issue is fixed, zipline run can still not be executed until
the cooldown time has expired. This can then lead users to open
new issues which seem related to the original one, but are not (see
for example issue quantopian#1957 and discussion therein).

This commit adds an optional --force-redownload flag to zipline run.
The default is set to false.
@tanaytrivedi

This comment has been minimized.

Copy link

commented Oct 23, 2017

Hi,
Is there an official solution out there for running backtests and not having the system break every time because of this bechmark issue? @yiorgosn solution doesn't work for me, I think you have to do more than just replace the file. I have the exact same failure even with his file. Of course, I have replaced the file directory to make sure it looks in the right place for the csv.

Is there a way I can run the backtest without doing a benchmark until it is fixed? Without, that is, ripping up the code and removing any mention of benchmarks.
Thanks

@brian-from-quantrocket

This comment has been minimized.

Copy link

commented Oct 23, 2017

You can try setting the benchmark to an asset that's already in your bundle. For example if running the example algos with AAPL, tell Zipline to use AAPL as your benchmark.

from zipline.api import symbol, set_benchmark

def initialize(context):
    set_benchmark(symbol("AAPL"))

My experience has been that Zipline still downloads the SPY data (limited to a year) but at least refrains from using it in the backtest, and thus the backtest doesn't fail.

@edwardlun

This comment has been minimized.

Copy link

commented Oct 30, 2017

I have the same problem as @tanaytrivedi. Tried @yiorgosn solution but it still doesn't work. Are there any extra steps needed in addition to replacing benchmark.py? thanks a lot..

@Steven-Sakurai

This comment has been minimized.

Copy link

commented Nov 1, 2017

Thanks! @yiorgosn
I simply replaced the file and the benchmark is working fine now.

@alexkojin

This comment has been minimized.

Copy link

commented Dec 19, 2017

Google has changed the url for a finance data. Instead of http://www.google.com/ need to use https://finance.google.com/. Open a source code of pandas-datareader package and change urls.

@scotthuang1989

This comment has been minimized.

Copy link

commented Jan 11, 2018

I have similar issue when I try to run example: buyapple.py
the error message is :

pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google.com/finance/historical?q=SPY&output=csv&startdate=Dec+29%2C+1989&enddate=Jan+09%2C+2018

I try to access the URL in webbrowser, google give following message:

... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.

It seems google have some anti-crawler method to prevent automatically get data.

Anyone have similar issue?

@alexkojin

This comment has been minimized.

@scotthuang1989

This comment has been minimized.

Copy link

commented Jan 11, 2018

@alexkojin , I put this url into chrome. get a 404 error.

BTW, you mean I need change pandas source code and reinstall to override the official release?

@alexkojin

This comment has been minimized.

Copy link

commented Jan 11, 2018

@scotthuang1989 Sorry, the url is fixed now.
Yes, you can just change the source code. Or you can make a fork of panda-datareader, apply the fix, and install panda-reader from your fork.

@scotthuang1989

This comment has been minimized.

Copy link

commented Jan 11, 2018

@alexkojin , afer dig into a little. I took @yiorgosn solution. i change benchmarks.py to read data from another source.
And I think next release will fix this issue. because master branch already have modified benchmarks.py.

@beevor

This comment has been minimized.

Copy link

commented Jan 31, 2018

@scotthuang1989, the fix proposed by @alexkojin is rather simple. Edit ~/anaconda3/envs/zipline/lib/python3.5/site-packages/pandas_datareader/google/daily.py. Change the url from 'http://www.google.com/finance/historical' to 'https://finance.google.com/finance/historical' and you should be good. Works on zipline=1.1.1-np1111py35, pandas_datareader=0.5.0 and pandas=0.18.1. Or, fork pandas-datareader.

@dannypurcell

This comment has been minimized.

Copy link

commented Feb 17, 2018

Why are we doing this in the first place when the benchmark symbol should be the quandl wiki bundle?

@seanfuture

This comment has been minimized.

Copy link

commented Mar 17, 2018

Thank you @yiorgosn .. For myself, the Mac OS X path was /usr/local/lib/python3.4/site-packages/zipline/data and the URL used to download all historical SPY data was https://finance.yahoo.com/quote/SPY/history?period1=728283600&period2=1521259200&interval=1d&filter=history&frequency=1d .. Once downloaded and your updated benchmarks.txt code was put in place, worked fine. Much appreciated. Aggravating when open source software doesn't work out of the box.

@niklasamslgruber

This comment has been minimized.

Copy link

commented Mar 17, 2018

Is there a solution yet? I tried changing the google url, but I get a "max retires exceeded with url" error, when running the program.

@freddiev4

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2018

@niklas-amslgruber there's a fix on master that uses IEX. You should be able to run a backtest up to 5 years from the current date using the zipline master branch, which you can install using:

git clone git@github.com:quantopian/zipline.git
pip install zipline/

or fork it and then do the same steps above, replacing quantopian with your-github-username.

Hoping to do a release of zipline in the next week or two as well so people can just pip install without cloning.

Also doing work here #2107 for a more permanent fix, but haven't had the chance to finish it.

@niklasamslgruber

This comment has been minimized.

Copy link

commented Mar 19, 2018

Pip install doesn't work for me ( I don't have the right to read from the remote repository). I can only install via Conda where the latest version on Github master is not available

@freddiev4

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2018

Hi @niklas-amslgruber you should be able to fork zipline and then run pip install/

The latest master is also available via conda by running:

conda install -c quantopian/label/ci -c quantopian zipline
@niklasamslgruber

This comment has been minimized.

Copy link

commented Mar 19, 2018

I always get this error message (installing with pip)

Command "/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-build-kzyizl42/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-wm71_e5s-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-build-kzyizl42/pandas/

@freddiev4

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2018

@niklas-amslgruber the reason for that is because Zipline we only build packages for Py27 and Py35 (you can see the badge in the README).

For conda, can create a new conda env for Python 3.5 using

conda create -n py35 python=3.5

Then run

conda install -c quantopian/label/ci -c quantopian zipline

Or create a Python 3.5 virtualenv and then run pip install zipline/.

@niklasamslgruber

This comment has been minimized.

Copy link

commented Mar 24, 2018

This error still exists even though I followed your instructions and installed it on Python 3.5 with Anaconda.

pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google.com/finance/historical?output=csv&q=SPY&enddate=Mar+21%2C+2018&startdate=Dec+29%2C+1989

@Sentdex

This comment has been minimized.

Copy link

commented Mar 25, 2018

Not only does this problem still exist, even after fixing the url to be finance.google.com, you still get an error that you're sending automated requests. We can overcome this, but the google finance api is just plain unstable anyway. Better off using quandl.......or custom bundle symbol.

What I am failing to understand is why we're downloading a benchmark from any website when we have a bundle? set_benchmark doesn't seem to care at all, which is very strange. Should be able to use benchmarking symbol from our custom set.

@xrvo

This comment has been minimized.

Copy link

commented Mar 25, 2018

Changing the benchmark data source to morningstar worked for me.

To do this, in [your_env]/lib/python3.5/site-packages/zipline/data/benchmarks.py make the 2 changes marked by # NEW

data = pd_reader.DataReader(
        symbol,
        'morningstar', # NEW
        first_date,
        last_date
    )

    data = data.reset_index(0, drop=True) # NEW
    data = data['Close']

However, I agree with @Sentdex: fetching the benchmark data from the local bundle would be an improvement -- both in speed and stability.

Edit: Morningstar data was new for pandas-datareader v0.6.0, so a version upgrade may be necessary.

@kelvinho8

This comment has been minimized.

Copy link

commented Apr 2, 2018

@xrvo Hi, Matt. I made the 2 changes in benchmark.py but it's not working. The error message is as follows:

======================================
File "C:\Users\Kelvin\AppData\Local\conda\conda\envs\py35\lib\site-packages\pandas_datareader\data.py", line 175, in DataReader
raise NotImplementedError(msg)

NotImplementedError: data_source='morningstar' is not implemented

=================================
Am I missing something that needs to be changed too?

Thanks.
Kelvin

@xrvo

This comment has been minimized.

Copy link

commented Apr 2, 2018

@kelvinho8: It looks like the Morningstar data connector is a fairly recent addition to pandas-datareader. It was added in v0.6.0.

You should be able to resolve this error by upgrading your pandas-datareader to v0.6.0, which is currently the latest release.

@phlsmk

This comment has been minimized.

Copy link

commented Apr 4, 2018

thanks @xrvo #1965 (comment) worked for me too with pandas-datareader v0.6.0.

@blackcabbage1023

This comment has been minimized.

Copy link

commented Apr 5, 2018

I tried to upgrade to pandas-datareader v0.6.0. but it still does not work
I came across with this problem.

File "/Applications/anaconda3/envs/introduction_programming/lib/python3.5/site-packages/pandas_datareader/compat/init.py", line 8, in
import pandas.io.common as com

AttributeError: module 'pandas.io' has no attribute 'common'

@blackcabbage1023

This comment has been minimized.

Copy link

commented Apr 5, 2018

Update: I then tried to change pandas.io into pandas_datareader as mentioned below.
https://github.com/pydata/pandas-datareader

But still it does not work.

@xrvo

This comment has been minimized.

Copy link

commented Apr 5, 2018

@blackcabbage1023 it seems like this error would only happen if there's either a problem with your environment or you have a really old version of pandas. Your pandas version should be v0.18.1 as per the zipline requirements.
If this doesn't work, I suggest you try creating a new virtual environment from scratch.

@freddiev4

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2018

We recently released Zipline 1.2.0 on PyPI, as well as conda packages for Linux and Windows (macOS soon); please try installing the latest release

You can see the release notes here Feel free to update to 1.2.0 with either:

pip install -U zipline

or

conda update zipline -c quantopian

@freddiev4 freddiev4 closed this Apr 9, 2018

@niklasamslgruber

This comment has been minimized.

Copy link

commented Apr 27, 2018

When will you release the macOS version? @freddiev4

@freddiev4

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2018

@niklay14 I currently don't have a time-line in mind as exams are coming up. I believe if you have conda you can also just pip install zipline as well

@niklasamslgruber

This comment has been minimized.

Copy link

commented Apr 27, 2018

I get this error message while installing zipline with pip: @freddiev4

Command "/Users/niklasamslgruber/anaconda3/envs/py35/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-req-build-qb36zhft/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-record-fcez5zho/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-req-build-qb36zhft/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.