New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception downloading yahoo historical data with adjust_price=True on fix-yahoo branch #342

Closed
OlegShteynbuk opened this Issue May 24, 2017 · 7 comments

Comments

Projects
None yet
2 participants
@OlegShteynbuk
Contributor

OlegShteynbuk commented May 24, 2017

Exception downloading yahoo historical data with adjust_price=True on fix-yahoo branch

This can be recreated for symbol = 'SRCE' and date range should include Jun 29, 2016

yahooData = pdr.data.get_data_yahoo(symbol, self.start, self.end, adjust_price=True)
File "/home/oleg/projects/openSource/python/pandas-datareader/yahoofix/pandas-datareader/pandas_datareader/data.py", line 40, in get_data_yahoo
return YahooDailyReader(*args, **kwargs).read()
File "/home/oleg/projects/openSource/python/pandas-datareader/yahoofix/pandas-datareader/pandas_datareader/yahoo/daily.py", line 117, in read
df = _adjust_prices(df)
File "/home/oleg/projects/openSource/python/pandas-datareader/yahoofix/pandas-datareader/pandas_datareader/yahoo/daily.py", line 170, in _adjust_prices
adj_ratio = hist_data['Adj Close'] / hist_data['Close']
File "/home/oleg/programs/python/anaconda2/envs/yahoofix/lib/python2.7/site-packages/pandas/core/ops.py", line 721, in wrapper
result = wrap_results(safe_na_op(lvalues, rvalues))
File "/home/oleg/programs/python/anaconda2/envs/yahoofix/lib/python2.7/site-packages/pandas/core/ops.py", line 692, in safe_na_op
lambda x: op(x, rvalues))
File "pandas/_libs/algos_common_helper.pxi", line 1212, in pandas._libs.algos.arrmap_object (pandas/_libs/algos.c:31954)
File "/home/oleg/programs/python/anaconda2/envs/yahoofix/lib/python2.7/site-packages/pandas/core/ops.py", line 692, in
lambda x: op(x, rvalues))
TypeError: unsupported operand type(s) for /: 'str' and 'str'

Environment:
followed the instruction found on one of the forums probably on stackoverflow:
git clone https://github.com/rgkimball/pandas-datareader
$ cd pandas-datareader
$ git checkout fix-yahoo
$ pip install -e .

It turned out that on yahoo web site the data displayed:
Jun 30, 2016 31.73 32.44 31.65 32.39 32.39 28,600
Jun 29, 2016 0.00 0.00 0.00 0.00 0.00 -
Jun 28, 2016 31.10 31.16 30.63 31.05 31.05 49,800

but when you click on the download button you will get:
2016-06-28 31.1 31.16 30.629999 31.049999 31.049999 49800
2016-06-29 null null null null null null
2016-06-30 31.73 32.439999 31.65 32.389999 32.389999 28600

string 'null' instead of 0.00 or '-'
The same lower case 'null' values will be if you use adjust_price=False in pdr.data.get_data_yahoo

The solution is to add 'null' in lower case to na_values argument to read_csv in function _read_lines in base.py
na_values=('-', 'null')

I have made this change that is my repository.

@OlegShteynbuk

This comment has been minimized.

Contributor

OlegShteynbuk commented May 24, 2017

created a pull request get a message about conflicts that must be resolved, probably the branch has been modified, can recreate this one for another branch if needed as merge for some reason was for the master not fix-yahoo

OlegShteynbuk/olegyahoofix#1

@OlegShteynbuk

This comment has been minimized.

Contributor

OlegShteynbuk commented May 24, 2017

Add 'null' in lower case to na_values defaults in pandas.read_csv #16471
pandas-dev/pandas#16471

@OlegShteynbuk OlegShteynbuk reopened this May 24, 2017

@OlegShteynbuk

This comment has been minimized.

Contributor

OlegShteynbuk commented May 24, 2017

Another solution will be to add 'null' in lower case to na_values defaults in pandas.read_csv
just created a new issue:
pandas-dev/pandas#16471

but in any case this solution should not have any performance or other issues

@OlegShteynbuk

This comment has been minimized.

Contributor

OlegShteynbuk commented Jun 5, 2017

pandas-dev/pandas#16471 is scheduled for 0.21.0, Next Major Release.
until then the following workaround can be used, actually two workarounds depending on how yahoo data are downloaded.

  1. If you save your data
   import pandas_datareader as pdr
   import pandas_datareader.yahoo.daily as yahoo_daily

    yahooData = pdr.data.get_data_yahoo(symbol, date_from, date_to, adjust_price=False)
    #save data to test_file
    yahooData.to_csv(test_file)
    # cleaning
    df = pd.read_csv(test_file, na_values=('-', 'null'))
    # adjust prices
    adj_df = yahoo_daily._adjust_prices(df)


  1. if you don't save your data
    import pandas_datareader as pdr
    import pandas_datareader.yahoo.daily as yahoo_daily

    yahooData = pdr.data.get_data_yahoo(symbol, date_from, date_to, adjust_price=False)
    # replace 'null'
    yahooData = yahooData.replace('null', np.NaN)
    yahooData[['Open','High', 'Low', 'Close', 'Adj Close']] = yahooData[['Open','High', 'Low', 'Close', 'Adj Close']].apply(pd.to_numeric)

    # adjust prices
    adj_df = yahoo_daily._adjust_prices(df)

even if you don't need price adjustment it is still a good idea to remove 'null'

rgkimball added a commit to rgkimball/pandas-datareader that referenced this issue Jun 8, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Jul 2, 2017

this still looks like an issue after #355

In [8]: pdr.data.get_data_yahoo('SRCE', '20160626', '20160705', adjust_price=False)
Out[8]: 
                 Open       High        Low      Close  Adj Close Volume
Date                                                                    
2016-06-27  31.120001  31.120001  30.450001  30.740000  30.179176  51500
2016-06-28  31.100000  31.160000  30.629999  31.049999  30.483521  49800
2016-06-29       null       null       null   0.000000   0.000000   null
2016-06-30  31.730000  32.439999  31.650000  32.389999  31.799072  28600
2016-07-01  32.389999  32.860001  31.889999  32.090000  31.504547  28100

In [9]: pdr.data.get_data_yahoo('SRCE', '20160626', '20160705', adjust_price=True)
TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')

need a .replace('null',np.nan) (somewhere).

@OlegShteynbuk want to do a PR?

@jreback jreback added the bug label Jul 2, 2017

@jreback jreback added this to the 0.5.0 milestone Jul 2, 2017

@OlegShteynbuk

This comment has been minimized.

Contributor

OlegShteynbuk commented Jul 2, 2017

will do it

@OlegShteynbuk

This comment has been minimized.

Contributor

OlegShteynbuk commented Jul 3, 2017

PR #357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment