Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

ENH: Added method to pandas.data.Options to download all option data for... #5602

Merged
merged 1 commit into from Jun 17, 2014
Jump to file or symbol
Failed to load files and symbols.
+1,154 −104
Split
View
@@ -55,6 +55,10 @@ performance improvements along with a large number of bug fixes.
Highlights include:
+Experimental Features
+~~~~~~~~~~~~~~~~~~~~~
+- ``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (:issue:`5602`)
+
@jorisvandenbossche

jorisvandenbossche Jun 17, 2014

Owner

This should be moved to v0.14.1.txt (or removed if it is already there)

@davidastephens

davidastephens Jun 17, 2014

Contributor

Yes, its in v0.14.1.txt, I will remove it here.

See the :ref:`v0.14.1 Whatsnew <whatsnew_0141>` overview or the issue tracker on GitHub for an extensive list
of all API changes, enhancements and bugs that have been fixed in 0.14.1.
View
@@ -52,6 +52,43 @@ Yahoo! Finance
f=web.DataReader("F", 'yahoo', start, end)
f.ix['2010-01-04']
+.. _remote_data.yahoo_Options:
+
+Yahoo! Finance Options
+----------------------
+***Experimental***
+
+The Options class allows the download of options data from Yahoo! Finance.
+
+The ''get_all_data'' method downloads and caches option data for all expiry months
@jorisvandenbossche

jorisvandenbossche Jun 17, 2014

Owner

Can you use backticks ```` instead of ''? Then it renders as 'code'

+and provides a formatted ''DataFrame'' with a hierarchical index, so its easy to get
+to the specific option you want.
+
+.. ipython:: python
+
+ from pandas.io.data import Options
+ aapl = Options('aapl', 'yahoo')
+ data = aapl.get_all_data()
@jreback

jreback May 12, 2014

Contributor

This fails here (on conversions you need to protect with a try/except) in general. you prob need to wrap all of the float conversions with a ',' replacement (or better yet, don't convert them individually), let them be object dtype.
Then on columns that should be numeric (to avoid accidently changing other stuff), df[column].replace(',',''). Need to do this kind of check in a test as well.

ipdb> l
    523 
    524 def _unpack(row, kind):
    525     def _parse_row_values(val):
    526         ret = val.text_content()
    527         if 'neg_arrow' in val.xpath('.//@class'):
--> 528             ret = float(ret)*(-1.0)
    529         return ret
    530 
    531     els = row.xpath('.//%s' % kind)
    532     return [_parse_row_values(val) for val in els]
    533 

ipdb> p ret
'2,240.10'

@davidastephens

davidastephens May 13, 2014

Contributor

ya, I had this issue in my code on the weekend. I did the replace - I'll push the update and add a test tonight.

@davidastephens

davidastephens May 13, 2014

Contributor

What do you suggest you do on ValueError here? Raise or return the string with an appended '-'?

@jreback

jreback May 13, 2014

Contributor

well, you can try to replace the commas, then convert; on failure I would make it np.nan. If some values in general are string-like and some not then you are forced to leave it as object. However before u go down that road, see WHY its not converting; is it bogus data coming in or are misinterpreting the field (either case should make missing).

+ data.head()
+
+ #Show the $600 strike puts at all expiry dates:
+ data.loc[(600, slice(None), 'put'),:].head()
+
+ #Show the volume traded of $600 strike puts at all expiry dates:
+ data.loc[(600, slice(None), 'put'),'Vol'].head()
+
+If you don't want to download all the data, more specific requests can be made.
+
+.. ipython:: python
+
+ import datetime
+ expiry = datetime.date(2016, 1, 1)
+ data = aapl.get_call_data(expiry=expiry)
+ data.head()
+
+Note that if you call ''get_all_data'' first, this second call will happen much faster, as the data is cached.
+
+
.. _remote_data.google:
@jreback

jreback May 12, 2014

Contributor

this works but I think that you need an example of how to slice this, because of this unless the Symbol is included in the index, then you can't slice it

This works

In [48]: data.set_index(['Symbol'],append=True).loc[(330,slice(None),'call'),:]
Out[48]: 
                                               Last  Chg  Bid  Ask  Vol  Open Int   Root IsNonstandard Underlying  Underlying_Price          Quote_Time
Strike Expiry     Type Symbol                                                                                                                          
330    2016-01-15 call AAPL160115C00330000   258.17    0  NaN  NaN    4        43   AAPL         False       AAPL            585.54 2014-05-09 04:00:00
                       AAPL7160115C00330000  270.00    0  NaN  NaN    5        21  AAPL7          True       AAPL            585.54 2014-05-09 04:00:00

[2 rows x 11 columns]

but simply slicing will not (though using .xs on a specific level will work as well)

@davidastephens

davidastephens May 13, 2014

Contributor

That code doesn't work for me, I get: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (0)'

What about data.loc[(330,slice(None), 'call')]?

@jreback

jreback May 13, 2014

Contributor

you need to do a df.sortlevel() on the created frame; always must be sorted to do any real indexing. Furthermore, I think the index should be ['Strike','Expiry','Type','Symbol'] as its completely unique and much more useful. Show a slicing example as well.

Google Finance
View
@@ -148,7 +148,23 @@ Performance
Experimental
~~~~~~~~~~~~
-There are no experimental changes in 0.14.1
+``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (PR `#5602`)
+ See :ref:`the docs<remote_data.yahoo_Options>` ***Experimental***
+
+ .. ipython:: python
+
+ from pandas.io.data import Options
+ aapl = Options('aapl', 'yahoo')
+ data = aapl.get_all_data()
+ data.head()
+
+ .. ipython:: python
+
+ from pandas.io.data import Options
+ aapl = Options('aapl', 'yahoo')
+ data = aapl.get_all_data()
+ data.head()
+
.. _whatsnew_0141.bug_fixes:
Oops, something went wrong.