Permalink
574 lines (402 sloc) 18.5 KB
.. currentmodule:: pandas_datareader

.. ipython:: python
   :suppress:

   import pandas as pd

   import numpy as np
   np.set_printoptions(precision=4, suppress=True)

   pd.options.display.max_rows=15


Remote Data Access

Warning

Yahoo! Finance, Google Finance, and Morningstar have been immediately deprecated. Endpoints from these providers have been retired

Functions from :mod:`pandas_datareader.data` and :mod:`pandas_datareader.wb` extract data from various Internet sources into a pandas DataFrame. Currently the following sources are supported:

It should be noted, that various sources support different kinds of data, so not all sources implement the same methods and the data elements returned might also differ.

Tiingo

Tiingo is a tracing platform that provides a data api with historical end-of-day prices on equities, mutual funds and ETFs. Free registration is required to get an API key. Free accounts are rate limited and can access a limited number of symbols (500 at the time of writing).

.. ipython:: python

    import os
    import pandas_datareader as pdr

    df = pdr.get_data_tiingo('GOOG', api_key=os.getenv('TIINGO_API_KEY'))
    df.head()


IEX

The Investors Exchange (IEX) provides a wide range of data through an API. Historical stock prices are available for up to 5 years:

.. ipython:: python

    import pandas_datareader.data as web
    from datetime import datetime
    start = datetime(2016, 9, 1)
    end = datetime(2018, 9, 1)
    f = web.DataReader('F', 'iex', start, end)
    f.loc['2018-08-31']

There are additional interfaces to this API that are directly exposed: tops ('iex-tops') and last ('iex-lasts'). A third interface to the deep API is exposed through Deep class or the get_iex_book function.

.. ipython:: python

    import pandas_datareader.data as web
    f = web.DataReader('gs', 'iex-tops')
    f[:10]


Robinhood

Robinhood is a stock trading platform with an API that provides a limited set of data. Historical daily data is limited to 1 year relative to today.

.. ipython:: python

    import pandas_datareader.data as web
    from datetime import datetime
    f = web.DataReader('F', 'robinhood')
    f.head()

Alpha Vantage

Alpha Vantage provides realtime equities and forex data. Free registration is required to get an API key.

Historical Time Series Data

Through the Alpha Vantage Time Series endpoints, it is possible to obtain historical equities data for individual symbols. The following endpoints are available:

  • av-daily - Daily Time Series
  • av-daily-adjusted - Daily Time Series (Adjusted)
  • av-weekly - Weekly Time Series
  • av-weekly-adjusted - Weekly Time Series (Adjusted)
  • av-monthly - Monthly Time Series
  • av-monthly-adjusted - Monthly Time Series (Adjusted)
.. ipython:: python

    import os
    from datetime import datetime
    import pandas_datareader.data as web

    f = web.DataReader("AAPL", "av-daily", start=datetime(2017, 2, 9),
                       end=datetime(2017, 5, 24),
                       access_key=os.getenv('ALPHAVANTAGE_API_KEY'))
    f.loc["2017-02-09"]

The top-level function get_data_alphavantage is also provided. This function will return the TIME_SERIES_DAILY endpoint for the symbol and date range provided.

Quotes

Alpha Vantage Batch Stock Quotes endpoint allows the retrieval of realtime stock quotes for up to 100 symbols at once. These quotes are accessible through the top-level function get_quote_av.

.. ipython:: python

    import os
    from datetime import datetime
    import pandas_datareader.data as web

    web.get_quote_av(["AAPL", "TSLA"])


Note

Most quotes are only available during market hours.

Forex

Alpha Vantage provides realtime currency exchange rates (for physical and digital currencies).

To request the exchange rate of physical or digital currencies, simply format as "FROM/TO" as in "USD/JPY".

.. ipython:: python

    import os
    import pandas_datareader.data as web

    f = web.DataReader("USD/JPY", "av-forex",
                       access_key=os.getenv('ALPHAVANTAGE_API_KEY'))

Multiple pairs are are allowable:

.. ipython:: python

    import os
    import pandas_datareader.data as web

    f = web.DataReader(["USD/JPY", "BTC/CNY"], "av-forex",
                       access_key=os.getenv('ALPHAVANTAGE_API_KEY'))


Sector Performance

Alpha Vantage provides sector performances through the top-level function get_sector_performance_av.

.. ipython:: python

    import os
    import pandas_datareader.data as web

    web.get_sector_performance_av().head()


Enigma

Access datasets from Enigma, the world's largest repository of structured public data. Note that the Enigma URL has changed from app.enigma.io as of release 0.6.0, as the old API deprecated.

Datasets are unique identified by the uuid4 at the end of a dataset's web address. For example, the following code downloads from USDA Food Recalls 1996 Data.

.. ipython:: python

    import os
    import pandas_datareader as pdr

    df = pdr.get_data_enigma('292129b0-1275-44c8-a6a3-2a0881f24fe1', os.getenv('ENIGMA_API_KEY'))
    df.columns

Quandl

Daily financial data (prices of stocks, ETFs etc.) from Quandl. The symbol names consist of two parts: DB name and symbol name. DB names can be all the free ones listed on the Quandl website. Symbol names vary with DB name; for WIKI (US stocks), they are the common ticker symbols, in some other cases (such as FSE) they can be a bit strange. Some sources are also mapped to suitable ISO country codes in the dot suffix style shown above, currently available for BE, CN, DE, FR, IN, JP, NL, PT, UK, US.

As of June 2017, each DB has a different data schema, the coverage in terms of time range is sometimes surprisingly small, and the data quality is not always good.

.. ipython:: python

    import pandas_datareader.data as web
    symbol = 'WIKI/AAPL'  # or 'AAPL.US'
    df = web.DataReader(symbol, 'quandl', '2015-01-01', '2015-01-05')
    df.loc['2015-01-02']

FRED

.. ipython:: python

    import pandas_datareader.data as web
    import datetime
    start = datetime.datetime(2010, 1, 1)
    end = datetime.datetime(2013, 1, 27)
    gdp = web.DataReader('GDP', 'fred', start, end)
    gdp.loc['2013-01-01']

    # Multiple series:
    inflation = web.DataReader(['CPIAUCSL', 'CPILFESL'], 'fred', start, end)
    inflation.head()


Fama/French

Access datasets from the Fama/French Data Library. The get_available_datasets function returns a list of all available datasets.

.. ipython:: python

    from pandas_datareader.famafrench import get_available_datasets
    import pandas_datareader.data as web
    len(get_available_datasets())
    ds = web.DataReader('5_Industry_Portfolios', 'famafrench')
    print(ds['DESCR'])
    ds[4].head()

World Bank

pandas users can easily access thousands of panel data series from the World Bank's World Development Indicators by using the wb I/O functions.

Indicators

Either from exploring the World Bank site, or using the search function included, every world bank indicator is accessible.

For example, if you wanted to compare the Gross Domestic Products per capita in constant dollars in North America, you would use the search function:

In [1]: from pandas_datareader import wb
In [2]: matches = wb.search('gdp.*capita.*const')

Then you would use the download function to acquire the data from the World Bank's servers:

In [3]: dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX'], start=2005, end=2008)

In [4]: print(dat)
                      NY.GDP.PCAP.KD
country       year
Canada        2008  36005.5004978584
              2007  36182.9138439757
              2006  35785.9698172849
              2005  35087.8925933298
Mexico        2008  8113.10219480083
              2007  8119.21298908649
              2006  7961.96818458178
              2005  7666.69796097264
United States 2008  43069.5819857208
              2007  43635.5852068142
              2006   43228.111147107
              2005  42516.3934699993

The resulting dataset is a properly formatted DataFrame with a hierarchical index, so it is easy to apply .groupby transformations to it:

In [6]: dat['NY.GDP.PCAP.KD'].groupby(level=0).mean()
Out[6]:
country
Canada           35765.569188
Mexico            7965.245332
United States    43112.417952
dtype: float64

Now imagine you want to compare GDP to the share of people with cellphone contracts around the world.

In [7]: wb.search('cell.*%').iloc[:,:2]
Out[7]:
                     id                                               name
3990  IT.CEL.SETS.FE.ZS  Mobile cellular telephone users, female (% of ...
3991  IT.CEL.SETS.MA.ZS  Mobile cellular telephone users, male (% of po...
4027      IT.MOB.COV.ZS  Population coverage of mobile cellular telepho...

Notice that this second search was much faster than the first one because pandas now has a cached list of available data series.

In [13]: ind = ['NY.GDP.PCAP.KD', 'IT.MOB.COV.ZS']
In [14]: dat = wb.download(indicator=ind, country='all', start=2011, end=2011).dropna()
In [15]: dat.columns = ['gdp', 'cellphone']
In [16]: print(dat.tail())
                        gdp  cellphone
country   year
Swaziland 2011  2413.952853       94.9
Tunisia   2011  3687.340170      100.0
Uganda    2011   405.332501      100.0
Zambia    2011   767.911290       62.0
Zimbabwe  2011   419.236086       72.4

Finally, we use the statsmodels package to assess the relationship between our two variables using ordinary least squares regression. Unsurprisingly, populations in rich countries tend to use cellphones at a higher rate:

In [17]: import numpy as np
In [18]: import statsmodels.formula.api as smf
In [19]: mod = smf.ols('cellphone ~ np.log(gdp)', dat).fit()
In [20]: print(mod.summary())
                            OLS Regression Results
==============================================================================
Dep. Variable:              cellphone   R-squared:                       0.297
Model:                            OLS   Adj. R-squared:                  0.274
Method:                 Least Squares   F-statistic:                     13.08
Date:                Thu, 25 Jul 2013   Prob (F-statistic):            0.00105
Time:                        15:24:42   Log-Likelihood:                -139.16
No. Observations:                  33   AIC:                             282.3
Df Residuals:                      31   BIC:                             285.3
Df Model:                           1
===============================================================================
                  coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept      16.5110     19.071      0.866      0.393       -22.384    55.406
np.log(gdp)     9.9333      2.747      3.616      0.001         4.331    15.535
==============================================================================
Omnibus:                       36.054   Durbin-Watson:                   2.071
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              119.133
Skew:                          -2.314   Prob(JB):                     1.35e-26
Kurtosis:                      11.077   Cond. No.                         45.8
==============================================================================

Country Codes

The country argument accepts a string or list of mixed two or three character ISO country codes, as well as dynamic World Bank exceptions to the ISO standards.

For a list of the the hard-coded country codes (used solely for error handling logic) see pandas_datareader.wb.country_codes.

Problematic Country Codes & Indicators

Note

The World Bank's country list and indicators are dynamic. As of 0.15.1, :func:`wb.download()` is more flexible. To achieve this, the warning and exception logic changed.

The world bank converts some country codes, in their response, which makes error checking by pandas difficult. Retired indicators still persist in the search.

Given the new flexibility of 0.15.1, improved error handling by the user may be necessary for fringe cases.

To help identify issues:

There are at least 4 kinds of country codes:

  1. Standard (2/3 digit ISO) - returns data, will warn and error properly.
  2. Non-standard (WB Exceptions) - returns data, but will falsely warn.
  3. Blank - silently missing from the response.
  4. Bad - causes the entire response from WB to fail, always exception inducing.

There are at least 3 kinds of indicators:

  1. Current - Returns data.
  2. Retired - Appears in search results, yet won't return data.
  3. Bad - Will not return data.

Use the errors argument to control warnings and exceptions. Setting errors to ignore or warn, won't stop failed responses. (ie, 100% bad indicators, or a single 'bad' (#4 above) country code).

See docstrings for more info.

OECD

OECD Statistics are available via DataReader. You have to specify OECD's data set code.

To confirm data set code, access to each data -> Export -> SDMX Query. Following example is to download 'Trade Union Density' data which set code is 'TUD'.

.. ipython:: python

    import pandas_datareader.data as web
    import datetime

    df = web.DataReader('TUD', 'oecd', end=datetime.datetime(2012, 1, 1))

    df.columns

    df[['Japan', 'United States']]

Eurostat

Eurostat are available via DataReader.

Get Rail accidents by type of accident (ERA data) data. The result will be a DataFrame which has DatetimeIndex as index and MultiIndex of attributes or countries as column. The target URL is:

You can specify dataset ID 'tran_sf_railac' to get corresponding data via DataReader.

.. ipython:: python

    import pandas_datareader.data as web

    df = web.DataReader('tran_sf_railac', 'eurostat')
    df

TSP Fund Data

Download mutual fund index prices for the TSP.

.. ipython:: python

    import pandas_datareader.tsp as tsp
    tspreader = tsp.TSPReader(start='2015-10-1', end='2015-12-31')
    tspreader.read()


Nasdaq Trader Symbol Definitions

Download the latest symbols from Nasdaq.

Note that Nasdaq updates this file daily, and historical versions are not available. More information on the field definitions.

In [12]: from pandas_datareader.nasdaq_trader import get_nasdaq_symbols
In [13]: symbols = get_nasdaq_symbols()
In [14]: print(symbols.loc['IBM'])
    Nasdaq Traded                                                    True
    Security Name       International Business Machines Corporation Co...
    Listing Exchange                                                    N
    Market Category
    ETF                                                             False
    Round Lot Size                                                    100
    Test Issue                                                      False
    Financial Status                                                  NaN
    CQS Symbol                                                        IBM
    NASDAQ Symbol                                                     IBM
    NextShares                                                      False
    Name: IBM, dtype: object

Stooq Index Data

Google finance doesn't provide common index data download. The Stooq site has the data for download.

.. ipython:: python

    import pandas_datareader.data as web
    f = web.DataReader('^DJI', 'stooq')
    f[:10]

MOEX Data

The Moscow Exchange (MOEX) provides historical data.

.. ipython:: python

   import pandas_datareader.data as web
   f = web.DataReader('USD000UTSTOM', 'moex', start='2017-07-01', end='2017-07-31')
   f.head()