In [16]:
# libraries 
import yfinance as yf
import pandas as pd
import numpy as np

# 1. Download Historical Stocks Prices

In this notebook we download historical prices for S&P500 components. We will use [Yahoo Finance Python API](https://pypi.org/project/yfinance/). Other cool data sources:
- [Quandl](https://www.quandl.com/tools/python)
- [Tiingo](https://pypi.org/project/tiingo/)

First, wefine functions to download historical prices from Yahoo Finance and to calculate log returns $R_{t} = ln (p_{t} / p_{t-1})$:

In [17]:
def download_price(stock, start, end):
    
    """
    This function downloads historical prices for a specific stock in a given time window 
    using yahoo finance python API:
        :param stock (string): ticker of stock
        :param start (string): start date (format yyyy-mm-dd)
        :param end (string): end date (format yyyy-mm-dd)      
        :return: pandas DataFrame of 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume' ('Date' index)
    """
    
    try:
        prices = yf.download(stock, start, end)
        return prices
    except: 
        print('error with ticker: ', stock)
        return pd.DataFrame()
    
    
def logreturn(prices, column='Adj Close'):
    
    """
    This function computes log returns from a time series of prices
        :param prices (array): time series of price
        :param column (string, default='Adj Close'): column on which we compute lor returns 
        :return: time series of log returns
    """
    
    return np.log(prices[column] / prices[column].shift(1))

We import the tickers (i.e. nicknames that identify the stocks) of S&P500 components. This are taken from the [Wikipedia List of S&P 500 companies](https://en.wikipedia.org/wiki/List_of_S%26P_500_companies) (more details [here](https://analyzingalpha.com/sp500-historical-components-and-changes)):

In [18]:
tickers = pd.read_csv('../data/tickers.csv')
tickers.head()

Unnamed: 0,ticker
0,A
1,AAL
2,AAP
3,AAPL
4,ABBV


Finally, we download prices and save them to $csv$ files:

In [19]:
start = '1990-01-01'
end   = '2021-01-01'

for index, row in tickers.iterrows():
    
    # show advancement
    print('Downloading stock: ', row['ticker'])
    prices = download_price(row['ticker'], start, end)
    
    # if not empty (i.e. ticker found)
    if not prices.empty:
        
        # compute log returns and save
        prices['LogRet_AdjClose'] = logreturn(prices)
        prices.to_csv('../data/prices/' + row['ticker'] + '.csv')

Downloading stock:  A
[*********************100%***********************]  1 of 1 completed
Downloading stock:  AAL
[*********************100%***********************]  1 of 1 completed
Downloading stock:  AAP
[*********************100%***********************]  1 of 1 completed
Downloading stock:  AAPL
[*********************100%***********************]  1 of 1 completed
Downloading stock:  ABBV
[*********************100%***********************]  1 of 1 completed
Downloading stock:  ABC
[*********************100%***********************]  1 of 1 completed
Downloading stock:  ABK
[*********************100%***********************]  1 of 1 completed
Downloading stock:  ABMD
[*********************100%***********************]  1 of 1 completed
Downloading stock:  ABT
[*********************100%***********************]  1 of 1 completed
Downloading stock:  ACAS
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- ACAS: No data found for this date range, symbo

  result = getattr(ufunc, method)(*inputs, **kwargs)


[*********************100%***********************]  1 of 1 completed
Downloading stock:  CHTR
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CI
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CINF
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CL
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CLF
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CLX
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CMA
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CMCSA
[*********************100%***********************]  1 of 1 completed
Downloading stock:  CMCSK
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- CMCSK: No data found for this date range, symbol may be delisted
Dow