# Data download and basic preparation
### Rodrigo Leo
---

This script uses the modified Yahoo! Finance api (through the `yfinance` library) to access and download historical data from selected markets. It also saves the original data (which consists on daily observations) and a copy of yearly observations as `.csv` files.

In [10]:
# Import the required libraries
import glob
import os
import pandas as pd
import yfinance as yf

The selected market data to download is passed to the script via the `tickers` dictionary. The key of the dictionary is a descriptive name for the data (for example, `sp500`), and the value of the dictionary is the corresponding symbol used at the Yahoo! Finance website (for example, `^GSPC`). Both are `string`s.

In [11]:
# Dictionary of data to download
tickers = {
    'ftse100': '^FTSE',
    'ni225':   '^N225',
    'sp500':   '^GSPC'
}

## Download data

The original (daily) data is saved in a `downloads` folder as `name.csv`, where each value of `name` is a key of the dictionary `tickers`.

In [15]:
results = {ticker: None for ticker in tickers}

for ticker in tickers.keys():

    # Download data
    t = yf.Ticker(tickers[ticker]).history(period = 'max')
    
    # Keep only the closing price
    t = pd.DataFrame(t['Close'])
    
    # Keep only the values up to 2020
    t = t.query('Date <= "2020-12-31"')
    
    # Save the Data Frame to a .csv file
    t.to_csv(f'downloads/{ticker}.csv')
    
    # Report number of results found
    results[ticker] = len(t)

In [6]:
# Show number of observations for each ticker
for ticker in results:
    print(f'{ticker}: {str(results[ticker])} observations found.')

ftse100: 9415 observations found.
ni225: 13772 observations found.
sp500: 23362 observations found.


## Convert to yearly observations

A copy of yearly data is saved in a `downloads` folder as `name_yearly.csv`, where each value of `name` is a key of the dictionary `tickers`.

In [101]:
# Get list of files
files = glob.glob('downloads/*.csv')

# Show the list of files
for file in files:
    print(file)

downloads/ftse100.csv
downloads/ni225.csv
downloads/sp500.csv


In [102]:
for file in files:
    
    # Get data name:
    loc1 = file.find('/') + 1
    loc2 = file.find('.')
    name = file[loc1:loc2]
    
    #Load data
    data = pd.read_csv(file)
    
    # Convert 'Date' column to Datetime type
    data['Date'] = pd.to_datetime(data['Date'])
    
    # Create a list 'years' with the years available in the data
    years = set([date.year for date in data['Date']])
    years = list(years)
    years.sort()
    
    # Set the column 'Date' as the index of the data frame
    # This is required to use the .last() method
    data = data.set_index('Date')
    
    # Loop through the years and extract the last day info
    # Each will be stored in a Data Frame inside a list named ' last_data'
    last_data = []

    for year in years:
        last = data.query(f'Date >= "{year}-01-01" and Date <= "{year}-12-31"').last(offset = '1D')
        last_data.append(pd.DataFrame(last))

    # Join the extracted data in a single Data Frame called ' closing'
    closing = pd.concat(last_data)
    
    # Save the Data Frame in a .csv file
    closing.to_csv(f'downloads/{name}_yearly.csv')