## Tutorial 1: Loading data using `yfinance`

* Recommended reading: 
    * Reliably download historical market data from with Python: https://aroussi.com/post/python-yahoo-finance

In [109]:
import yfinance as yf
import pandas as pd

In [112]:
# tickers

# all the tickers mentioned in the spec
TICKERS = [
    "AAPL", "MSFT", "AMZN", "GOOG", 
    "INTC", "NVDA", "META", "CSCO", 
    "TSLA", "ORCL", "IBM", "CRM", 
    "TSM", "ADBE", "QCOM", "TCEHY", 
    "AVGO", "BABA"
]
start_date = "2015-04-30"
end_date = "2024-04-30"

# download all data
for i in range(len(TICKERS)):
    tk = TICKERS[i]
    data = yf.download(
        tk, start=start_date, end=end_date
    )
    # check if NaN values, forward fill
    if data.isnull().values.any():
        print("> {} found null. Forward filled. ".format(tk))
        data = data.ffill()
    # save data
    path = "./data/{}_{}_TO_{}.csv".format(tk, start_date, end_date)
    data.to_csv(path)
    print("> {} saved. size = {}".format(tk, data.shape))

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


> AAPL saved. size = (2265, 6)
> MSFT saved. size = (2265, 6)
> AMZN saved. size = (2265, 6)
> GOOG saved. size = (2265, 6)
> INTC saved. size = (2265, 6)


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

> NVDA saved. size = (2265, 6)
> META saved. size = (2265, 6)
> CSCO saved. size = (2265, 6)
> TSLA saved. size = (2265, 6)



[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


> ORCL saved. size = (2265, 6)
> IBM saved. size = (2265, 6)
> CRM saved. size = (2265, 6)
> TSM saved. size = (2265, 6)


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


> ADBE saved. size = (2265, 6)
> QCOM saved. size = (2265, 6)
> TCEHY saved. size = (2265, 6)
> AVGO saved. size = (2265, 6)


[*********************100%%**********************]  1 of 1 completed


> BABA saved. size = (2265, 6)


In [81]:
# check data format
for tk in TICKERS:
    try:
        path = "./data/{}_{}_TO_{}.csv".format(tk, start_date, end_date)
        data = pd.read_csv(path)
    except:
        print("{} in the range {} to {} does not exist locally. ".format(tk, start_date, end_date))
    assert not data.isnull().values.any()

In [54]:
data.shape

(2418, 7)

Ensure that all date indices match.

In [94]:
tmp = None
for tk in TICKERS:
    path = "./data/{}_{}_TO_{}.csv".format(tk, start_date, end_date)
    prev = tmp
    tmp = pd.read_csv(path).Date
    if prev is not None:
        assert (tmp == prev).all()

### Get Fundamentals

Mainly interested in market cap at Q1 2024. `yfinance` does not seem to provide historical market cap data directly through their API. We can have a workaround:
1. find the day the company's quarterly report is made by
2. query `total number of shares` and `close`
3. compute `market cap` $=$ `total number of shares` $*$ `close`

**Update**: outputting current market cap instead

Save data frame example as an image

In [107]:
import dataframe_image as dfi

In [108]:
dfi.export(data.head(), './img/{}_dataframe.png'.format(tk))