yfinance (https://pypi.org/project/yfinance/) is an open-source toll
that uses Yahoo's publicly available APIs to download finance data.

GOOD FOR >30m resolution, but has LIMITATIONS for others:

- Minute Data: 7 days
- 2 Minute Data: 60 days
- 5 Minute Data: 60 days
- 15 Minute Data: 60 days
- 30 Minute Data: 60 days
- Hourly Data: 730 days
- Daily/Weekly/Monthly: No limit

valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max

valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo

<!-- EXAMPLE USAGE:

tickers = 'MA V'
start = '2011-12-30'
end = '2022-01-01'
data = pd.DataFrame()
data = yf.download(tickers, start, end)['Close']
data
yf.Ticker("MA").calendar # next event
yf.Ticker("MA").earnings_dates # historical events
yf.Ticker("MA").recommendations # grades
yf.Ticker("MA").actions # dividends & splits -->

In [56]:
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from pathlib import Path

# define data path
Path("data").mkdir(parents=True, exist_ok=True)

### GET TICKER NAMES

In [57]:
# get dataframe from the previous step
df = pd.read_pickle("pairs_to_download.pkl")

tickerStrings = list(df.index.union(df.columns))
tickerStrings

['APO',
 'BAC',
 'BAM',
 'BEN',
 'BX',
 'C',
 'CADE',
 'CG',
 'COF',
 'JHG',
 'KKR',
 'MS',
 'RF',
 'SCHW',
 'STT',
 'TROW',
 'TW',
 'USB']

### DOWNLOAD & CREATE CSV FILE

#### SETUP

In [58]:
int_per = {'1d':'1y', '1h':'1y'}  # define interval and corresponding period

#### OPTION 1 (DOWNLOAD  & CREATE A FILE FOR EACH TICKER PER INTERVAL)

In [59]:
# int_per = {'1d':'3mo', '1h':'3mo'}  # define interval and corresponding period


# enable to enter manually
#tickerStrings = ['MA', 'V', 'LNT', 'FTS', 'POR', 'CMS', 'OUT', 'WELL']

for ticker in tickerStrings:
    for key in int_per:
        data = yf.download(ticker, group_by="Ticker", period=int_per[key], interval=key)
        data['ticker'] = ticker
        data.set_index(data.columns[0]) # datetime row comes with different names or unnamed
        data.index.names = ['time']
    
        # use in need of sorting and renaming
        #data = data.set_index(["time"]).sort_index()
        #data = data.rename(columns={"Date": "time"})

        # save as seperate files
        data.to_csv(f'data/{ticker}_{key.upper()}.csv')

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Check if downloaded correctly:

In [60]:
filename = 'data/' +  tickerStrings[0] + '_1H.csv'
df = pd.read_csv(filename, parse_dates=["time"])
df

Unnamed: 0,time,Open,High,Low,Close,Adj Close,Volume,ticker
0,2021-11-01 09:30:00-04:00,77.620003,78.010002,76.379997,76.514999,76.514999,799831,APO
1,2021-11-01 10:30:00-04:00,76.589996,76.889999,75.849998,75.910004,75.910004,823608,APO
2,2021-11-01 11:30:00-04:00,76.260002,76.370003,75.570000,76.080002,76.080002,722579,APO
3,2021-11-01 12:30:00-04:00,75.739998,76.639999,75.650002,76.559998,76.559998,641080,APO
4,2021-11-01 13:30:00-04:00,76.324997,76.639999,76.139999,76.290001,76.290001,675032,APO
...,...,...,...,...,...,...,...,...
1757,2022-10-31 11:30:00-04:00,55.432499,55.930000,55.410000,55.820000,55.820000,326874,APO
1758,2022-10-31 12:30:00-04:00,55.810001,55.959999,55.549999,55.950001,55.950001,275487,APO
1759,2022-10-31 13:30:00-04:00,55.990002,56.000000,55.639999,55.639999,55.639999,167958,APO
1760,2022-10-31 14:30:00-04:00,55.639999,55.830002,55.595001,55.700001,55.700001,90865,APO


#### OPTION 2 (DOWNLOAD  & CREATE SINGLE DF FROM ALL TICKERS)

In [61]:
# int_per = {'1d':'1y', '1h':'1y'}  # define interval and corresponding period

df_list = list()

for key in int_per:
    for ticker in tickerStrings:
        data = yf.download(ticker, group_by="Ticker", period=int_per[key], interval=key)
        data['ticker'] = ticker
        data.index.names = ['time']
        df_list.append(data)

    # combine all dataframes into a single dataframe
    df = pd.concat(df_list)

    # save to csv
    df.to_csv('data/tickers_'+key.upper()+'.csv')
    
    df_list = []

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Check if downloaded correctly:

In [62]:
filename = 'data/tickers_1H.csv'
df = pd.read_csv(filename, parse_dates=["time"])

In [63]:
df_c = df.set_index(["ticker", "time"]).sort_index() # set indexes
df_c
df_c.xs(tickerStrings[0]) # check the first ticker

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-11-01 09:30:00-04:00,77.620003,78.010002,76.379997,76.514999,76.514999,799831
2021-11-01 10:30:00-04:00,76.589996,76.889999,75.849998,75.910004,75.910004,823608
2021-11-01 11:30:00-04:00,76.260002,76.370003,75.570000,76.080002,76.080002,722579
2021-11-01 12:30:00-04:00,75.739998,76.639999,75.650002,76.559998,76.559998,641080
2021-11-01 13:30:00-04:00,76.324997,76.639999,76.139999,76.290001,76.290001,675032
...,...,...,...,...,...,...
2022-10-31 11:30:00-04:00,55.432499,55.930000,55.410000,55.820000,55.820000,326874
2022-10-31 12:30:00-04:00,55.810001,55.959999,55.549999,55.950001,55.950001,275487
2022-10-31 13:30:00-04:00,55.990002,56.000000,55.639999,55.639999,55.639999,167958
2022-10-31 14:30:00-04:00,55.639999,55.830002,55.595001,55.700001,55.700001,90865


#### EXERCISE (DOWNLOAD MULTIPLE TICKERS AND FLATTEN THE LEVELS )

In [64]:
data = yf.download(  # or pdr.get_data_yahoo(...
        # tickers list or string as well
        tickers = "OUT WELL",

        # use "period" instead of start/end
        # valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
        # (optional, default is '1mo')
        period = "1mo",

        # fetch data by interval (including intraday if period < 60 days)
        # valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
        # (optional, default is '1d')
        interval = "30m",

        # group by ticker (to access via data['SPY'])
        # (optional, default is 'column')
        group_by = 'ticker',

        # adjust all OHLC automatically
        # (optional, default is False)
        auto_adjust = True,

        # download pre/post regular market hours data
        # (optional, default is False)
        prepost = False,

        # use threads for mass downloading? (True/False/Integer)
        # (optional, default is True)
        threads = True,

        # proxy URL scheme use use when downloading?
        # (optional, default is None)
        proxy = None
    )
data

[*********************100%***********************]  2 of 2 completed


Unnamed: 0_level_0,WELL,WELL,WELL,WELL,WELL,OUT,OUT,OUT,OUT,OUT
Unnamed: 0_level_1,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
2022-09-30 14:30:00-04:00,64.309998,64.334999,64.209999,64.209999,0,15.190000,15.190000,15.150000,15.160000,0
2022-09-30 15:00:00-04:00,64.209999,64.440002,64.150002,64.440002,208633,15.160000,15.295000,15.145000,15.295000,153201
2022-09-30 15:30:00-04:00,64.440002,64.665001,64.279999,64.330002,554969,15.300000,15.330000,15.170000,15.190000,378534
2022-10-03 09:30:00-04:00,65.099998,65.300003,63.849998,64.169998,256361,15.520000,15.540000,15.010000,15.100000,70525
2022-10-03 10:00:00-04:00,64.220001,65.029999,64.180000,64.995003,222513,15.140000,15.470000,15.120000,15.450000,83731
...,...,...,...,...,...,...,...,...,...,...
2022-10-31 12:30:00-04:00,61.180000,61.240002,61.000000,61.055000,117275,18.459999,18.510000,18.400000,18.440001,54625
2022-10-31 13:00:00-04:00,61.040001,61.189999,60.965000,61.189999,92784,18.455000,18.455000,18.299999,18.320000,84271
2022-10-31 13:30:00-04:00,61.189999,61.240002,61.049999,61.095001,79221,18.309999,18.340000,18.250000,18.280001,102875
2022-10-31 14:00:00-04:00,61.099998,61.209999,61.060001,61.150002,53796,18.275000,18.299999,18.170000,18.184999,119722


to flatten the MultiIndex use map with join:

In [65]:
data_flat = data.copy()
data_flat.columns = data_flat.columns.map('_'.join)
data_flat =data_flat.reset_index()
data_flat

Unnamed: 0,Datetime,WELL_Open,WELL_High,WELL_Low,WELL_Close,WELL_Volume,OUT_Open,OUT_High,OUT_Low,OUT_Close,OUT_Volume
0,2022-09-30 14:30:00-04:00,64.309998,64.334999,64.209999,64.209999,0,15.190000,15.190000,15.150000,15.160000,0
1,2022-09-30 15:00:00-04:00,64.209999,64.440002,64.150002,64.440002,208633,15.160000,15.295000,15.145000,15.295000,153201
2,2022-09-30 15:30:00-04:00,64.440002,64.665001,64.279999,64.330002,554969,15.300000,15.330000,15.170000,15.190000,378534
3,2022-10-03 09:30:00-04:00,65.099998,65.300003,63.849998,64.169998,256361,15.520000,15.540000,15.010000,15.100000,70525
4,2022-10-03 10:00:00-04:00,64.220001,65.029999,64.180000,64.995003,222513,15.140000,15.470000,15.120000,15.450000,83731
...,...,...,...,...,...,...,...,...,...,...,...
269,2022-10-31 12:30:00-04:00,61.180000,61.240002,61.000000,61.055000,117275,18.459999,18.510000,18.400000,18.440001,54625
270,2022-10-31 13:00:00-04:00,61.040001,61.189999,60.965000,61.189999,92784,18.455000,18.455000,18.299999,18.320000,84271
271,2022-10-31 13:30:00-04:00,61.189999,61.240002,61.049999,61.095001,79221,18.309999,18.340000,18.250000,18.280001,102875
272,2022-10-31 14:00:00-04:00,61.099998,61.209999,61.060001,61.150002,53796,18.275000,18.299999,18.170000,18.184999,119722


or use index values to get the data

save as a file

In [66]:
multiindex = data.columns
ticker_list = set([item[0] for item in multiindex])

for ticker in ticker_list:
    data_i = data[(ticker, 'Close')].reset_index().droplevel(level=0, axis=1)
    data_i = data_i.rename(columns={ data_i.columns[0]: "time" })
    data_i = data_i.set_index(["time"]).sort_index()
    #save as seperate files
    data_i.to_csv(f'data/ticker_{ticker}.csv')