yfinance (https://pypi.org/project/yfinance/) is an open-source toll
that uses Yahoo's publicly available APIs to download finance data.

GOOD FOR >30m resolution, but has LIMITATIONS for others:

- Minute Data: 7 days
- 2 Minute Data: 60 days
- 5 Minute Data: 60 days
- 15 Minute Data: 60 days
- 30 Minute Data: 60 days
- Hourly Data: 730 days
- Daily/Weekly/Monthly: No limit

valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max

valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo

<!-- EXAMPLE USAGE:

tickers = 'MA V'
start = '2011-12-30'
end = '2022-01-01'
data = pd.DataFrame()
data = yf.download(tickers, start, end)['Close']
data
yf.Ticker("MA").calendar # next event
yf.Ticker("MA").earnings_dates # historical events
yf.Ticker("MA").recommendations # grades
yf.Ticker("MA").actions # dividends & splits -->

In [737]:
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from pathlib import Path

# define data path
Path("data").mkdir(parents=True, exist_ok=True)

### GET TICKER NAMES

In [738]:
# get dataframe from the previous step
df = pd.read_pickle("pairs_to_download.pkl")

tickerStrings = list(df.index.union(df.columns))
tickerStrings

['AEE', 'AEP', 'CMS', 'CNP', 'DUK', 'ED', 'OGE', 'VST', 'XEL']

### DOWNLOAD & CREATE CSV FILE

#### OPTION 1 (DOWNLOAD  & CREATE A FILE FOR EACH TICKER PER INTERVAL)

In [739]:
int_per = {'1d':'1y', '1h':'1y'}  # define interval and corresponding period

# enable to enter manually
#tickerStrings = ['MA', 'V', 'LNT', 'FTS', 'POR', 'CMS', 'OUT', 'WELL']

for ticker in tickerStrings:
    for key in int_per:
        data = yf.download(ticker, group_by="Ticker", period=int_per[key], interval=key)
        data['ticker'] = ticker
        data.set_index(data.columns[0]) # datetime row comes with different names or unnamed
        data.index.names = ['time']
    
        # use in need of sorting and renaming
        #data = data.set_index(["time"]).sort_index()
        #data = data.rename(columns={"Date": "time"})

        # save as seperate files
        data.to_csv(f'data/{ticker}_{key.upper()}.csv')

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Check if downloaded correctly:

In [740]:
filename = 'data/' +  tickerStrings[0] + '_1H.csv'
df = pd.read_csv(filename, parse_dates=["time"])
df

Unnamed: 0,time,Open,High,Low,Close,Adj Close,Volume,ticker
0,2021-10-29 09:30:00-04:00,84.709999,85.290001,84.599998,85.059998,85.059998,0,AEE
1,2021-10-29 10:30:00-04:00,85.070000,85.070000,84.330002,84.370003,84.370003,45237,AEE
2,2021-10-29 11:30:00-04:00,84.370003,84.559998,84.220001,84.330002,84.330002,52396,AEE
3,2021-10-29 12:30:00-04:00,84.349998,84.419998,84.269997,84.300003,84.300003,33964,AEE
4,2021-10-29 13:30:00-04:00,84.315002,84.315002,83.959999,84.000000,84.000000,48334,AEE
...,...,...,...,...,...,...,...,...
1757,2022-10-28 12:30:00-04:00,81.760002,81.820000,81.610001,81.629997,81.629997,55766,AEE
1758,2022-10-28 13:30:00-04:00,81.660004,81.754997,81.580002,81.750000,81.750000,63663,AEE
1759,2022-10-28 14:30:00-04:00,81.739998,82.360001,81.739998,82.339996,82.339996,148978,AEE
1760,2022-10-28 15:30:00-04:00,82.320999,82.449997,82.279999,82.290001,82.290001,152771,AEE


#### OPTION 2 (DOWNLOAD  & CREATE SINGLE DF FROM ALL TICKERS)

In [741]:
int_per = {'1d':'1y', '1h':'1y'}  # define interval and corresponding period

df_list = list()

for key in int_per:
    for ticker in tickerStrings:
        data = yf.download(ticker, group_by="Ticker", period=int_per[key], interval=key)
        data['ticker'] = ticker
        data.index.names = ['time']
        df_list.append(data)

    # combine all dataframes into a single dataframe
    df = pd.concat(df_list)

    # save to csv
    df.to_csv('data/tickers_'+key.upper()+'.csv')
    
    df_list = []

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Check if downloaded correctly:

In [742]:
filename = 'data/tickers_1H.csv'
df = pd.read_csv(filename, parse_dates=["time"])

In [743]:
df_c = df.set_index(["ticker", "time"]).sort_index() # set indexes
df_c
df_c.xs(tickerStrings[0]) # check the first ticker

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-10-29 09:30:00-04:00,84.709999,85.290001,84.599998,85.059998,85.059998,0
2021-10-29 10:30:00-04:00,85.070000,85.070000,84.330002,84.370003,84.370003,45237
2021-10-29 11:30:00-04:00,84.370003,84.559998,84.220001,84.330002,84.330002,52396
2021-10-29 12:30:00-04:00,84.349998,84.419998,84.269997,84.300003,84.300003,33964
2021-10-29 13:30:00-04:00,84.315002,84.315002,83.959999,84.000000,84.000000,48334
...,...,...,...,...,...,...
2022-10-28 12:30:00-04:00,81.760002,81.820000,81.610001,81.629997,81.629997,55766
2022-10-28 13:30:00-04:00,81.660004,81.754997,81.580002,81.750000,81.750000,63663
2022-10-28 14:30:00-04:00,81.739998,82.360001,81.739998,82.339996,82.339996,148978
2022-10-28 15:30:00-04:00,82.320999,82.449997,82.279999,82.290001,82.290001,152771


#### EXERCISE (DOWNLOAD MULTIPLE TICKERS AND FLATTEN THE LEVELS )

In [744]:
data = yf.download(  # or pdr.get_data_yahoo(...
        # tickers list or string as well
        tickers = "OUT WELL",

        # use "period" instead of start/end
        # valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
        # (optional, default is '1mo')
        period = "1mo",

        # fetch data by interval (including intraday if period < 60 days)
        # valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
        # (optional, default is '1d')
        interval = "30m",

        # group by ticker (to access via data['SPY'])
        # (optional, default is 'column')
        group_by = 'ticker',

        # adjust all OHLC automatically
        # (optional, default is False)
        auto_adjust = True,

        # download pre/post regular market hours data
        # (optional, default is False)
        prepost = False,

        # use threads for mass downloading? (True/False/Integer)
        # (optional, default is True)
        threads = True,

        # proxy URL scheme use use when downloading?
        # (optional, default is None)
        proxy = None
    )
data

[*********************100%***********************]  2 of 2 completed


Unnamed: 0_level_0,OUT,OUT,OUT,OUT,OUT,WELL,WELL,WELL,WELL,WELL
Unnamed: 0_level_1,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
2022-09-29 09:30:00-04:00,15.730,15.745000,15.100000,15.165000,127557,64.629997,64.959999,62.900002,63.110001,212260
2022-09-29 10:00:00-04:00,15.160,15.195000,15.030000,15.080000,119118,63.080002,63.150002,62.650002,63.060001,413215
2022-09-29 10:30:00-04:00,15.090,15.130000,14.970000,15.090000,96243,63.040001,63.590000,62.889999,63.569000,144799
2022-09-29 11:00:00-04:00,15.080,15.225000,15.070000,15.215000,97446,63.529999,63.935001,63.509998,63.770000,150106
2022-09-29 11:30:00-04:00,15.220,15.320000,15.150000,15.275000,109828,63.830002,64.315002,63.810001,64.160004,124114
...,...,...,...,...,...,...,...,...,...,...
2022-10-28 14:00:00-04:00,18.055,18.110001,18.030001,18.045000,138847,60.910000,60.998501,60.880001,60.919998,56700
2022-10-28 14:30:00-04:00,18.045,18.264999,18.035000,18.264999,91410,60.919998,61.349998,60.910000,61.345001,239021
2022-10-28 15:00:00-04:00,18.270,18.334999,18.200001,18.334999,113195,61.349998,61.490002,61.340000,61.439999,148446
2022-10-28 15:30:00-04:00,18.330,18.580000,18.285000,18.549999,414983,61.439999,61.689999,61.369999,61.459999,596001


to flatten the MultiIndex use map with join:

In [745]:
data_flat = data.copy()
data_flat.columns = data_flat.columns.map('_'.join)
data_flat =data_flat.reset_index()
data_flat

Unnamed: 0,Datetime,OUT_Open,OUT_High,OUT_Low,OUT_Close,OUT_Volume,WELL_Open,WELL_High,WELL_Low,WELL_Close,WELL_Volume
0,2022-09-29 09:30:00-04:00,15.730,15.745000,15.100000,15.165000,127557,64.629997,64.959999,62.900002,63.110001,212260
1,2022-09-29 10:00:00-04:00,15.160,15.195000,15.030000,15.080000,119118,63.080002,63.150002,62.650002,63.060001,413215
2,2022-09-29 10:30:00-04:00,15.090,15.130000,14.970000,15.090000,96243,63.040001,63.590000,62.889999,63.569000,144799
3,2022-09-29 11:00:00-04:00,15.080,15.225000,15.070000,15.215000,97446,63.529999,63.935001,63.509998,63.770000,150106
4,2022-09-29 11:30:00-04:00,15.220,15.320000,15.150000,15.275000,109828,63.830002,64.315002,63.810001,64.160004,124114
...,...,...,...,...,...,...,...,...,...,...,...
282,2022-10-28 14:00:00-04:00,18.055,18.110001,18.030001,18.045000,138847,60.910000,60.998501,60.880001,60.919998,56700
283,2022-10-28 14:30:00-04:00,18.045,18.264999,18.035000,18.264999,91410,60.919998,61.349998,60.910000,61.345001,239021
284,2022-10-28 15:00:00-04:00,18.270,18.334999,18.200001,18.334999,113195,61.349998,61.490002,61.340000,61.439999,148446
285,2022-10-28 15:30:00-04:00,18.330,18.580000,18.285000,18.549999,414983,61.439999,61.689999,61.369999,61.459999,596001


or use index values to get the data

save as a file

In [746]:
multiindex = data.columns
ticker_list = set([item[0] for item in multiindex])

for ticker in ticker_list:
    data_i = data[(ticker, 'Close')].reset_index().droplevel(level=0, axis=1)
    data_i = data_i.rename(columns={ data_i.columns[0]: "time" })
    data_i = data_i.set_index(["time"]).sort_index()
    #save as seperate files
    data_i.to_csv(f'data/ticker_{ticker}.csv')