Looking into yfinance, it takes a ticker symbol and a period you want data for, and will return a DataFrame with that info. First step is to get a list of all the ticker symbols traded, then feed those into yfinance to get all the historical data, and store that data in a csv.

#### Get list of stock symbols traded

In [1]:
import pandas as pd

data = pd.read_csv('http://www.nasdaqtrader.com/dynamic/SymDir/nasdaqtraded.txt', sep='|')
data.head()

Unnamed: 0,Nasdaq Traded,Symbol,Security Name,Listing Exchange,Market Category,ETF,Round Lot Size,Test Issue,Financial Status,CQS Symbol,NASDAQ Symbol,NextShares
0,Y,A,"Agilent Technologies, Inc. Common Stock",N,,N,100.0,N,,A,A,N
1,Y,AA,Alcoa Corporation Common Stock,N,,N,100.0,N,,AA,AA,N
2,Y,AAA,Listed Funds Trust AAF First Priority CLO Bond...,P,,Y,100.0,N,,AAA,AAA,N
3,Y,AAAU,Goldman Sachs Physical Gold ETF Shares,P,,Y,100.0,N,,AAAU,AAAU,N
4,Y,AAC.U,"Ares Acquisition Corporation Units, each consi...",N,,N,100.0,N,,AAC.U,AAC=,N


Store this file as metadata for future use if we need.

In [2]:
data.to_csv('metadata.csv')

Looking to what these fields mean, I found that `Test Issue`
> Indicates whether the security is a test security.

Since these are tests and not actual securities they can safely be ignored.

In [3]:
data['Test Issue'].value_counts()

N    9830
Y      34
Name: Test Issue, dtype: int64

In [4]:
data = data[data['Test Issue'] == 'N']
len(data)

9830

In [5]:
stock_symbols = data['Symbol'].to_list()
stock_symbols[:5]

['A', 'AA', 'AAA', 'AAAU', 'AAC.U']

#### Get historical data for each symbol and store in CSV

In [6]:
!mkdir data

Although yfinance.dowload can take a string of many ticker symbols, I kept running into threading issues or JSONDecoding issues after waiting a long while and then needing to start over. I swiched to running one at a time in a for loop and then storing each result as its own CSV to make starting over from where it failed or left off easier by checking which CSVs are available and not running the yfinance request for those again.

In [7]:
import yfinance as yf
import os

data_dir = f'{os.getcwd()}/data'
files_downloaded = os.listdir(data_dir)
not_downloaded_yet = [symbol for symbol in stock_symbols if f'{symbol}.csv' not in files_downloaded]

for symbol in not_downloaded_yet:
    stock_data = yf.download(symbol, period='max')
    stock_data.to_csv(f'data/{symbol}.csv')

Check how many individual files we ended up with vs the total symbols we attempted.

In [8]:
len(files_downloaded)

9830