## Create Stock Price DataBase with Python & SQL

### Contents:

1. Create functions
2. Dow Jones Industrial Average (DJIA)
    - Scrape ticker lists from Wikipedia 
    - Download historical data from from Yahoo! Finance's API 
3. S&P/TSX60 
    - Scrape ticker lists from Wikipedia
    - Cleaning data for ticker lists to meet Yahoo! Finance's API
    - Download historical data from from Yahoo! Finance's API
4. Download fundamental data from Yahoo! Finance's API
5. Market information
    - Download historical market data for some important indexes
6. Create and import data into SQLite database

I am going to use this data for building a financial Analysis Dashboard on Power Bi and answer some questions about financial analysis using SQL

In [6]:
# Import important packages
import sqlalchemy
import yfinance as yf
import pandas as pd

##### The yfinance Ticker object:
* _start_ and _end_ dates
* _period_ instead of start/end
* valid periods: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
* valid intervals: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1w, 1mo, 3mo
* _actions=True_: (Optional) inclues info on both dividends and splits.
* _prepost=True_: (Optional) download pre/post regular market hours data

## 1. Create Functions

In [7]:
# Function to collect data
def getData(tickers):
    data = []
    for ticker in tickers:
        data.append(yf.download(ticker, period='max').reset_index())
    return data


#Function to create symbol column and merge multiple data into one dataframe
def merge_data(frames, symbols):
    for i in range(len(frames)):
        frames[i]['Symbol'] = symbols[i]
    newframes = [df.set_index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Symbol']) for df in frames]
    merged_df = pd.concat(newframes, axis=1).reset_index()
    return merged_df


#Function to create engine for database
def createengine(name):
    engine = sqlalchemy.create_engine('sqlite:///' + name +'.db')
    return engine


#Function to import data to database
def toSQL(frames,name_table, engine):
    frames.to_sql(name_table, engine, index=False)
    print('Successfully imported data')

## 2. Dow Jones Industrial Average (^DJI)

### Scrape ticker lists from Wikipedia

In [8]:
#Download tickers of ^DJI on wikipedia website
wiki = 'https://en.wikipedia.org/wiki/'

tickerDOW = pd.read_html(wiki+'Dow_Jones_Industrial_Average')[1].Symbol.to_list()

### Using function to download data

In [9]:
USDOWJONES = getData(tickerDOW)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Since different ticker have different dataframes, I will create symbol column and merge data into one dataframes

In [10]:
new_USDOWJONES = merge_data(USDOWJONES, tickerDOW)

In [11]:
new_USDOWJONES

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Symbol
0,1962-01-02,0.000000,4.250000,4.125000,4.156250,0.754036,212800,MMM
1,1962-01-03,0.000000,4.187500,4.085938,4.187500,0.759706,422400,MMM
2,1962-01-04,0.000000,4.257813,4.187500,4.187500,0.759706,212800,MMM
3,1962-01-05,0.000000,4.171875,4.062500,4.078125,0.739863,315200,MMM
4,1962-01-08,0.000000,4.085938,4.031250,4.054688,0.735610,334400,MMM
...,...,...,...,...,...,...,...,...
344531,2022-09-08,135.399994,136.869995,134.880005,136.429993,136.429993,5652000,WMT
344532,2022-09-09,136.300003,137.500000,136.130005,136.839996,136.839996,5380200,WMT
344533,2022-09-12,137.080002,138.250000,136.970001,138.070007,138.070007,4761500,WMT
344534,2022-09-13,136.860001,137.949997,134.809998,135.220001,135.220001,5895800,WMT


Check if the data downloaded is correct or not

In [12]:
new_USDOWJONES.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Symbol
0,1962-01-02,0.0,4.25,4.125,4.15625,0.754036,212800,MMM
1,1962-01-03,0.0,4.1875,4.085938,4.1875,0.759706,422400,MMM
2,1962-01-04,0.0,4.257813,4.1875,4.1875,0.759706,212800,MMM
3,1962-01-05,0.0,4.171875,4.0625,4.078125,0.739863,315200,MMM
4,1962-01-08,0.0,4.085938,4.03125,4.054688,0.73561,334400,MMM


In [13]:
yf.download('MMM', start='1962-01-02', end='1962-01-08').reset_index()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1962-01-02,0.0,4.25,4.125,4.15625,0.754036,212800
1,1962-01-03,0.0,4.1875,4.085938,4.1875,0.759706,422400
2,1962-01-04,0.0,4.257813,4.1875,4.1875,0.759706,212800
3,1962-01-05,0.0,4.171875,4.0625,4.078125,0.739862,315200


## 3. Download TSX60 -  a stock market index of 60 large companies listed on the Toronto Stock Exchange
Summary: With Dow Jones Index, we do not have any problem when downloading data. But with TSX60, we will have error when using tickers scraped from Wikipedia to download data from Yahoo Finance API because:
1. There are some tickers that have '.' instead of '-', for example BAM.A should be BAM-A
2. TSX60 tickers have '.TO' at the end

I am going to fix these issues

### Scrape ticker lists from Wikipedia

In [14]:
tickerTSX = pd.read_html(wiki+'S%26P/TSX_60')[0].Symbol.to_list()

In [15]:
tickerTSX[:10]

['AEM', 'AQN', 'ATD', 'BCE', 'BMO', 'BNS', 'ABX', 'BHC', 'BAM.A', 'BIP.UN']

### Cleaning data for ticker lists to meet Yahoo! Finance's API standard

After downloading and manually checking, I realize that the ticker number 39 is incorrect

In [16]:
#Check the tickers are incorrect
for i in range(len(tickerTSX)):
    if type(tickerTSX[i]) != str:
        print(tickerTSX[i], i, type(tickerTSX[i]))

nan 39 <class 'float'>


In [17]:
#Correct the incorrect ticker
tickerTSX[39] = 'NA'
tickerTSX[39]

'NA'

There are some tickers that have '.' instead of '-', and need to have '.TO' at the end.

In [18]:
for ticker in tickerTSX:
    if '.' in str(ticker):
        print(ticker)

BAM.A
BIP.UN
CCL.B
GIB.A
CAR.UN
CTC.A
RCI.B
SJR.B
TECK.B


In [19]:
# I tried to download using this ticker but it failed
yf.download('CTC.A')['Close']

[*********************100%***********************]  1 of 1 completed

1 Failed download:
- CTC.A: No data found, symbol may be delisted


Series([], Name: Close, dtype: float64)

In [20]:
#The correct ticker has '-' and '.TO'
yf.download('CTC-A.TO')['Close']

[*********************100%***********************]  1 of 1 completed


Date
1986-01-02     11.750000
1986-01-03     11.630000
1986-01-06     11.630000
1986-01-07     11.750000
1986-01-08     11.630000
                 ...    
2022-09-08    160.240005
2022-09-09    160.509995
2022-09-12    163.410004
2022-09-13    162.130005
2022-09-14    157.570007
Name: Close, Length: 9226, dtype: float64

In [21]:
#I am going to replace "." to "-" and add ".TO" at the end of ticker
newtickerTSX = []
for ticker in tickerTSX:
    if '.' in ticker:
        dot_index = ticker.rfind('.')
        newticker = ticker[:dot_index] + '-' + ticker[dot_index +1:] + '.TO'
        newtickerTSX.append(newticker)
    else:
        newtickerTSX.append(str(ticker) + '.TO')  

### Download historical data from from Yahoo! Finance's API

In [22]:
# Download historical price data
TSX60 = getData(newtickerTSX)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [23]:
new_TSX60 = merge_data(TSX60, newtickerTSX)
new_TSX60.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Symbol
0,1995-01-12,14.88,15.0,14.5,14.5,11.853491,147100,AEM.TO
1,1995-01-13,14.38,14.5,14.13,14.25,11.649123,26500,AEM.TO
2,1995-01-16,14.25,14.25,13.88,13.88,11.346649,3800,AEM.TO
3,1995-01-17,14.25,14.25,13.88,13.88,11.346649,18500,AEM.TO
4,1995-01-18,14.13,14.5,14.13,14.25,11.649123,18500,AEM.TO


In [24]:
new_TSX60.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Symbol
342431,2022-09-08,41.75,42.669998,41.560001,42.43,42.43,892900,WPM.TO
342432,2022-09-09,42.900002,43.209999,42.450001,43.060001,43.060001,957500,WPM.TO
342433,2022-09-12,43.970001,44.220001,43.43,43.82,43.82,1016700,WPM.TO
342434,2022-09-13,42.84,43.82,42.560001,42.849998,42.849998,954900,WPM.TO
342435,2022-09-14,43.150002,43.689999,42.869999,43.029999,43.029999,732500,WPM.TO


## 4. Download fundamental data from Yahoo! Finance's API

In [25]:
#Function to download fundamental data
def download_fundamental(tickers):
    df_fundamental = pd.DataFrame()
    for ticker in tickers:
        var = yf.Ticker(ticker).info
        frame = pd.DataFrame([var])
        df_fundamental = df_fundamental.append(frame)
    return df_fundamental

#### Download TSX60 fundamental data

In [26]:

TSXFundamental = download_fundamental(new_TSX60)

  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)


In [27]:
# Looking at Fundamental dataset
TSXFundamental.shape

(8, 166)

We have 60 rows and 159 columns

In [28]:
TSXFundamental.head()

Unnamed: 0,symbol,quoteType,exchange,exchangeTimezoneName,exchangeTimezoneShortName,gmtOffSetMilliseconds,market,isEsgPopulated,quoteSourceName,regularMarketOpen,...,fiftyTwoWeekHigh,fiveYearAvgDividendYield,fiftyTwoWeekLow,bid,tradeable,dividendYield,bidSize,dayHigh,coinMarketCapLink,trailingPE
0,DATE,MUTUALFUND,YHD,America/New_York,EDT,-14400000.0,us_market,False,Delayed Quote,,...,,,,,,,,,,
0,OPEN,EQUITY,NMS,America/New_York,EDT,-14400000.0,us_market,False,,4.22,...,25.325,,4.035,4.31,False,,3000.0,4.48,,
0,,,,,,,,,,,...,,,,,,,,,,
0,LOW,EQUITY,NYQ,America/New_York,EDT,-14400000.0,us_market,False,,193.32,...,263.31,1.63,170.12,193.27,False,0.0217,800.0,194.49,,15.172685
0,,,,,,,,,,,...,,,,,,,,,,


In [29]:
TSXFundamental.tail()

Unnamed: 0,symbol,quoteType,exchange,exchangeTimezoneName,exchangeTimezoneShortName,gmtOffSetMilliseconds,market,isEsgPopulated,quoteSourceName,regularMarketOpen,...,fiftyTwoWeekHigh,fiveYearAvgDividendYield,fiftyTwoWeekLow,bid,tradeable,dividendYield,bidSize,dayHigh,coinMarketCapLink,trailingPE
0,LOW,EQUITY,NYQ,America/New_York,EDT,-14400000.0,us_market,False,,193.32,...,263.31,1.63,170.12,193.27,False,0.0217,800.0,194.49,,15.172685
0,,,,,,,,,,,...,,,,,,,,,,
0,,,,,,,,,,,...,,,,,,,,,,
0,,,,,,,,,,,...,,,,,,,,,,
0,,,,,,,,,,,...,,,,,,,,,,


In [30]:
TSXFundamental.isnull().sum()

symbol                       5
quoteType                    5
exchange                     5
exchangeTimezoneName         5
exchangeTimezoneShortName    5
                            ..
dividendYield                7
bidSize                      6
dayHigh                      6
coinMarketCapLink            8
trailingPE                   7
Length: 166, dtype: int64

We have some columns that do not have values. We will drop all columns that have null values

In [31]:
# Select columns that we will use for future data analysis and visualization

select_columns = ['symbol', 'shortName','longName', 'sector', 'longBusinessSummary', 'city', 'state', 'country', 'website','industry','ebitdaMargins',
       'profitMargins', 'grossMargins', 'operatingCashflow',
       'revenueGrowth', 'operatingMargins', 'ebitda', 'targetLowPrice',
       'recommendationKey', 'grossProfits', 'freeCashflow',
       'targetMedianPrice', 'currentPrice', 'earningsGrowth',
       'currentRatio', 'returnOnAssets', 'numberOfAnalystOpinions',
       'targetMeanPrice', 'debtToEquity', 'returnOnEquity',
       'targetHighPrice', 'totalCash', 'totalDebt', 'totalRevenue',
       'totalCashPerShare', 'financialCurrency', 'revenuePerShare',
       'market',
       ]

In [32]:
subTSX_Fundamental = TSXFundamental[select_columns]

In [33]:
subTSX_Fundamental.head()

Unnamed: 0,symbol,shortName,longName,sector,longBusinessSummary,city,state,country,website,industry,...,debtToEquity,returnOnEquity,targetHighPrice,totalCash,totalDebt,totalRevenue,totalCashPerShare,financialCurrency,revenuePerShare,market
0,DATE,,,,,,,,,,...,,,,,,,,,,us_market
0,OPEN,Opendoor Technologies Inc,Opendoor Technologies Inc.,Real Estate,Opendoor Technologies Inc. operates a digital ...,Tempe,AZ,United States,https://www.opendoor.com,Real Estate Services,...,320.671,-0.11479,19.0,2472000000.0,7555000000.0,15437000000.0,3.931,USD,25.095,us_market
0,,,,,,,,,,,...,,,,,,,,,,
0,LOW,"Lowe's Companies, Inc.","Lowe's Companies, Inc.",Consumer Cyclical,"Lowe's Companies, Inc., together with its subs...",Mooresville,NC,United States,https://www.lowes.com,Home Improvement Retail,...,,,300.0,2148000000.0,33661000000.0,95392000000.0,3.461,USD,143.447,us_market
0,,,,,,,,,,,...,,,,,,,,,,


#### Download Dow Jones's Fundamental Data

In [34]:
DOWFundamental = download_fundamental(tickerDOW)

  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append(frame)
  df_fundamental = df_fundamental.append

In [35]:
subDOWFundamental = DOWFundamental[select_columns]

# 4. Download market historical prices

In [36]:
index_list = ['TX60.TS', '^DJI', '^GSPC','^IXIC']

In [37]:
index_price_data = getData(index_list)

[*********************100%***********************]  1 of 1 completed

1 Failed download:
- TX60.TS: No data found for this date range, symbol may be delisted
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [38]:
merge_index_price_data = merge_data(index_price_data, index_list)

In [39]:
merge_index_price_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Symbol
0,1992-01-02,3152.100098,3172.629883,3139.310059,3172.399902,3172.399902,23550000,^DJI
1,1992-01-03,3172.399902,3210.639893,3165.919922,3201.5,3201.5,23620000,^DJI
2,1992-01-06,3201.5,3213.330078,3191.860107,3200.100098,3200.100098,27280000,^DJI
3,1992-01-07,3200.100098,3210.199951,3184.47998,3204.800049,3204.800049,25510000,^DJI
4,1992-01-08,3204.800049,3229.199951,3185.820068,3203.899902,3203.899902,29040000,^DJI


In [40]:
merge_index_price_data.shape

(39044, 8)

# 5. Create database & import data into DJIA table

In [41]:
stockengine = createengine('StockDatabase')
toSQL(new_USDOWJONES, "DJIA", stockengine)

Successfully imported data


In [42]:
#Create new table to the database
toSQL(new_TSX60, "TSX60", stockengine)

Successfully imported data


In [43]:
toSQL(subTSX_Fundamental, "TSX60Fundamental", stockengine)

Successfully imported data


In [44]:
toSQL(subDOWFundamental, "DOWFundamental", stockengine)

Successfully imported data


In [45]:
toSQL(merge_index_price_data, "markets", stockengine)

Successfully imported data
