# Canadian Banks Stock Data Analysis
### Authors: Mariia-Olena Zhupnyk & Mariia Shekhovtsova

## Table of Contents
1. [Introduction](#1-introduction)
2. [Data Collection](#2-data-collection)
   - [Install and Import Libraries](#21-import-libraries)
   - [Scrape Data for Selected Banks](#23-scrape-data-for-selected-banks)
3. [Data Preprocessing & Cleaning](#3-data-preprocessing--cleaning)
4. [Exploratory Data Analysis (EDA)](#4-exploratory-data-analysis-eda)
5. [Statistical Analysis & Financial Metrics](#5-statistical-analysis--financial-metrics)
6. [Power BI Dashboard](#6-power-bi-dashboard)
7. [Conclusion & Insights](#7-conclusion--insights)

## 1. Introduction

The purpose of this project is to analyze the stock performance of Canada’s leading banks — **BMO, CIBC, TD, RBC, and Scotiabank** — over the past five years. Through this analysis, we aim to provide data-driven insights into market trends, investment risks, and stock relationships. The findings can support investors, analysts, and financial professionals in making informed decisions.

### Objectives:
- **[Stock Market Performance Analysis](#4-exploratory-data-analysis-eda)** – Evaluating stock trends and overall performance.
- **[Volatility Assessment](#5-statistical-analysis--financial-metrics)** – Measuring price fluctuations to identify risk levels and market stability.
- **[Stock Price Correlation](#5-statistical-analysis--financial-metrics)** – Examining relationships between different banks' stock prices.
- **[Prediction Model Development](#5-statistical-analysis--financial-metrics)** – Building a predictive model to forecast stock trends and help investors determine which bank is more stable or offers better returns.

We collect stock price data and financial statements from [Yahoo Finance](https://finance.yahoo.com/), process them using Python, store them in a PostgreSQL database, and visualize insights using Power BI.

## 2. Data Collection

### 2.1 Install and Import Libraries

In [9]:
!pip install yfinance
!pip install yahooquery
%pip install matplotlib
!pip install seaborn









Note: you may need to restart the kernel to use updated packages.




Collecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2




In [10]:
import yfinance as yf
from yahooquery import Ticker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

In [11]:
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore", category=FutureWarning)

### 2.2 Scrape Data for Selected Banks

#### Scrape Historical stock prices

In [9]:
# Step 1: Define the tickers for Canadian banks
bank_tickers = {
    "TD": "TD.TO",
    "BMO": "BMO.TO",
    "RBC": "RY.TO",
    "CIBC": "CM.TO",
    "Scotiabank": "BNS.TO"
}

In [5]:
# Step 2: Download historical stock prices
historical_data = {}
for bank, ticker in bank_tickers.items():
    print(f"Downloading historical data for {bank} ({ticker})...")
    historical_data[bank] = yf.download(ticker, start="2020-01-01", end="2025-01-01")

Downloading historical data for TD (TD.TO)...


[*********************100%***********************]  1 of 1 completed


Downloading historical data for BMO (BMO.TO)...


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Downloading historical data for RBC (RY.TO)...
Downloading historical data for CIBC (CM.TO)...


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Downloading historical data for Scotiabank (BNS.TO)...





In [6]:
# Step 3: Save historical data to CSV
for bank, data in historical_data.items():
    data.to_csv(f"{bank}_historical_data.csv")
    print(f"Saved historical data for {bank} to {bank}_historical_data.csv")

Saved historical data for TD to TD_historical_data.csv
Saved historical data for BMO to BMO_historical_data.csv
Saved historical data for RBC to RBC_historical_data.csv
Saved historical data for CIBC to CIBC_historical_data.csv
Saved historical data for Scotiabank to Scotiabank_historical_data.csv


In [7]:
# Step 4: Fetch financial data (e.g., market cap, PE ratio, dividend yield)
financial_data = {}
for bank, ticker in bank_tickers.items():
    print(f"Fetching financial data for {bank} ({ticker})...")
    stock = yf.Ticker(ticker)
    financial_data[bank] = {
        "Market Cap": stock.info.get("marketCap"),
        "PE Ratio": stock.info.get("trailingPE"),
        "Dividend Yield": stock.info.get("dividendYield"),
        "Beta": stock.info.get("beta"),
    }

Fetching financial data for TD (TD.TO)...
Fetching financial data for BMO (BMO.TO)...
Fetching financial data for RBC (RY.TO)...
Fetching financial data for CIBC (CM.TO)...
Fetching financial data for Scotiabank (BNS.TO)...


In [8]:
# Step 5: Save financial data to CSV
financial_df = pd.DataFrame(financial_data).transpose()
financial_df.to_csv("financial_data.csv")
print("Saved financial data to financial_data.csv")

Saved financial data to financial_data.csv


In [9]:
# Step 6: Preview the saved data
print("Sample historical data for TD:")
print(historical_data["TD"].head())

Sample historical data for TD:
Price           Close       High        Low       Open   Volume
Ticker          TD.TO      TD.TO      TD.TO      TD.TO    TD.TO
Date                                                           
2020-01-02  57.946812  57.994171  57.560080  57.781070  2207900
2020-01-03  57.899452  57.946805  57.544292  57.678463  3472900
2020-01-06  57.907341  57.986265  57.583745  57.615316  8359000
2020-01-07  57.844208  58.175692  57.788960  58.017844  3622600
2020-01-08  58.294079  58.562421  57.796853  57.867883  6465500


In [10]:
print("\nFinancial data:")
print(financial_df)


Financial data:
              Market Cap   PE Ratio  Dividend Yield   Beta
TD          1.441382e+11  17.449154          0.0512  0.822
BMO         1.049765e+11  15.108192          0.0462  1.160
RBC         2.458259e+11  15.434279          0.0323  0.842
CIBC        8.540744e+10  12.449175          0.0402  1.128
Scotiabank  1.005176e+11  12.582624          0.0531  0.978


#### Scrape financial data for each bank

In [18]:
# Function to scrape financial data for each bank
def scrape_financial_data():
    for bank, ticker in bank_tickers.items():
        print(f"Scraping financial data for {bank} ({ticker})...")

        # Create Ticker object
        stock = Ticker(ticker)

        # Get financial data
        income_statement = stock.income_statement()
        balance_sheet = stock.balance_sheet()
        cash_flow = stock.cash_flow()

        # Fetch dividends using the history method
        dividends = stock.history(period="max")  # Get full historical data
        dividends = dividends[dividends.index.get_level_values("symbol") == ticker]  # Filter for current ticker
        dividends = dividends[["dividends"]]  # Select only dividends column

        # Convert to DataFrame and save as CSV
        pd.DataFrame(income_statement).to_csv(f"{bank}_income_statement.csv", index=False)
        pd.DataFrame(balance_sheet).to_csv(f"{bank}_balance_sheet.csv", index=False)
        pd.DataFrame(cash_flow).to_csv(f"{bank}_cash_flow.csv", index=False)
        
        if not dividends.empty:
            dividends.to_csv(f"{bank}_dividends.csv")

        print(f"Data saved for {bank} ✅")

In [19]:
# Run the scraper
scrape_financial_data()

Scraping financial data for TD (TD.TO)...
Data saved for TD ✅
Scraping financial data for BMO (BMO.TO)...
Data saved for BMO ✅
Scraping financial data for RBC (RY.TO)...
Data saved for RBC ✅
Scraping financial data for CIBC (CM.TO)...
Data saved for CIBC ✅
Scraping financial data for Scotiabank (BNS.TO)...
Data saved for Scotiabank ✅


### Data Preprocessing & Cleaning

In [11]:
#Load one of the historical CSV files (e.g., TD_historical_data.csv)
td_data = pd.read_csv("../PythonForDA/Canadian_banks/TD_historical_data.csv")

In [12]:
print(td_data.head())

        Price              Close               High                 Low  \
0      Ticker              TD.TO              TD.TO               TD.TO   
1        Date                NaN                NaN                 NaN   
2  2020-01-02   57.9467887878418  57.99414792760275  57.560057605242065   
3  2020-01-03  57.89945602416992  57.94680915907252   57.54429546938549   
4  2020-01-06  57.90734100341797   57.9862648895292   57.58374524240345   

                 Open   Volume  
0               TD.TO    TD.TO  
1                 NaN      NaN  
2   57.78104685244191  2207900  
3   57.67846669928117  3472900  
4  57.615316001149246  8359000  


In [13]:
 # Check for missing values
print("\nMissing values in TD data:")
print(td_data.isnull().sum())


Missing values in TD data:
Price     0
Close     1
High      1
Low       1
Open      1
Volume    1
dtype: int64


In [14]:
print(td_data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1257 entries, 0 to 1256
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Price   1257 non-null   object
 1   Close   1256 non-null   object
 2   High    1256 non-null   object
 3   Low     1256 non-null   object
 4   Open    1256 non-null   object
 5   Volume  1256 non-null   object
dtypes: object(6)
memory usage: 59.1+ KB
None


In [15]:
#td_data = td_data.dropna()  # Drop rows with missing values

In [16]:
#print(td_data.isnull().sum())

In [17]:
#print(td_data.head())

In [15]:
# Drop the first 2 row
td_data = td_data.iloc[2:].reset_index(drop=True)
td_data

Unnamed: 0,Price,Close,High,Low,Open,Volume
0,2020-01-02,57.9467887878418,57.99414792760275,57.560057605242065,57.78104685244191,2207900
1,2020-01-03,57.89945602416992,57.94680915907252,57.54429546938549,57.67846669928117,3472900
2,2020-01-06,57.90734100341797,57.9862648895292,57.58374524240345,57.615316001149246,8359000
3,2020-01-07,57.84420394897461,58.17568790912178,57.788956622283415,58.017840124720365,3622600
4,2020-01-08,58.2940788269043,58.56242127421098,57.7968528690381,57.867882568087325,6465500
...,...,...,...,...,...,...
1250,2024-12-23,74.59191131591797,74.61163870401722,73.73362316363925,73.9901243689176,5763300
1251,2024-12-24,75.1937026977539,75.25289239220729,74.48339625765291,75.1937026977539,1345700
1252,2024-12-27,75.39100646972656,75.61791284222221,75.07531555581855,75.12464531882436,5200200
1253,2024-12-30,75.20356750488281,75.51925840861496,74.83854683973492,74.97666113970118,14855800


In [16]:
# Rename columns
td_data.columns = ["Date", "Price_Close", "Price_High", "Price_Low", "Price_Open", "Volume"]
td_data

Unnamed: 0,Date,Price_Close,Price_High,Price_Low,Price_Open,Volume
0,2020-01-02,57.9467887878418,57.99414792760275,57.560057605242065,57.78104685244191,2207900
1,2020-01-03,57.89945602416992,57.94680915907252,57.54429546938549,57.67846669928117,3472900
2,2020-01-06,57.90734100341797,57.9862648895292,57.58374524240345,57.615316001149246,8359000
3,2020-01-07,57.84420394897461,58.17568790912178,57.788956622283415,58.017840124720365,3622600
4,2020-01-08,58.2940788269043,58.56242127421098,57.7968528690381,57.867882568087325,6465500
...,...,...,...,...,...,...
1250,2024-12-23,74.59191131591797,74.61163870401722,73.73362316363925,73.9901243689176,5763300
1251,2024-12-24,75.1937026977539,75.25289239220729,74.48339625765291,75.1937026977539,1345700
1252,2024-12-27,75.39100646972656,75.61791284222221,75.07531555581855,75.12464531882436,5200200
1253,2024-12-30,75.20356750488281,75.51925840861496,74.83854683973492,74.97666113970118,14855800


In [17]:
# Set the "Date" column as the index (optional, if needed for analysis)
td_data["Date"] = pd.to_datetime(td_data["Date"])  # Ensure Date is in datetime format
td_data

Unnamed: 0,Date,Price_Close,Price_High,Price_Low,Price_Open,Volume
0,2020-01-02,57.9467887878418,57.99414792760275,57.560057605242065,57.78104685244191,2207900
1,2020-01-03,57.89945602416992,57.94680915907252,57.54429546938549,57.67846669928117,3472900
2,2020-01-06,57.90734100341797,57.9862648895292,57.58374524240345,57.615316001149246,8359000
3,2020-01-07,57.84420394897461,58.17568790912178,57.788956622283415,58.017840124720365,3622600
4,2020-01-08,58.2940788269043,58.56242127421098,57.7968528690381,57.867882568087325,6465500
...,...,...,...,...,...,...
1250,2024-12-23,74.59191131591797,74.61163870401722,73.73362316363925,73.9901243689176,5763300
1251,2024-12-24,75.1937026977539,75.25289239220729,74.48339625765291,75.1937026977539,1345700
1252,2024-12-27,75.39100646972656,75.61791284222221,75.07531555581855,75.12464531882436,5200200
1253,2024-12-30,75.20356750488281,75.51925840861496,74.83854683973492,74.97666113970118,14855800


In [18]:
# Save the cleaned data
td_data.to_csv("TD_historical_data_cleaned.csv")

# Display the cleaned DataFrame
print(td_data.head())

        Date        Price_Close         Price_High           Price_Low  \
0 2020-01-02   57.9467887878418  57.99414792760275  57.560057605242065   
1 2020-01-03  57.89945602416992  57.94680915907252   57.54429546938549   
2 2020-01-06  57.90734100341797   57.9862648895292   57.58374524240345   
3 2020-01-07  57.84420394897461  58.17568790912178  57.788956622283415   
4 2020-01-08   58.2940788269043  58.56242127421098    57.7968528690381   

           Price_Open   Volume  
0   57.78104685244191  2207900  
1   57.67846669928117  3472900  
2  57.615316001149246  8359000  
3  58.017840124720365  3622600  
4  57.867882568087325  6465500  


In [20]:
# Step 1: Load the financial data CSV file
financial_data = pd.read_csv("../PythonForDA/Canadian_banks/financial_data.csv")

# Step 2: Display the data
print("\nFinancial Data:")
print(financial_data)

# Step 3: Check for missing values
print("\nMissing values in financial data:")
print(financial_data.isnull().sum())

# Step 4: Check data types and ensure numerical fields are correct
print("\nFinancial Data Info:")
print(financial_data.info())



Financial Data:
   Unnamed: 0    Market Cap   PE Ratio  Dividend Yield   Beta
0          TD  1.441382e+11  17.449154          0.0512  0.822
1         BMO  1.049765e+11  15.108192          0.0462  1.160
2         RBC  2.458259e+11  15.434279          0.0323  0.842
3        CIBC  8.540744e+10  12.449175          0.0402  1.128
4  Scotiabank  1.005176e+11  12.582624          0.0531  0.978

Missing values in financial data:
Unnamed: 0        0
Market Cap        0
PE Ratio          0
Dividend Yield    0
Beta              0
dtype: int64

Financial Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      5 non-null      object 
 1   Market Cap      5 non-null      float64
 2   PE Ratio        5 non-null      float64
 3   Dividend Yield  5 non-null      float64
 4   Beta            5 non-null      float64
dtypes: float64(4), object