# Mini Project 2

**2025 Introduction to Quantiative Methods in Finance**

**The Erdös Institute**


###  Hypothesis Testing of Standard Assumptions Theoretical Financial Mathematics

In the theory of mathematical finance, it is common to assume the log returns of a stock/index are normally distributed.


Investigate if the log returns of stocks or indexes of your choosing are normally distributed. Some suggestions for exploration include:

    1) Test if there are period of times when the log-returns of a stock/index have evidence of normal distribution.
    
    2) Test if removing extremal return data creates a distribution with evidence of being normal.
    
    3) Create a personalized portfolio of stocks with historical log return data that is normally distributed.
    
    4) Test if the portfolio you created in the first mini-project has significant periods of time with evidence of normally distributed log returns.
    
    5) Gather x-number of historical stock data and just perform a normality test on their log return data to see if any of the stocks exhibit evidence of log returns that are normally distributed.

In [34]:
import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import normaltest, shapiro, anderson
import statsmodels.api as sm

tickers = ['AAPL', 'GOOGL', 'TSLA',  'JNJ']

# Download 5 years of daily data
data = yf.download(tickers, start="2020-06-01", end="2025-06-01")['Close']
log_returns = np.log(data / data.shift(1)).dropna()

  data = yf.download(tickers, start="2020-06-01", end="2025-06-01")['Close']
[*********************100%***********************]  4 of 4 completed


In [23]:
# This function simply computes and prints p-values on Pearson and Shapiro tests

def test_normality(series, name=""):
    print(f"\n--- {name} ---")
    
    # Pearson
    stat, p = normaltest(series)
    print(f"D’Agostino-Pearson: p={p:.4f} {'Normal' if p > 0.05 else 'Reject Normal'}")

    # Shapiro
    stat, p = shapiro(series)
    print(f"Shapiro-Wilk: p={p:.4f} {'Normal' if p > 0.05 else 'Reject Normal'}")

In [24]:
# Q1: Is there a normal one?
# We look at 5 years from 2020 to 2024
periods = {
    "2020": ("2020-06-01", "2020-12-31"),
    "2021": ("2021-01-01", "2021-12-31"),
    "2022": ("2022-01-01", "2022-12-31"),
    "2023": ("2023-01-01", "2023-12-31"),
    "2024": ("2024-01-01", "2024-12-31")
}


In [25]:
# Run tests. From the result, p-values are small, they are not normal.
for ticker in tickers:
    returns = log_returns[ticker]
    for label, (start, end) in periods.items():
        segment = returns.loc[start:end]
        name = f"{ticker} {label}"
        test_normality(segment, name=name)


--- AAPL 2020 ---
D’Agostino-Pearson: p=0.0024 Reject Normal
Shapiro-Wilk: p=0.0018 Reject Normal

--- AAPL 2021 ---
D’Agostino-Pearson: p=0.3453 Normal
Shapiro-Wilk: p=0.1532 Normal

--- AAPL 2022 ---
D’Agostino-Pearson: p=0.0262 Reject Normal
Shapiro-Wilk: p=0.0610 Normal

--- AAPL 2023 ---
D’Agostino-Pearson: p=0.0034 Reject Normal
Shapiro-Wilk: p=0.0057 Reject Normal

--- AAPL 2024 ---
D’Agostino-Pearson: p=0.0000 Reject Normal
Shapiro-Wilk: p=0.0000 Reject Normal

--- GOOGL 2020 ---
D’Agostino-Pearson: p=0.0007 Reject Normal
Shapiro-Wilk: p=0.0012 Reject Normal

--- GOOGL 2021 ---
D’Agostino-Pearson: p=0.0000 Reject Normal
Shapiro-Wilk: p=0.0000 Reject Normal

--- GOOGL 2022 ---
D’Agostino-Pearson: p=0.0459 Reject Normal
Shapiro-Wilk: p=0.0950 Normal

--- GOOGL 2023 ---
D’Agostino-Pearson: p=0.0000 Reject Normal
Shapiro-Wilk: p=0.0000 Reject Normal

--- GOOGL 2024 ---
D’Agostino-Pearson: p=0.0000 Reject Normal
Shapiro-Wilk: p=0.0000 Reject Normal

--- TSLA 2020 ---
D’Agostino-Pea

In [26]:
# Q2: Is there a normal one after removing outliers

In [27]:
# Ourlier trimmer
def trim_outliers(series, lower=0.025, upper=0.975):
    return series[(series > series.quantile(lower)) & (series < series.quantile(upper))]

In [28]:
# Run tests with outlier removal
for ticker in tickers:
    returns = log_returns[ticker]
    for label, (start, end) in periods.items():
        segment = returns.loc[start:end]
        trimmed = trim_outliers(segment)

        print(f"\n=== {ticker} {label} ===")

        # After trimming
        stat, p = normaltest(trimmed)
        print(f"Trimmed D’Agostino-Pearson: p={p:.4f} {'Normal' if p > 0.05 else 'Reject Normal'}")
        stat, p = shapiro(trimmed)
        print(f"Trimmed Shapiro-Wilk: p={p:.4f} {'Normal' if p > 0.05 else 'Reject Normal'}")


=== AAPL 2020 ===
Trimmed D’Agostino-Pearson: p=0.3710 Normal
Trimmed Shapiro-Wilk: p=0.0892 Normal

=== AAPL 2021 ===
Trimmed D’Agostino-Pearson: p=0.4838 Normal
Trimmed Shapiro-Wilk: p=0.1069 Normal

=== AAPL 2022 ===
Trimmed D’Agostino-Pearson: p=0.0093 Reject Normal
Trimmed Shapiro-Wilk: p=0.0356 Reject Normal

=== AAPL 2023 ===
Trimmed D’Agostino-Pearson: p=0.0335 Reject Normal
Trimmed Shapiro-Wilk: p=0.1106 Normal

=== AAPL 2024 ===
Trimmed D’Agostino-Pearson: p=0.1850 Normal
Trimmed Shapiro-Wilk: p=0.0664 Normal

=== GOOGL 2020 ===
Trimmed D’Agostino-Pearson: p=0.0737 Normal
Trimmed Shapiro-Wilk: p=0.0250 Reject Normal

=== GOOGL 2021 ===
Trimmed D’Agostino-Pearson: p=0.7282 Normal
Trimmed Shapiro-Wilk: p=0.1702 Normal

=== GOOGL 2022 ===
Trimmed D’Agostino-Pearson: p=0.0810 Normal
Trimmed Shapiro-Wilk: p=0.0964 Normal

=== GOOGL 2023 ===
Trimmed D’Agostino-Pearson: p=0.4371 Normal
Trimmed Shapiro-Wilk: p=0.1877 Normal

=== GOOGL 2024 ===
Trimmed D’Agostino-Pearson: p=0.0189 Re

In [29]:
# Q3: Based on the result from Q2, TSLA and GOOGL are apprxmately normal.
profolio = ['TSLA', 'GOOGL']

In [35]:
# Q4 : below is the stocks in Q1
high_risk_tickers = ['TSLA', 'NVDA', 'COIN']
low_risk_tickers = ['JNJ', 'PG', 'WMT']

In [36]:
# Let us test them!
# Run tests with removal
tickers = high_risk_tickers + low_risk_tickers

data = yf.download(tickers, start="2020-06-01", end="2025-06-01")['Close']
log_returns = np.log(data / data.shift(1)).dropna()
for ticker in tickers:
    returns = log_returns[ticker]
    for label, (start, end) in periods.items():
        segment = returns.loc[start:end]
        trimmed = trim_outliers(segment)

        print(f"\n=== {ticker} {label} ===")

        # After trimming
        stat, p = normaltest(trimmed)
        print(f"Trimmed D’Agostino-Pearson: p={p:.4f} {'Normal' if p > 0.05 else 'Reject Normal'}")
        stat, p = shapiro(trimmed)
        print(f"Trimmed Shapiro-Wilk: p={p:.4f} {'Normal' if p > 0.05 else 'Reject Normal'}")

  data = yf.download(tickers, start="2020-06-01", end="2025-06-01")['Close']
[*********************100%***********************]  6 of 6 completed


=== TSLA 2020 ===
Trimmed D’Agostino-Pearson: p=nan Reject Normal
Trimmed Shapiro-Wilk: p=nan Reject Normal

=== TSLA 2021 ===
Trimmed D’Agostino-Pearson: p=0.4261 Normal
Trimmed Shapiro-Wilk: p=0.0798 Normal

=== TSLA 2022 ===
Trimmed D’Agostino-Pearson: p=0.2764 Normal
Trimmed Shapiro-Wilk: p=0.0237 Reject Normal

=== TSLA 2023 ===
Trimmed D’Agostino-Pearson: p=0.6392 Normal
Trimmed Shapiro-Wilk: p=0.3721 Normal

=== TSLA 2024 ===
Trimmed D’Agostino-Pearson: p=0.1270 Normal
Trimmed Shapiro-Wilk: p=0.1081 Normal

=== NVDA 2020 ===
Trimmed D’Agostino-Pearson: p=nan Reject Normal
Trimmed Shapiro-Wilk: p=nan Reject Normal

=== NVDA 2021 ===
Trimmed D’Agostino-Pearson: p=0.4199 Normal
Trimmed Shapiro-Wilk: p=0.0660 Normal

=== NVDA 2022 ===
Trimmed D’Agostino-Pearson: p=0.0128 Reject Normal
Trimmed Shapiro-Wilk: p=0.0298 Reject Normal

=== NVDA 2023 ===
Trimmed D’Agostino-Pearson: p=0.0384 Reject Normal
Trimmed Shapiro-Wilk: p=0.0323 Reject Normal

=== NVDA 2024 ===
Trimmed D’Agostino-Pe


  stat, p = normaltest(trimmed)
  stat, p = shapiro(trimmed)
  stat, p = normaltest(trimmed)
  stat, p = shapiro(trimmed)
  stat, p = normaltest(trimmed)
  stat, p = shapiro(trimmed)
  stat, p = normaltest(trimmed)
  stat, p = shapiro(trimmed)
  stat, p = normaltest(trimmed)
  stat, p = shapiro(trimmed)
  stat, p = normaltest(trimmed)
  stat, p = shapiro(trimmed)


In [37]:
# The below 5 are normal after removing outliers
["TSLA", "JNJ", "PG", "WMT"]

['TSLA', 'JNJ', 'PG', 'WMT']