# Mini Project 1

**2025 Introduction to Quantiative Methods in Finance**

**The Erdös Institute**

**Instructions** Use current stock data to create two potentially profitable investment portfolios. One that is higher risk and one that is lower risk.

-- You are to interpret and explain your interpretation of a high risk profile and low risk profile of a portfolio. You should provide some measurable quantitative data in your explanation.

# Outline

**Goal**: Construct two portfolios - one with high risk and another with risk.

I am going to quantify risk using the volatility of the historic returns of the portfolio. To make the comparison transparent and meaningful, I have constructed two portfolios that have roughly equal expected return based on historical data.

The basic idea behind reducing the risk is to increase diversification. We could increase diversification by increasing the number of stocks. A portfolio with a higher number of stocks would have a lower risk as a result. However, in this miniproject, I have focused on the diversification coming from picking stocks from different sectors while keeping the number of stocks same.

### High Risk
The high risk portfolio has 10 stocks all of which are from the Technology sector. The 10 stocks are chosen with equal weight.

**Sector representation**: Information Technology(10).

**Stocks**: Intel (INTC), Western Digital (WDC), Lam Research (LRCX), KLA Corporation (KLAC), DXC Technology (DXC), Arista Networks (ANET), Fidelity National Information Services (FIS), ServiceNow (NOW), HP Inc. (HPQ), Micron Technology (MU).


### Low Risk

The low risk portfolio also has 10 stocks. But they are chosen from 10 different sectors.

**Sector representation**: (One from each) Utilities, Consumer Staples, Health Care, Communication Services, Financials, Materials, Industrials, Information Technology, Consumer Discretionary, Energy

**Stocks**: NextEra Energy (NEE), Costco Wholesale Corp. (COST), Johnson & Johnson (JNJ), Verizon Communications (VZ), CME Group (CME), Sherwin-Williams (SHW), Lockheed Martin (LMT), IBM (IBM), McDonald’s (MCD), ONEOK (OKE).


Note: 
* All the companies chosen are part of S&P 500. This is to ensure that we are dealing with stocks that are all large cap, so that size effect is not important.
* The companies are chosen based on the names that I was familiar with and to achieve a roughly equal expected return.
* Expected return is measured as just the mean of the historical return. I have looked 10 years daily return data from 2015-2024.

In [46]:
# Package imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
import seaborn as sns
from math import *
import yfinance as yf
import datetime as dt

sns.set_style('darkgrid')

In [47]:
# Plot the cumulative returns of the following stocks over the last two years
tickers_high = ["INTC", "WDC", "LRCX", "KLAC", "DXC", "ANET", "FIS", "NOW", "HPQ", "MU"]
tickers_low = ["NEE", "COST", "JNJ", "VZ", "CME", "SHW", "LMT", "IBM", "MCD", "OKE"]
tickers = tickers_high+ tickers_low

start_date = "2015-01-01"
end_date = "2024-12-31"

stock = yf.download(tickers, start = start_date, end = end_date)

daily_returns = np.log(stock['Close']/stock['Close'].shift(1))
daily_returns = daily_returns.dropna()

weights = np.full(10,0.1)

  stock = yf.download(tickers, start = start_date, end = end_date)
[*********************100%***********************]  20 of 20 completed


In [48]:
# Porfolio daily returns
high_daily = daily_returns[tickers_high].dot(weights)
low_daily  = daily_returns[tickers_low ].dot(weights)

# Annualized expected return
high_exp_ret = high_daily.mean() * 252
low_exp_ret  = low_daily.mean()  * 252

# annualized volatility
high_vol = high_daily.std(ddof=0) * np.sqrt(252)
low_vol  = low_daily.std(ddof=0)  * np.sqrt(252)

# Sharpe ratio
high_sharpe = high_exp_ret / high_vol
low_sharpe  = low_exp_ret  / low_vol


In [49]:
# print summary
print(f"High-risk portfolio  |  Exp. return: {high_exp_ret:.2%}   "
      f"Volatility: {high_vol:.2%}   Sharpe: {high_sharpe:.2f}")
print(f"Low-risk  portfolio  |  Exp. return: {low_exp_ret:.2%}   "
      f"Volatility: {low_vol:.2%}   Sharpe: {low_sharpe:.2f}")

High-risk portfolio  |  Exp. return: 12.17%   Volatility: 28.69%   Sharpe: 0.42
Low-risk  portfolio  |  Exp. return: 11.79%   Volatility: 15.77%   Sharpe: 0.75


We see that while both portfolios have an expected return of around 11%, the volatility of the high risk portfolio is almost twice that of the low risk portfolio. To normalize the expected return, I have also looked at the Sharpe ratio, which measures the expected return per unit risk.

**Conclusion**: Diversifying the portfolio with stocks from different sectors produces a lower risk compared to portfolio where stocks are all concentrated in a single sector.

To explicitly see that the sector diversification comes about by reduced correlation between different stocks, I have looked at the correlation matrix for the two partfolios.

In [50]:
corr_high = ((daily_returns[tickers_high]).corr())
corr_low = ((daily_returns[tickers_low]).corr())




In [51]:
print("Correlation matrix of the high risk portfolio ")
corr_high.style.format("{:.2f}")

Correlation matrix of the high risk portfolio 


Ticker,INTC,WDC,LRCX,KLAC,DXC,ANET,FIS,NOW,HPQ,MU
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
INTC,1.0,0.48,0.58,0.58,0.33,0.37,0.37,0.38,0.45,0.53
WDC,0.48,1.0,0.61,0.55,0.4,0.37,0.38,0.36,0.51,0.69
LRCX,0.58,0.61,1.0,0.87,0.4,0.47,0.4,0.49,0.5,0.69
KLAC,0.58,0.55,0.87,1.0,0.36,0.47,0.38,0.49,0.47,0.62
DXC,0.33,0.4,0.4,0.36,1.0,0.28,0.41,0.27,0.44,0.35
ANET,0.37,0.37,0.47,0.47,0.28,1.0,0.31,0.47,0.35,0.42
FIS,0.37,0.38,0.4,0.38,0.41,0.31,1.0,0.39,0.37,0.34
NOW,0.38,0.36,0.49,0.49,0.27,0.47,0.39,1.0,0.31,0.41
HPQ,0.45,0.51,0.5,0.47,0.44,0.35,0.37,0.31,1.0,0.47
MU,0.53,0.69,0.69,0.62,0.35,0.42,0.34,0.41,0.47,1.0


In [52]:
print("Correlation matrix of the low risk portfolio ")
corr_low.style.format("{:.2f}")

Correlation matrix of the low risk portfolio 


Ticker,NEE,COST,JNJ,VZ,CME,SHW,LMT,IBM,MCD,OKE
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
NEE,1.0,0.33,0.39,0.4,0.35,0.37,0.34,0.31,0.42,0.27
COST,0.33,1.0,0.38,0.29,0.32,0.39,0.32,0.35,0.37,0.19
JNJ,0.39,0.38,1.0,0.41,0.38,0.36,0.42,0.43,0.42,0.22
VZ,0.4,0.29,0.41,1.0,0.32,0.29,0.31,0.38,0.32,0.25
CME,0.35,0.32,0.38,0.32,1.0,0.37,0.38,0.36,0.47,0.36
SHW,0.37,0.39,0.36,0.29,0.37,1.0,0.34,0.4,0.47,0.32
LMT,0.34,0.32,0.42,0.31,0.38,0.34,1.0,0.43,0.39,0.3
IBM,0.31,0.35,0.43,0.38,0.36,0.4,0.43,1.0,0.41,0.4
MCD,0.42,0.37,0.42,0.32,0.47,0.47,0.39,0.41,1.0,0.37
OKE,0.27,0.19,0.22,0.25,0.36,0.32,0.3,0.4,0.37,1.0


In [56]:
# High-risk sleeve
avg_rho_high = (corr_high.where(~np.eye(len(corr_high), dtype=bool)).stack().mean())
avg_rho_low = (corr_low.where(~np.eye(len(corr_low), dtype=bool)).stack().mean())

print("Average pairwise correlation of the high risk portfolio: {avg_rho_high:.2f})
print("Average pairwise correlation of the low risk portfolio: ", avg_rho_low"{:.2f}")


SyntaxError: invalid decimal literal (3063898552.py, line 5)

By calculating the average pairwise correlation of stocks in the two portfolios, we see that the stocks that are all concenctrated in one sector are on average more correlated with each other than stocks that are in different sectors.