# Stock Portfolio Optimization: Modern Portfolio Theory
**Description**: Optimizing an investment portfolio based on a user's risk tolerance: portfolio weighted average volatility, portfolio weighted average beta, sector diversification constraints. 

**User base**: Any retail investor 

**Purposes**: Can be used for Roth IRA investment strategy, structuring custom portfolios, a learning about risk's effect on returns

**Limitations**: 
- Historical performance is NOT an indicator of future performance. 
- While each asset's beta and volatility are based on historical data (valid assumptions), expected returns are based on random variables associated with the asset industry's past performance and volatility. 

## Data Collection and Cleaning

#### Part 1: Data Collection

Google Finance was used (through Google sheets) to collect weekly closing prices for stocks in the S&P 500. Only populated 285 stocks with full weekly closing price information

In [94]:
import numpy as np
import pandas as pd

# Reading in file data
stock_data_spy = pd.read_csv("complete_stock_data_SPY.csv")

stock_data_spy['Date'] = pd.to_datetime(stock_data_spy['Date'])
stock_data_spy.set_index('Date', inplace=True)

# Most Recent Closing Price
most_recent_closing_prices = stock_data_spy.iloc[-1]

# Weekly returns for each stock and for SPY
weekly_returns = stock_data_spy.pct_change().dropna()

# Annualized Average Return
years = (stock_data_spy.index[-1] - stock_data_spy.index[0]).days / 365.25
annualized_avg_return = ((stock_data_spy.iloc[-1] / stock_data_spy.iloc[0]) ** (1/years)) - 1

# Weekly Volatility --> Annualized Volatility
annualized_volatility = weekly_returns.std() * np.sqrt(52)

# Beta Value of Each Stock
cov_matrix = weekly_returns.cov()
market_variance = weekly_returns['SPY'].var()
beta_values = cov_matrix['SPY'] / market_variance

# Creating a DataFrame for the results
final_stocks_data = pd.DataFrame({
    'Most Recent Closing Price': most_recent_closing_prices,
    'Annualized Average Return': annualized_avg_return,
    'Annualized Volatility': annualized_volatility,
    'Beta Value': beta_values
})

final_stocks_data.head()

Unnamed: 0,Most Recent Closing Price,Annualized Average Return,Annualized Volatility,Beta Value
SPY,455.3,0.116487,0.203725,1.0
MMM,95.95,-0.136077,0.259216,0.865184
ABT,102.87,0.080846,0.260149,0.755388
ANF,73.31,0.316185,0.630581,1.514816
ACN,334.04,0.16371,0.275473,1.085449


ChatGPT's 'Data Analysis' tool was used to combine this dataframe with an S&P500 sectors csv to add a column for asset sector.

Here are the summary statistics for each the sectors:

In [95]:
final_stocks_data = pd.read_csv("final_stocks.csv")
industry_stats = final_stocks_data.groupby('Sector').agg(
    Total_Stocks=('Stock', 'count'),
    Average_Return=('Annualized Average Return', 'mean'),
    Average_Volatility=('Annualized Volatility', 'mean'),
    Average_Beta=('Beta Value', 'mean')
)
industry_stats

Unnamed: 0_level_0,Total_Stocks,Average_Return,Average_Volatility,Average_Beta
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Communication Services,9,0.062688,0.042627,0.908172
Consumer Discretionary,29,0.079549,0.054587,1.325326
Consumer Staples,28,0.043557,0.036499,0.626798
Energy,18,0.085124,0.065446,1.156831
Financials,47,0.056666,0.048894,1.202312
Health Care,29,0.080711,0.04112,0.817957
Industrials,40,0.124102,0.04365,1.070276
Information Technology,33,0.165247,0.048432,1.140272
Materials,16,0.063667,0.050561,1.093781
Real Estate,12,-0.000628,0.053066,1.206909


In [106]:
final_stocks_data = pd.read_csv("final_stocks.csv")

np.random.seed(42)
final_stocks_data = final_stocks_data.merge(industry_stats[['Average_Return', 'Average_Volatility']], 
                                            left_on='Sector', 
                                            right_index=True)
# Calculate expected return for each stock
final_stocks_data['Expected_Return'] = final_stocks_data['Average_Return'] + \
                                       (np.random.randn(len(final_stocks_data)) * 
                                        final_stocks_data['Average_Volatility'])
final_stocks_data = final_stocks_data[['Stock',
                                       'Most Recent Closing Price', 
                                      'Expected_Return', 
                                      'Annualized Volatility', 
                                      'Beta Value', 
                                      'Sector']]

# Make sure to convert back to annualized volatility instead of weekly
final_stocks_data['Annualized Volatility'] = final_stocks_data['Annualized Volatility'] * np.sqrt(52)

# Rename 'Expected_Return' to 'Expected Return'
data = final_stocks_data.rename(columns={'Expected_Return': 'Expected Return'})

In [107]:
data

Unnamed: 0,Stock,Most Recent Closing Price,Expected Return,Annualized Volatility,Beta Value,Sector
0,ABT,102.87,0.101136,0.260149,0.755388,Health Care
6,A,126.62,0.075026,0.269528,0.953477,Health Care
20,BAX,36.02,0.107345,0.266092,0.629239,Health Care
21,BDX,238.89,0.143339,0.235329,0.491206,Health Care
24,BIIB,231.95,0.071083,0.467142,0.735806,Health Care
...,...,...,...,...,...,...
184,NFLX,479.56,0.067527,0.434105,1.040286,Communication Services
186,NWSA,22.01,0.090912,0.337754,1.006534,Communication Services
198,OMC,80.09,0.130294,0.297058,0.920965,Communication Services
267,VZ,37.41,0.009924,0.197792,0.399793,Communication Services
