# **Getting them stonks: An introduction to the Mean-Variance framework for portfolio optimization**
---

## Contents:

>[1 - Introduction](#1---Introduction)
>
>[2 - Importing modules](#2---Importing-modules)
>
>[3 - Data retrieval](#3---Data-retrieval)
>
>[4 - Preprocessing](#4---Preprocessing)
>
>[5 - Modelling](#5---Modelling)
>
>[6 - Conclusion](#6---Conclusion)
>


## 1 - Introduction

1. Present aim
2. Explain Theory (mean-variance optimization)
3. Get data (summary stats)
4. Plot Efficient frontier + portfolio points (MC simulation)
5. Change parameters
6. Extensions

### **Q:** Situation: You won the lottery, recieved the paycheck for your summer internship, or that distant uncle you didn't even know passed and left you some money... what do you do?
### **A:** Invest... but how?

Recepie for investment:

1. Define a goal/strategy
2. Pick suitable assets
3. **Construct a suitable portfolio**
4. Check and repeat

### **Q:** Given $n$ assets, what is the optimal allocation of these within a portfolio?
### **A:** There are many...

### The Mean-Variance framework:
- Developed by Harry Markowitz in 1952 (earned him Nobel Price in Economics)
- Aims to solve the above problem using two ingredients:
    1. The volatility of asset returns (risk) - for stocks, this is the average log first difference in stock prices
    2. The expected asset returns (reward) - for stocks, this is the sample covariance of periodic returns
- Shortcomings:
    - Stock returns can be non-stationary $\implies$ we can't used average returns as a reasonable forecast
    - Stock returns are notoriously hard to forecast (Efficient Market Hypothesis)
    
### Goal: Using those two ingredients, find an set of weights for how much each asset should make up of the total portfolio
    

## 2 - Importing modules

In [1]:
import numpy as np
import pandas as pd
import cvxpy as cp
import pandas_datareader.data as web
import matplotlib.pyplot as plt
from random import seed
from random import random

seed(1)
%matplotlib inline

## 3 - Data retrieval

First, we need to gather data on stock prices for a selection of assets. We focus our attention on [Investopedia's Top Stocks for March 2021](https://www.investopedia.com/top-stocks-4581225) during the period Feb 2018 - Feb 2021.

In [2]:
# Specify asset symbols
stocks = ['NRG','BIO','VIRT','WTM','ALL','MAT','FCX','IAC','ZM','CE','MRNA','PTON','ETSY','TSLA','ZS']
data = web.DataReader(stocks, 'yahoo', start='2019/02/10', end='2021/02/10')
data.head()

Attributes,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Symbols,NRG,BIO,VIRT,WTM,ALL,MAT,FCX,IAC,ZM,CE,...,MAT,FCX,IAC,ZM,CE,MRNA,PTON,ETSY,TSLA,ZS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2019-02-11,40.368961,252.929993,23.600658,920.919434,88.779411,15.74,11.329431,,,91.073799,...,16303600.0,15561600.0,,,1123000.0,597900.0,,2122300.0,35648500.0,1122800.0
2019-02-12,40.521839,258.100006,23.862984,913.227234,89.144028,16.469999,11.290126,,,94.205116,...,15181600.0,15617900.0,,,1511900.0,753100.0,,2576600.0,27588000.0,1493600.0
2019-02-13,40.655602,261.790009,23.763483,901.424561,90.189827,17.07,12.07621,,,93.768631,...,10584100.0,36169400.0,,,811700.0,1333500.0,,2073500.0,25708000.0,1194800.0
2019-02-14,40.71294,261.420013,23.265959,898.381592,89.45105,16.91,11.948471,,,94.451813,...,6968400.0,15313400.0,,,921500.0,1091400.0,,1595400.0,26004000.0,1315000.0
2019-02-15,40.569614,270.339996,22.858894,903.908752,90.55442,13.82,12.066383,,,95.653938,...,33526900.0,16573400.0,,,1337400.0,2087400.0,,2537500.0,19524500.0,1018100.0


In [3]:
data.info(verbose=False)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 505 entries, 2019-02-11 to 2021-02-10
Columns: 90 entries, ('Adj Close', 'NRG') to ('Volume', 'ZS')
dtypes: float64(90)
memory usage: 359.0 KB


## 4 - Preprocessing

In [4]:
data = data['Adj Close']
data.head()

Symbols,NRG,BIO,VIRT,WTM,ALL,MAT,FCX,IAC,ZM,CE,MRNA,PTON,ETSY,TSLA,ZS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2019-02-11,40.368961,252.929993,23.600658,920.919434,88.779411,15.74,11.329431,,,91.073799,18.17,,53.150002,62.568001,48.59
2019-02-12,40.521839,258.100006,23.862984,913.227234,89.144028,16.469999,11.290126,,,94.205116,18.690001,,55.830002,62.362,50.099998
2019-02-13,40.655602,261.790009,23.763483,901.424561,90.189827,17.07,12.07621,,,93.768631,18.530001,,55.040001,61.633999,49.0
2019-02-14,40.71294,261.420013,23.265959,898.381592,89.45105,16.91,11.948471,,,94.451813,19.66,,54.060001,60.754002,50.080002
2019-02-15,40.569614,270.339996,22.858894,903.908752,90.55442,13.82,12.066383,,,95.653938,21.440001,,54.66,61.576,50.110001


In [5]:
returns = (np.log(data)).diff()
returns.info(verbose=False)


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 505 entries, 2019-02-11 to 2021-02-10
Columns: 15 entries, NRG to ZS
dtypes: float64(15)
memory usage: 63.1 KB


In [10]:
ex_returns = returns.mean()
cov_returns = returns.cov()

## 5 - Modelling

#### Goal:

$W=\begin{bmatrix}
w_1\\
\vdots \\
w_n
\end{bmatrix}$

#### Ingredients:

#### $R=\begin{bmatrix}
\mathbb{E}[r_1]\\
\vdots \\
\mathbb{E}[r_n]
\end{bmatrix} \quad,\quad\Sigma = \begin{bmatrix}
\sigma_{11} & \dots & \sigma_{n1}\\
\vdots & \ddots & \vdots\\
\sigma_{1n} & \dots & \sigma_{nn}
\end{bmatrix}$

#### Plan of Attack:
1. Define our objective function
2. OPTIMIZE!
3. Analyze solutions

### Model 1: Minimum volatility

#### $ \underset{W}{\text{min}} \quad  W^T\:\Sigma \: W$
#### $\textrm{s.t}\quad \sum_{i=1}^{n}{w_i}=1 \quad , \quad w_i\geq 0$


In [7]:
# Defining weights vector
n = 15
w = cp.Variable(n)

# Creating objective function and constraints
objective = cp.quad_form(w, cov_returns)
constraints = [w>=0,                    # no short-selling constraint
               (np.ones(n))@w == 1]     # market-neutral constraint

# Solving for optimal portfolio
problem = cp.Problem(cp.Minimize(objective), constraints)
problem.solve()

# Print result
print("\nThe optimal value is", problem.value)
print("Solution weight is")
print(w.value)



The optimal value is 0.00016442906276205004
Solution weight is
[ 6.95216244e-02  1.03680106e-01  2.00519006e-01  2.55487462e-01
  1.09232164e-01  3.30430180e-02  1.20304320e-19  1.08345114e-01
  5.40576310e-02  2.03043329e-19  5.54032645e-02  1.07106109e-02
  2.93221901e-19  3.63203996e-19 -5.95152625e-21]


### Model 1: Minimum volatility (with short-selling)

#### $ \underset{W}{\text{min}} \quad  W^T\:\Sigma \: W$
#### $\textrm{s.t}\quad \sum_{i=1}^{n}{w_i}=1$


In [8]:
# Defining weights vector
n = cov_returns.shape[0]
w = cp.Variable(n)

# Creating objective function and constraints
objective = cp.quad_form(w, cov_returns)
constraints = [(np.ones(n))@w == 1]     # market-neutral constraint

# Solving for optimal portfolio
problem = cp.Problem(cp.Minimize(objective), constraints)
problem.solve()

# Print result
print("\nThe optimal value is", problem.value)
print("Solution weight is")
print(w.value)


The optimal value is 0.00015173208467840986
Solution weight is
[ 0.09393574  0.10101893  0.19661463  0.2360158   0.14246694  0.04654119
 -0.06193708  0.15767521  0.06110636  0.05979388  0.05569257  0.0164063
 -0.04202623 -0.06248073 -0.00082351]


### Model 3: Risk-efficient

#### $ \underset{W}{\text{min}} \quad  W^T\:\Sigma \: W $
#### $\textrm{s.t}\quad R^T\!W = \mu \quad , \quad \sum_{i=1}^{n}{w_i}=1\quad , \quad w_i\geq 0$

In [15]:
# Defining weights vector
n = 15
w = cp.Variable(n)

# Defining return rate in (0,0.25]
mu = (random())/4

# Creating objective function and constraints
objective = cp.quad_form(w, cov_returns)
constraints = [(ex_returns.to_numpy())@w == mu,      # target return constraint
               (np.ones(n))@w == 1]     # market-neutral constraint

# Solving for optimal portfolio
problem = cp.Problem(cp.Minimize(objective), constraints)
problem.solve()

# Print result
print('target risk: ', mu)
print("\nThe optimal value is", problem.value)
print("Solution weight is")
print(w.value)

target risk:  0.06376725643485542

The optimal value is 0.08316548264715101
Solution weight is
[-2.93034223 -1.63874226 -3.56089474 -1.34375183 -2.24498806 -1.09036456
  2.83962263  8.46558582  0.07907244 -0.2487511   1.8957191   1.62294545
 -1.1480467   1.04629547 -0.74335944]


### Model 4: Return-efficient

#### $ \underset{W}{\text{max}} \quad R^T\!W$
#### $\textrm{s.t}\quad W^T\:\Sigma \: W = \sigma^2 \quad , \quad \sum_{i=1}^{n}{w_i}=1 \quad , \quad w_i\geq 0$

### Model 5: Maximum Sharpe Ratio *
#### $ \underset{W}{\text{max}} \frac{R^T\!W - r_f}{\sqrt{W^T\:\Sigma \: W}}$
#### $\textrm{s.t}\quad \sum_{i=1}^{n}{w_i}=1 \quad , \quad w_i\geq 0$


## 6 - Conclusion


## 7 - References
