# **Getting them stonks: Portfolio mean-variance optimaization**
---

## Contents:

>[1 - Introduction](#1---Introduction)
>
>[2 - Importing modules](#2---Importing-modules)
>
>[3 - Data retrieval](#3---Data-retrieval)
>
>[4 - Preprocessing](#4---Preprocessing)
>
>[5 - Modelling](#5---Modelling)
>
>[6 - Conclusion](#6---Conclusion)
>


## 1 - Introduction

1. Present aim
2. Explain Theory (mean-variance optimization)
3. Get data (summary stats)
4. Plot Efficient frontier + portfolio points (MC simulation)
5. Change parameters
6. Extensions

### **Q:** Situation: You won the lottery, recieved the paycheck for your summer internship, or that distant uncle you didn't even know passed and left you some money... what do you do?
### **A:** Invest... but how?

Recepie for investment:

1. Define a goal/strategy
2. Pick suitable assets
3. **Construct a suitable portfolio**
4. Check and repeat

### **Q:** Given $n$ assets, what is the optimal allocation of these within a portfolio?
### **A:** There are many...

### The Mean-Variance framework:
- Developed by Harry Markowitz in 1952 (earned him Nobel Price in Economics)
- Aims to solve the above problem using two ingredients:
    1. The volatility of asset returns (risk) - for stocks, this is the average log first difference in stock prices
    2. The expected asset returns (reward) - for stocks, this is the sample covariance of periodic returns
- Shortcomings:
    - Stock returns can be non-stationary
    
### Goal: Using those two ingredients, find an set of weights for how much each asset should make up of the total portfolio
    

## 2 - Importing modules

In [1]:
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
%matplotlib inline

## 3 - Data retrieval

We focus our attention on [Investopedia's Top Stocks for March 2021](https://www.investopedia.com/top-stocks-4581225)

In [2]:
# Specify asset symbols
stocks = ['NRG','BIO','VIRT','WTM','ALL','MAT','FCX','IAC','ZM','CE','MRNA','PTON','ETSY','TSLA','ZS']
data = web.DataReader(stocks, 'yahoo', start='2020/01/01', end='2021/02/10')
data.head()

Attributes,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Symbols,NRG,BIO,VIRT,WTM,ALL,MAT,FCX,IAC,ZM,CE,...,MAT,FCX,IAC,ZM,CE,MRNA,PTON,ETSY,TSLA,ZS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2020-01-02,37.159161,372.160004,15.453154,1121.527832,110.420639,13.3,13.151219,,68.720001,117.754089,...,3288500.0,22771700.0,,1315500.0,911300.0,1233600.0,5916200.0,2152300.0,47660500.0,1377200.0
2020-01-03,36.584541,366.779999,15.813854,1119.320557,110.43042,13.48,12.752698,,67.279999,116.15403,...,2928400.0,20401300.0,,1127900.0,770900.0,1751000.0,4974100.0,2109800.0,88892500.0,1165200.0
2020-01-06,35.799213,372.029999,15.320263,1118.881104,110.75341,14.07,12.802513,,70.32,114.873993,...,4184200.0,19145300.0,,3151600.0,710900.0,1606500.0,4028900.0,2077100.0,50665000.0,1534600.0
2020-01-07,35.31078,380.540009,15.272803,1110.760986,109.804039,14.18,13.001774,,71.900002,114.427917,...,8298600.0,20849500.0,,6985400.0,1231300.0,1461400.0,3072000.0,1945500.0,89410500.0,1714900.0
2020-01-08,35.301205,381.790009,15.291789,1114.196899,110.107437,14.13,13.131293,,72.550003,113.535767,...,3304000.0,17484700.0,,2482300.0,826300.0,1041600.0,7474100.0,3222700.0,155721500.0,3232500.0


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 280 entries, 2020-01-02 to 2021-02-10
Data columns (total 90 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   (Adj Close, NRG)   280 non-null    float64
 1   (Adj Close, BIO)   280 non-null    float64
 2   (Adj Close, VIRT)  280 non-null    float64
 3   (Adj Close, WTM)   280 non-null    float64
 4   (Adj Close, ALL)   280 non-null    float64
 5   (Adj Close, MAT)   280 non-null    float64
 6   (Adj Close, FCX)   280 non-null    float64
 7   (Adj Close, IAC)   155 non-null    float64
 8   (Adj Close, ZM)    280 non-null    float64
 9   (Adj Close, CE)    280 non-null    float64
 10  (Adj Close, MRNA)  280 non-null    float64
 11  (Adj Close, PTON)  280 non-null    float64
 12  (Adj Close, ETSY)  280 non-null    float64
 13  (Adj Close, TSLA)  280 non-null    float64
 14  (Adj Close, ZS)    280 non-null    float64
 15  (Close, NRG)       280 non-null    float64
 16  (Close,

## 4 - Preprocessing

In [4]:
data = data['Adj Close']
data.head()

Symbols,NRG,BIO,VIRT,WTM,ALL,MAT,FCX,IAC,ZM,CE,MRNA,PTON,ETSY,TSLA,ZS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2020-01-02,37.159161,372.160004,15.453154,1121.527832,110.420639,13.3,13.151219,,68.720001,117.754089,19.23,29.74,45.189999,86.052002,47.330002
2020-01-03,36.584541,366.779999,15.813854,1119.320557,110.43042,13.48,12.752698,,67.279999,116.15403,18.889999,30.6,44.900002,88.601997,47.380001
2020-01-06,35.799213,372.029999,15.320263,1118.881104,110.75341,14.07,12.802513,,70.32,114.873993,18.129999,29.75,44.834999,90.307999,48.700001
2020-01-07,35.31078,380.540009,15.272803,1110.760986,109.804039,14.18,13.001774,,71.900002,114.427917,17.780001,30.4,45.779999,93.811996,48.400002
2020-01-08,35.301205,381.790009,15.291789,1114.196899,110.107437,14.13,13.131293,,72.550003,113.535767,17.98,29.65,45.005001,98.428001,50.75


In [5]:
returns = (np.log(data)).diff()
returns.info()


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 280 entries, 2020-01-02 to 2021-02-10
Data columns (total 15 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   NRG     279 non-null    float64
 1   BIO     279 non-null    float64
 2   VIRT    279 non-null    float64
 3   WTM     279 non-null    float64
 4   ALL     279 non-null    float64
 5   MAT     279 non-null    float64
 6   FCX     279 non-null    float64
 7   IAC     154 non-null    float64
 8   ZM      279 non-null    float64
 9   CE      279 non-null    float64
 10  MRNA    279 non-null    float64
 11  PTON    279 non-null    float64
 12  ETSY    279 non-null    float64
 13  TSLA    279 non-null    float64
 14  ZS      279 non-null    float64
dtypes: float64(15)
memory usage: 35.0 KB


In [6]:
ex_returns = returns.mean()
cov_returns = returns.cov()

<class 'pandas.core.frame.DataFrame'>
Index: 15 entries, NRG to ZS
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       15 non-null     float64
dtypes: float64(1)
memory usage: 880.0+ bytes
<class 'pandas.core.frame.DataFrame'>
Index: 15 entries, NRG to ZS
Data columns (total 15 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   NRG     15 non-null     float64
 1   BIO     15 non-null     float64
 2   VIRT    15 non-null     float64
 3   WTM     15 non-null     float64
 4   ALL     15 non-null     float64
 5   MAT     15 non-null     float64
 6   FCX     15 non-null     float64
 7   IAC     15 non-null     float64
 8   ZM      15 non-null     float64
 9   CE      15 non-null     float64
 10  MRNA    15 non-null     float64
 11  PTON    15 non-null     float64
 12  ETSY    15 non-null     float64
 13  TSLA    15 non-null     float64
 14  ZS      15 non-null     float64
dtypes: float64

## 5 - Modelling

1. Define our objective function
2. Find gradient vector
3. Implement Stochastic Gradient Descent
4. OPTIMIZE!

#### Mathematically:
#### $ \underset{W}{\text{min}} \quad  W^T\:\Sigma \: W \quad \textrm{s.t}\quad  R^T\!W = \mu \;, \quad \sum_{i=1}^{n}{w_i}=1$
#### where:

#### $W=\begin{bmatrix}
w_1\\
\vdots \\
w_n
\end{bmatrix} \quad,\quad R=\begin{bmatrix}
\mathbb{E}[r_1]\\
\vdots \\
\mathbb{E}[r_n]
\end{bmatrix} \quad,\quad\Sigma = \begin{bmatrix}
\sigma_{11} & \dots & \sigma_{n1}\\
\vdots & \ddots & \vdots\\
\sigma_{1n} & \dots & \sigma_{nn}
\end{bmatrix}$

#### $ \underset{W}{\text{min}} \quad  W^T\:\Sigma \: W \quad \textrm{s.t}\quad  R^T\!W = \mu \;, \quad \sum_{i=1}^{n}{w_i}=1$
#### $\implies \mathcal{L}(W,\lambda) = W^T\:\Sigma \: W - \lambda( R^T\!W - \mu)$

In [12]:
def objective(w):
    cov_returns


Approximating the gradient numerically using **finite differences method**

* Backward difference  $f'(x) \approx \frac{f(x_k) - f(x_k - \epsilon)}{\epsilon}$
* Forward difference  $f'(x) \approx \frac{f(x_k + \epsilon) - f(x_k)}{\epsilon}$
* Central difference $f'(x) \approx \frac{f(x_k + \frac{\epsilon}{2}) - f(x_k - \frac{\epsilon}{2})}{\epsilon}$

The central difference approximation gives the most accurate one among these three. Therefore, let's implement that one here.

In [None]:
def central_finite_diff(f, x):
    dim = x.shape[0]
    
    eps  = np.sqrt(np.finfo(float).eps) 
    grad = np.zeros((1,dim))
    
    for i in range(dim):
        e = np.zeros((1,dim))
        e[0,i] = eps
        grad_approx = (f(x + (e/2)) - f(x-(e/2)))/eps
        grad[0,i] = grad_approxr
    return grad


In [None]:
def gradient_descent(f, x0, grad_f, lr, max_iter=1e5, grad_tol=1e-4, traj=False):
    '''
    Gradient Descent
    INPUTS:
        f        : Function
        x0       : Initial guess
        grad_f   : Gradient function
        lr       : Learning rate
        max_iter : Maximum number of iterations
        grad_tol : Tolerance for gradient approximation
        traj     : Boolean for plotting
    OUTPUTS:
        x        : Optimal point
        iter_i   : Number of iterations needed
    '''
    
    # Initialize problem
    x      = np.copy(x0)
    iter_i = 0
    grad_i = grad_tol*10
    
    # Plotting
    if traj == True:
        x_list = []
        f_list = []
        
    while np.sum(np.abs(grad_i)) > grad_tol and iter_i < max_iter:
        
        grad_i  = grad_f(f, x) # compute gradient
        x       = x - lr*grad_i       # compute step
        iter_i += 1
        
        # Plotting
        if traj == True:
            x_list.append(x.flatten().tolist())
            f_list.append(f(x))
        
    print(' Optimization using Gradient Descent \n')
    print('Iterations: ', iter_i)
    print('Optimal x : ', x) 
    print('Final grad: ', grad_i)
    
    # Trajectory    
    if traj == True:
        return x, x_list, f_list, 
        
    return x, iter_i

## 6 - Conclusion
