# Problem 1 Gram-Schmidt

To prove the existence of a solution, one only needs to validate one.

Consider:

$$\alpha = x \cdot y$$
$$\zeta =  y - \alpha x$$
$$\beta =  |\zeta|_{2}$$
$$ z = \beta^{-1} \zeta$$

One can easily prove that:
\begin{eqnarray}
 |z| &=& |\beta^{-1} \zeta|\\
 &=&1
\end{eqnarray}

And:
\begin{eqnarray}
 \alpha x+ \beta z &=& \alpha x + (y-\alpha x)\\
 &=& y
\end{eqnarray}

And:
\begin{eqnarray}
 x \cdot z &=& x \cdot \beta^{-1} \zeta\\
 &=& \beta^{-1} x \cdot (y-(x\cdot y) x)\\
 &=& \beta^{-1} [x \cdot y-(x\cdot y)] \\
 &=& 0
\end{eqnarray}

So $x \bot z$ .

Notice that solution exists only when $\beta \neq 0$

Equivalently,

$$|y-(x \cdot y) x| \neq 0$$

So if $x$ and $y$ are colinear, solution does not exists.

# Problem 2 Multi-stock model

Consider $n$ stocks with models:

$$ S_t^{(k)} = S_0^{(k)} \cdot exp(\tilde{\mu}^{(k)} t + \sigma^{(k)} B_t^{(k)}) $$

Assuming $cov(B_{t}^i,B_{s}^j) = min\{t,s\} \cdot \rho_{i,j}$ 

And: 
$$\sigma_k B_t^{(k)} = \sum_{i}^{k} \sigma_{k,i} W_t^{(i)}$$
$$\sigma^2_k = \sum_{j}^{min\{k,j\}} \sigma^2_{k,j}$$

Or in matrix form: 
$$\begin{bmatrix}
\sigma_1 B^{(1)}\\
\sigma_2 B^{(2)}\\
\sigma_3 B^{(3)}\\
...\\
\sigma_n B^{(n)}
\end{bmatrix}
{=}
\begin{bmatrix}
\sigma_{11}  & 0 & 0 &...&0\\
\sigma_{21}  & \sigma_{22} & 0 &...&0\\
\sigma_{31} & \sigma_{32 }&\sigma_{33} &... &0\\
...&...&...&...&...\\
\sigma_{n1} &\sigma_{n2} &\sigma_{n3} &...&\sigma_{nn}
\end{bmatrix}
\begin{bmatrix}
W^{(1)}\\
W^{(2)}\\
W^{(3)}\\
...\\
W^{(n)}
\end{bmatrix}
$$

Denote $\{\sigma_{i,j}\}$ as matrix $\Sigma$, $\{\sigma_{i}\}$ as the vector $\sigma$. The last equation can be written as:
$$diag[\sigma] B_t = \Sigma W_t$$

For simplicity, subscript t is ommited in this homework.

Since cov(W) is just $ t I$:
$$ cov(\Sigma W) = \Sigma ( tI) \Sigma^T = t \cdot \Sigma \Sigma^T $$

From historical data, $cov(\sigma \odot B)$ is directly observable, a.k.a the covariance matrix of stocks log returns.
Scale the covariance matrix by $t$, and decompose it into two matrix: $\Sigma, \Sigma^T$.
Cholesky decomposition is unique, so: 

\begin{eqnarray}
\Sigma &=& CholeskyDecompose(cov(diag[\sigma] B) / t)
\end{eqnarray}

In this particular problem, the setup is:
$$cov(diag[\sigma] B) = \begin{bmatrix}
\sigma_{1}^{2}  & \rho \sigma_{1} \sigma_{2} \\
\rho \sigma_{1} \sigma_{2}  & \sigma_{2}^{2} \\
\end{bmatrix} t $$

From (9) we know:
$$\Sigma = \begin{bmatrix}
\sigma_{1}  & 0 \\
\rho \sigma_{2} & \sqrt{1-\rho^2} \sigma_{2} \\
\end{bmatrix}$$

$$B = diag(\sigma)^{-1}\Sigma W$$
$$W = (diag(\sigma)^{-1}\Sigma)^{-1} B$$
$$W = diag[\sigma]\Sigma^{-1}B$$

Denote $diag(\sigma)^{-1}\Sigma:= S$

$$S = \begin{bmatrix}
1 & 0 \\
\rho  & \sqrt{1-\rho^2} \\
\end{bmatrix}$$


* Assuming that $\rho$ is not 1, i.e. these two stocks are different, we have:

$$W = S^{-1}B$$
$$W = \frac{1}{\sqrt{1-\rho^2}}\begin{bmatrix}
\sqrt{1-\rho^2} & 0 \\
-\rho  & 1 \\
\end{bmatrix} B$$

(1) $W$ is a combination of Brownian motions $B$, so $W$ is composed of Brownian motions.

(2) The covariance matrix of $W$ is:

$$S^{-1}cov(B)(S^{-1})^T = {(\frac{1}{\sqrt{1-\rho^2}})}^2 \begin{bmatrix}
 1-\rho^2 & 0\\
0 &  1-\rho^2 \\
\end{bmatrix} t = \begin{bmatrix}
t&0\\
0&t\\
\end{bmatrix}$$

* Elements in $W$ **independent standard Brownian motions**.

* **Comment**:

    In Problem 1, given two different unit vectors $x$ and $y$, we can solve for a unit vector $z$ such that $y$ lives in the space tensed by two orthogonal basises $x$ and $z$.

    In this problem, S is a matrix composed of two row vectors on a unit sphere. To solve $W = S^{-1}B$ for W, we need S to be full-rank which means the two row unit vectors in $S$ are different.

    Both problem leads to the fact that if the information in a two-dimension world can be represented by one vector, we can not recover the underlying brownian motions that drives the events. In stock market, that means two stocks have identical exposure to identical set of factors and the diversity represented by $det(S)$ is 0. Concludsions from Problem 1 and Problem 2 are consistent.

# Problem 3  Diversified Portfolio? 

In [2]:
import numpy as np #Handle math objects like vectors and matrixs 
import scipy as sp
from scipy.stats import norm # Normal cdf is used
import pandas as pd #Contains data from yahoo
import matplotlib.pyplot as plt # Seeing is believing! --Thrall
from datetime import datetime # Very usefull when you need to operate on date
from datetime import timedelta # Very usefull when you need to operate on date
from yahoo_finance import Share # API from Yahoo to fetch data
%matplotlib inline
float_formatter = lambda x: "%.3f" % x
np.set_printoptions(formatter={'float_kind':float_formatter})
pd.options.display.float_format = '{:20,.3f}'.format

We choose S&P 500 (ETF), Google, Apple and Amazon.

In [28]:
universe = ['SPY','GOOG','AAPL','AMZN'] # ,'MSFT','IBM','M','MNST','PNC','ROST','COO']

n = len(universe)

i = datetime.now()
j = i - timedelta(days=252*2)
currentDate = "%s-%s-%s"%(i.year,i.month,i.day)
startDate = "%s-%s-%s"%(j.year,j.month,j.day)

M is the dataframe that stores log returns of all the stocks.

In [6]:
count = 0
for symbol in universe:
    equity = Share(symbol)
    df = pd.DataFrame(equity.get_historical(startDate,currentDate))
    df.index = df.Date
    df.drop('Date',1,inplace = True)
    df.sort_index(inplace = True) # ,ascending = False
    df['Adj_Close'] = pd.to_numeric(df['Adj_Close'])
    label = symbol#+'_logReturn'
    df[label] = np.log(1+df.Adj_Close.diff(1)/df.Adj_Close)
#     df.fillna(0,inplace = True)
    df.dropna(inplace = True)
    
    if count==0:
        M = df[[label]].copy()
    else:
        M = M.join(df[[label]])
    
    count += 1


### (a) Compute the covariance matrix for the log returns


Solve the covariance matrix of the stocks ($cov(diag[\sigma] B)$):
$$\mu^{(k)} = \bar X^{(k)}-\sigma^{(k)}$$

In [82]:
dt = 1/252
covMat = np.cov(M.values.T)/dt

mu = M.mean()/dt
print(mu)

SPY                   0.096
GOOG                  0.052
AAPL                  0.124
AMZN                  0.155
dtype: float64


Vector $\sigma$ is the standard deviation of each stock:

$$\sigma = diag[cov(diag[\sigma] B)]^{\frac{1}{2}}$$

In [32]:
sigma = np.diag(covMat)**(0.5)
print(sigma)

[0.124 0.190 0.221 0.271]


### (b) -(c) Compute $\rho$, $R$ and $\Sigma$

Perform Cholesky decomposition on $Cov(diag[\sigma] B)$:

In [10]:
Sigma = np.linalg.cholesky(covMat)
print(Sigma)

[[0.124 0.000 0.000 0.000]
 [0.119 0.149 0.000 0.000]
 [0.126 0.027 0.180 0.000]
 [0.133 0.104 0.012 0.212]]


Recover standard deviation of log returns of stocks $\sigma_k$ from formula:

$$\sigma^2_k = \sum_{j}^{min\{k,j\}} \sigma^2_{k,j}$$

In [14]:
sigma = np.sum(Sigma**2,axis = 1)**(0.5)
print(sigma)

[0.124 0.190 0.221 0.271]


It should be consistent with the $\sigma$ we computed in the last part.

Recover the corrlation matrx $\rho$:

In [35]:
ss = np.outer(sigma,sigma)

corrMat = covMat/ss
print(corrMat)

[[1.000 0.624 0.570 0.490]
 [0.624 1.000 0.453 0.606]
 [0.570 0.453 1.000 0.363]
 [0.490 0.606 0.363 1.000]]


Recover matrix $\Sigma_R$:

$$\Sigma_R = diag[\sigma]^{-1} \Sigma$$

In [37]:
R = np.dot(np.diag(sigma**(-1)),Sigma)
print(R)

[[1.000 0.000 0.000 0.000]
 [0.624 0.781 0.000 0.000]
 [0.570 0.124 0.812 0.000]
 [0.490 0.384 0.044 0.781]]


## (d) Measure of diversity $d$:

$$ d = det(R)^{1/n}$$

In [51]:
d = np.linalg.det(R)**(1/n)
print(d)

0.839125928067


To answer which 2 stocks out of the 3 are the most diversified combination, we can maximize the determinant of the remaining R after removing 1 stocks.

In [80]:
print('If we remove stock 1, d = %.5f\nIf we remove stock 2, d = %.5f \nIf we remove stock 3, d = %.5f' % (np.product(R[[0,2,3],[0,2,3]]),
    np.product(R[[0,1,3],[0,1,3]]),
    np.product(R[[0,1,2],[0,1,2]])))

If we remove stock 1, d = 0.63443
If we remove stock 2, d = 0.61063 
If we remove stock 3, d = 0.63454


In [81]:
universe

['SPY', 'GOOG', 'AAPL', 'AMZN']

In concludsion, the second stock (GOOGLE) and the third (APPLE) are the best.