In [None]:
from importlib import reload

import pandas as pd
import numpy as np
from enum import Enum

import helper
from helper import Col

%matplotlib notebook

In [None]:
sp500_df = helper.get_sp500_df('sp500_monthly.csv')
sp500_df.head()

## Expected Yearly Returns

If I invest in S&P 500 today, what returns do I expect in 1 year?

Let the `PRICE` at month $i$ be $P_i$. Let the `RETURNS` of month $i$ be $X_i = \frac{P_{i+1}}{P_i}$.

Assume:
1. $X_1, \ldots, X_n$ are $i.i.d.$ (note that this is a very strong assumption, it will be relaxed later).

Now we have by the LLN:
1. $\overline{\mu_X} = \overline{X_n} \to_{n \to \infty} \mu_X$
2. $\overline{\sigma^2_X} = \overline{X^2_n} - \overline{X_n}^2 \to_{n \to \infty} \sigma^2_X$


So let's calculate the sample mean ($\overline{\mu_X}$) and the sample variance ($\overline{\sigma^2_X}$).

In [None]:
reload(helper)

In [None]:
returns_srs = helper.get_returns(sp500_df.PRICE)
returns_srs.head()

Let's see what the distribution of returns looks like.

In [None]:
returns_srs.hist(bins=20)

In [None]:
sample_mean_returns = returns_srs.mean()
sample_mean_returns

In [None]:
sample_variance_returns = returns_srs.var()
sample_variance_returns

In [None]:
yearly_mean_returns = sample_mean_returns**12
yearly_mean_returns

The variable $y =$ `yearly_mean_returns` answers the question: If I invest $P$ dollars today in the S&P 500, how much do I expect to have in a year?. Answer : $P \times y$.

Now because it is not easy to calculate $\mathrm{Var}[X_1 \times \ldots X_{12}]$, let's transition to a new model of interpreting the returns. 

Let's assume that the price at month $i$, $P_i$ is the result of a continuously compounding rate. Let's say that this rate is constant over each month, but that it can change over each month. To put this more precisely:

- Let $Y_i$ be the rate of returns over month $i$. 
- Let $P_i$ be the price at the beggining of month $i$.
- This implies that $P_i = P_1 \exp(Y_1) \times \ldots \times \exp(Y_{i-1}) = P_1 \exp(Y_1 + \ldots + Y_{i-1})$.
- This implies that $Y_i = \log(\frac{P_{i+1}}{P_i}) = \log P_{i+1} - \log P_i$.

$Y_i$ is the rate of returns during month $i$ if the returns are continuously compounding. Let's call it the "log returns" for short. This should make sense by looking at the last bullet point above, since $Y_i$ is just the $\log$ of the returns as we had previously defined them. 

In [None]:
reload(helper)

In [None]:
log_returns_srs = helper.get_log_returns(sp500_df.PRICE)
log_returns_srs.head()

In [None]:
sample_mean_log_returns = log_returns_srs.mean()
sample_mean_log_returns

In [None]:
sample_variance_log_returns = log_returns_srs.var()
sample_variance_log_returns

Now we can calculate the variance for our yearly returns, since the rate of returns for the whole year will simply be $Y_1 + \ldots + Y_{12}$.

Since we assumed that the $Y_i$'s are $i.i.d.$, then $\mathrm{Var}[Y_1 + \ldots + Y_{12}] = \mathrm{Var}[Y_1] + \ldots + \mathrm{Var}[Y_{12}] = 12 \times Var[Y_1] \approx 12 \times \overline{\sigma_Y^2}$