# Asset / Portfolio Returns


## Data preparation

There are some data samples we will use in the notebook, in both csv and messagepack formats. To load them, use the helpers provided:

In [None]:
from sample_data import stock_prices
aapl = stock_prices("aapl")
msft = stock_prices("msft")

As some month-end data might be missing (mainly because holidays), we need to pad the series with latest known values:

In [None]:
month_end_prices = msft.Close.resample("M").ffill()
month_end_prices[:5]

We can extract very basic parameters on this serie by using `describe()`

In [None]:
month_end_prices.describe()

## Asset Returns

Net Return over month $t$ is defined by 

$\displaystyle R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1 = \%\Delta P_{t}$

Intuitively known as the _percentage change in price_.

In [None]:
net_return = msft.Close['2016-03-31'] / msft.Close['2016-02-29'] - 1
net_return

that is equivalent to use `olhc` function we saw in Pandas introduction:

In [None]:
pt = msft.Close['2016-02-29':'2016-03-31'].resample("M").ohlc().close
pt

In [None]:
net_return = (pt.iloc[1] / pt.iloc[0]) - 1
net_return

Pandas already has a function to calculate it, named `pct_change`. For instance, this calculates the net returns for one month:

In [None]:
one_month_net_returns = month_end_prices.pct_change()
one_month_net_returns[:5]

### Alternative method

Another way to calculate it is using $r_i = \log(\frac{p_i}{p_{i-1}}) = \log{p_i} - \log{p_{i-1}}$

The result is a good approximation for not too big percentage changes. It has the benefit that `log` operations are simple table lookups, so previous division is no longer needed. There're some additional notes on the math behind in [this answer from StackExchange](https://stats.stackexchange.com/questions/244199/why-is-it-that-natural-log-changes-are-percentage-changes-what-is-about-logs-th).

For this implementation we'll be using [`shift` method from Pandas' Series](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.shift.html), that returns the previous element in the sequence.

In [None]:
import numpy as np

def monthly_returns(closes):
    return np.log(closes) - np.log(closes.shift(1))

returns = monthly_returns(month_end_prices)
returns[:5]

### K-Month Generalization

Beware of adding two simple one-period returns. Having price values for 3 months as $[2, 3, 1.5]$, we get returns as $R_{t-1} = (3/2)-1 = 0.5$ and $R_t = (1.5/3)-1 = -0.5$. However, this gives us a two-period return of zero when added $R_{t-1}+R_t = 0.5 - 0.5 = 0$, when it should be $R_{t} = (P_t-P_{t2}) - 1 = (1.5 / 2) - 1 = -0.25$. 

We need to account for the combined growth by multiplying both returns $1 - R_t(2) = R_t \cdot R_{t-1} = 0.5 \cdot -0.5 = -0.25$

Better to generalize it as a "geometric average":

$\displaystyle 1 + R_t(k) = \prod{1 + R_{t-j}}$ 

For instance, the prices for February to April 2016 are:

In [None]:
p = month_end_prices["2016-02":"2016-04"]
p

And the percentage of change April and February are $1-R_t = (P_t/P_{t-2})$:

In [None]:
(p.iloc[2] / p.iloc[0]) - 1

Therefore, we can apply the formula above as:

In [None]:
((p.iloc[1] / p.iloc[0]) * (p.iloc[2]/p.iloc[1])) - 1

## Portfolio Returns

The same values we've calculated before for assets can easily be translated to whole _portfolios_.

Let's calculate the initial portfolio value and the amount of shares

In [None]:
msft_shares = 10
aapl_shares = 10
initial_portfolio_value = (msft_shares * msft.Close['2016-03-30']) + (aapl_shares * aapl.Close['2016-03-30'])
initial_portfolio_value

Percentage of each stock, should add to 1

In [None]:
x_msft = (msft_shares * msft.Close['2016-03-30']) / initial_portfolio_value
x_aapl = (aapl_shares * aapl.Close['2016-03-30']) / initial_portfolio_value
[x_msft, x_aapl, x_msft+x_aapl]

One-month returns for AAPL and MSFT

In [None]:
ret_msft = msft.Close['2016-04-29'] / msft.Close['2016-03-30'] - 1
ret_aapl = aapl.Close['2016-04-29'] / aapl.Close['2016-03-30'] - 1
[ret_msft, ret_aapl]

You can see them as _weights_. Using them, the one-month rate of return on the portfolio is

In [None]:
rpt = (x_msft*ret_msft) + (x_aapl*ret_aapl)
rpt

The portfolio value at the end of month $t$ is $V_t = V_{t-1}(1 + R_{p,t})$

In [None]:
vt = initial_portfolio_value * (1 + rpt)
vt

In general, for a portfolio of $n$ assets with investment shares $x_i$ such that $x_1+...+x_n=1$, the one-period portfolio gross and simple returns are defined as

$\displaystyle R_{p,t} = \sum\limits_{i=1}^n x_i R_{i,t}$

---

## Exercise 3.1

Read the file `data/prices.csv` using Spark and compute the daily returns for all the symbols in the file. Can you spot any possible performance problem in the solution presented? How does it compare to Pandas?

_Tip: you can use [window functions](https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.window.html) to access the data._