# Asset / Portfolio Returns


## Data preparation

There are some data samples we will use in the notebook, in both csv and messagepack formats. To load them, use the helpers provided:

In [1]:
from sample_data import stock_prices
aapl = stock_prices("aapl")
msft = stock_prices("msft")

As some month-end data might be missing (mainly because holidays), we need to pad the series with latest known values:

In [2]:
month_end_prices = msft.Close.resample("ME").ffill()
month_end_prices[:5]

Date
2010-01-31    28.180000
2010-02-28    28.670000
2010-03-31    29.290001
2010-04-30    30.540001
2010-05-31    25.799999
Freq: ME, Name: Close, dtype: float64

We can extract very basic parameters on this serie by using `describe()`

In [3]:
month_end_prices.describe()

count    151.000000
mean      95.898609
std       86.182672
min       23.010000
25%       31.930000
50%       55.090000
75%      132.280006
max      336.320007
Name: Close, dtype: float64

## Asset Returns

Net Return over month $t$ is defined by 

$\displaystyle R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1 = \%\Delta P_{t}$

Intuitively known as the _percentage change in price_.

In [4]:
net_return = msft.Close['2016-03-31'] / msft.Close['2016-02-29'] - 1
net_return

np.float64(0.08549525123432211)

that is equivalent to use `olhc` function we saw in Pandas introduction:

In [5]:
pt = msft.Close['2016-02-29':'2016-03-31'].resample("ME").ohlc().close
pt

Date
2016-02-29    50.880001
2016-03-31    55.230000
Freq: ME, Name: close, dtype: float64

In [6]:
net_return = (pt.iloc[1] / pt.iloc[0]) - 1
net_return

np.float64(0.08549525123432211)

Pandas already has a function to calculate it, named `pct_change`. For instance, this calculate the net returns for one month:

In [7]:
one_month_net_returns = month_end_prices.pct_change()
one_month_net_returns[:5]

Date
2010-01-31         NaN
2010-02-28    0.017388
2010-03-31    0.021625
2010-04-30    0.042677
2010-05-31   -0.155206
Freq: ME, Name: Close, dtype: float64

### Alternative method

Another way to calculate it is using $r_i = \log(\frac{p_i}{p_{i-1}}) = \log{p_i} - \log{p_{i-1}}$

The result is a good approximation for not too big percentage changes. It has the benefit that `log` operations are simple table lookups, so previous division is no longer needed. There're some additional notes on the math behind in [this answer from StackExchange](https://stats.stackexchange.com/questions/244199/why-is-it-that-natural-log-changes-are-percentage-changes-what-is-about-logs-th).

For this implementation we'll be using [`shift` method from Pandas' Series](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.shift.html), that returns the previous element in the sequence.

In [8]:
import numpy as np

def monthly_returns(closes):
    return np.log(closes) - np.log(closes.shift(1))

returns = monthly_returns(month_end_prices)
returns[:5]

Date
2010-01-31         NaN
2010-02-28    0.017239
2010-03-31    0.021395
2010-04-30    0.041791
2010-05-31   -0.168663
Freq: ME, Name: Close, dtype: float64

### K-Month Generalization

Beware of adding two simple one-period returns. Having price values for 3 months as $[2, 3, 1.5]$, we get returns as $R_{t-1} = (3/2)-1 = 0.5$ and $R_t = (1.5/3)-1 = -0.5$. However, this gives us a two-period return of zero when added $R_{t-1}+R_t = 0.5 - 0.5 = 0$, when it should be $R_{t} = (P_t-P_{t2}) - 1 = (1.5 / 2) - 1 = -0.25$. 

We need to account for the combined growth by multiplying both returns $1 - R_t(2) = R_t \cdot R_{t-1} = 0.5 \cdot -0.5 = -0.25$

Better to generalize it as a geometric average:

$\displaystyle 1 + R_t(k) = \prod{1 + R_{t-j}}$ 

For instance, the prices for February to April 2016 are:

In [9]:
p = month_end_prices["2016-02":"2016-04"]
p

Date
2016-02-29    50.880001
2016-03-31    55.230000
2016-04-30    49.869999
Freq: ME, Name: Close, dtype: float64

And the percentage of change April and February are $1-R_t = (P_t/P_{t-2})$:

In [10]:
(p.iloc[2] / p.iloc[0]) - 1

np.float64(-0.019850670499757528)

Therefore, we can apply the formula above as:

In [11]:
((p.iloc[1] / p.iloc[0]) * (p.iloc[2]/p.iloc[1])) - 1

np.float64(-0.01985067049975764)

## Portfolio Returns

The same values we've calculated before for assets can easily be translated to whole _portfolios_.

Let's calculate the initial portfolio value and the amount of shares

In [12]:
msft_shares = 10
aapl_shares = 10
initial_portfolio_value = (msft_shares * msft.Close['2016-03-30']) + (aapl_shares * aapl.Close['2016-03-30'])
initial_portfolio_value

np.float64(824.3999862670898)

Percentage of each stock, should add to 1

In [13]:
x_msft = (msft_shares * msft.Close['2016-03-30']) / initial_portfolio_value
x_aapl = (aapl_shares * aapl.Close['2016-03-30']) / initial_portfolio_value
[x_msft, x_aapl, x_msft+x_aapl]

[np.float64(0.6677583715925172),
 np.float64(0.33224162840748284),
 np.float64(1.0)]

One-month returns for AAPL and MSFT

In [14]:
ret_msft = msft.Close['2016-04-29'] / msft.Close['2016-03-30'] - 1
ret_aapl = aapl.Close['2016-04-29'] / aapl.Close['2016-03-30'] - 1
[ret_msft, ret_aapl]

[np.float64(-0.0940962829603188), np.float64(-0.14439576530990283)]

You can see them as _weights_. Using them, the one-month rate of return on the portfolio is

In [15]:
rpt = (x_msft*ret_msft) + (x_aapl*ret_aapl)
rpt

np.float64(-0.11080786488419804)

The portfolio value at the end of month $t$ is $V_t = V_{t-1}(1 + R_{p,t})$

In [16]:
vt = initial_portfolio_value * (1 + rpt)
vt

np.float64(733.0499839782715)

In general, for a portfolio of $n$ assets with investment shares $x_i$ such that $x_1+...+x_n=1$, the one-period portfolio gross and simple returns are defined as

$\displaystyle R_{p,t} = \sum\limits_{i=1}^n x_i R_{i,t}$

---

## Exercise 3.1

Read the file `data/prices.csv` using Spark and compute the daily returns for all the symbols in the file. Can you spot any possible performance problem in the solution presented? How does it compare to Pandas?

_Tip: you can use [window functions](https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.window.html) to access the data._