# <font color='tomato' style="font-size:40px"><center><b>02. Fundamentals of Risk and Returns</b></center></font>

In this module we discuss returns, risk, and risk-adjusted performance measures. These concepts are key to portfolio analysis and management. In particular, we discuss simple and cumulative returns, expected returns, standard deviation, as well as a measure of downside risk - drawdown. We discuss, also, risk-adjusted performance measures including return-risk ratio as well as the Sharpe ratio.

On the programming side of things, we review how to import and manipulate financial data using Pandas, how to plot the output of our analysis using Plotly, and how to implement in Python all of the concepts that we talk about in class.

It may be useful to consult Lectures on Pandas and Plotly from the Bootcamp  if you have difficulties understanding the code that we are using. But, hopefully, it should be sufficiently clear as is.

In every class, we import packages necessary for further work:
* <font color='mediumseagreen'><b>Pandas</b></font>
* <font color='mediumseagreen'><b>DateTime</b></font>
* <font color='mediumseagreen'><b>NumPy</b></font>
* <font color='mediumseagreen'><b>Plotly - Graph objects</b></font>
* function <font color='DodgerBlue'><b>make_subplots</b></font> from <font color='mediumseagreen'><b>Plotly</b></font>

In [2]:
import datetime as dt
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## <font color='orange' style="font-size:25px"><b>Returns</b></font>

Suppose we want to compare performance of two stocks. Price of the first one was initially 10 and grew to 12 dollars per share, while the price of the second one was 1000 dollars and grew to 1002 dollars per share. Both increased by 2 dollars

In [1]:
12-10, 1002-1000

(2, 2)

On the other hand, it is clear that the first stock grew a lot, relative to the initial price, while the second stock barely moved. So, dollar returns, while important, do not tell investors the full story about their performance.

Most investors are concerned with returns (relative returns) instead of dollar returns (price changes of assets or absolute returns). There are good reason for that:

* Return on an asset is a scale-free measure of the investment performance. Scale-free means that, using returns, we can compare two investments that have different dollar costs.

* Also, (relative) returns have nicer statistical properties than dollar returns

### <font color='MediumVioletRed' style="font-size:20px"><b>Simple returns</b></font>

**Simple or arithmetic returns** for the unit time period is defined as (here $P_t$ is asset price at time $t$):

$$r_{t+1}=\frac{P_{t+1}−P_{t}}{P_{t}}=\frac{ΔP_t}{P_{t}}=\frac{P_{t+1}}{P_{t}}−1$$

The unit time period can be 1 day (in which case this would be simple daily return), 1 week (this would be simple weekly return) etc.

For the example above, simple returns would be

In [3]:
(12-10)/10,(1002-1000)/1000

(0.2, 0.002)

Price at time $t+1$ is related to the simple return as follows:

$$P_{t+1}=P_{t}(1+r_{t+1})$$

So, if we know the price at time $t$, in order to forecast the price at time $t+1$ we need to forecast the return between $t$ and $t+1$. We need to, in other words, answer the following question: what is the one-day (one-week, one-month etc) return that we expect is to be realized in the next period.

It is convenient to use both net returns $r_t$  ( numbers like 0.03, -0.01, etc.) and gross returns $(1+r_t)$ (which will be something like 1.03, 0.99, etc.). Thus, price tomorrow would be price today times the gross return between today and tomorrow.

We have seen this in the first lectures when we talked about $u$ and $d$. Those were gross returns on risky assets, while $u-1$ and $d-1$ are net returns.

If you buy a stock for 10 dollars and sell it for 12, the net return (we from now on skip the word net and call it just return) is:

In [4]:
(12-10)/10

0.2

On the other hand, gross return would be $\frac{P_{t+1}}{P_{t}}$. In our example this is:

In [5]:
12/10

1.2

If the stock had  paid dividends between t and t+1, the relevant return capturing the monetary gain/loss of the shareholder is

$$TR_{t,t+1} = \frac{P_{t+1} - P_{t} + D_{t,t+1}}{P_{t}}$$

This is the **total return** (without dividends we sometimes cal returns **price return**).

Having said that, utilizing adjusted stock prices, available in most data bases such as Yahoo Finance! takes care of the dividends. In that case, formally calculating price returns of adjusted stock prices is the same as calculating total returns.

Unless otherwise stated, and for the simplicity of the exposition, we use price returns but apply them to adjusted stock prices unless otherwise stated.

### <font color='MediumVioletRed' style="font-size:20px"><b>Multiperiod returns</b></font>

Suppose you have purchased a stock at time $t$ and sold it at time $t+2$. What is the cumulative return on your investment? Clearly, it is

$$r_{t,t+2} = \frac{P_{t+2}}{P_{t}} -1$$

If our unit of time measure is 1 day, one way to think of this expression is in terms of returns that are cumulated over the two days.  Thus, we call return over multiple periods **cumulative return**.

We can calculate cumulative returns using either time series of prices or using returns data. Note that:

$$r_{t,t+2}=\frac{P_{t+2}}{P_{t}} -1 =\frac{P_{t+2}}{P_{t+1}}\frac{P_{t+1}}{P_{t}}-1 $$

This can be rewritten in terms of gross returns as follows:

$$\frac{P_{t+1}}{P_{t}} = 1+r_{t,t+1}$$

$$\frac{P_{t+2}}{P_{t+1}} = 1+r_{t+1,t+2}$$

Thus, cumulative return for 2 periods is, therefore

$$r_{t,t+2} = (1+r_{t,t+1})(1+r_{t+1,t+2}) -1$$


Suppose you buy a stock that gains 10 percent on the first day and loses 3 percent on the second day. Cumulative return for the two day period is:

In [6]:
(1+0.1)*(1-0.03)-1

0.06699999999999995

Thus, the gain is 6.7 percent for the two day period.

Q: You buy a stock at the closing price on Monday. On Tuesday, the stock closes at 10% above Monday's closing price. On Wednesday, it falls and closes at 10% below Tuesday’s closing price and you sell it at that closing price. Did you have a negative, positive or zero return?

In [9]:
((1+0.1)*(1-0.1)-1)

-0.009999999999999898

### <font color='MediumVioletRed' style="font-size:20px"><b>Annualization</b></font>

Suppose a monthly return is 1 percent. What would the annual return be provided that this return persists for 12 months?

One way to answer this is to say 0.01x12 = 0.12. While many people calculate it this way, it would not be entirely accurate because it ignores accumulation of interest on interest (or, return on return).

Cumulative returns for 12 months would satisfy:

$$1+r_{t,t+12} = (1+r_{t,t+1})(1+r_{t+1,t+2})\cdots (1+r_{t+11,t+12})$$

Assuming that return in period 1 persists in all other periods (i.e. that returns will be the same in all periods) we obtain the following expression:

$$1+r_{t,t+12} = (1+r_{t,t+1})^{12} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \Longleftrightarrow  \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, r_{t,t+12} = (1+r_{t,t+1})^{12}-1$$

This implies that the annualized return would be equal to:

In [10]:
(1+0.01)**12-1

0.12682503013196977

So, it is 12.68 percent

If monthy return is very small, the difference is miniscule. Namely, using Taylor expansion,

$$(1+r_{t,t+1})^{12}-1 \approx (1 + 12 \, r_{t,t+1}) -1 = 12 \, r_{t,t+1}$$

Q: A stock gains 1% over a quarter (i.e. a 3 month period). What is it’s annualized return?

In [11]:
(1+0.01)**4-1

0.040604010000000024

Thus, one way to annualize is to use just multiplication of the single period by the number of periods (the first methods)

The second, more sophisticated rule for annualization is to take into the account cumulation as we explained above.

### <font color='MediumVioletRed' style="font-size:20px"><b>Importing financial data into Pandas</b></font>

Let us import data from file 'fivepricesNew.csv'. If you remember the Bootcamp, you know that this can be easily done with <font color='mediumseagreen'><b>Pandas</b></font>' function <font color='DodgerBlue'><b>pdread_csv</b></font>. Google Colab users should upload the data first:

In [12]:
from google.colab import files
uploaded1=files.upload()

Saving fivepricesNew.csv to fivepricesNew.csv


In [13]:
prices = pd.read_csv('fivepricesNew.csv',index_col=0,parse_dates=True)
prices

Unnamed: 0_level_0,AMZN,YHOO,IBM,AAPL,^GSPC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-02,257.309998,20.080000,196.350006,78.432899,1462.42
2013-01-03,258.480011,19.780001,195.270004,77.442299,1459.37
2013-01-04,259.149994,19.860001,193.990005,75.285698,1466.47
2013-01-07,268.459015,19.400000,193.139999,74.842903,1461.89
2013-01-08,266.380005,19.660000,192.869995,75.044296,1457.15
...,...,...,...,...,...
2014-12-24,303.029999,50.650002,161.820007,112.010002,2081.88
2014-12-26,309.089996,50.860001,162.339996,113.989998,2088.77
2014-12-29,312.040008,50.529999,160.509995,113.910004,2090.57
2014-12-30,310.299988,51.220001,160.050003,112.519997,2080.35


This file contains adjusted daily prices on 4 large US stocks and on the SP500 index. This index is one of the most frequently used proxies for the US stock market. It is constructed utilizing companies with the largest market cap.

This is how we can calculate simple (daily) returns in <font color='Green'><b>Pandas</b></font>:

In [14]:
returns = prices.pct_change()
returns

Unnamed: 0_level_0,AMZN,YHOO,IBM,AAPL,^GSPC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-02,,,,,
2013-01-03,0.004547,-0.014940,-0.005500,-0.012630,-0.002086
2013-01-04,0.002592,0.004044,-0.006555,-0.027848,0.004865
2013-01-07,0.035921,-0.023162,-0.004382,-0.005882,-0.003123
2013-01-08,-0.007744,0.013402,-0.001398,0.002691,-0.003242
...,...,...,...,...,...
2014-12-24,-0.010627,0.012595,-0.002589,-0.004709,-0.000139
2014-12-26,0.019998,0.004146,0.003213,0.017677,0.003310
2014-12-29,0.009544,-0.006488,-0.011273,-0.000702,0.000862
2014-12-30,-0.005576,0.013655,-0.002866,-0.012203,-0.004889


Note that we did not have the data on the prices prior to '20131-01-02', thus we obtained a missing data symbol 'NaN'. We could drop the first row  containing the missing data.

In [16]:
returns = returns.dropna()
returns

Unnamed: 0_level_0,AMZN,YHOO,IBM,AAPL,^GSPC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-03,0.004547,-0.014940,-0.005500,-0.012630,-0.002086
2013-01-04,0.002592,0.004044,-0.006555,-0.027848,0.004865
2013-01-07,0.035921,-0.023162,-0.004382,-0.005882,-0.003123
2013-01-08,-0.007744,0.013402,-0.001398,0.002691,-0.003242
2013-01-09,-0.000113,-0.016785,-0.002852,-0.015629,0.002656
...,...,...,...,...,...
2014-12-24,-0.010627,0.012595,-0.002589,-0.004709,-0.000139
2014-12-26,0.019998,0.004146,0.003213,0.017677,0.003310
2014-12-29,0.009544,-0.006488,-0.011273,-0.000702,0.000862
2014-12-30,-0.005576,0.013655,-0.002866,-0.012203,-0.004889


We can now find out cumulative returns on all 5 stocks for the entire period of 2 years (from the beginning of Jan 2013 to the end of Dec 2014). Method <font color='DodgerBlue'><b>prod</b></font> calculates the cumulative product for the entire period.

The Apple stock performed similar to the market, while Amazon had half of that cumulative return. On the other hand, IBM was the clear loser, and Yahoo! the clear winner. **If you invested 1000 dollars into a  Yahoo stock at the beginning of 2013, you would have ended up with over 2500 at the end of 2014 (minus the commisions).**

In [17]:
(returns+1).prod()

AMZN     1.206133
YHOO     2.515438
IBM      0.817112
AAPL     1.407318
^GSPC    1.407872
dtype: float64

Suppose we want to see how our investment of, say, 1 dollar accumulated day after day, rather than in aggregate for the 2 years. In that case we can calculate gross cumulative returns for each day.

The easiest way to do this is to divide stock prices at each point in time by the initial price.

In [19]:
prices_init = prices.iloc[0] #initial stock prices, i.e. row number 0
prices_init

AMZN      257.309998
YHOO       20.080000
IBM       196.350006
AAPL       78.432899
^GSPC    1462.420000
Name: 2013-01-02 00:00:00, dtype: float64

Cumulative gross returns (how much would 1 dollar investment grow over time) is

In [20]:
cum_gross_return = prices/prices_init
cum_gross_return

Unnamed: 0_level_0,AMZN,YHOO,IBM,AAPL,^GSPC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-02,1.000000,1.000000,1.000000,1.000000,1.000000
2013-01-03,1.004547,0.985060,0.994500,0.987370,0.997914
2013-01-04,1.007151,0.989044,0.987981,0.959874,1.002769
2013-01-07,1.043329,0.966135,0.983652,0.954228,0.999638
2013-01-08,1.035249,0.979084,0.982276,0.956796,0.996396
...,...,...,...,...,...
2014-12-24,1.177685,2.522410,0.824141,1.428100,1.423586
2014-12-26,1.201236,2.532869,0.826789,1.453344,1.428297
2014-12-29,1.212701,2.516434,0.817469,1.452324,1.429528
2014-12-30,1.205938,2.550797,0.815126,1.434602,1.422539


It is often easier to see what is going on with the data when we use their graphical representation. For this purpose Aleksandar and I utilize Plotly. Plotly allows for powerful interactive plots, but for now it suffices to do simple stuff.

Those unfamiliar with Plotly can of course learn some details of the implementation from the example we provide but can also check the literature from the Bootcamp or the Plotly manual.

In [21]:
fig=go.Figure()

fig.add_trace(go.Scatter(x=cum_gross_return,y=cum_gross_return['AMZN'],mode='lines',line=dict(color='red',width=2),name='AMZN'))
fig.add_trace(go.Scatter(x=cum_gross_return,y=cum_gross_return['YHOO'],mode='lines',line=dict(color='blue',width=2),name='YHOO'))

fig.update_layout(width=800,height=600)
fig.update_layout(title=dict(text='<b>Gross cumulative returns</b>',font=dict(color='FireBrick',size=40),x=0.5,y=0.9))

fig.show()

## <font color='orange' style="font-size:25px"><b>Measures of risk and reward</b></font>  

**Expected returns** on investment measure our prediction of gain (or loss) when undertaking an investment. **Volatility** (whose square is variance) measures risk of our investment. Below we briefly discuss these concepts.

### <font color='MediumVioletRed' style="font-size:20px"><b>Expected returns</b></font>

If there are $S$ possible states of the world, the expected return would be:

$$ \mathbb{E}(r) = \sum_{s=1}^{S}p_s r_s  $$

Here $p_s$ is probability of realization of state $s$ while $r_s$ is return in that state.

Of course, strictly speaking, we do not know either possible values of returns or probability with which they are realized. What we do know are past returns.


In practice, we *estimate* expected returns. We mention here two approaches. Both are based on past returns. While past performance is not necessarily indicative of future performance, we still proceed as these approaches are used a lot in practice.

The first approach of estimating expected return on an asset is **sample mean of returns** (daily, weekly, or monthly):

$$\overline{r}=\frac{1}{N}\sum_{t=1}^{N}r_t$$

Here, $N$ is the number of past returns, and $r_t$ return realization at time $t$.

Investors usually think in terms of annual quantities (i.e. 12 months returns). On the other hand, returns we are observed are usually daily, weekly or monthly.

How do we annualize?


$$\overline{r}_a=252 \overline{r}_d\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \Longleftrightarrow \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \overline{r}_a=52 \overline{r}_w \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \Longleftrightarrow \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \overline{r}_a=12 \overline{r}_m$$

Note that in this definition, daily (weekly, or monthly) expected return is estimated using the arithmetic mean of the simple daily (weekly, monthly) returns. This quantity is annualized not taking into the account accumulation of returns on returns, i.e. using the simple multiplication by the number of trading periods in one year.

Let us calculate the expected returns using this approach:

In [22]:
returns.mean() #daily expected returns

AMZN     0.000550
YHOO     0.002014
IBM     -0.000338
AAPL     0.000807
^GSPC    0.000705
dtype: float64

In [23]:
ExpRet1 =returns.mean()*252 #anualized daily returns
ExpRet1

AMZN     0.138659
YHOO     0.507439
IBM     -0.085088
AAPL     0.203447
^GSPC    0.177588
dtype: float64

Note that the expected annual return on Yahoo, using this method would be close to 51 percent. The IBM would be expected to lose money.

Another way for estimating the expected returns is to use **annualized compounded sample returns**. We  calculate, namely, the geometric average of gross returns and then annualize them taking into the account interest on interest.

This is how that is done:

$$\overline{r}_a=\left(\left(\prod_{t=1}^{N}(1+r_t)\right)^{\frac{1}{N}}\right)^n-1 = \left(\prod_{t=1}^{N}(1+r_t)\right)^{\frac{n}{N}}-1$$

where compounding factor $n$ (small n) is number of periods, i.e. 252 for daily , 52 for weekly and 12 for monthly compounding and $N$ is the size of the sample that we use for calculation.

Let's calculate expected returns using this approach:

In [24]:
ExpRet2=(1+returns).prod()**(252/len(returns))-1
ExpRet2

AMZN     0.098445
YHOO     0.587468
IBM     -0.096239
AAPL     0.186707
^GSPC    0.186941
dtype: float64

The numbers obtained clearly indicate that the two methods do not give identical estimates for the returns. In fact, when we use other methods such as CAPM, multi-factor models, or Black-Litterman, discrepancy between different estimates may be even bigger. One should bear this in mind when attempting to use portfolio optimization in practice - the results are very sensitive to the way we prepare inputs.

Before we continue any further, I would like to introduce the concept of **excess returns with respect to a benchmark**.

Here we take as a benchmark a market proxy, in this case SP500 index. In that case, stocks that are expected to do better than the market are said to be expected to **outperform the market**. Here we use simple mean approach.


In [26]:
ExpRet1.iloc[-1] #expect return on SP500, i.e. on the market

0.17758845046197477

In [27]:
ExpRet1.iloc[:-1]-ExpRet1.iloc[-1] #Anualized expected return on individual stocks with respect to the market

AMZN   -0.038929
YHOO    0.329851
IBM    -0.262676
AAPL    0.025858
dtype: float64

Note that Amazon, using the first method for calculating the expected return, is expected to underperform the market by close to 4 percent annually while IBM is expected to underperform by 26 percent. On the other hand, Yahoo is expected to outperform the market by 33 percent. Apple, according to this method, is also expected to overperform but just slightly.

However, if we use the second method for expected returns calculation (i.e. annualized compounded returns) only Yahoo stock is expected to outperform the market

In [28]:
ExpRet2.iloc[:-1]-ExpRet2.iloc[-1]

AMZN   -0.088496
YHOO    0.400527
IBM    -0.283180
AAPL   -0.000234
dtype: float64

### <font color='MediumVioletRed' style="font-size:20px"><b>Measuring risk - Variance and Volatility</b></font>

Variance is a measure of squared deviation from the mean.

$$ \mathbb{V}(r) = \mathbb{E}((r- \mathbb{E}(r))^2)  $$

We can estimate variance, like the mean, using many different methods. Here we focus on sample variance. It can be estimated as follows:

$$\sigma_{d}^2 = \frac{1}{N-1} \sum_{t=1}^{N} (r_t - \overline{r})^2$$

While variance can be used as a measure of risk, it has the units of the square of returns. Taking its square root we obtain the quantity - the standard deviation - that has the same dimensions as the expected returns. Thus, the two can be compared.

We commonly use it to measure volatility of returns:

$$\sigma_{d} = \sqrt{ \frac{1}{N-1} \sum_{i=1}^{N} (r_i - \overline{r})^2}$$

How to annualize daily volatility? If we use daily data, annualizing is often done by the square root law (in the statistics course you shall revisit the conditions under which this is a sensible thing to do):

$$\sigma_a = \sigma_d \sqrt{252}$$

If we use weekly data we need to multiply by square root of 52, and if we use monthly by square root of 12. Now, let us calculate volatility of returns in our five assets. Here we will use <font color='mediumseagreen'><b>Pandas</b></font> method <font color='DeepPink'><b>std</b></font>:

In [29]:
vol=returns.std()*252**0.5
vol

AMZN     0.297614
YHOO     0.300029
IBM      0.178456
AAPL     0.252674
^GSPC    0.110939
dtype: float64

So, Yahoo has just marginally higher volatility than Amazon, but much higher expected return. On the other hand, the least risky is SP500 index. Why would that be?

We can perform the same calculation using <font color='mediumseagreen'><b>NumPy</b></font>'s function <font color='DodgerBlue'><b>np.std</b></font>. Note, however, that it by default calculates the standard deviation of the population instead of standard deviation of the sample.

However, there is a way in which you can still use <font color='mediumseagreen'><b>NumPy</b></font> to calculate volatility of returns. Basically, you have to set argument **ddof** (degrees of freedom) to 1:

In [30]:
np.std(returns,ddof=1)*np.sqrt(252)

AMZN     0.297614
YHOO     0.300029
IBM      0.178456
AAPL     0.252674
^GSPC    0.110939
dtype: float64

Annualized standard deviation is one possible proxy of volatility of returns. We sometimes use the two words interchangeably.

### <font color='MediumVioletRed' style="font-size:20px"><b>Risk-adjusted performance ratios</b></font>

Higher expected returns is usually is positively correlated with higher risks.

In order to tell somehow which investments are more attractive, we may need to compare expected returns with risk. One way to do that is to calculate the **expected return per unit of risk**.

It is the ratio of expected returns and the corresponding volatility of returns. Since we can calculate the expected returns in two different ways, our comparisons may depend on the type of expected returns calculatons that we use

In [31]:
ExpRet1/vol #comparison using the sample mean

AMZN     0.465903
YHOO     1.691300
IBM     -0.476800
AAPL     0.805175
^GSPC    1.600770
dtype: float64

Note that Yahoo seems to provide the best risk-return tradeoff, followed by the market proxy, SP500 portfolio.

In [32]:
ExpRet2/vol

AMZN     0.330782
YHOO     1.958039
IBM     -0.539287
AAPL     0.738925
^GSPC    1.685076
dtype: float64

Yahoo looks even better when we use the second (cumulative) approach.

The second, much more commonly used way, is to compare the corresponding **Sharpe ratios**, i.e. excess expected returns (with respect to risk free rate) with the corresponding volatility.

$$\text{Sharpe} = \frac{\overline{r}_a - r_f}{\sigma_a}$$

This is very similar to previous ratio except that now in the numerator we have excess annualized expected return on the portfolio with respect to the risk-free rate (also annual). The numerator is called the **risk premium** on risky asset (or portfolio) a.

Essentially, we use here risk-free rate as  a benchmark of performance. Again, we consider both approaches. Suppose that the risk free rate is 3 percent per annum.

In [33]:
rf=0.03
(ExpRet1-rf)/vol

AMZN     0.365101
YHOO     1.591310
IBM     -0.644909
AAPL     0.686445
^GSPC    1.330352
dtype: float64

Yahoo looks the most attractive followed by the market proxy.

Let's now use the second measure of expected returns in calculation of the Sharpe ratio:

In [34]:
(ExpRet2-rf)/vol

AMZN     0.229980
YHOO     1.858048
IBM     -0.707396
AAPL     0.620195
^GSPC    1.414658
dtype: float64

We would rank the assets similarly using the second method for the expected return estimation but the actual values of the Sharpe ratio vary substantially.

## <font color='orange' style="font-size:25px"><b>Let us now write functions that do these calculations (and will be used a lot)</b></font>

In order to avoid unnecessary repetitions in our calculations, we here present the functions that can later be used to automatize expected returns, standard deviation, and Sharpe ratio calculations.

In [36]:
def expRet(ret,n=252,com=False):
  """Calculates expected return in one of two methods. Parameters are
  -ret: Pandas' DataFrame or Series of returns
  -n: optional, number of compounding periods in year(252 for days, 52 for weeks, 12 for months, 4 for quarters...)
  -com: optional, determines wheter expected returns are calculate as sample mean or anualized compounded return
      Selecting True we use compounded returns estimation
"""
  if com:
    return (1+ret).prod()**(n/len(ret))-1
  else:
    return ret.mean()*n


Applying this function on previously created data set of returns to calculate expected return based on first method (annualized sample mean):

In [37]:
er1=expRet(returns,252) #Using the arithmetic mean
er1

AMZN     0.138659
YHOO     0.507439
IBM     -0.085088
AAPL     0.203447
^GSPC    0.177588
dtype: float64

Now calculate expected returns using the second method (i.e. annualized compounded sample returns):

In [38]:
er2=expRet(returns,252,True) #Using the geometric mean
er2

AMZN     0.098445
YHOO     0.587468
IBM     -0.096239
AAPL     0.186707
^GSPC    0.186941
dtype: float64

Next, let us now define function which calculates annualized volatility of sample returns:

In [39]:
def annualize_vol(ret,n=252):
  """Calculates volatility of sample returns. Parameters are:
  -ret: pandas' DataFrame or series of returns
  - n: optional, number of compounding periods in a year(252 for days, 52 for weeks, 12 for months, 4 for quarters)
  """
  return ret.std()*n**0.5

Now use it to calculate volatility of each stock. Which stock is the most risky? Is this the same stock which has the highest returns? What do you concludes?

In [40]:
annualize_vol(returns,252)

AMZN     0.297614
YHOO     0.300029
IBM      0.178456
AAPL     0.252674
^GSPC    0.110939
dtype: float64

Finally, we are going to define function which calculates the Sharpe ratio. In its construction we consider both approaches of expected returns calculation. Also, we give user the opportunity to set her own annualized risk-free rate:

In [43]:
def sharpe(ret,n=252,rf=0,com=False):
  """Calculate Sharpe's ratio. Parameters are:
  -ret: Pandas' DataFrame or series of returns
  -rf: optional, risk free rate(should be given as decimal, if it is omitted, we asume it to be zero)
  -n: optional, number of compounding periods in a year(252 for days, 52 for weeks, 12 for months, 4 for quarters...)
  com: optional, determines whether expected returns are calcualted as annualized sample mean or annualized compounded return
    selecting True we select compounded returns estimation for the expected returns
    """
  return(expRet(ret,n,com)-rf)/annualize_vol(ret,n)

Calculate the Sharpe ratio for each stock in the sample under assumption that risk-free rate is 3% annually. If we calculate expected returns using the first approach, the Sharpe ratio is:

In [44]:
sharpe(returns,252,rf=0.03)

AMZN     0.365101
YHOO     1.591310
IBM     -0.644909
AAPL     0.686445
^GSPC    1.330352
dtype: float64

Note that the second argument n is an optional argument equal by default to 252. Thus, we could skip 252 as the second argument.

In [45]:
sharpe(returns,rf=0.03)

AMZN     0.365101
YHOO     1.591310
IBM     -0.644909
AAPL     0.686445
^GSPC    1.330352
dtype: float64

The second approach leads to:

In [46]:
sharpe(returns,252,rf=0.03,com=True)

AMZN     0.229980
YHOO     1.858048
IBM     -0.707396
AAPL     0.620195
^GSPC    1.414658
dtype: float64

### <font color='MediumVioletRed' style="font-size:20px"><b>Data set for Fama-French model</b></font>

Here we have a large data set capturing historical performance of the US stock market. The data represents monthly returns in percentage points. Portfolios are formed by market caps. Thus, one portfolio are stocks in the first decile by market cap, the second decile by market cap, etc.

In [47]:
from google.colab import files
uploaded2=files.upload()

Saving Portfolios_Formed_on_ME_Branko.csv to Portfolios_Formed_on_ME_Branko.csv


In [49]:
returnsFF = pd.read_csv("Portfolios_Formed_on_ME_Branko.csv",header=0,index_col=0,parse_dates=True,na_values=-99.99)
returnsFF

Unnamed: 0,<= 0,Lo 30,Med 40,Hi 30,Lo 20,Qnt 2,Qnt 3,Qnt 4,Hi 20,Lo 10,2-Dec,3-Dec,4-Dec,5-Dec,6-Dec,7-Dec,8-Dec,9-Dec,Hi 10
192607,,0.14,1.59,3.42,0.37,0.78,1.68,1.39,3.67,-0.12,0.52,-0.05,1.31,1.21,2.04,1.58,1.29,3.53,3.71
192608,,3.21,2.77,2.91,2.26,3.51,3.75,1.53,3.07,1.33,2.55,4.00,3.20,2.81,4.45,1.61,1.49,0.61,3.79
192609,,-1.74,-0.88,0.80,-1.39,-1.06,0.05,-0.26,0.81,0.59,-2.00,-2.01,-0.46,-0.06,0.14,-2.02,0.74,-0.77,1.25
192610,,-2.94,-3.28,-2.79,-2.57,-3.93,-2.67,-3.38,-2.74,-4.33,-2.01,-3.25,-4.35,-2.93,-2.48,-3.60,-3.26,-3.36,-2.56
192611,,-0.38,3.73,2.74,-0.95,2.94,3.52,3.25,2.71,-3.30,-0.23,0.08,4.74,3.64,3.44,3.63,3.05,3.86,2.40
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
201908,,-6.47,-5.13,-1.94,-6.33,-5.41,-5.66,-4.38,-1.75,-5.63,-6.73,-6.62,-4.64,-6.17,-5.28,-4.69,-4.19,-2.64,-1.58
201909,,2.78,1.65,1.64,2.92,2.35,1.67,1.73,1.62,2.48,3.17,2.62,2.18,1.10,2.09,1.43,1.91,1.27,1.69
201910,,0.90,1.88,2.31,0.06,2.73,2.14,0.87,2.43,-1.83,1.15,1.90,3.24,3.06,1.46,1.07,0.75,0.45,2.82
201911,,5.28,4.41,3.90,5.05,4.62,4.34,5.03,3.78,3.17,6.12,5.55,4.06,4.54,4.19,4.62,5.28,3.83,3.78


Note that here we had to parse the data slightly different. This data set we use again, thus it is convenient to learn how to parse it. It requires some adjustments that some of the other datasets do not require. But, none of the steps is too complicated.

We won't bother ourselves with the whole data set. Instead, focus only on two portfolios: Ones with the lowest and the highest companies with respect to market caps.

In [50]:
returns = returnsFF[['Lo 10','Hi 10']]
returns

Unnamed: 0,Lo 10,Hi 10
192607,-0.12,3.71
192608,1.33,3.79
192609,0.59,1.25
192610,-4.33,-2.56
192611,-3.30,2.40
...,...,...
201908,-5.63,-1.58
201909,2.48,1.69
201910,-1.83,2.82
201911,3.17,3.78


Note that indices here are not a <font color='mediumseagreen'><b>DataTime</b></font> object. We can use <font color='DodgerBlue'><b>pd.to_datetime</b></font> to transform strings into <font color='mediumseagreen'><b>DataTime</b></font> object of a chosen format:

In [53]:
pd.to_datetime(returns.index,format="%Y%m")

DatetimeIndex(['1926-07-01', '1926-08-01', '1926-09-01', '1926-10-01',
               '1926-11-01', '1926-12-01', '1927-01-01', '1927-02-01',
               '1927-03-01', '1927-04-01',
               ...
               '2019-03-01', '2019-04-01', '2019-05-01', '2019-06-01',
               '2019-07-01', '2019-08-01', '2019-09-01', '2019-10-01',
               '2019-11-01', '2019-12-01'],
              dtype='datetime64[ns]', length=1122, freq=None)

Let's save it as index of our data set:

In [54]:
returns.index=pd.to_datetime(returns.index,format="%Y%m")
returns

Unnamed: 0,Lo 10,Hi 10
1926-07-01,-0.12,3.71
1926-08-01,1.33,3.79
1926-09-01,0.59,1.25
1926-10-01,-4.33,-2.56
1926-11-01,-3.30,2.40
...,...,...
2019-08-01,-5.63,-1.58
2019-09-01,2.48,1.69
2019-10-01,-1.83,2.82
2019-11-01,3.17,3.78


The numbers are in percentage points. Thus, I transform returns from percentage points to decimals:

In [55]:
returns = returns*0.01
returns

Unnamed: 0,Lo 10,Hi 10
1926-07-01,-0.0012,0.0371
1926-08-01,0.0133,0.0379
1926-09-01,0.0059,0.0125
1926-10-01,-0.0433,-0.0256
1926-11-01,-0.0330,0.0240
...,...,...
2019-08-01,-0.0563,-0.0158
2019-09-01,0.0248,0.0169
2019-10-01,-0.0183,0.0282
2019-11-01,0.0317,0.0378


Also I will change the names of columns since ticker symbols of this two portfolios may not be familiar to people without financial background. We will use labels *small* and *large cap*:

In [56]:
returns.columns=['SmallCap','LargeCap']
returns

Unnamed: 0,SmallCap,LargeCap
1926-07-01,-0.0012,0.0371
1926-08-01,0.0133,0.0379
1926-09-01,0.0059,0.0125
1926-10-01,-0.0433,-0.0256
1926-11-01,-0.0330,0.0240
...,...,...
2019-08-01,-0.0563,-0.0158
2019-09-01,0.0248,0.0169
2019-10-01,-0.0183,0.0282
2019-11-01,0.0317,0.0378


We can summarize the whole thing as the following list of commands


```python
# Summary of the data trasformation needed for the FF data set
returnsFF = pd.read_csv("Portfolios_Formed_on_ME_Branko.csv", header = 0, index_col=0, parse_dates=True, na_values=-99.99)
returns =returnsFF[['Lo 10', 'Hi 10']]
returns.columns = ['SmallCap', 'LargeCap']
returns.index=pd.to_datetime(returns.index, format ="%Y%m")
returns = returns*0.01
```




Now let's plot the two series of returns:

In [57]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=returns.index,y=returns['SmallCap'],line=dict(color='Purple'),name="SmallCap"))
fig.add_trace(go.Scatter(x=returns.index,y=returns['LargeCap'],line=dict(color='Orange'),name="LargeCap"))

fig.update_layout(title=dict(text='Returns',font=dict(size=30),x=0.5,y=0.9),yaxis=dict(zerolinecolor='Black'))

fig.show()

Based on the graph which of two portfolios is more volatile to you? Why? Let's check your answers:

In [59]:
annualize_vol(returns,12)

SmallCap    0.341059
LargeCap    0.173711
dtype: float64

Thus, large caps are less risky than small caps.

On the other hand let's check our expected returns on the two portfolios in both ways that we have discussed:

In [60]:
print(expRet(returns,12)) #Expected returns using sample mean
print("*"*50)
expRet(returns,12,True) #Expected returns using compounding

SmallCap    0.164308
LargeCap    0.107184
dtype: float64
**************************************************


SmallCap    0.121027
LargeCap    0.096091
dtype: float64

Riskier portfolio offers higher expected returns. Which of the two portfolios is more attractive according to risk-adjusted measures (for both types of expected returns) assuming that risk free rate is 3% annually?

In [61]:
print(sharpe(returns,12,0.05)) #Exp returns use sample mean
print("_"*50)
sharpe(returns,12,0.05,True) #exp returns use compounding

SmallCap    0.335156
LargeCap    0.329191
dtype: float64
__________________________________________________


SmallCap    0.208255
LargeCap    0.265333
dtype: float64

Both approaches give us the same conclusion - large companies outperm the small ones according to this measure.

## <font color='orange' style="font-size:25px"><b>Downside measures</b></font>

Returns aren't normally distributed. In practice extreme returns are more likely to happen, which isn't characteristic of normal distribution. Thus we need some method of estimating them, so that we can protect ourself from such outcomes. These measures are called downside mearures.

### <font color='MediumVioletRed' style="font-size:20px"><b>Maximum Drawdown</b></font>

Let us now introduce another measure of risk called the **Maximum drawdown**. It shows how much would you lose if you buy the asset at its very peak and sell it at its bottom price. This is a measure of ultimate **bad timing** risk. Investors care a lot about drawdown. Funds lose lots of clients after a large drawdown.

How do we compute Maximum drawdown? Here is the algorithm:

1. Compute a wealth index - Start with a fixed initial amount of investments (say, 1000 dollars). Using historical returns on your portfolio you compute the *wealth index*, i.e. how much money would have in your account over time.
2. Compute previous peaks - Find the highest prior peak up to that point in time.
3. Compute Maximal drawdown - For a given point in time, the return realized if we have purchased at the point of the highest peak and sold at that point in time is (relative) drawdown. The highest drawdown corresponds to minimum value of a drawdown, i.e. to the return obtain if we have purchased at the peak and sold at the bottom.

With this measure we can get the feeling about how much we can lose if we make the ultimately bad timing decision with regard to our portfolio.

There is also risk-adjusted ratio based on drawdown. It is called **Calmar ratio**. This is ratio of the annualized return over the trailing 36 months divided by the maximum drawdown over those 36 months.

Drawbacks of this measure are:
* Sensitive to outliers (Drawdown defined by peak and trough).
* Depend a lot on a frequency. E.g., a deep drawdown using daily data may completely dissapear if we consider monthly data.

Despite this drawbacks, drawdown extremely popular with practitioners.

#### <font color='MediumPurple' style="font-size:16px"><b>Step 1: Compute the wealth index from returns</b></font>

Let's start with cumulative returns. Cumulative return from time $0$ to time $t$ can be denoted as $r_{0,t} \equiv cr_t$. Suppose $V_t$ is portfolio value at time $t$. Gross cumulative portfolio returns satisfy:

$$1+cr_t=\frac{V_t}{V_0}=(1+r_1)(1+r_2)\cdots (1+r_t)$$

The wealth index at time $t$ is obtained when we multiply both sides of this equation with the initial investment ($V_0$):

$$V_t=V_0(1+cr_t)$$

Thus, portfolio wealth at time t is equal to the product of the initial investment and the gross cumulative return up to time $t$.

Suppose our initial investment is $ 1000 dollars and that we have invested this money in the Large Cap portfolio. The wealth index of this investment is computed in the following way:

In [62]:
wealth_index= 1000*(1+returns['LargeCap']).cumprod()
wealth_index.head()

1926-07-01    1037.100000
1926-08-01    1076.406090
1926-09-01    1089.861166
1926-10-01    1061.960720
1926-11-01    1087.447778
Name: LargeCap, dtype: float64

Let's present it graphically:

In [67]:
fig=go.Figure()

fig.add_trace(go.Scatter(x=wealth_index.index,y=wealth_index.values,line=dict(color='Green')))

fig.update_layout(title=dict(text='Wealth Index of LargeCap', font=dict(size=30),x=0.5,y=0.9),yaxis=dict(zerolinecolor='Black'))

fig.show()

#### <font color='MediumPurple' style="font-size:16px"><b>Step 2: Compute the previous peaks</b></font>  

Previous peak calculated at time t is the maximal value of the portfolio for the period beginning at time $0$ and ending at time $t$. For example, for $t=5$, previous peak is the maximal value of all observations obtained from $t=0$ to $t=5$ (i.e. $0,1,2,3,4,5$).

We can calculate this value for each date in the sample (our $t$) very elegantly using the <font color='mediumseagreen'><b>Pandas</b></font>' method <font color='DeepPink'><b>cummax</b></font>. It calculates the cumulative maximum, which is exactly what we need in this case:

In [68]:
previous_peaks= wealth_index.cummax()
previous_peaks.head()

1926-07-01    1037.100000
1926-08-01    1076.406090
1926-09-01    1089.861166
1926-10-01    1089.861166
1926-11-01    1089.861166
Name: LargeCap, dtype: float64

Now, let us present this series of cumulative maximums (i.e. previous peaks) graphically:

In [69]:
fig=go.Figure()

fig.add_trace(go.Scatter(x=previous_peaks.index,y=previous_peaks.values,line=dict(color='SkyBlue')))

fig.update_layout(title=dict(text='Previous Peaks', font=dict(size=30),x=0.5,y=0.9),yaxis=dict(zerolinecolor='Black'))

fig.show()

#### <font color='MediumPurple' style="font-size:16px"><b>Step 3: Compute drawdowns</b></font>

We can speak of **absolute** and **relative** drawdowns. Absolute drawdown is the difference between the wealth index at time $t$ and the previous peak calculated at time $t$ (for each $t$ in our sample).

 On the other hand, the relative drawdown is the absolute drawdown divided by the value of previous peak at time $t$ (for each $t$ in the sample).

Below we focus on *relative drawdowns*. Let's calculate them:

In [70]:
drawdowns = (wealth_index-previous_peaks)/previous_peaks
drawdowns.head()

1926-07-01    0.000000
1926-08-01    0.000000
1926-09-01    0.000000
1926-10-01   -0.025600
1926-11-01   -0.002214
Name: LargeCap, dtype: float64

Next, let's present drawdowns graphically. Here I would like to show you the beauty of <font color='mediumseagreen'><b>Plotly</b></font>'s **rangeslider**:

In [74]:
fig= go.Figure()

fig.add_trace(go.Scatter(x=drawdowns.index,y=drawdowns.values,line=dict(color='Purple')))

fig.update_layout(title=dict(text="Drawdowns",font=dict(size=30),x=0.5,y=0.9),yaxis=dict(zerolinecolor='black'),
                  xaxis_rangeslider_visible=True,height=600,width=1000)

fig.show()

What is the maximum drawdown? To answear that question we will use <font color='mediumseagreen'><b>Pandas</b></font>' method <font color='DeepPink'><b>min</b></font> which finds the minimal value of data set:

In [75]:
drawdowns.min()

-0.8215272613546656

This is, by definition, the Maximum drawdown. It tells us that we would have lost maximally about 82% of our wealth if we had invested in this portfolio at the peak value and sold when it was at its lowest value.  

Instead of looking at drawdowns for the whole period of data, suppose that I want to find the largest drawdown since 1970. If you remember, <font color='mediumseagreen'><b>Pandas</b></font> can extract parts of data set based on date rage that you give. So, in this example we can write the following:

In [78]:
drawdowns["1970":].min()

-0.4947207957938188

So, when did this happen? For this we use <font color='mediumseagreen'><b>Pandas</b></font>' method <font color='DeepPink'><b>idxmin</b></font> which show us the index value of the row in which minimum was found:

In [79]:
drawdowns["1970":].idxmin()

Timestamp('2009-02-01 00:00:00')

Since we are here, let's find when did occur maximal drawdown for entire sample of data:

In [80]:
drawdowns.idxmin()

Timestamp('1932-06-01 00:00:00')

#### <font color='MediumPurple' style="font-size:16px"><b>The function that calculates drawdowns</b></font>

Everything that we have seen previously can be put in single function in following way:

In [81]:
def drawdown(ret: pd.Series):
  """Computes and returns a dataframe which contains wealth index,previous peaks and relative drawdowns. Parameters are:
  -ret: time series of asset returns"""
  wealth_index=1000*(1+ret).cumprod()
  previous_peaks = wealth_index.cummax()
  drawdowns = (wealth_index-previous_peaks)/previous_peaks
  return pd.DataFrame({"Wealth":wealth_index,"Peaks":previous_peaks,"Drawdowns":drawdowns})


The function take as an argument series of returns or dataframe of returns. This is the only argument.

Note that as output we have <font color='DodgerBlue'><b>pd.DataFrame</b></font> which consists of wealth index, previous peaks and drawdowns. Let's use it to calculate this values for the same portfolio as in previous example:

In [82]:
df_largecap = drawdown(returns['LargeCap'])
df_largecap.head()

Unnamed: 0,Wealth,Peaks,Drawdowns
1926-07-01,1037.1,1037.1,0.0
1926-08-01,1076.40609,1076.40609,0.0
1926-09-01,1089.861166,1089.861166,0.0
1926-10-01,1061.96072,1089.861166,-0.0256
1926-11-01,1087.447778,1089.861166,-0.002214


Let's plot wealth and previous peaks on the same graph. Difference between this two lines is absolute drawdown: