# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Assignment 08: Finance Portfolio

## Learning Objectives

At the end of the experiment, you will be able to

* understand portfolio optimization
* compute volatility, covariance, correlation, and expected returns for a portfolio of two company stocks
* optimize a portfolio comprised of four company stocks

## Information

**Portfolio optimization** is the process of creating a portfolio of assets, for which the investment has the maximum return and minimum risk.

A **portfolio** is the investment in different kinds of assets from different companies. For example, if we have investments in 3 companies, say, Google, Amazon and Tesla, then these 3 companies make up our investment portfolio.

Usually, when we build a portfolio of assets from different companies, for all assets, we will get a profit after a specified period of time. However, the profit may not be the same for each investment we make. This profit is termed as **returns**.

When the company whose stocks we have purchased goes bankrupt, instead of gaining profits, we will also lose the capital investment. This is termed as a **risk** of investment.

Modern Portfolio Theory is a mathematical process that allows us to maximize returns for a given risk level.

**Efficient frontier** is a graph with ‘returns’ on the Y-axis and ‘volatility’ on the X-axis. It shows the set of optimal portfolios that offer the highest expected return for a given risk level or the lowest risk for a given level of expected return.

### Setup Steps:

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "" #@param {type:"string"}

In [None]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "M8_AST_08_Finance_Portfolio_C" #name of the notebook

def setup():

    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://learn-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")

### Import required packages

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
import yfinance as yf
import warnings
warnings.filterwarnings("ignore")

Let's see some fundamental statistical terms required in the portfolio optimization process through the below example.

**Example 1:** Consider a portfolio made up of stocks from just 2 companies, Tesla and Facebook. Calculate the volatility, covariance, correlation, and expected returns for this portfolio.

In the below code cell we pull the required stock price data from Yahoo and the dates for which we will be pulling the data is from 1st January 2018 to 31st December 2019.

In [None]:
# Read data
dt = yf.download("TSLA META", start='2018-01-01', end='2019-12-31')
dt.head()

As we can see, there are a lot of different columns for different prices throughout the day, but we will only focus on the ‘Close’ column. This column gives us the closing price of the company’s stock on the given day.

In [None]:
# Closing price
test = dt['Close']
test.head()

**Calculate the percentage change in stock prices**

Now, we calculate the percentage change in stock prices of Tesla every day by taking the log of percentage change as  the log of the returns is time additive.

It is common practice in portfolio optimization to take the log of returns for calculations of covariance and correlation.

Here, we use pandas' `pct_change()` function to compute percentage change between the current and a prior element. This function by default calculates the percentage change from the immediately previous row.

To know more about pct_change() function, click [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pct_change.html).

In [None]:
# Log of percentage change for Tesla
tesla = test['TSLA'].pct_change().apply(lambda x: np.log(1+x))
tesla.head()

In [None]:
# Variance for Tesla
var_tesla = tesla.var()
var_tesla

Similarly for Facebook:

In [None]:
# Log of Percentage change for Facebook
fb = test['META'].pct_change().apply(lambda x: np.log(1+x))
fb.head()

In [None]:
# Variance for Facebook
var_fb = fb.var()
var_fb

**Volatility**

Volatility is measured as the annual standard deviation of a company’s stock. From the square root of variance, we get only the daily standard deviation. To convert it to annual standard deviation we multiply the variance by 250 as there are 250 trading days in a year.

**Note:** This volatility is for individual stocks and is different from the  Volatility of a portfolio we saw in the previous assignment.

In [None]:
# Volatility
tesla_vol = np.sqrt(var_tesla * 250)
fb_vol = np.sqrt(var_fb * 250)
tesla_vol, fb_vol

Let's plot the volatility of both Tesla and Facebook for better visualization.

In [None]:
# Volatility of both stocks
test.pct_change().apply(lambda x: np.log(1+x)).std().apply(lambda x: x*np.sqrt(250)).plot(kind='bar')
plt.ylabel("Volatility")
plt.show()

**Covariance and Correlation**

Covariance measures the directional relationship between the returns on two assets. Risk and volatility can be reduced in a portfolio by pairing assets that have a negative covariance.

In [None]:
# Log of Percentage change
test1 = test.pct_change().apply(lambda x: np.log(1+x))
test1.head()

In [None]:
# Covariance
test1.cov()

We can see that there is small positive covariance between Tesla and Facebook.

In [None]:
# Correlation
test1.corr()

In line with the covariance, the correlation between Tesla and Facebook is also positive.

**Expected Returns**

Expected returns of an asset are simply the mean of percentage change in its stock prices.

For expected returns, we need to define weights for the assets chosen. In simpler terms, this means we need to decide what percentage of the total money we want to hold in each company’s stock.

Usually, this decision is done by using optimization techniques but for now, we will consider random weights for Tesla and Facebook.

In [None]:
# Log of percentage change
test2 = test.pct_change().apply(lambda x: np.log(1+x))
test2.head()

Let’s define an array of random weights for the purpose of calculation that will represent the percentage allocation of investments between these two stocks. They must add up to 1.

In [None]:
# Define weights for allocation
w = [0.2, 0.8]
e_r_ind = test2.mean()
e_r_ind

The formula for the expected return on a portfolio is given by:

$$E(r) = (w_a * r_a) + (w_b * r_b)$$

where, $w$ is the weight of asset and

$r$ is the return of asset.

In [None]:
# Total expected return
e_r = (e_r_ind * w).sum()
e_r

### Building an Optimal Risk Portfolio

Now we create an optimal portfolio using the above concepts.

**Example 2:** Consider stocks from 4 companies, namely, Apple, Nike, Google and Amazon for a period of 5 years,

* calculate the weights of each asset
* calculate the expected returns, minimum variance portfolio, optimal risk portfolio and efficient frontier
* calculate the highest Sharpe ratio

Let’s start by pulling the required asset data from Yahoo.

In [None]:
#Download the required asset data from Yahoo
tickers = ['AAPL', 'NKE', 'GOOGL', 'AMZN']
df_ = yf.download(tickers, start='2015-01-01', end='2019-12-31')
df_.head()

In [None]:
# Keep the ‘Close’ column to perform calculations
df = df_['Close']
df.head()

Covariance and Correlation matrix

In [None]:
# Log of percentage change
cov_matrix = df.pct_change().apply(lambda x: np.log(1+x)).cov()
cov_matrix

In [None]:
# Correlation matrix
corr_matrix = df.pct_change().apply(lambda x: np.log(1+x)).corr()
corr_matrix

**Portfolio Variance**

The formula for calculating portfolio variance is given as,

$$\sigma^2(r_p) = \sum_{i=1}^{n} \sum_{j=1}^{n} w_i w_j Cov(r_i, r_j) $$

Here, $w_i$ and $w_j$ denote weights of all assets from $1$ to $n$ (in this case from 1 to 4) and $Cov(r_i, r_j)$ is the covariance of the two assets denoted by $i$ and $j$.

The simplest way to do this complex calculation is by defining a list of weights and multiplying this list horizontally and vertically with our covariance matrix.

Let’s define a random list of weights for all 4 assets.

In [None]:
# Randomly weighted portfolio's variance
w = {'AAPL': 0.1, 'NKE': 0.2, 'GOOGL': 0.5, 'AMZN': 0.2}
port_var = cov_matrix.mul(w, axis=0).mul(w, axis=1).sum().sum()
port_var

Thus we have found the portfolio variance. But for truly optimizing the portfolio, the random weights won't work. We need to calculate it according to what gives us maximum expected returns.

**Portfolio expected returns**

The mean of returns (given by the change in prices of asset stock prices) gives us the expected returns of that asset.
The sum of all individual expected returns further multiplied by the weight of assets give us the expected return for the portfolio.

Note that we use the `resample()` function to get yearly returns. The argument to the function, ‘Y’, denotes yearly. If we don't perform resampling, we will get daily returns.

In [None]:
# Yearly returns for individual companies
ind_er = df.resample('Y').last().pct_change().mean()
ind_er

In [None]:
# Portfolio returns
w = [0.1, 0.2, 0.5, 0.2]
port_er = (w*ind_er).sum()
port_er

**Plotting the efficient frontier**

Below, we can see the calculations and code for finding the optimal weights of assets and plotting the efficient frontier for given portfolio.
But first, let's take a look at the volatility and returns of individual assets.

In [None]:
# Volatility is given by the annual standard deviation
ann_sd = df.pct_change().apply(lambda x: np.log(1+x)).std().apply(lambda x: x*np.sqrt(250))
ann_sd

In [None]:
# Creating a table for visualising returns and volatility of assets
assets = pd.concat([ind_er, ann_sd], axis=1)
assets.columns = ['Returns', 'Volatility']
assets

From the above results, we can see that Amazon has the maximum risk attached but it also offers a higher return.

In order to plot the efficient frontier graph, we need to do this process for several iterations. In each iteration, we consider different weights for assets and calculate the return and volatility of that particular portfolio combination.

In [None]:
# Define an empty list for portfolio returns, volatility and asset weights
p_ret = []
p_vol = []
p_weights = []

num_assets = len(df.columns)
print("Number of assets: ", num_assets)
num_portfolios = 10000
print("Number of portfolios: ", num_portfolios)

In [None]:
np.random.seed(1)
for portfolio in range(num_portfolios):
    weights = np.random.random(num_assets)
    # sum of weights must be 1
    weights = weights/np.sum(weights)
    p_weights.append(weights)
    # Returns are the product of individual expected returns of asset and its weights
    returns = np.dot(weights, ind_er)
    p_ret.append(returns)
    # Portfolio Variance
    var = cov_matrix.mul(weights, axis=0).mul(weights, axis=1).sum().sum()
    # Daily standard deviation
    sd = np.sqrt(var)
    # Annual standard deviation = volatility
    ann_sd = sd * np.sqrt(250)
    p_vol.append(ann_sd)

In [None]:
data_ = {'Returns':p_ret, 'Volatility':p_vol}

for counter, symbol in enumerate(df.columns.tolist()):
    print(counter, symbol)
    data_[symbol + '_weight'] = [w[counter] for w in p_weights]

# Create dataframe of the 10000 portfolios
portfolios  = pd.DataFrame(data_)
portfolios.head()

We can see that there are a number of portfolios with different weights, returns and volatility. Plotting the returns and volatility from this dataframe will give the efficient frontier for our portfolio.

In [None]:
# Visualize efficient frontier
plt.figure(figsize=(8,6))
plt.scatter(portfolios['Volatility'], portfolios['Returns'], marker='o', s=10, alpha=0.3)
plt.title("Efficient Frontier")
plt.xlabel("Risk")
plt.ylabel("Returns")
plt.grid()
plt.show()

Each point on the line (left edge) represents an optimal portfolio of stocks that maximizes the returns for any given level of risk.

The point (portfolios) in the interior are sub-optimal for a given risk level. For every interior point, there is another that offers higher returns for the same risk.

On this graph, we can also see the combination of weights that will give all possible combinations: minimum volatility (left most point), maximum returns (top most point), and everything in between.

**Minimum Variance Portfolio**

In [None]:
# Minimum variance portfolio
min_var_port = portfolios.iloc[np.argmin(portfolios['Volatility'])]
min_var_port

In [None]:
# Weights for Minimum variance portfolio
sns.barplot(x = min_var_port[2:].index, y = min_var_port[2:])
plt.ylabel("Weights")
plt.show()

The minimum volatility is in a portfolio where the weights of Apple, Amazon, Google and Nike are about 25%, 4%, 30% and 40% respectively. This point can be plotted on the efficient frontier graph as shown:

In [None]:
# Visualize Efficient frontier
plt.subplots(figsize=[8,6])
plt.scatter(portfolios['Volatility'], portfolios['Returns'], marker='o', s=10, alpha=0.3)

# Visualize Minimum variance portfolio
plt.scatter(min_var_port[1], min_var_port[0], color='r', marker='*', s=300)
plt.title("Efficient Frontier")
plt.xlabel("Risk")
plt.ylabel("Returns")
plt.grid()
plt.show()

The red star denotes the most efficient portfolio with minimum volatility. Note that any point to the right of efficient frontier boundary is a sup-optimal portfolio.

We found the portfolio with minimum volatility, but we will notice that the return on this portfolio is pretty low. We want to maximize the return, even if it is a tradeoff with some level of risk. This is where a parameter called the Sharpe Ratio comes in.

The Sharpe ratio is the average return earned ($E(r_p)$) in excess of the risk-free rate ($r_f$) per unit of volatility ($\sigma_p$) or total risk.

$$Sharpe\ ratio = \frac{E(r_p) - r_f}{\sigma_p}$$

**Tangency Portfolio**

A tangency portfolio or an optimal risky portfolio can be considered as one that has the highest Sharpe ratio.

Let's find the optimal portfolio for our case:

In [None]:
# Risk-free return
rf = 0.01
# Sharpe ratios for different weight combinations
sharpe_ratios = (portfolios['Returns'] - rf)/portfolios['Volatility']

# Highest sharpe ratio
optimal_idx = np.argmax(sharpe_ratios)
print("Highest Sharpe ratio: ", sharpe_ratios[optimal_idx])

In [None]:
# Tangent portfolio
optimal_risky_port = portfolios.iloc[optimal_idx]
optimal_risky_port

In [None]:
# Weights for Tangency portfolio
sns.barplot(x = optimal_risky_port[2:].index, y = optimal_risky_port[2:])
plt.ylabel("Weights")
plt.show()

In [None]:
# Diference b/w tangent portfolio and minimum variance portfolio
optimal_risky_port - min_var_port

From the above results, we can notice that the difference in risk between minimum volatility portfolio and optimal risky portfolio is low while the difference in returns is high.

We can also plot this point on efficient frontier graph.

In [None]:
# Visualize Efficient frontier
plt.subplots(figsize=[8,6])
plt.scatter(portfolios['Volatility'], portfolios['Returns'], marker='o', s=10, alpha=0.3)

# Visualize Minimum variance portfolio
plt.scatter(min_var_port[1], min_var_port[0], color='r', marker='*', s=300)

# Visualize optimal or tangent portfolio
plt.scatter(optimal_risky_port[1], optimal_risky_port[0], color='g', marker='*', s=300)
plt.title("Efficient Frontier")
plt.xlabel("Risk")
plt.ylabel("Returns")
plt.grid()
plt.show()

In the above plot, the green star represents the optimal risky portfolio.

### Please answer the questions below to complete the experiment:




In [None]:
#@title The tangency portfolio is defined as the portfolio that has the { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "" #@param ["", "Highest returns", "Lowest variance", "Highest Sharpe ratio"]

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]

In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "" #@param {type:"string"}

In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "" #@param ["","Yes", "No"]

In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]

In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]

In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")