# Markowitz Efficient Frontier on NSE Stocks

Source: https://towardsdatascience.com/python-markowitz-optimization-b5e1623060f5
The credits for this notebook goes to Fábio Neves and his posts on Medium. This is my first notebook and it is primarily for me learning analytics. The following comments etc. may seem too much, but it is keeping in mind that I may use this notebook for future reference.
Thanks for going through this!

## What Is the Efficient Frontier?

The efficient frontier is the set of optimal portfolios that offer the highest expected return for a defined level of risk or the lowest risk for a given level of expected return. Portfolios that lie below the efficient frontier are sub-optimal because they do not provide enough return for the level of risk. Portfolios that cluster to the right of the efficient frontier are sub-optimal because they have a higher level of risk for the defined rate of return.

I'm utilizing the data from NSE NIFTY50 stocks, to prepare a Markowitz Efficient Frontier.


In [None]:
#Importing necessary libraries
import os
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy.optimize import minimize

### Data Explorartion
The data is avialable as independent .csv files in the folder, each with its own Open, Close, etc. since 2000. For this problem I need to compile all the closing prices for the stocks into  a single DataFrame -> stocks.
For this, I have utilised pands.join() in a loop to merge all the 5 Closing Values into a sinlge Data Frame.

I am limiting the number of stocks to 5 but, it can be extended to use all the 50 stocks.
The entire code has been optimised such that adjusting the below value of 5 to the number of stocks you want to include in the portfolio, would allow you too create the effiecient frontier. It requires no other changes.


In [None]:
#The final dataframe
stocks=pd.DataFrame()
for dirname, _, filenames in os.walk('/kaggle/input'):
    #just change the 5 to 50 to utilise all 50 stocks
    for filename in filenames[:5]:
        data=pd.DataFrame()
        #reading the csv files one at a time, and querrying 'Date' and 'Close' columns using a lambda
        data = pd.read_csv(os.path.join(dirname, filename), usecols=lambda x: x in ['Date', 'Close'], parse_dates=True)
        #renaming the Close column to the Stock-name i.e., filename
        data.rename(columns = {'Close':(filename.replace('.csv',''))}, inplace = True)
        data.set_index('Date',inplace=True)
        data.index = pd.to_datetime(data.index)
        #join returns a new dataframe, not inplace of old dataframe
        stocks=stocks.join(data, how='outer')

stocks.head()

### Normalizing Data
I’m going to use logarithmic returns, since it’s more convenient and it takes care of the normalization for the rest of the project. Converting everything to logarithmic returns is simple. Think of it as the log of an arithmetic daily return (which is obtained by dividing the price at day n, by the price at day n-1).

In [None]:
log_ret = np.log(stocks/stocks.shift(1))
log_ret.head()

 We need to prepare a for loop which will simulate several different combinations of the 5 stocks and save their Sharpe ratio. I’m going to use 10000 portfolios. The random seed at the top of the code is making sure I get the same random numbers every time for reproducibility.

In [None]:
np.random.seed(42)
num_ports = 10000
all_weights = np.zeros((num_ports, len(stocks.columns)))
ret_arr = np.zeros(num_ports)
vol_arr = np.zeros(num_ports)
sharpe_arr = np.zeros(num_ports)

for x in range(num_ports):
    # Weights
    weights = np.array(np.random.random(len(stocks.columns)))
    weights = weights/np.sum(weights)
    
    # Save weights
    all_weights[x,:] = weights
    
    # Expected yearly-return
    ret_arr[x] = np.sum((log_ret.mean() * weights * 252))
    
    # Expected volatility
    vol_arr[x] = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov()*252, weights)))
    
    # Sharpe Ratio
    sharpe_arr[x] = ret_arr[x]/vol_arr[x]

Let us find out the maximum Sharpe Ratio among the given portfolios.

In [None]:
print("Max Sharpe Ratio in the array: {}".format(sharpe_arr.max()))
print("It's location in the array: {}".format(sharpe_arr.argmax()))

 Let’s check the allocation weights in that index number for the maximum SR and save the return and volatility figures to use it in the chart later.

In [None]:
print(all_weights[sharpe_arr.argmax(),:])

max_sr_ret = ret_arr[sharpe_arr.argmax()]
max_sr_vol = vol_arr[sharpe_arr.argmax()]

## Plot the portfolios:

We have everything we need to plot a chart that compares all combinations in terms of volatility (or risk) and return, colored by Sharpe ratio. 

The red dot is obtained from the previous calculation above and it represents the return and volatility for the simulation with the maximum Sharpe ratio.

In [None]:
plt.figure(figsize=(15,8))
plt.scatter(vol_arr, ret_arr, c=sharpe_arr, cmap='viridis')
plt.colorbar(label='Sharpe Ratio')
plt.xlabel('Volatility')
plt.ylabel('Return')
plt.scatter(max_sr_vol, max_sr_ret,c='red', s=50) # red dot
plt.show()


The basic bullet shape can be observed in the given plot, which sort of outlines the efficient frontier for the given stocks.

## Important Functions

get_ret_vol_sr: Returns an array with the average Returns, Volatility and Sharpe Ratio for the given set of weights
neg_sharpe: Returns the negative SR for the given weights                                                                                                                                             
check_sum: checks if sum of weights is equal to 1. It will return 0 (zero) if the sum is 1.

In [None]:
def get_ret_vol_sr(weights):
    weights= np.array(weights)
    ret=np.sum(log_ret.mean() * weights)*252
    vol=np.sqrt(np.dot(weights.T, np.dot(log_ret.cov()*252, weights)))
    sr=ret/vol
    return np.array([ret, vol, sr])

def neg_sharpe(weights):
    return get_ret_vol_sr(weights)[2]*-1

def check_sum(weights):
    return np.sum(weights)-1

Moving on, we will need to create a variable to include our constraints like the check_sum. We’ll also define an initial guess and specific bounds, to help the minimization be faster and more efficient. Our initial guess will be 1/n th for each stock (or 1/6, in this case), and the bounds will be a tuple (0,1) for each stock, since the weight can range from 0 to 1.

In [None]:
#constraints
cons = ({'type':'eq', 'fun': check_sum})
n_cols=len(log_ret.columns)
#bounds
bnds = (((0,1),)*n_cols)
#initial guess
init_guess = [np.repeat((1/n_cols),n_cols)]

Enter the minimize function. I chose the method ‘SLSQP’ because it’s the method used for most of the generic minimization problems. In case you are wondering, it stands for Sequential Least Squares Programming. Make sure to pass the initial method, the bounds and the constraints with the variables defined above. If we print the variable it will look like this:

In [None]:
opt_results = minimize(neg_sharpe, init_guess, method = 'SLSQP', bounds = bnds, constraints = cons)
print(opt_results)

We want the key x from the dictionary, which is an array with the weights of the portfolio that has the maximum Sharpe ratio. If we use our function get_ret_vol_sr we get the return, volatility, and sharpe ratio:

In [None]:
get_ret_vol_sr(opt_results.x)

So we got a better Sharpe ratio than we got with the simulation we did before (0.66 vs 0.62 earlier)

We’re now ready to check all optimal portfolios, which is basically our efficient frontier. The efficient frontier is the set of portfolios that gets us the highest expected return for any given risk level. Or from another perspective, the minimum amount of risk for an expected return. To trace this line, we can define a variable frontier_y. Going back to the chart above, we can see the maximum return doesn’t go much higher than 0.25, so frontier_y will be defined from 0 to 0.3.

In [None]:
frontier_y = np.linspace(0,0.25,200)

To finish the plotting of the frontier, we have define one last function that will help us minimize the volatility. It will return the volatility (index 1) of the given weights.

In [None]:
def minimize_volatility(weights):
    return get_ret_vol_sr(weights)[1]

And now the last bit of code to help us get get our x values for the efficient frontier. We use the same code as above with a few changes to the constraints. The for loop is basically going through every possible value in our previously defined frontier_y and obtaining the minimum result (which is the key ‘fun’) of volatility (our x axis in the chart).

In [None]:
frontier_x = []

for possible_return in frontier_y:
    cons = ({'type':'eq', 'fun':check_sum},
            {'type':'eq', 'fun': lambda w: get_ret_vol_sr(w)[0] - possible_return})
    
    result = minimize(minimize_volatility,init_guess,method='SLSQP', bounds=bnds, constraints=cons)
    frontier_x.append(result['fun'])

Finally we can plot the actual efficient frontier by passing the variables frontier_x and frontier_y.

In [None]:
plt.figure(figsize=(12,8))
plt.scatter(vol_arr, ret_arr, c=sharpe_arr, cmap='viridis')
plt.colorbar(label='Sharpe Ratio')
plt.xlabel('Volatility')
plt.ylabel('Return')
plt.plot(frontier_x,frontier_y, 'r--', linewidth=3)
plt.savefig('cover.png')
plt.show()