# Portfolio Optimization

Portfolio Optimization is used for risk-averse investors to construct portfolios to optimize or maximize expected return based on a given level of market risk, emphasizing that risk is an inherent part of higher reward

This notebook:
1. Runs an example Monte Carlo Simulation for an optimal portfolio with resulting returns 
2. Creates an Efficient Frontier which is used to identify a set of optimal portfolios that offers the highest expected return for a defined level of risk or the lowest risk for a given level of expected return

## Monte Carlo Simulation for Optimization Search


Monte Carlo simulations are used by analyst to determine the expected value and optimal distribution of a portfolio. 

In [None]:
from io import BytesIO

import numpy as np
import pandas as pd
import hvplot.pandas  # noqa
import panel as pn

In [None]:
stocks = pd.read_csv('./data/stocks.csv', index_col='Date', parse_dates=True)

In [None]:
stocks.head()

In [None]:
mean_daily_ret = stocks.pct_change(1).mean()
mean_daily_ret

In [None]:
stocks.pct_change(1).corr()

# Simulating Thousands of Possible Allocations

In [None]:
stocks.head()

In [None]:
stock_normed = stocks/stocks.iloc[0]
timeseries = stock_normed.hvplot()
timeseries

In [None]:
stock_daily_ret = stocks.pct_change(1)
stock_daily_ret.head()

## Log Returns vs Arithmetic Returns

We will now switch over to using log returns instead of arithmetic returns, for many of our use cases they are almost the same,but most technical analyses require detrending/normalizing the time series and using log returns is a nice way to do that.
Log returns are convenient to work with in many of the algorithms we will encounter.

For a full analysis of why we use log returns, check [this great article](https://quantivity.wordpress.com/2011/02/21/why-log-returns/).


In [None]:
log_ret = np.log(stocks/stocks.shift(1))
log_ret.head()

In [None]:
log_ret_hists = log_ret.hvplot.hist(bins=100, subplots=True, width=400, group_label='Ticker', grid=True).cols(2)
log_ret_hists

In [None]:
log_ret.describe().transpose()

In [None]:
log_ret.mean() * 252

In [None]:
# Compute pairwise covariance of columns
log_ret.cov()

In [None]:
log_ret.cov()*252 # multiply by days

## Single Run for Some Random Allocation

In [None]:
# Set seed (optional)
np.random.seed(101)

# Stock Columns
print('Stocks')
print(stocks.columns)
print('\n')

# Create Random Weights
print('Creating Random Weights')
weights = np.array(np.random.random(4))
print(weights)
print('\n')

# Rebalance Weights
print('Rebalance to sum to 1.0')
weights = weights / np.sum(weights)
print(weights)
print('\n')

# Expected Return
print('Expected Portfolio Return')
exp_ret = np.sum(log_ret.mean() * weights) *252
print(exp_ret)
print('\n')

# Expected Variance
print('Expected Volatility')
exp_vol = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov() * 252, weights)))
print(exp_vol)
print('\n')

# Sharpe Ratio
SR = exp_ret/exp_vol
print('Sharpe Ratio')
print(SR)


Great! Now we can just run this many times over!

In [None]:
num_ports = 15000

all_weights = np.zeros((num_ports,len(stocks.columns)))
ret_arr = np.zeros(num_ports)
vol_arr = np.zeros(num_ports)
sharpe_arr = np.zeros(num_ports)

for ind in range(num_ports):

    # Create Random Weights
    weights = np.array(np.random.random(4))

    # Rebalance Weights
    weights = weights / np.sum(weights)
    
    # Save Weights
    all_weights[ind,:] = weights

    # Expected Return
    ret_arr[ind] = np.sum((log_ret.mean() * weights) *252)

    # Expected Variance
    vol_arr[ind] = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov() * 252, weights)))

    # Sharpe Ratio
    sharpe_arr[ind] = ret_arr[ind]/vol_arr[ind]

In [None]:
sharpe_arr.max()

In [None]:
sharpe_arr.argmax()

In [None]:
all_weights[1419,:]

In [None]:
max_sr_ret = ret_arr[1419]
max_sr_vol = vol_arr[1419]

## Plotting the data

In [None]:
import holoviews as hv

In [None]:
scatter = hv.Scatter((vol_arr, ret_arr, sharpe_arr), 'Volatility', ['Return', 'Sharpe Ratio'])
max_sharpe = hv.Scatter([(max_sr_vol,max_sr_ret)])

scatter.opts(color='Sharpe Ratio', cmap='plasma', width=600, height=400, colorbar=True, padding=0.1) *\
max_sharpe.opts(color='red', line_color='black', size=10)

# Mathematical Optimization

There are much better ways to find good allocation weights than just guess and check! We can use optimization functions to find the ideal weights mathematically!

### Functionalize Return and SR operations

In [None]:
def get_ret_vol_sr(weights):
    """
    Takes in weights, returns array or return,volatility, sharpe ratio
    """
    weights = np.array(weights)
    ret = np.sum(log_ret.mean() * weights) * 252
    vol = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov() * 252, weights)))
    sr = ret/vol
    return np.array([ret,vol,sr])

In [None]:
from scipy.optimize import minimize

To fully understand all the parameters, check out the [`scipy.optimize.minimize` documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html).

In [None]:
#help(minimize)

Optimization works as a minimization function, since we actually want to maximize the Sharpe Ratio, we will need to turn it negative so we can minimize the negative sharpe (same as maximizing the postive sharpe)

In [None]:
def neg_sharpe(weights):
    return  get_ret_vol_sr(weights)[2] * -1

In [None]:
# Contraints
def check_sum(weights):
    '''
    Returns 0 if sum of weights is 1.0
    '''
    return np.sum(weights) - 1

In [None]:
# By convention of minimize function it should be a function that returns zero for conditions
cons = ({'type':'eq','fun': check_sum})

In [None]:
# 0-1 bounds for each weight
bounds = ((0, 1), (0, 1), (0, 1), (0, 1))

In [None]:
# Initial Guess (equal distribution)
init_guess = [0.25,0.25,0.25,0.25]

In [None]:
# Sequential Least SQuares Programming (SLSQP).
opt_results = minimize(neg_sharpe,init_guess,method='SLSQP',bounds=bounds,constraints=cons)

In [None]:
opt_results

In [None]:
opt_results.x

In [None]:
get_ret_vol_sr(opt_results.x)

# All Optimal Portfolios (Efficient Frontier)

The [efficient frontier](https://www.investopedia.com/terms/e/efficientfrontier.asp) is the set of optimal portfolios that offers the highest expected return for a defined level of risk or the lowest risk for a given level of expected return. Portfolios that lie below the efficient frontier are sub-optimal, because they do not provide enough return for the level of risk. Portfolios that cluster to the right of the efficient frontier are also sub-optimal, because they have a higher level of risk for the defined rate of return.

In [None]:
# Our returns go from 0 to somewhere along 0.3
# Create a linspace number of points to calculate x on
frontier_y = np.linspace(0,0.3,100) # Change 100 to a lower number for slower computers!

In [None]:
def minimize_volatility(weights):
    return  get_ret_vol_sr(weights)[1] 

In [None]:
frontier_volatility = []

for possible_return in frontier_y:
    # function for return
    cons = ({'type':'eq','fun': check_sum},
            {'type':'eq','fun': lambda w: get_ret_vol_sr(w)[0] - possible_return})
    
    result = minimize(minimize_volatility,init_guess,method='SLSQP',bounds=bounds,constraints=cons)
    
    frontier_volatility.append(result['fun'])

In [None]:
ef_graph = scatter * hv.Curve((frontier_volatility, frontier_y)).opts(color='green', line_dash='dashed')
ef_graph

Now let's build a Panel app. List some reasons why

In [None]:
pn.extension('tabulator', design='material', template='material', loading_indicator=True)
file_input = pn.widgets.FileInput(sizing_mode='stretch_width')
selector = pn.widgets.MultiSelect(
    name='Select stocks', sizing_mode='stretch_width',
    options=stocks.columns.to_list()
)


n_samples = pn.widgets.IntSlider(
    name='Random samples', value=10_000, start=1000, end=20_000, step=1000, sizing_mode='stretch_width'
)
button = pn.widgets.Button(name='Run Analysis', sizing_mode='stretch_width')
posxy = hv.streams.Tap(x=None, y=None)

text = """
#  Portfolio optimization

This application performs portfolio optimization given a set of stock time series.

To optimize your portfolio:

1. Upload a CSV of the daily stock time series for the stocks you are considering
2. Select the stocks to be included.
3. Run the Analysis
4. Click on the Return/Volatility plot to select the desired risk/reward profile

Upload a CSV containing stock data:
"""

explanation = """
The code for this app was taken from [this excellent introduction to Python for Finance](https://github.com/PrateekKumarSingh/Python/tree/master/Python%20for%20Finance/Python-for-Finance-Repo-master).
To learn some of the background and theory about portfolio optimization see [this notebook](https://github.com/PrateekKumarSingh/Python/blob/master/Python%20for%20Finance/Python-for-Finance-Repo-master/09-Python-Finance-Fundamentals/02-Portfolio-Optimization.ipynb).
"""

sidebar = pn.layout.WidgetBox(
    pn.pane.Markdown(text, margin=(0, 10)),
    file_input,
    selector,
    n_samples,
    explanation,
    max_width=350,
    sizing_mode='stretch_width'
).servable(area='sidebar')

sidebar

In [None]:
@pn.cache
def get_stocks(data):
    if data is None:
        stock_file = 'https://datasets.holoviz.org/stocks/v1/stocks.csv'
    else:
        stock_file = BytesIO(data)
    return pd.read_csv(stock_file, index_col='Date', parse_dates=True)


file_input = pn.widgets.FileInput(sizing_mode='stretch_width')

stocks = hvplot.bind(get_stocks, file_input).interactive()

selector = pn.widgets.MultiSelect(
    name='Select stocks', sizing_mode='stretch_width',
    options=stocks.columns.to_list()
)

selected_stocks = stocks.pipe(
    lambda df, cols: df[cols] if cols else df, selector
)

In [None]:
def compute_random_allocations(log_return, num_ports=15000):
    _, ncols = log_return.shape
    
    # Compute log and mean return
    mean_return = np.nanmean(log_return, axis=0)
    
    # Allocate normalized weights
    weights = np.random.random((num_ports, ncols))
    normed_weights = (weights.T / np.sum(weights, axis=1)).T
    data = dict(zip(log_return.columns, normed_weights.T))

    # Compute expected return and volatility of random portfolios
    data['Return'] = expected_return = np.sum((mean_return * normed_weights) * 252, axis=1)
    return_covariance = np.cov(log_return[1:], rowvar=False) * 252
    if not return_covariance.shape:
        return_covariance = np.array([[252.]])
    data['Volatility'] = volatility = np.sqrt((normed_weights * np.tensordot(return_covariance, normed_weights.T, axes=1).T).sum(axis=1))
    data['Sharpe'] = expected_return/volatility
    
    df = pd.DataFrame(data)
    df.attrs['mean_return'] = mean_return
    df.attrs['log_return'] = log_return
    return df

def check_sum(weights):
    return np.sum(weights) - 1

def get_return(mean_ret, weights):
    return np.sum(mean_ret * weights) * 252

def get_volatility(log_ret, weights):
    return np.sqrt(np.dot(weights.T, np.dot(np.cov(log_ret[1:], rowvar=False) * 252, weights)))

def compute_frontier(df, n=30):
    frontier_ret = np.linspace(df.Return.min(), df.Return.max(), n)
    frontier_volatility = []

    cols = len(df.columns) - 3
    bounds = tuple((0, 1) for i in range(cols))
    init_guess = [1./cols for i in range(cols)]
    for possible_return in frontier_ret:
        cons = (
            {'type':'eq', 'fun': check_sum},
            {'type':'eq', 'fun': lambda w: get_return(df.attrs['mean_return'], w) - possible_return}
        )
        result = minimize(lambda w: get_volatility(df.attrs['log_return'], w), init_guess, bounds=bounds, constraints=cons)
        frontier_volatility.append(result['fun'])
    return pd.DataFrame({'Volatility': frontier_volatility, 'Return': frontier_ret})

def minimize_difference(weights, des_vol, des_ret, log_ret, mean_ret):
    ret = get_return(mean_ret, weights)
    vol = get_volatility(log_ret, weights)
    return abs(des_ret-ret) + abs(des_vol-vol)

@pn.cache
def find_best_allocation(log_return, vol, ret):
    cols = log_return.shape[1]
    vol = vol or 0
    ret = ret or 0
    mean_return = np.nanmean(log_return, axis=0)
    bounds = tuple((0, 1) for i in range(cols))
    init_guess = [1./cols for i in range(cols)]
    cons = (
        {'type':'eq','fun': check_sum},
        {'type':'eq','fun': lambda w: get_return(mean_return, w) - ret},
        {'type':'eq','fun': lambda w: get_volatility(log_return, w) - vol}
    )
    opt = minimize(
        minimize_difference, init_guess, args=(vol, ret, log_return, mean_return),
        bounds=bounds, constraints=cons
    )
    ret = get_return(mean_return, opt.x)
    vol = get_volatility(log_return, opt.x)
    return pd.Series(list(opt.x)+[ret, vol], index=list(log_return.columns)+['Return', 'Volatility'], name='Weight')

ef_graph * closest_point
we calculated closest point which gives us the weights of each stock on the right side. we can compare this to the max_sharpe which was calculated earlier. 

In [None]:
# Set up data pipelines

log_return = np.log(selected_stocks/selected_stocks.shift(1))
selected_stocks = stocks
closest_allocation = log_return.pipe(find_best_allocation, posxy.param.x, posxy.param.y)

opts = {'x': 'Volatility', 'y': 'Return', 'responsive': True}

closest_point = closest_allocation.to_frame().T.hvplot.scatter(color='green', line_color='black', size=50, **opts).dmap()

ef_graph = ef_graph * closest_point * max_sharpe

summary = pn.pane.Markdown(
    pn.bind(lambda p: f"""
    The selected portfolio has a volatility of {p.Volatility:.2f}, a return of {p.Return:.2f}
    and Sharpe ratio of {p.Return/p.Volatility:.2f}.""", closest_allocation), width=250
)

summary = pn.pane.Markdown(
    pn.bind(lambda p: f"""
    The selected portfolio has a volatility of {p.Volatility:.2f}, a return of {p.Return:.2f}
    and Sharpe ratio of {p.Return/p.Volatility:.2f}.""", closest_allocation), width=250
)

table = pn.widgets.Tabulator(closest_allocation.to_frame().iloc[:-2])
pn.Row(ef_graph, pn.Column(summary, table), sizing_mode='stretch_both')



In [None]:
investment = pn.widgets.Spinner(name='Investment Value in $', value=5000, step=1000, start=1000, end=100000)
year = pn.widgets.DateRangeSlider(name='Year', value=(stocks.index.min().eval(), stocks.index.max().eval()), start=stocks.index.min(), end=stocks.index.max())

stocks_between_dates = selected_stocks[year.param.value_start:year.param.value_end]
price_on_start_date = selected_stocks[year.param.value_start:].iloc[0]
allocation = (closest_allocation.iloc[:-2] * investment)

performance_plot = (stocks_between_dates * allocation / price_on_start_date).sum(axis=1).rename().hvplot.line(
    ylabel='Total Value ($)', title='Portfolio performance', responsive=True, min_height=400
).dmap()

performance = pn.Column(
    pn.Row(year, investment),
    performance_plot,
    sizing_mode='stretch_both'
)

performance

In [None]:
main = pn.Tabs(
    ('Analysis', pn.Column(
            pn.Row(
                ef_graph, pn.Column(summary, table),
                sizing_mode='stretch_both'
            ),
            performance,
            sizing_mode='stretch_both'
        )
    ),
    ('Timeseries', timeseries),
    ('Log Return', pn.Column(
        '## Daily normalized log returns',
        'Width of distribution indicates volatility and center of distribution the mean daily return.',
        log_ret_hists,
        sizing_mode='stretch_both'
    )),
    sizing_mode='stretch_both', min_height=1000
).servable(title='Portfolio Optimizer')

pn.Row(sidebar, main)