<img src="images/inmas.png" width=130x align='right' />

# Exercise 18 - SciPy Basics


### Prerequisite
Notebook 18

----

### Housekeeping (if needed)

In [None]:
import numpy as np
import matplotlib.pyplot as plt


### 1.
- Rewritte the Lotke-Volterra example so that the devivative function is a class where all coefficients are contained. (Hint: Use the `__call__` method)
- Which approach do you think is better?

### 2.
- Import `ode` from the `scipy.integrate` submodule
- Read the help for `ode` (`help(ode)`)


### 3.*
Solve the Lotke-Volterra example but this time using the `ode` package. 

### 4.
This exercise is meant to expose you to a larger Python program which contains all the elements we learned so far:
- Python Scientific Software Stack (Pandas, SciPy, NumPy, MatplotLib)
- The use of arrays, lambda expressions, linear algebra, optimization

By using a script that is longer and more complex than what can be achived in a few lines, this exercise will better demonstrate the importance of coding structure and style.
It also introduces the concept of mean-variance optimization, an approach introduced in the 1950's by Markowitz.
This, and further refinements of this theory, led to multiple Nobel Prizes in Economics.

#### 4.1
Read the formulation of Markowitz Portfolio Optimization.

#### 4.2
Run the script with a few different scenarios, possibly with different stocks (Make sure the stock was existing in the time period you are considering and change the `totalMonths`  and other related variables if shorter).


#### **Mean-Variance Optimization with risk-free assets** (*Solution with variance constraint*) 

Did you ever ask yourself what the ultimate rate of return could be on a limited-choice investment portfolio if one could guess the market a little better? Or even perfectly? A related and more practical question is *Can one determine if the past choice of asset allocation is consistent with one's tolerance to risk?* Answering these questions could help better manage current and future choices for asset allocation in a retirement portfolio.

Formulating and answering theses questions is what this script is all about. Following common practice, it considers the volatility of the market as a measure of risk and maintains the volatility of the allocated portfolio below a desired value as a constraint, ensuring that the choice of asset allocation is in line with the desired risk tolerance. The frequency at which changes in asset allocation could be performed can be selected from monthly to yearly, all the way down to only once in the 16-year period that represents the range of historical data considered here and obtained from a public source.

Mathematically, the solution to this problem consists in maximizing  the portfolio return under a variance inequality constraint and a long-only portfolio, i.e., a portfolio in which one can only invest in the assets, not short them. This problem has no analytical solution, but it can easily be solved numerically with modern algorithms.

This script only considers 5 different assets: ExxonMobil stock (XOM), S&P 500 (^GSPC), Dow Jones US Completion Total (^DWCPF), MSCI World ex US Market Index (ACWX), Bloomberg Aggregate Bond market (AGG), and risk-free cash (^IRX) represented by 3-month Treasury bills. Only growth is considered, inflation is not. For tracking the US Aggregate Bond market, we use the AGG ETF which has an inception year of 2003.

#### Mathematical formulation of the problem
The level of mathematics involved here only requires basic linear algebra, in particular, matrix-vector multiplication, and basic statistics.
Following Markowitz modern portfolio theory, we consider the variance of a market portfolio consisting of assets allocated with weights stored in a vector $w$, and having a covariance matrix typically represented by $\Sigma$ which is calculated between the time series of these assets. 
The variance is then expressed as \\[ \sigma^2 = w^T \Sigma w, \\] where $T$ is the transpose operator changing a column vector into a row vector.
The square root of the variance, $\sigma$ is the standard deviation that quantifies the volatility.
Under this approach, the standard deviation is a measure of risk. If assets were uncorrelated, the $\Sigma$ matrix would be diagonal and the $\sigma$ would simply be a weighted sum of each of the respective standard deviations. But the S&P 500 index is obviously correlated with the performance of the XOM stock and this approach takes this correlation into consideration.

The rate of return on the market part of the portfolio, i.e., excluding the risk-free asset is 
\\[ w^T \alpha, \\]
where $\alpha$ is a vector containing the average rates of return for each of the $n$ assets in which the portfolio is invested over the period considered. It is just a weighted sum of average rates, where the weights are a fraction of unity.

Let vector $1_n$ be a vector having 1 for all its elements.
We consider a portfolio which also has a risk-free asset available for investment with a rate of return $r_0$
in which we can invest the remaining fraction $(1 - w^T 1_n)$ not invested in the market.
The objective function that we would like to maximize is the total return of a portfolio that can also invest in a risk-free asset with return $r_0$, 
\\[f(w) = w^T \alpha + (1 - w^T 1_n)r_0, \\]
subject to the variance of the whole portfolio (i.e., considering the risk-free part)
\\[ (w^T 1_m)^2\  w^T\Sigma w \le \sigma_o^2,\\]
being smaller than a target value $\sigma_o^2$,
and under the condition that we only invest, requiring that $w \ge 0$ element-wise, (i.e., no short position), and no borrowing on our cash position,
\\[ w^T 1_n \le 1. \\]
It can be observed that if $w=0$, then the portfolio is totally invested in the risk-free part, the variance is 0, and the return is $r_0$.

The desired volatility $\sigma_o$ is specified as the standard deviation on the annual return of the total portfolio, the one containing the risk-free asset.

While the inequality constraint on variance is a 4th-order equation, the problem can still be solved using sequential quadratic programming with the inequality constraints and bounds presented here.
For this purpose, we  use the scipy library.
We run the optimization using historical daily values divided by periods of one to several months depending on the user selection.
The time series for the assets daily performance are downloaded from Yahoo finance, and covariance is calculated from these data over the dividing period. For each period, optimal weights are calculated and values of the portfolio's annualized rates of return are computed and stored for displaying in graphs at the end of the calculations. We restrict the weights $w$ to be between 0 and 1.
We use the 252 days of trading in the year to convert annual rates to trading-day rates.
More details on the computation are given in the comments inserted in the Python code below.

Additional bounds can be imposed on the fraction of investment made in the risk-free asset (`maxCash`) and on holding positions for each market asset (`maxAssetFraction`).
The data used are the adjusted daily data at closing (adjusted for splits and dividends). Daily data are grouped in time periods represented in multiples of months (`nMonths`).
Choosing 12 months gives an optimization that can re-adjust asset allocation once a year, while choosing 192 months only allows for a single set of asset allocation over the 16 year considered. Note that this is not re-balancing as accounts are implicitly assumed to be always in balance with the chosen asset allocation during each period.
Choosing a high number such as 96 months (8 years) gives the historical rate of return from a scenario where one chooses a constant allocation ratio over the first 8 years, and another one for the other 8 following years.

The last thing to select is the desired annualized volatility $\sigma_o$ denoted by `desiredVolatility`.
When selecting long-term periods, such as 96 months, one can easily realize that the tolerance for volatility needs to be relaxed in order to achieve higher rates of returns.
Interestingly, for the last 16 years, the optimal long-term asset allocation is not the 60/40 stock/bonds portfolio commonly recommended by financial advisors, but rather something else (cash and stocks).
While bonds are part of the solution in downturns years, when `nMonths` $\le$ 96, no choice of volatility yielded a 60/40 stocks/bonds portfolio for a 16-year long-term allocation solution.
For 2 blocks of 8 years, the solution has only 2 assets: extended stocks/bonds for the first block, but stock/cash allocation for the second half.
This echoes what many analysts have said regarding the fate of the 60/40 portfolio wisdom with recent market performance. Exploring this in more details, for the whole period of the last 192 months, the average annual returns of a portfolio of S&P500/bonds compared to a portfolio of S&P500/cash look like the following:

| S&P500 | Bonds | Cash | Return| Volatility|
| ------: |-------:|------:|-------:|-----------:|
| 80% | 20% | -- | 7.7%| 16.3%|
| 80% | -- | 20% | 7.5%| 13.0%|
| 75% | 25% | -- | 7.5%| 15.3%|
| 75% | -- | 25% | 7.3%| 11.4%|
| 70% | 30% | -- | 7.2%| 14.3%|
| 70% | -- | 30% | 7.0%| 10.0%|
| 65% | 35% | -- | 7.0%| 13.3%|
| 65% | -- | 35% | 6.7%|  8.6%|
| 60% | 40% | -- | 6.7%| 12.4%|
| 60% | -- | 40% | 6.4%|  7.3%|

This shows that holding bonds instead of cash was beneficial when considering the last 16 years, but this observation is reversed when only considering the last 8 years:
| S&P500 | Bonds | Cash | Return| Volatility|
| ------:|-------:|------:|-------:|-------:|
| 80% | 20% | -- | 10.7%| 15.0%|
| 80% | -- | 20% | 10.8%| 11.9%|
| 75% | 25% | -- | 10.2%| 14.1%|
| 75% | -- | 25% | 10.4%| 10.4%|
| 70% | 30% | -- | 9.7%| 13.3%|
| 70% | -- | 30% | 9.9%|  9.1%|
| 65% | 35% | -- | 9.2%| 12.4%|
| 65% | -- | 35% | 9.5%|  7.4%|
| 60% | 40% | -- | 8.7%| 11.6%|
| 60% | -- | 40% | 9.0%|  6.7%|


Other fixed asset combinations can be easily explored by properly adjusting the `fixedWeigths` and `myWeights` variables in the **Other Adjustable Parameters** section below.


#### Additional packages required
This notebook requires the installation of the `yfinance` package. In Anaconda, this can be installed using
    
    conda install -c conda-forge yfinance
or

    pip install yfinance
otherwise.

#### Adjustable parameters
Adjustable parameters should be isolated from the code.
Parameters scattered all over the code are difficult to find and make the code harder to read and debug. 

In [None]:
'''
### Always good to leave some info in the code in the event it gets reused as a module.
This program uses Mean-Variance Optimization for determining
the optimal composition of a fixed percentage portfolio
by using historical data.

It is part of a Python workshop and is also used for demonstrating how to 
structure a program that has some complexity.

This code was purposely written without OOP. Refactoring this code with an OOP
approach was meant to be left as an exercise.

### Always good to say who wrote the program as well as additional contributors if any.
### Even better if you use github!

Martin-D. Lacasse - 2024
'''
# Desired annual volatility in percent:
desiredVolatility = 5.0

# Period over which to divide and perform analysis in months. Note that the underlying data is daily.
# Pick 1,2,3,6,12,24,32,48,64,96, or 192 as these values all divide the 16 years of historical data.
nMonths = 12

# Maximum fraction to hold in cash.
maxCash = 0.8

# Maximum fraction in any market asset:
maxAssetFraction = 0.85

#### Other adjustable parameters 

In [None]:
# Default is to sync all time periods to today's date.
# Change to True for using Jan 1 as a reference, allowing to compare annual rates with public data.
useJan1 = False

# If you do not want to optimize but rather just use fixed asset allocation:
# Change variable to True and set myWeights accordingly (in decimal).
fixedWeights = False
# Weights are [XOM, S&P_500, DJ_US_Total_Completion, MSCI_World_Ex_US, Bonds]
myWeights = [0, 0.6, 0, 0, 0.40]

#### The code - nothing to adjust below

Just run all cells and look at the graphs below. Adjust the parameters and repeat.

Read the code if you want more details.

-------------------------------------------------------------------------
#### Housekeeping

In [None]:
import yfinance as yf
import numpy as np
import sys
from datetime import timedelta, date
from dateutil.relativedelta import relativedelta
from scipy.optimize import minimize
import pandas as pd
import matplotlib.pyplot as plt

#### Defining a few useful functions
It is best practice to isolate algorithms and group them in functions.
Breaking larger algorithms in smaller units (functions) makes testing much easier.

In [None]:
def variance(weights, covMatrix):
    '''
    The variance on the market portion of the portfolio.
    This is a vector matrix vector multiplication leading to a scalar.
    '''
    return weights.T @ covMatrix @ weights


def stdDev(weights, covMatrix):
    '''
    The standard deviation (volatility) of the market portion of the portfolio.
    '''
    return np.sqrt(variance(weights, covMatrix))


def objective(weights, alphas, r0):
    '''
    This is the (negative) return on a portfolio with a risk-free asset.
    The negative sign is so that this function can be minimized.
    '''
    return -(weights.T @ alphas + (1 - np.sum(weights)) * r0)


def a2td(ar):
    '''
    Convert an annual rate to a trading-day rate.
    This is used for converting rates to daily trading rates for computation.
    '''
    return (1 + ar) ** (1 / 252) - 1


def shiftWeekends(date):
    '''
    Avoid requesting data for days on weekend as they cause spurious effects in yfinance.
    Shift day back to the Friday just before weekend.
    '''
    dow = date.strftime('%w')
    if dow == '0':  # Sunday
        return date - timedelta(days=2)
    if dow == '6':  # Saturday
        return date - timedelta(days=1)

    return date

#### Putting it all together

In [None]:
tickers = ['XOM', '^GSPC', '^DWCPF', 'ACWX', 'AGG']
alltickers = tickers + ['cash']

now = date.today()
if useJan1 == True:
    # Use Jan 1 of this year as reference point.
    refDay = date(now.year, 1, 1)
else:
    # Or start from today.
    refDay = date(now.year, now.month, now.day)

# We define constraints through dictionaries as required by scipy minimize().
# We limit variance to be below the desired value.
# We ensure market total weights are below 1, the rest being in cash, but smaller than maxCash.
# Inequality expressions are meant to be >= 0.
constraints = [
    {
        'type': 'ineq',
        'fun': lambda weights: targetVariance
        - (np.sum(weights) ** 2) * variance(weights, covMatrix),
    },
    {'type': 'ineq', 'fun': lambda weights: 1 - np.sum(weights)},
    {'type': 'ineq', 'fun': lambda weights: maxCash - 1 + np.sum(weights)},
]

if fixedWeights == False:
    # We only consider long positions (no short, i.e., >= 0) with a maximum asset weight (<= maxAssetFraction).
    bounds = [(0.0, maxAssetFraction) for _ in range(len(tickers))]
else:
    # You requested fixed bounds to test the performance of fixed-ratio portfolios.
    # To ensure feasibility, we override the desired volatility to a high number (e.g.. > 100%).
    desiredVolatility = 100
    # Use the values provided by the user as low and high bounds.
    bounds = [(myWeights[i], myWeights[i]) for i in range(len(myWeights))]

# Convert from percent to decimal, from standard deviation to variance, and from annualized to trading days.
targetVariance = ((desiredVolatility / 100) ** 2) / 252

# We create four dataframes for storing results:
computedWeights = pd.DataFrame(columns=alltickers)
assetsAnnualReturns = pd.DataFrame(columns=alltickers)
portfolioPeriodReturns = pd.DataFrame(columns=['return'])
portfolioAnnualReturns = pd.DataFrame(columns=['return'])

# Note that Yahoo finance does not provide daily data before 2008 for some indices here.
totalMonths = 16 * 12
if totalMonths % nMonths != 0:
    print("Sorry, nMonths must be a divider of ", totalMonths)
    sys.exit(-1)

# A list for storing timestamps of the beginning of each period.
years = []

startDate = refDay + relativedelta(months=-totalMonths)
for monthsAgo in range(totalMonths - nMonths, -1, -nMonths):
    # Periods must overlap by one day for completeness.
    # See below where startDate gets reset to endDate - 1.
    endDate = refDay + relativedelta(months=-monthsAgo)
    # Avoid weekends as they cause gaps in data.
    startDate = shiftWeekends(startDate)
    endDate = shiftWeekends(endDate)
    print('From', startDate, 'to', endDate)
    years.append(startDate)

    # Download the data.
    adjCloseDf = pd.DataFrame()
    for ticker in tickers:
        data = yf.download(ticker, start=startDate, end=endDate)
        adjCloseDf[ticker] = data['Adj Close']

    adjCloseDf = adjCloseDf.dropna()
    # print(adjCloseDf)

    # Daily market return is (closing - previous-day closing)/previous-day closing.
    dailyReturns = (adjCloseDf / adjCloseDf.shift(1)) - 1
    # Drop NA first row as first day of the set has no previous day.
    dailyReturns = dailyReturns.dropna()
    # print(dailyReturns)
    # Compute covariance matrix between returns of all requested tickers for the period considered.
    covMatrix = dailyReturns.cov()
    # print(covMatrix)
    # Diagonal of covariance matrix is the variance on that asset.
    print("Volatility: ", np.sqrt(np.diag(covMatrix)))

    # For each ticker, compute the returns for the selected multi-month period using first and last data points.
    # Then convert to mean trading-day daily returns.
    meanAnnualReturns = []
    meanDailyReturns = []
    meanPeriodReturns = []
    # print(adjCloseDf.index[0], '->', adjCloseDf.index[-1])
    for ticker in tickers:
        # print(ticker, adjCloseDf[ticker].iloc[0], '->', adjCloseDf[ticker].iloc[-1])
        pr = (adjCloseDf[ticker].iloc[-1] / adjCloseDf[ticker].iloc[0]) - 1
        # Convert period growth to annual growth.
        ar = (1 + pr) ** (12 / nMonths) - 1
        # Then convert annualized rate to mean trading-day rate.
        mdr = a2td(ar)
        meanDailyReturns.append(mdr)
        meanAnnualReturns.append(ar)
        meanPeriodReturns.append(pr)

    # Download short-term percent rates for treasury bills.
    # We wll use this rate as the annual rate on our risk-free asset.
    data = yf.download('^IRX', start=startDate, end=endDate)
    # Reset startDate for next period.
    startDate = endDate - timedelta(days=1)

    # Take the mean over period and convert from percent to decimal.
    ar0 = data['Adj Close'].mean() / 100
    print('Annualized risk-free r0: %.2f%%' % (100 * ar0))
    # Return rate over period.
    pr0 = (1 + ar0) ** (nMonths / 12) - 1
    # Convert from annualized return to return per trading day.
    dr0 = a2td(ar0)

    # Convert lists to arrays: Some for humans to understand.
    palphas = np.array(meanPeriodReturns)
    print('Assets returns over period (%):', (100 * palphas))
    alphas = np.array(meanAnnualReturns)
    print('Annualized returns over period (%):', (100 * alphas))
    # The other one for the optimization steps where everyting is expressed in trading days.
    dalphas = np.array(meanDailyReturns)

    # Start with a solution that is mostly in cash.
    initialWeights = np.array([0.1] * len(tickers))
    # Solve
    solution = minimize(
        objective,
        initialWeights,
        args=(dalphas, dr0),
        constraints=constraints,
        bounds=bounds,
        tol=1e-14,
        options={'maxiter': 1000},
    )
    if solution.success != True:
        print('WARNING: Optimization failed: ', solution.message, solution.success)

    optWeights = solution.x
    marketVolatility = np.sqrt(252) * stdDev(optWeights, covMatrix)
    actualVolatility = (np.sum(optWeights)) * marketVolatility
    optPeriodReturn = -objective(optWeights, palphas, pr0)
    print(
        'Expected period return of %.2f%% with volatility %.1f%%'
        % (100 * optPeriodReturn, 100 * actualVolatility)
    )
    print('Market volatility %.1f%%' % (100 * marketVolatility))

    # Add cash weight to asset allocation.
    computedWeights.loc[len(computedWeights)] = np.append(
        optWeights, (1 - sum(optWeights))
    )
    # Add cash annual return to our list.
    assetsAnnualReturns.loc[len(assetsAnnualReturns)] = np.append(alphas, ar0)
    portfolioPeriodReturns.loc[len(portfolioPeriodReturns)] = [optPeriodReturn]
    annReturn = (1 + optPeriodReturn) ** (12 / nMonths) - 1
    portfolioAnnualReturns.loc[len(portfolioAnnualReturns)] = [annReturn]

# Generate a new index from timestamps marking beginning of period.
newIndex = {}
for i in range(len(years)):
    newIndex[i] = years[i].strftime('%Y.%m.%d')
# and use to replace default index.
computedWeights.rename(index=newIndex, inplace=True)
assetsAnnualReturns.rename(index=newIndex, inplace=True)
portfolioPeriodReturns.rename(index=newIndex, inplace=True)
portfolioAnnualReturns.rename(index=newIndex, inplace=True)

print('Optimal weights:')
pd.set_option('display.width', 100)
print(computedWeights)

# Take geometric mean of returns over all periods.
prates = portfolioPeriodReturns.to_numpy() + 1
cumGain = prates.prod()
print('Cumulative gain over epoch: %.1f%%' % (100 * (cumGain - 1)))
meanPeriodReturn = np.exp(np.log(cumGain) / len(prates)) - 1
# Then annualize it.
meanAnnualReturn = (1 + meanPeriodReturn) ** (12 / nMonths) - 1

print('Mean period return: %.2f%%' % (100 * meanPeriodReturn))
print('Mean annual return: %.2f%%' % (100 * meanAnnualReturn))
# print('Period', portfolioPeriodReturns)
# print('Annual', portfolioAnnualReturns)

#### Plotting the results for efficient portfolios

In [None]:
# For the next line to work, LaTeX must be installed on the host computer.
# plt.rcParams['text.usetex'] = True
plt.rcParams["figure.figsize"] = (14, 6)

fig, ax = plt.subplots()
assetsAnnualReturns.plot(ax=ax, kind='bar', legend=True)
ax.set(
    xlabel='period starting date',
    ylabel='return (decimal)',
    title='Annualized rates of return by %d-month periods' % nMonths,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=60, ha='right')
ax.grid()
plt.show()

fig, ax = plt.subplots()
computedWeights.plot(ax=ax, kind='bar', legend=True)
ax.set(
    xlabel='period starting date',
    ylabel='weights (decimal)',
    title='Optimal asset allocation with risk-free asset and <= '
    + str(desiredVolatility)
    + '% volatility',
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=60, ha='right')
ax.grid()
plt.show()

fig, ax = plt.subplots()
portfolioAnnualReturns.plot(ax=ax, kind='bar', legend=True)
ax.set(
    xlabel='period starting date',
    ylabel='return (decimal)',
    title='Return of optimal asset allocation with risk-free asset and <= %.1f%% volatility, <mean> %.2f%%'
    % (desiredVolatility, 100 * meanAnnualReturn),
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=60, ha='right')
ax.grid()
plt.show()