# Example 1: Discrete + CVaR optimization with gaussian modelling

This first example showcases the basic usage of the framework. In order to better understand the notebook reading the README is recommended.

## Data retrieval

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np

### Get price data

The asset universe will consist in a series of wide indeces with some geographic and sector diversification. 

In [3]:
def get_save_weekly_prices(indices, start_date=None, end_date=None):
    """
    Get the weekly prices of specified indices.

    Parameters:
    - indices: List of index tickers (e.g., ['AAPL', 'GOOGL']).
    - start_date: Start date in the format 'YYYY-MM-DD'.
    - end_date: End date in the format 'YYYY-MM-DD'.

    Returns:
    - DataFrame containing weekly prices of the specified indices.
    """
    weekly_prices = pd.DataFrame()

    for index_ticker in indices:
        data = yf.download(index_ticker, start=start_date, end=end_date)
        weekly_data = data['Close'].resample('W-Fri').last()  # Resample to get weekly data (Friday as the end of the week)
        weekly_prices[index_ticker] = weekly_data

    return weekly_prices

# Example usage
indices_list = ['^GSPC', '^DJI', '^IXIC', '^RUT', '^FTSE', '^N225', '^HSI', '^XAU', '^GDAXI']
df_prices = get_save_weekly_prices(indices=indices_list)
df_prices = df_prices.dropna()
df_prices.to_csv('./data/weekly_example_1.csv')

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


In [2]:
df_prices = pd.read_csv('./data/weekly_example_1.csv', index_col=0, parse_dates=True)
df_prices.head()

Unnamed: 0_level_0,^GSPC,^DJI,^IXIC,^RUT,^FTSE,^N225,^HSI,^XAU,^GDAXI
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1992-01-03,419.339996,3201.5,592.650024,192.089996,2504.100098,22983.769531,4307.100098,79.18,1604.050049
1992-01-10,415.100006,3199.5,615.700012,199.309998,2477.899902,22381.900391,4348.899902,79.059998,1613.819946
1992-01-17,418.859985,3265.0,626.849976,205.059998,2536.699951,21321.369141,4454.899902,82.949997,1669.290039
1992-01-24,415.480011,3232.800049,624.679993,205.419998,2510.399902,21072.150391,4600.100098,81.5,1669.52002
1992-01-31,408.779999,3223.399902,620.210022,205.160004,2571.199951,22023.050781,4601.799805,83.529999,1685.699951


### Get Cash Flow data

For this example because of privacy concerns synthetic data will be artificially generated

In [3]:
from development.synth_data import generate_synth_income, generate_synth_expenses

In [4]:
arr_income = generate_synth_income(mean=2000, std=50)
arr_expenses = generate_synth_expenses(mean=1600, std=80)

## Returns Model

Now the model for the asset returns is chosen and fitted to the data. Note that in this case, because of the optimiztion model used, the model outputs prices directly instead of returns. 

In [5]:
from src.returns import LogNormalReturns

In [6]:
returns_model = LogNormalReturns()
returns_model.fit(df_prices=df_prices)

## CashFlow Model

Now the model for the income and the expenses is chosen and fitted to the data

In [7]:
from src.cashflows import NormalCashFlows

In [8]:
income_model = NormalCashFlows()
income_model.fit(cash_flow=arr_income)
expenses_model = NormalCashFlows()
expenses_model.fit(cash_flow=arr_expenses)

## Asset Allocation

For the last step, the optimization model is chosen and solved

In [111]:
from src.optimization.om_discrete_cvar import create_model

These are the basic parameters for the model. They are included in the `constants.py` file, but will explicitally be included for explanatory purposes

In [134]:
N_SCENARIOS = 150 ## The more scenarios the more accurate the solution but more difficult to solve
HORIZON_PERIODS = 13 ## In this case, number of weeks to plan for. Note that only the first step is executed, but plannig several steps ahead provides more accurate and realistic deicisions
STARTING_CASH = 100_000
TRADING_FEE = 0.001
CVAR_ALPHA = 0.1 # The quantile for which the CVaR is calculated. The smaller the more the extreme cases will be taken into account. 
CVAR_GAMMA = 0.5 # Weight given to CVaR in the objective function 

The simulated paths of cashflows and prices are generated according to the fitted models

In [135]:
df_prices.index[-1] # We make sure that prices data are up to date, in order to simulate prices starting at an up to date price

Timestamp('2024-01-05 00:00:00')

In [136]:
prices_syms = returns_model.predict(horizon=HORIZON_PERIODS, n_paths=N_SCENARIOS) # Prices are simulated
income_syms = income_model.predict(horizon=HORIZON_PERIODS, n_paths=N_SCENARIOS) # Income is simulated
expenses_syms = expenses_model.predict(horizon=HORIZON_PERIODS, n_paths=N_SCENARIOS) # Expenses are simulated

Now the data is fitted to the format understood by the model, feeding all model parameters and sets.

In [137]:


non_cash_assets = list(df_prices.columns)
sScenarios = list(range(N_SCENARIOS))
sInitialTime = [0]
sIntermediateTime = list(range(1, HORIZON_PERIODS))
sFinalTime = [HORIZON_PERIODS]
sNonInitialTimes = sIntermediateTime + sFinalTime
sNonFinalTimes = sInitialTime + sIntermediateTime
sTime = sInitialTime + sIntermediateTime + sFinalTime

data = {
    None: {
        'sInitialTime': {None: sInitialTime},
        'sIntermediateTime': {None: sIntermediateTime},
        'sFinalTime': {None: sFinalTime},
        'sNonCashAssets': {None: non_cash_assets},
        'sScenarios': {None: sScenarios},
        'pPrices': {(s_i,non_cash_assets[a_i], t_i): prices_syms[s_i,a_i,t_i] for s_i in sScenarios for t_i in sTime for a_i in range(len(non_cash_assets))},
        'pInitialNonCashAllocations': {a: 0 for a in non_cash_assets},
        'pInitialCashAllocations': {None: STARTING_CASH},
        'pIncome': {(s_i,t_i): income_syms[s_i, t_i] for s_i in sScenarios for t_i in sNonFinalTimes},
        'pExpense': {(s_i,t_i): expenses_syms[s_i, t_i] for s_i in sScenarios for t_i in sNonFinalTimes},
        'pTradeFee': {None: TRADING_FEE},
        'pCVaRAlpha': {None: CVAR_ALPHA},
        'pCVaRGamma': {None: CVAR_GAMMA},
    }
}


Now the optimization model is created, the data is fed, the solver is chosen and the model is solved

In [138]:
from src.optimization.om_discrete_cvar import create_model
from pyomo.environ import SolverFactory
from pyomo.opt.results import SolverStatus
import time

In [139]:

optimization_model = create_model()
instance = optimization_model.create_instance(data)
solver = SolverFactory('cbc') # The solver chosen here is open source and not very fast. Any solver that works with PyOMO can be chosen.
# Note that for the optimization to work the solver will need to be installed on the machine. 

# solver.options['seconds'] = 120 # Solver will exit at 120 s even if optimal solution is not found

print(f'Starting to solve...')
t1 = time.time()
result_obj = solver.solve(instance, tee=True)
# result_obj.Solver.Status = SolverStatus.warning
# instance.solutions.load_from(result_obj)   
t2 = time.time()
print(f"Solved in {t2-t1:.2f} s")

Starting to solve...


Welcome to the CBC MILP Solver 
Version: 2.10.7 
Build Date: Feb 14 2022 

command line - /usr/bin/cbc -printingOptions all -import /tmp/tmpt31jki04.pyomo.lp -stat=1 -solve -solu /tmp/tmpt31jki04.pyomo.soln (default strategy 1)
Option for printingOptions changed from normal to all
 CoinLpIO::readLp(): Maximization problem reformulated as minimization
Coin0009I Switching back to maximization to get correct duals etc
Presolve is modifying 9 integer bounds and re-presolving
Presolve 8133 (-2429) rows, 6193 (-2420) columns and 54774 (-22708) elements
Statistics for presolved model
Original problem has 360 integers (0 of which binary)
Presolved problem has 342 integers (0 of which binary)
==== 5883 zero objective 13 different
==== absolute objective values 13 different
==== for integers 333 zero objective 10 different
333 variables have objective of -0
1 variables have objective of 67.7137
1 variables have objective of 1021.69
1 variables have objective of 2460.23
1 variables have objective

## Results Analysis

A helper class has been created to help with the analysis of the proposed solution. 

In [140]:
from src.results import ResultsAnalyzer

In [141]:

analyzer = ResultsAnalyzer(instance)
# The first type of plot shows all simulated time series
# All plotting funcitons need to be given the name of the variable/parameter that wants to be analyzed  
analyzer.plot_ts(name='vTotalWealth', time_col='time',col_names=['scenario', 'time'], colors='scenario', inline_plot=True)

In [142]:
# This second plot shows the distribution of any variable/parameter
analyzer.plot_dist(name='vTotalWealth', col_names=['scenario', 'time'], filter={'time':5}, inline_plot=True)


In [143]:
# This third plot shows the time series evolution with a certain confidence interval
analyzer.plot_ci(name='vTotalWealth', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.90, inline_plot=True)

In [144]:
# Any variable can also be retrieved as a dataframe
df = analyzer.get_df(name='vNonCashAllocations',col_names=['asset', 'time'])
df[df['value']!=0]

Unnamed: 0,scenario,time,value
29,^IXIC,1,2.0
30,^IXIC,2,4.0
31,^IXIC,3,4.0
32,^IXIC,4,4.0
33,^IXIC,5,4.0
34,^IXIC,6,4.0
35,^IXIC,7,4.0
36,^IXIC,8,4.0
37,^IXIC,9,4.0
38,^IXIC,10,4.0


In [145]:
analyzer.plot_ci(name='pPrices', time_col='time', scenario_col='scenario',filter={'asset':'^GDAXI'}, col_names=['scenario', 'asset', 'time'], confidence=.95, inline_plot=True)

In [146]:
analyzer.plot_ci(name='vCashAllocations', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.95, inline_plot=True)

In [147]:
analyzer.plot_ci(name='pIncome', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.95, inline_plot=True)

Here we see for example how the number of simulated scenarios should probably be higher, since income is a normal distribution independent of time the mean and 95%-tiles should be fairly constant. Here however the effect of individual deviations is still recognizable 

In [148]:
analyzer.plot_ci(name='pExpense', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.95, inline_plot=True)