# Example 3: Discrete vs Continous optimization

This first example showcases the basic usage of the framework. In order to better understand the notebook reading the README is recommended.

## Data retrieval

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np

### Get price data

The asset universe will consist in a series of wide indeces with some geographic and sector diversification. 

In [2]:
def get_save_monthly_prices(indices, start_date=None, end_date=None):
    """
    Get the weekly prices of specified indices.

    Parameters:
    - indices: List of index tickers (e.g., ['AAPL', 'GOOGL']).
    - start_date: Start date in the format 'YYYY-MM-DD'.
    - end_date: End date in the format 'YYYY-MM-DD'.

    Returns:
    - DataFrame containing monthly prices of the specified indices.
    """
    monthly_prices = pd.DataFrame()

    for index_ticker in indices:
        data = yf.download(index_ticker, start=start_date, end=end_date)
        monthly_data = data['Close'].resample('M').last()  # Resample to get monthly data
        monthly_prices[index_ticker] = monthly_data

    return monthly_prices

# Example usage
indices_list = ['^GSPC', '^DJI', '^IXIC', '^RUT', '^FTSE', '^N225', '^HSI', '^XAU', '^GDAXI']
df_prices = get_save_monthly_prices(indices=indices_list)
df_prices = df_prices.dropna()
df_prices.to_csv('./data/monthly_example_3.csv')

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


In [3]:
df_prices = pd.read_csv('./data/monthly_example_3.csv', index_col=0, parse_dates=True)
df_prices.head()

Unnamed: 0_level_0,^GSPC,^DJI,^IXIC,^RUT,^FTSE,^N225,^HSI,^XAU,^GDAXI
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1992-01-31,408.779999,3223.399902,620.210022,205.160004,2571.199951,22023.050781,4601.799805,83.529999,1685.699951
1992-02-29,412.700012,3267.699951,633.469971,211.149994,2562.100098,21338.810547,4929.100098,81.400002,1746.76001
1992-03-31,403.690002,3235.5,603.77002,203.699997,2440.100098,19345.949219,4938.299805,72.040001,1718.349976
1992-04-30,414.950012,3359.100098,578.679993,196.259995,2654.100098,17390.710938,5369.600098,69.93,1728.089966
1992-05-31,415.350006,3396.899902,585.309998,198.520004,2707.600098,18347.75,6080.200195,75.019997,1806.359985


### Get Cash Flow data

For this example because of privacy concerns synthetic data will be artificially generated

In [4]:
from development.synth_data import generate_synth_income, generate_synth_expenses

In [5]:
arr_income = generate_synth_income(mean=2000, std=50)
arr_expenses = generate_synth_expenses(mean=1600, std=80)

## Returns Model

Now the model for the asset returns is chosen and fitted to the data. Note that in this case, because of the optimiztion model used, the model outputs prices directly instead of returns. 

In [6]:
from src.returns import LogNormalReturns

In [7]:
returns_model = LogNormalReturns()
returns_model.fit(df_prices=df_prices)

## CashFlow Model

Now the model for the income and the expenses is chosen and fitted to the data

In [8]:
from src.cashflows import NormalCashFlows

In [9]:
income_model = NormalCashFlows()
income_model.fit(cash_flow=arr_income)
expenses_model = NormalCashFlows()
expenses_model.fit(cash_flow=arr_expenses)

## Asset Allocation

For the last step, the optimization model is chosen and solved

These are the basic parameters for the model. They are included in the `constants.py` file, but will explicitally be included for explanatory purposes

In [11]:
N_SCENARIOS = 150 # The more scenarios the more accurate the solution but more difficult to solve
HORIZON_PERIODS = 26 # In this case, number of weeks to plan for. Note that only the first step is executed, but plannig several steps ahead provides more accurate and realistic deicisions
STARTING_CASH = 100_000
TRADING_FEE = 0.001
CVAR_ALPHA1 = 0.8 # The quantile for which the CVaR is calculated. The smaller the more the extreme cases will be taken into account. 
CVAR_GAMMA1 = 0.7 # Weight given to CVaR in the objective function 

The simulated paths of cashflows and prices are generated according to the fitted models

In [12]:
df_prices.index[-1] # We make sure that prices data are up to date, in order to simulate prices starting at an up to date price

Timestamp('2024-01-31 00:00:00')

In [13]:
prices_syms = returns_model.predict(horizon=HORIZON_PERIODS, n_paths=N_SCENARIOS) # Prices are simulated
income_syms = income_model.predict(horizon=HORIZON_PERIODS, n_paths=N_SCENARIOS) # Income is simulated
expenses_syms = expenses_model.predict(horizon=HORIZON_PERIODS, n_paths=N_SCENARIOS) # Expenses are simulated

Now the data is fitted to the format understood by the model, feeding all model parameters and sets.

In [14]:


non_cash_assets = list(df_prices.columns)
sScenarios = list(range(N_SCENARIOS))
sInitialTime = [0]
sIntermediateTime = list(range(1, HORIZON_PERIODS))
sFinalTime = [HORIZON_PERIODS]
sNonInitialTimes = sIntermediateTime + sFinalTime
sNonFinalTimes = sInitialTime + sIntermediateTime
sTime = sInitialTime + sIntermediateTime + sFinalTime

data = {
    None: {
        'sInitialTime': {None: sInitialTime},
        'sIntermediateTime': {None: sIntermediateTime},
        'sFinalTime': {None: sFinalTime},
        'sNonCashAssets': {None: non_cash_assets},
        'sScenarios': {None: sScenarios},
        'pPrices': {(s_i,non_cash_assets[a_i], t_i): prices_syms[s_i,a_i,t_i] for s_i in sScenarios for t_i in sTime for a_i in range(len(non_cash_assets))},
        'pInitialNonCashAllocations': {a: 0 for a in non_cash_assets},
        'pInitialCashAllocations': {None: STARTING_CASH},
        'pIncome': {(s_i,t_i): income_syms[s_i, t_i] for s_i in sScenarios for t_i in sNonFinalTimes},
        'pExpense': {(s_i,t_i): expenses_syms[s_i, t_i] for s_i in sScenarios for t_i in sNonFinalTimes},
        'pTradeFee': {None: TRADING_FEE},
        'pCVaRAlpha': {None: CVAR_ALPHA1},
        'pCVaRGamma': {None: CVAR_GAMMA1},
    }
}


Now the optimization model is created, the data is fed, the solver is chosen and the model is solved

In [15]:
from src.optimization.om_discrete_cvar import create_model as create_model_discrete
from src.optimization.om_continuous_cvar import create_model as create_model_continuous
from pyomo.environ import SolverFactory
from pyomo.opt.results import SolverStatus
import time

## Continuous

In [17]:
optimization_model = create_model_continuous()
instance = optimization_model.create_instance(data)
# solver = SolverFactory('cbc') # The solver chosen here is open source and not very fast. Any solver that works with PyOMO can be chosen.
solver = SolverFactory('gurobi', solver_io="python")
# Note that for the optimization to work the solver will need to be installed on the machine. 
solver.options['TimeLimit'] = 300
solver.options['NoRelHeurTime'] = 120

print(f'Starting to solve...')
t1 = time.time()
result_obj = solver.solve(instance, tee=True)
if result_obj.Solver.Status != SolverStatus.ok:
    result_obj.Solver.Status = SolverStatus.warning
    instance.solutions.load_from(result_obj)   
t2 = time.time()
print(f"Solved in {t2-t1:.2f} s")

Starting to solve...
Set parameter TimeLimit to value 300
Set parameter NoRelHeurTime to value 120
Gurobi Optimizer version 11.0.0 build v11.0.0rc2 (linux64 - "Ubuntu 22.04.3 LTS")

CPU model: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz, instruction set [SSE2|AVX|AVX2]
Thread count: 2 physical cores, 4 logical processors, using up to 4 threads

Academic license 2460039 - for non-commercial use only - registered to fe___@alu.comillas.edu
Optimize a model with 20663 rows, 16764 columns and 152401 nonzeros
Model fingerprint: 0xbaf370bd
Coefficient statistics:
  Matrix range     [1e-03, 8e+04]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [3e+02, 1e+05]
Presolve removed 8129 rows and 8120 columns
Presolve time: 0.28s
Presolved: 12534 rows, 8644 columns, 108243 nonzeros

Concurrent LP optimizer: dual simplex and barrier
Showing barrier log only...

Ordering time: 0.93s

Barrier statistics:
 Free vars  : 217
 AA' NZ     : 4.538e+05
 Factor NZ  : 3.962e+0

In [18]:
print(result_obj['Problem'])
print(result_obj['Solver'])


- Name: unknown
  Lower bound: 127535.02085151561
  Upper bound: 127535.02085151561
  Number of objectives: 1
  Number of constraints: 20663
  Number of variables: 16764
  Number of binary variables: 0
  Number of integer variables: 0
  Number of continuous variables: 16764
  Number of nonzeros: 152401
  Sense: -1
  Number of solutions: 1


- Name: Gurobi 11.00
  Status: ok
  Wallclock time: 23.25827693939209
  Termination condition: optimal
  Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.



In [19]:
from src.results import ResultsAnalyzer

In [21]:
analyzer = ResultsAnalyzer(instance)
# This second plot shows the distribution of any variable/parameter
analyzer.plot_dist(name='vTotalWealth', col_names=['scenario', 'time'], filter={'time':5}, inline_plot=True)


In [22]:
# This third plot shows the time series evolution with a certain confidence interval
analyzer.plot_ci(name='vTotalWealth', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.90, inline_plot=True)

In [24]:
analyzer.plot_ci(name='vCashAllocations', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.95, inline_plot=True)

## Discrete

In [25]:
optimization_model = create_model_discrete()
instance = optimization_model.create_instance(data)
# solver = SolverFactory('cbc') # The solver chosen here is open source and not very fast. Any solver that works with PyOMO can be chosen.
solver = SolverFactory('gurobi', solver_io="python")
# Note that for the optimization to work the solver will need to be installed on the machine. 
solver.options['TimeLimit'] = 300
solver.options['NoRelHeurTime'] = 120

print(f'Starting to solve...')
t1 = time.time()
result_obj = solver.solve(instance, tee=True)
if result_obj.Solver.Status != SolverStatus.ok:
    result_obj.Solver.Status = SolverStatus.warning
    instance.solutions.load_from(result_obj)   
t2 = time.time()
print(f"Solved in {t2-t1:.2f} s")

Starting to solve...
Set parameter TimeLimit to value 300
Set parameter NoRelHeurTime to value 120
Gurobi Optimizer version 11.0.0 build v11.0.0rc2 (linux64 - "Ubuntu 22.04.3 LTS")

CPU model: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz, instruction set [SSE2|AVX|AVX2]
Thread count: 2 physical cores, 4 logical processors, using up to 4 threads

Academic license 2460039 - for non-commercial use only - registered to fe___@alu.comillas.edu
Optimize a model with 20663 rows, 16764 columns and 152401 nonzeros
Model fingerprint: 0xcf739155
Variable types: 16053 continuous, 711 integer (0 binary)
Coefficient statistics:
  Matrix range     [1e-03, 8e+04]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [3e+02, 1e+05]
Found heuristic solution: objective 57.0304421
Presolve removed 8438 rows and 8279 columns
Presolve time: 0.43s
Presolved: 12225 rows, 8485 columns, 106125 nonzeros
Variable types: 7801 continuous, 684 integer (0 binary)
Starting NoRel heuristic
F

In [26]:
print(result_obj['Problem'])
print(result_obj['Solver'])


- Name: unknown
  Lower bound: 126295.02490712496
  Upper bound: 127495.93013992083
  Number of objectives: 1
  Number of constraints: 20663
  Number of variables: 16764
  Number of binary variables: 0
  Number of integer variables: 711
  Number of continuous variables: 16053
  Number of nonzeros: 152401
  Sense: -1
  Number of solutions: 10


- Name: Gurobi 11.00
  Wallclock time: 300.2094039916992
  Termination condition: maxTimeLimit
  Termination message: Optimization terminated because the time expended exceeded the value specified in the TimeLimit parameter.



In [27]:
from src.results import ResultsAnalyzer

In [28]:
analyzer = ResultsAnalyzer(instance)
# This second plot shows the distribution of any variable/parameter
analyzer.plot_dist(name='vTotalWealth', col_names=['scenario', 'time'], filter={'time':5}, inline_plot=True)


In [29]:
# This third plot shows the time series evolution with a certain confidence interval
analyzer.plot_ci(name='vTotalWealth', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.90, inline_plot=True)

In [30]:
analyzer.plot_ci(name='vCashAllocations', time_col='time', scenario_col='scenario',filter=None, col_names=['scenario', 'time'], confidence=.95, inline_plot=True)

As we can see, while de continuous allocation problem can be solved in less than 30 seconds, the discrete one cannot find an optimal in 300 seconds. Furthermore, being more constrained the objective value is also lower.