# Example: Online Stochastic Bandit Algorithms for Portfolio Rebalancing
In this example, we'll propose a stochastic multi-armed bandit approach to portfolio rebalancing where a bandit algorithm selects which assets to include in the portfolio, and an optimal Utility Maximization (UM) approach is used to determine the optimal weights for the selected assets. Thus, we simultaneously address the asset selection and weight allocation problems in portfolio management.

> __Learning Objectives__
> 
> By the end of this example, you should be able to:
> * __Apply multi-armed bandit algorithms__ to explore different portfolio compositions by balancing exploration of uncertain subsets with exploitation of known high-utility combinations.
> * __Formulate and solve utility maximization problems__ using Cobb-Douglas utility functions to determine optimal asset share counts for a selected portfolio, with the understanding that this has an analytical solution for asset selections.
> * __Integrate bandit selection with utility maximization__ to demonstrate how these two approaches work together to simultaneously address asset selection and weight allocation problems in portfolio management.

This material will be presented at the upcoming [INFORMS Annual Meeting 2025](https://meetings.informs.org/wordpress/annual/) in Atlanta, GA on October 26-29, 2025. Let's get started!
___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.

> __Include:__ The [`include(...)` command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 

Let's set up our code environment:

In [3]:
include(joinpath(@__DIR__, "Include-informs.jl")); # include the Include.jl file

For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/) and the [VLQuantitativeFinancePackage.jl documentation](https://github.com/varnerlab/VLQuantitativeFinancePackage.jl). 

### Data
We gathered daily open-high-low-close (OHLC) data for each firm in the [S&P 500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2025` until `10-17-2025`, along with data for several exchange-traded funds and volatility products during that time period.

Let's load the `original_dataset::DataFrame` by calling [the `MyTestingMarketDataSet()` function](https://varnerlab.github.io/VLQuantitativeFinancePackage.jl/dev/data/#VLQuantitativeFinancePackage.MyTestingMarketDataSet).

In [6]:
original_out_of_sample_dataset = MyTestingMarketDataSet() |> x-> x["dataset"] # load the original dataset (testing)

Dict{String, DataFrame} with 482 entries:
  "NI"   => [1m182×8 DataFrame[0m[0m…
  "EMR"  => [1m182×8 DataFrame[0m[0m…
  "CTAS" => [1m182×8 DataFrame[0m[0m…
  "HSIC" => [1m182×8 DataFrame[0m[0m…
  "KIM"  => [1m182×8 DataFrame[0m[0m…
  "PLD"  => [1m182×8 DataFrame[0m[0m…
  "IEX"  => [1m182×8 DataFrame[0m[0m…
  "BAC"  => [1m182×8 DataFrame[0m[0m…
  "CBOE" => [1m182×8 DataFrame[0m[0m…
  "EXR"  => [1m182×8 DataFrame[0m[0m…
  "NCLH" => [1m182×8 DataFrame[0m[0m…
  "CVS"  => [1m182×8 DataFrame[0m[0m…
  "DRI"  => [1m182×8 DataFrame[0m[0m…
  "DTE"  => [1m182×8 DataFrame[0m[0m…
  "ZION" => [1m182×8 DataFrame[0m[0m…
  "AVY"  => [1m182×8 DataFrame[0m[0m…
  "EW"   => [1m182×8 DataFrame[0m[0m…
  "EA"   => [1m182×8 DataFrame[0m[0m…
  "NWSA" => [1m182×8 DataFrame[0m[0m…
  "BBWI" => [1m182×8 DataFrame[0m[0m…
  "CAG"  => [1m182×8 DataFrame[0m[0m…
  "GPC"  => [1m182×8 DataFrame[0m[0m…
  "FCX"  => [1m182×8 DataFrame[0m[0m…
  "GILD" => [1

Not all tickers in our dataset have the maximum number of trading days for various reasons, such as acquisition or delisting events. Let's collect only those tickers with the maximum number of trading days.

First, let's compute the number of records for a firm that we know has the maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_out_of_sample_days::Int64` variable:

In [8]:
maximum_number_trading_out_of_sample_days = original_out_of_sample_dataset["AAPL"] |> nrow; # maximum number of trading days in our dataset 

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_out_of_sample_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [10]:
dataset = let

    # initialize -
    dataset = Dict{String, DataFrame}();

    # iterate through the dictionary; we can't guarantee a particular order
    for (ticker, data) ∈ original_out_of_sample_dataset  # we get each (K, V) pair!
        if (nrow(data) == maximum_number_trading_out_of_sample_days) # check if ticker has maximum trading days
            dataset[ticker] = data;
        end
    end
    dataset; # return
end;

Let's get a list of the firms in our cleaned dataset and sort them alphabetically. We store the sorted firm ticker symbols in the `list_of_tickers_price_data::Array{String,1}` variable:

In [12]:
list_of_tickers_price_data = keys(dataset) |> collect |> sort;

Finally, let's load the single index model parameters that we computed in the previous example. We'll store this data in the `sim_model_parameters::Dict{String,NamedTuple}` variable. In addition, we return a few other useful variables, such as the historical market growth rate, the mean and variance of the market growth, etc.

In [14]:
sim_model_parameters,Gₘ,Ḡₘ, Varₘ = let

    # initialize -
    path_to_sim_model_parameters = joinpath(_PATH_TO_DATA,"SIMs-SPY-SP500-01-03-14-to-12-31-24.jld2");
    sim_model_parameters = JLD2.load(path_to_sim_model_parameters);
    parameters = sim_model_parameters["data"]; # return

    Gₘ = sim_model_parameters["Gₘ"]; # Get the past market growth rate 
    Ḡₘ = sim_model_parameters["Ḡₘ"]; # mean of market growth rates
    Varₘ = sim_model_parameters["Varₘ"]; # variance of market growth

    # return -
    parameters,  Gₘ , Ḡₘ, Varₘ;
end;

Now let's get a list of all tickers for which we have single index model parameters:

In [16]:
tickers_that_we_sim_sim_data_for = keys(sim_model_parameters) |> collect |> sort;

We need to use only the tickers for which we have both price data and SIM parameters. We'll compute [the intersection of the two lists](https://docs.julialang.org/en/v1/base/collections/#Base.intersect) and store the result in the `list_of_tickers::Array{String,1}` variable:

In [18]:
list_of_tickers = intersect(tickers_that_we_sim_sim_data_for, list_of_tickers_price_data);

Finally, we need to compute the projection matrix `X̄::Array{Float64,2}` = $(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}$ (which will be handy for later) where $\mathbf{X}$ is the market factor matrix:

In [20]:
X̄ = let
    
    
    Rₘ = Gₘ; # get Gₘ from the sim model parameters
    max_length = length(Rₘ);
    X = [ones(max_length) Rₘ];

    a = inv(transpose(X)*X)*transpose(X)
    a;
end

2×2766 Matrix{Float64}:
  0.000367293  0.000355057  0.0003607   …   0.000382956   0.000370094
 -5.4289e-5    6.10402e-5   7.85216e-6     -0.000201909  -8.06852e-5

### Constants
Finally, let's set some constants we'll use later in this notebook. The comments describe the constants, their units, and permissible values:

In [22]:
Δt = (1/252); # time-step
total_number_of_tickers = length(list_of_tickers); # how many tickers do we have in my dataset?
investment_budget = 10000.0; # initial budget of the agent
risk_free_rate = 0.0418; # risk free rate
μₘ =  Ḡₘ; # expected growth for SPY
B = investment_budget; # TODO: Budget for portfolio. We can change this later if we want
ϵ = 0.001; # hyperparameter: minimum share number for each asset
α::Float64 = 0.80; # learning rate for moving average calculation
Tₘ::Int64 = 10; # how many days for market time
λ::Float64 = 1; # risk scale

### Implementation
Next, build the `world(...)` function. The `world(...)` function takes the action vector `a::Vector{Int64}` where the elements of `a::Vector{Int64}` are binary variables indicating whether to select an asset (`1`) or not (`0`). The length of the action vector `a` is $|\dim\mathcal{P}|$, i.e., the total number of assets available for selection.

> __What's in the action vector?__ The action vector `a::Vector{Int64}` is a binary vector of length $|\dim\mathcal{P}|$ where each element indicates whether to include (`1`) or exclude (`0`) the corresponding asset from the portfolio. For example, if we have 5 assets and the action vector is `[1, 0, 1, 0, 1]`, it means we are selecting assets 1, 3, and 5 for inclusion in the portfolio while excluding assets 2 and 4.

The `world(...)` function returns the portfolio Utility, the optimal number of shares for each selected asset, the fill price for each selected asset, and the user preference values $\gamma_i$ for each selected asset. 

In [24]:
function world(t::Int, a::Vector{Int64}, context::MyDynamicBanditPortfolioAllocationContextModel)

    # initialize -
    total_number_of_assets = context.number_of_assets; # total number of assets that we *could* purchase
    bounds = context.bounds; # bounds for how much we purchase
    B = context.B; # budget for this period
    mylocaltickers = context.tickers; # tickers for the assets in this portfolio
    localdataset = context.dataset; # dataset for the assets in this portfolio
    singleindexmodels = context.singleindexmodels; # single index models for the assets in this portfolio
    X̄ = context.X̄;
    number_of_samples_to_draw = context.number_of_samples_to_draw;

    # what is the min share purchase?
    min_share_purchase = bounds[1,1];

    # Compute the fill price vector -
    pₒ = zeros(total_number_of_assets);
    for i ∈ eachindex(mylocaltickers)
        ticker = mylocaltickers[i];
        H = localdataset[ticker][t,:high];
        L = localdataset[ticker][t,:low];
        f = rand();
        pₒ[i] = f*H + (1-f)*L; # randomness in the fill price
    end

    # next, compute the preference parameters γ -
    γ = zeros(total_number_of_assets);
    for i ∈ eachindex(mylocaltickers)
        ticker = mylocaltickers[i];

        # get the model -
        simmodel = singleindexmodels[ticker];
        α = simmodel.alpha;
        β = simmodel.beta;
        CI_alpha_L = simmodel.alpha_95_CI_lower;
        CI_alpha_U = simmodel.alpha_95_CI_upper;
        CI_beta_L = simmodel.beta_95_CI_lower;
        CI_beta_U = simmodel.beta_95_CI_upper;

        # compute random parameters -
        # αᵢ = rand(Uniform(CI_alpha_L, CI_alpha_U)); # sample from the confidence interval
        # βᵢ = rand(Uniform(CI_beta_L, CI_beta_U));
        αᵢ = α; # use the mean value
        βᵢ = β; # use the mean value

        # Compute the alpha and beta values -
        R = αᵢ + βᵢ*μₘ; # draw random value from the error distribution -
        γ[i] = tanh_fast(R/(βᵢ^λ));
    end

    # Compute the optimal share count -
    n = zeros(total_number_of_assets); # initialize space for optimal solution
    S = findall(aᵢ -> aᵢ == 1, a); # Which assets does our bandit want us to buy?
    number_of_selected_arms = length(S);

    # In the set of assets to explore, do we have any non-preferred assets?
    negative_gamma_flag = any(γ[S] .< 0);
    if (negative_gamma_flag == false)
        
        # easy case: all of my potential assets are preferred.
        γ̄ = sum(γ[S]);
        B̄ = B;
        for s ∈ S
            n[s] = (γ[s]/γ̄)*(B̄/pₒ[s]);
        end
    else

        # hard case: some assets are *not* preferred. 
        
        # Prep work for non-preferred case
        # First: the non-preferred assets are min_share_purchase -
        # Second: Compute the adjusted budget
        # Third: Compute γ̄
        B̄ = B;
        γ̄ = 0.0;
        for s ∈ S
            if (γ[s] < 0.0)
                B̄ += -min_share_purchase*pₒ[s];
                n[s] = min_share_purchase;
            else
                γ̄ += γ[s];
            end
        end

        # compute the optimal preferred assets -
        for s ∈ S
            if (γ[s] ≥ 0.0)
                n[s] = (γ[s]/γ̄)*(B̄/pₒ[s]);
            end
        end
    end

    # premultiplier -
    κ = negative_gamma_flag == true ? -1.0 : 1.0;
    
    # Compute the optimal utility -
    U = κ;
    for s ∈ S
        U *= (n[s]^γ[s])
    end
    
    # Return the utility and the share count that we allocated
    return U, n, pₒ, γ
end;

___

## Task 1: Maximum Utility Portfolio Optimization Problem
In this task, we'll solve the maximum utility portfolio optimization problem for a given set that we select such that we have the maximum __investor satisfaction__, i.e., maximum Utility. Let's start by specifying the possible tickers that our online stochastic bandit algorithm can choose from. We'll store these tickers in the `my_test_portfolio_tickers::Array{String,1}` variable:

In [27]:
my_test_portfolio_tickers = ["AAPL", "MSFT", "INTC", "MU", "AMD", "GS", "BAC", "WFC", "C", "F", "GM", 
    "JNJ", "PG", "UPS", "COST", "TGT", "WMT", "MRK", "PFE", "ADBE"]; # tickers selected for portfolio

In [28]:
length(my_test_portfolio_tickers) # how many tickers do we have?

20

Given this set of possible tickers, let's compute the expected growth rate vector and the covariance matrix for these assets using their single index model parameters. We don't use these values in the bandit calculation, but we will need them to compute the Sharpe ratio of the resulting portfolio later. 

We'll store these in the `μ̂_sim::Array{Float64,1}` and `Σ̂_sim::Array{Float64,2}` variables:

In [30]:
μ̂_sim, Σ̂_sim = let

    # initialize -
    N = length(my_test_portfolio_tickers); # number of assets in portfolio
    μ_sim = Array{Float64,1}(); # drift vector
    Σ̂_sim = Array{Float64,2}(undef, N, N); # covariance matrix for *our* portfolio
    
    # Ḡₘ = mean(Gₘ); # mean of market factor # TODO: already loaded above, we'll use 2014 - 2024 data value
    σ²ₘ = Varₘ; # variance of market factor # TODO: already loaded above, we'll use 2014 - 2024 data value

    # compute the expected growth rate (return) for each of our tickers -
    for i ∈ eachindex(my_test_portfolio_tickers)
        ticker = my_test_portfolio_tickers[i];
        data = sim_model_parameters[ticker]; # get the data for this ticker
        αᵢ = data.alpha; # get alpha
        βᵢ = data.beta; # get beta
        Ḡᵢ = αᵢ + βᵢ* Ḡₘ; # compute the growth rate for this ticker
        push!(μ_sim, Ḡᵢ); # append growth rate value to μ_sim
    end

    # compute the covariance matrix using the single index model -
    for i ∈ eachindex(my_test_portfolio_tickers)

        ticker_i = my_test_portfolio_tickers[i];
        data_i = sim_model_parameters[ticker_i]; # get the data for ticker i
        βᵢ = data_i.beta; # get beta for ticker i
        σ²_εᵢ = (Δt)*data_i.training_variance; # residual variance for ticker i

        for j ∈ eachindex(my_test_portfolio_tickers)
            
            ticker_j = my_test_portfolio_tickers[j];
            data_j = sim_model_parameters[ticker_j]; # get the data for ticker j
            βⱼ = data_j.beta; # get beta for ticker j
            σ²_εⱼ = (Δt)*data_j.training_variance; # residual variance for ticker j
            
            if i == j
                Σ̂_sim[i,j] = βᵢ*βⱼ*σ²ₘ + σ²_εᵢ; # diagonal elements
            else
                Σ̂_sim[i,j] = βᵢ*βⱼ*σ²ₘ; # off-diagonal elements
            end
        end
    end

    (μ_sim, Σ̂_sim*Δt); # return
end;

To implement our bandit strategy, we first build the epsilon-greedy dynamic noise algorithm model. This model holds some information about the problem, e.g., the number of assets available for selection, and the learning rate $\alpha$. We store this model in the `dynamic_algorithm_model::MyEpsilonGreedyDynamicNoiseAlgorithmModel` variable:

In [32]:
dynamic_algorithm_model = let

    # initialize -
    algorithm = nothing; # Initialize the algorithm variable to nothing; this variable will be used to store the algorithm model
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    
    # TODO: Build an algorithm model by uncommenting the code block below
    algorithm = build(MyEpsilonGreedyDynamicNoiseAlgorithmModel, (
        K = K, # arms 
        α = α, # learning rate
    ));

    # return the algorithm -
    algorithm;
end;

Next, we set bounds on the number of shares that we can hold for each asset in our portfolio. The lower bound `ϵ::Float64` represents the __minimum number of shares__ required for portfolio inclusion, while the upper bound is unbounded to allow for flexible allocation. 

> __Why a non-zero lower bound?__ Setting a non-zero lower bound `ϵ::Float64` follows from the form of the Cobb-Douglas utility function, if any asset has zero allocation, the overall utility becomes zero which does not reflect realistic investor behavior. Thus, we set a small positive value for `ϵ::Float64` to ensure that each selected asset contributes positively to the portfolio's utility.

We'll store this in the `static_share_bounds::Array{Float64,2}` variable:

In [34]:
static_share_bounds = let

    # initialize -
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    bounds = Array{Float64,2}(undef, K, 2);
    
    # build my bounds array -
    for i ∈ eachindex(my_test_portfolio_tickers)
        bounds[i,1] = ϵ; # min shares that we can hold of this asset
        bounds[i,2] = Inf; # max number of shares, Inf says this is unbounded
    end
    bounds;
end;

Now we build the context model that encapsulates all the data and parameters needed for the bandit algorithm to make portfolio decisions. The context includes the market data, single index model parameters, ticker symbols, budget constraints, and bounds on share purchases. We'll store this in the `dynamic_context_model::MyDynamicBanditPortfolioAllocationContextModel` variable:

In [36]:
dynamic_context_model = let

    # initialize -
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    nₒ = ones(K); # initial guess, assume 1 x share for each

    # build -
    contextmodel = build(MyDynamicBanditPortfolioAllocationContextModel, (
        dataset = dataset,
        singleindexmodels = sim_model_parameters,
        tickers = my_test_portfolio_tickers,
        nₒ = nₒ,
        B = B,
        bounds = static_share_bounds,
        X̄ = X̄,
        R̄ₘ = Ḡₘ,
        number_of_samples_to_draw = size(X̄,2),
        μₒ = zeros(2^K) # initially we have *no* idea which arm is best
    ));

    contextmodel;
end;

Finally, we test our approach by solving the utility maximization problem for the case where we select all available assets (by setting the action vector `a::Vector{Int64}` to all ones). This gives us a baseline utility `U::Float64` we could achieve with our full portfolio. 

> __Expectation:__ If this doesn't blow up, then it will return a finite utility value, optimal share counts, fill prices, and preference parameters for each asset in the portfolio. Then we can move on to implementing the bandit algorithm to select subsets of assets dynamically.

So what do we see?

In [38]:
(U, n, pₒ, γ) = let
   
    # call my world function to test -
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    t = 1; # what time period are we in?

    # call the world function directly -
    U, n, pₒ, γ = world(t, ones(Int64,K), dynamic_context_model);
end;

In [39]:
U

-17.70410917415204

__Preferred versus Non-Preferred Assets:__ The `γ::Array{Float64,1}` vector computed above indicates whether an asset is preferred or non-preferred. If $\gamma_i \geq 0$, then the asset is preferred; otherwise, it is non-preferred. Let's look at what assets are preferred versus non-preferred:

In [41]:
let

    # initialize -
    df = DataFrame();

    for i ∈ eachindex(my_test_portfolio_tickers)
        ticker = my_test_portfolio_tickers[i];

        # get the model -
        row_df = (
            ticker = ticker,
            γᵢ = γ[i],
            preferred = γ[i] >= 0 ? "Yes" : "No",

        );
        push!(df, row_df);
    end

    # make a table -
     pretty_table(df, backend = :text, fit_table_in_display_vertically = false, fit_table_in_display_horizontally = false,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

 -------- ------------ -----------
 [1m ticker [0m [1m         γᵢ [0m [1m preferred [0m
 [90m String [0m [90m    Float64 [0m [90m    String [0m
 -------- ------------ -----------
    AAPL      0.19245         Yes
    MSFT     0.190485         Yes
    INTC   -0.0192358          No
      MU    0.0745648         Yes
     AMD     0.177492         Yes
      GS    0.0810726         Yes
     BAC    0.0659029         Yes
     WFC    0.0322735         Yes
       C     0.017186         Yes
       F   -0.0283554          No
      GM     0.018269         Yes
     JNJ    0.0759124         Yes
      PG     0.135188         Yes
     UPS    0.0203514         Yes
    COST     0.255981         Yes
     TGT    0.0826703         Yes
     WMT     0.230516         Yes
     MRK     0.111904         Yes
     PFE   -0.0210359          No
    ADBE     0.144215         Yes
 -------- ------------ -----------


Now that we have the preferred versus non-preferred assets, we'll make a table that summarizes the resulting shares `n::Array{Float64,1}` purchased for each asset in our test portfolio:

In [43]:
let

    # initialize -
    df = DataFrame();
    CS = 0;
    t = 1;
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    context = dynamic_context_model;
    total_number_of_assets = context.number_of_assets; # total number of assets that we *could* purchase
    bounds = context.bounds; # bounds for how much we purchase
    B = context.B; # budget for this period
    mylocaltickers = context.tickers; # tickers for the assets in this portfolio
    localdataset = context.dataset; # dataset for the assets in this portfolio
    singleindexmodels = context.singleindexmodels; # single index models for the assets in this portfolio
    N = 2^K; # number of possible portfolios
    

    for i ∈ 1:K

        CS += n[i]*pₒ[i];
        wᵢ = (n[i]*pₒ[i]/B);

        row_df = (
            ticker = my_test_portfolio_tickers[i],
            Sᵢ = pₒ[i],
            γᵢ = γ[i],
            nᵢ = n[i],
            wᵢ = wᵢ,
            UC = n[i]*pₒ[i],
            CS = CS,
        );
        push!(df, row_df);
    end

    pretty_table(df, backend = :text, fit_table_in_display_vertically = false, fit_table_in_display_horizontally = false,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

 -------- --------- ------------ ---------- ------------ ------------ ---------
 [1m ticker [0m [1m      Sᵢ [0m [1m         γᵢ [0m [1m       nᵢ [0m [1m         wᵢ [0m [1m         UC [0m [1m      CS [0m
 [90m String [0m [90m Float64 [0m [90m    Float64 [0m [90m  Float64 [0m [90m    Float64 [0m [90m    Float64 [0m [90m Float64 [0m
 -------- --------- ------------ ---------- ------------ ------------ ---------
    AAPL   242.232      0.19245    4.16738     0.100947      1009.47   1009.47
    MSFT   420.973     0.190485    2.37346    0.0999163      999.163   2008.63
    INTC   20.2662   -0.0192358      0.001   2.02662e-6    0.0202662   2008.65
      MU   89.0603    0.0745648    4.39163     0.039112       391.12   2399.77
     AMD   125.309     0.177492    7.42974    0.0931013      931.013   3330.79
      GS   575.044    0.0810726   0.739518    0.0425255      425.255   3756.04
     BAC    44.779    0.0659029     7.7198    0.0345685      345.685   4101.73
     WFC

## Task 2: Solve the Online Bandit Portfolio Optimization Problem
In this task, we use the epsilon-greedy bandit algorithm to explore different portfolio combinations and learn which asset subsets maximize investor utility. 

The algorithm iteratively selects actions (portfolio compositions), observes the resulting utility, and updates its beliefs about the reward distribution. The results are stored in the return variables `R::Array{Float64,2}`, `μ::Array{Float64,1}`, `S::Array{Float64,2}`, `P::Array{Float64,2}`, and `G::Array{Float64,2}`. 

Let's run the algorithm for 250 iterations:

In [45]:
R, μ, S, P, G = my_bandit_solve(dynamic_algorithm_model, T = 250, world = world, context = dynamic_context_model); # check the world function

In [46]:
S # share counts over for each trial

250×20 Matrix{Float64}:
  4.14801  2.38068  0.001   4.46029  …  13.4101   5.91984  0.001  1.76709
 11.8644   0.0      0.001  12.5902       0.0     16.9063   0.001  0.0
  0.0      0.0      0.001  13.7247       0.0      0.0      0.0    0.0
  0.0      0.0      0.001   0.0          0.0     12.5078   0.001  3.72355
  0.0      0.0      0.001   0.0         49.3494  21.837    0.0    0.0
  0.0      0.0      0.0     0.0      …   0.0      0.0      0.0    0.0
  8.11606  0.0      0.001   0.0          0.0      0.0      0.0    3.45939
  0.0      0.0      0.0     0.0          0.0      0.0      0.0    0.0
  7.93297  4.53686  0.0     8.42912     25.5535  11.3487   0.0    0.0
  7.96371  4.53715  0.0     8.3248      25.5728  11.288    0.0    0.0
  0.0      6.25621  0.0     0.0      …  35.4953  15.6272   0.001  0.0
  0.0      7.39054  0.001   0.0          0.0      0.0      0.0    5.54549
  7.96437  4.54224  0.0     8.32898     25.4604  11.2767   0.0    0.0
  ⋮                                  ⋱            

We can sort the possible portfolio compositions by the estimated average utility `μ::Array{Float64,1}` to see which portfolios the bandit algorithm thinks are best:

In [48]:
sorted_reward_index_array = sortperm(μ, rev=true) # sort the index of the rewards

1048576-element Vector{Int64}:
  204827
  212153
  731202
  207153
  250339
   55538
  783507
   31146
  785475
  716081
  732352
   12787
   18546
       ⋮
  459312
 1039118
  306903
  125165
  158703
  864850
    8031
  473966
  445757
  776935
  219725
  506778

__What's in the best portfolio?__  
Let's build a table that shows the optimal portfolio composition identified by the bandit algorithm after 250 periods of exploration. 

The utility value `U::Float64` indicates the investor satisfaction level for this portfolio, while the weights `wᵢ::Float64` show the proportion of budget allocated to each asset. Notice that the bandit algorithm may have selected a portfolio subset (where some `nᵢ::Float64` values are near zero) rather than including all assets, demonstrating how the algorithm balances portfolio complexity with investor utility.

In [50]:
let

    # initialize -
    df = DataFrame();
    CS = 0;
    t = 1;
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    context = dynamic_context_model;
    total_number_of_assets = context.number_of_assets; # total number of assets that we *could* purchase
    bounds = context.bounds; # bounds for how much we purchase
    B = context.B; # budget for this period
    mylocaltickers = context.tickers; # tickers for the assets in this portfolio
    localdataset = context.dataset; # dataset for the assets in this portfolio
    singleindexmodels = context.singleindexmodels; # single index models for the assets in this portfolio
    N = 2^K; # number of possible portfolios

    # compute the best portfolio -
    j = sorted_reward_index_array[1]; # get a portfolio index 
    a = digits(j, base=2, pad=K); # generate a binary representation of the number, with K digits  
    if (j == N)
        a = digits(j - 1, base=2, pad=K); # generate a binary representation of the number, with K digits  
    end

    # call world -
    U, n, pₒ, γ = world(t, a, dynamic_context_model);

    @show U,j

    for i ∈ 1:K

        CS += n[i]*pₒ[i];
        wᵢ = (n[i]*pₒ[i]/B);

        row_df = (
            ticker = my_test_portfolio_tickers[i],
            Sᵢ = pₒ[i],
            γᵢ = γ[i],
            nᵢ = n[i],
            wᵢ = wᵢ,
            UC = n[i]*pₒ[i],
            CS = CS,
        );
        push!(df, row_df);
    end

     pretty_table(df, backend = :text, fit_table_in_display_vertically = false, fit_table_in_display_horizontally = false,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

(U, j) = (10.463601731372094, 204827)
 -------- --------- ------------ --------- ---------- --------- ---------
 [1m ticker [0m [1m      Sᵢ [0m [1m         γᵢ [0m [1m      nᵢ [0m [1m       wᵢ [0m [1m      UC [0m [1m      CS [0m
 [90m String [0m [90m Float64 [0m [90m    Float64 [0m [90m Float64 [0m [90m  Float64 [0m [90m Float64 [0m [90m Float64 [0m
 -------- --------- ------------ --------- ---------- --------- ---------
    AAPL   244.038      0.19245   7.90375   0.192882   1928.82   1928.82
    MSFT    419.84     0.190485   4.54726   0.190912   1909.12   3837.94
    INTC   20.4067   -0.0192358       0.0        0.0       0.0   3837.94
      MU   90.1479    0.0745648   8.28993   0.074732    747.32   4585.26
     AMD   122.676     0.177492   14.5009    0.17789    1778.9   6364.16
      GS   573.161    0.0810726       0.0        0.0       0.0   6364.16
     BAC   44.7943    0.0659029       0.0        0.0       0.0   6364.16
     WFC   71.3067    0.0322735     

Ok, so what is the Sharpe ratio of this portfolio?

In [52]:
let

    # initialize -
    t = 1;
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    N = 2^K; # number of possible portfolios

    # compute the best portfolio -
    j = sorted_reward_index_array[1]; # get a portfolio index 
    a = digits(j, base=2, pad=K); # generate a binary representation of the number, with K digits  
    if (j == N)
        a = digits(j - 1, base=2, pad=K); # generate a binary representation of the number, with K digits  
    end

    # call world -
    U, n, pₒ, γ = world(t, a, dynamic_context_model);

    # compute the weights -
    w = zeros(K);
    for i ∈ 1:K
        w[i] = (n[i]*pₒ[i])/B;
    end

    # get the alpha and beta values for each asset in the portfolio -
    α_vector = zeros(K);
    β_vector = zeros(K);

    # compute the alpha for the portfolio -
   for i ∈ 1:K
       ticker = my_test_portfolio_tickers[i];
       α_vector[i] = sim_model_parameters[ticker].alpha;
       β_vector[i] = sim_model_parameters[ticker].beta;
   end

   # compute the alpha and beta for the portfolio -
   α_portfolio = dot(w, α_vector);
   β_portfolio = dot(w, β_vector);
   numerator = α_portfolio + β_portfolio*μₘ - risk_free_rate;
   denominator = sqrt(transpose(w)*Σ̂_sim*w);
   sharpe_ratio = numerator/denominator;

   println("The Sharpe ratio of this portfolio is: $(sharpe_ratio)");
end

The Sharpe ratio of this portfolio is: 0.8173463948910116


## Summary
This example demonstrates how online stochastic bandit algorithms can be effectively applied to solve the combined asset selection and weight allocation problems in portfolio management through the integration of epsilon-greedy exploration with utility maximization.

> __Key Takeaways:__
> 
> * __Bandit Algorithms for Portfolio Selection:__ Multi-armed bandit algorithms like epsilon-greedy enable adaptive asset selection by balancing exploration of uncertain portfolios with exploitation of known high-utility combinations, avoiding the need to exhaustively evaluate all possible subsets.
> * __Utility Maximization for Weight Allocation:__ Once a portfolio subset is selected, Cobb-Douglas utility functions provide a principled approach to weight allocation that reflects investor preferences and risk tolerance, with the preference parameters `γ::Array{Float64,1}` distinguishing between preferred and non-preferred assets.
> * __Integration of Selection and Allocation:__ The combination of bandit algorithms for asset selection with Cobb-Douglas utility maximization for weight allocation provides a theoretically sound and computationally efficient framework for simultaneously addressing both problems in portfolio optimization.

By combining these techniques, investors can manage both the complexity of asset selection and the precision of weight allocation in a computationally efficient and theoretically sound framework.
___

## Disclaimer and Risks
__This content is offered solely for training and informational purposes__. No offer or solicitation to buy or sell securities or derivative products or any investment or trading advice or strategy is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. You should decide solely based on your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.