# Example: Online Stochastic Bandit Algorithms for Portfolio Rebalancing
In this example, we'll propose a stochastic multi-armed bandit approach to portfolio rebalancing where a bandit algorithm selects which assets to include in the portfolio, and an optimal Utility Maximization (UM) approach is used to determine the optimal weights for the selected assets. Thus, we simultaneously address the asset selection and weight allocation problems in portfolio management.

> __Learning Objectives__
> 
> By the end of this example, you should be able to:
> * __Apply multi-armed bandit algorithms__ to dynamically select which assets to include in a portfolio based on historical performance and uncertainty.
> * __Formulate and solve utility maximization problems__ using Cobb-Douglas utility functions to determine optimal asset weights for a selected portfolio.
> * __Combine selection and allocation strategies__ to implement an integrated online learning framework that adapts portfolio composition over time.

This material will be presented at the upcoming [INFORMS Annual Meeting 2024](https://meetings.informs.org/annual2024/) in Atlanta, GA on October 26-29, 2025. Let's get started!
___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.

> __Include:__ The [`include(...)` command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 

Let's set up our code environment:

In [1]:
include(joinpath(@__DIR__, "Include-informs.jl")); # include the Include.jl file

For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/) and the [VLQuantitativeFinancePackage.jl documentation](https://github.com/varnerlab/VLQuantitativeFinancePackage.jl). 

### Data
We gathered daily open-high-low-close (OHLC) data for each firm in the [S&P 500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2025` until `10-17-2025`, along with data for several exchange-traded funds and volatility products during that time period.

Let's load the `original_dataset::DataFrame` by calling [the `MyTestingMarketDataSet()` function](https://varnerlab.github.io/VLQuantitativeFinancePackage.jl/dev/data/#VLQuantitativeFinancePackage.MyTestingMarketDataSet).

In [2]:
original_out_of_sample_dataset = MyTestingMarketDataSet() |> x-> x["dataset"] # load the original dataset (testing)

Dict{String, DataFrame} with 482 entries:
  "NI"   => [1m182×8 DataFrame[0m[0m…
  "EMR"  => [1m182×8 DataFrame[0m[0m…
  "CTAS" => [1m182×8 DataFrame[0m[0m…
  "HSIC" => [1m182×8 DataFrame[0m[0m…
  "KIM"  => [1m182×8 DataFrame[0m[0m…
  "PLD"  => [1m182×8 DataFrame[0m[0m…
  "IEX"  => [1m182×8 DataFrame[0m[0m…
  "BAC"  => [1m182×8 DataFrame[0m[0m…
  "CBOE" => [1m182×8 DataFrame[0m[0m…
  "EXR"  => [1m182×8 DataFrame[0m[0m…
  "NCLH" => [1m182×8 DataFrame[0m[0m…
  "CVS"  => [1m182×8 DataFrame[0m[0m…
  "DRI"  => [1m182×8 DataFrame[0m[0m…
  "DTE"  => [1m182×8 DataFrame[0m[0m…
  "ZION" => [1m182×8 DataFrame[0m[0m…
  "AVY"  => [1m182×8 DataFrame[0m[0m…
  "EW"   => [1m182×8 DataFrame[0m[0m…
  "EA"   => [1m182×8 DataFrame[0m[0m…
  "NWSA" => [1m182×8 DataFrame[0m[0m…
  ⋮      => ⋮

Not all tickers in our dataset have the maximum number of trading days for various reasons, such as acquisition or delisting events. Let's collect only those tickers with the maximum number of trading days.

First, let's compute the number of records for a firm that we know has the maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_out_of_sample_days::Int64` variable:

In [3]:
maximum_number_trading_out_of_sample_days = original_out_of_sample_dataset["AAPL"] |> nrow; # maximum number of trading days in our dataset 

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_out_of_sample_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [4]:
dataset = let

    # initialize -
    dataset = Dict{String, DataFrame}();

    # iterate through the dictionary; we can't guarantee a particular order
    for (ticker, data) ∈ original_out_of_sample_dataset  # we get each (K, V) pair!
        if (nrow(data) == maximum_number_trading_out_of_sample_days) # check if ticker has maximum trading days
            dataset[ticker] = data;
        end
    end
    dataset; # return
end;

Let's get a list of the firms in our cleaned dataset and sort them alphabetically. We store the sorted firm ticker symbols in the `list_of_tickers_price_data::Array{String,1}` variable:

In [5]:
list_of_tickers_price_data = keys(dataset) |> collect |> sort;

Finally, let's load the single index model parameters that we computed in the previous example. We'll store this data in the `sim_model_parameters::Dict{String,NamedTuple}` variable:

In [6]:
sim_model_parameters,Gₘ,Ḡₘ, Varₘ = let

    # initialize -
    path_to_sim_model_parameters = joinpath(_PATH_TO_DATA,"SIMs-SPY-SP500-01-03-14-to-12-31-24.jld2");
    sim_model_parameters = JLD2.load(path_to_sim_model_parameters);
    parameters = sim_model_parameters["data"]; # return

    Gₘ = sim_model_parameters["Gₘ"]; # Get the past market growth rate 
    Ḡₘ = sim_model_parameters["Ḡₘ"]; # mean of market growth rates
    Varₘ = sim_model_parameters["Varₘ"]; # variance of market growth

    # return -
    parameters,  Gₘ , Ḡₘ, Varₘ;
end;

Now let's get a list of all tickers for which we have single index model parameters:

In [7]:
tickers_that_we_sim_sim_data_for = keys(sim_model_parameters) |> collect |> sort;

We need to use only the tickers for which we have both price data and SIM parameters. We'll compute the intersection of the two lists and store the result in the `list_of_tickers::Array{String,1}` variable:

In [8]:
list_of_tickers = intersect(tickers_that_we_sim_sim_data_for, list_of_tickers_price_data);

### Prepare the Past Market Data
Finally, we need to construct a matrix $\mathbf{X}$ that contains the historical market data to compute the ordinary least squares (OLS) estimates of the alpha parameters. The matrix $\mathbf{X}$ has columns corresponding to a constant term and the past market growth rates. This matrix will be used to compute the projection matrix `X̄::Array{Float64,2}` = $(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$:

In [9]:
X̄ = let
    
    
    Rₘ = Gₘ; # get Gₘ from the sim model parameters
    max_length = length(Rₘ);
    X = [ones(max_length) Rₘ];

    a = inv(transpose(X)*X)*transpose(X)
    a;
end

2×2766 Matrix{Float64}:
  0.000367293  0.000355057  0.0003607   …   0.000382956   0.000370094
 -5.4289e-5    6.10402e-5   7.85216e-6     -0.000201909  -8.06852e-5

### Constants
Finally, let's set some constants we'll use later in this notebook. The comments describe the constants, their units, and permissible values:

In [10]:
Δt = (1/252); # time-step
total_number_of_tickers = length(list_of_tickers); # how many tickers do we have in my dataset?
investment_budget = 10000.0; # initial budget of the agent
risk_free_rate = 0.0418; # risk free rate
μₘ =  Ḡₘ; # expected growth for SPY
B = investment_budget; # TODO: Budget for portfolio. We can change this later if we want
ϵ = 0.001; # hyperparameter: minimum share number for each asset
α::Float64 = 0.80; # learning rate for moving average calculation
Tₘ::Int64 = 10; # how many days for market time
λ::Float64 = 1; # risk scale

### Implementation
Next, build the `world(...)` function. The `world(...)` function takes the action vector `a::Vector{Int64}` where the elements of `a::Vector{Int64}` are binary variables indicating whether to select an item (`1`) or not (`0`). The length of the action vector `a` is $N$, the total number of _combinations_ available for selection.

In [11]:
function world(t::Int, a::Vector{Int64}, context::MyDynamicBanditPortfolioAllocationContextModel)

    # initialize -
    total_number_of_assets = context.number_of_assets; # total number of assets that we *could* purchase
    bounds = context.bounds; # bounds for how much we purchase
    B = context.B; # budget for this period
    mylocaltickers = context.tickers; # tickers for the assets in this portfolio
    localdataset = context.dataset; # dataset for the assets in this portfolio
    singleindexmodels = context.singleindexmodels; # single index models for the assets in this portfolio
    X̄ = context.X̄;
    number_of_samples_to_draw = context.number_of_samples_to_draw;

    # what is the min share purchase?
    min_share_purchase = bounds[1,1];

    # Compute the fill price vector -
    pₒ = zeros(total_number_of_assets);
    for i ∈ eachindex(mylocaltickers)
        ticker = mylocaltickers[i];
        H = localdataset[ticker][t,:high];
        L = localdataset[ticker][t,:low];
        f = rand();
        pₒ[i] = f*H + (1-f)*L; # randomness in the fill price
        # pₒ[i] = localdataset[ticker][t,:close]; # remove randomness 
    end

    # next, compute the preference parameters γ -
    γ = zeros(total_number_of_assets);
    for i ∈ eachindex(mylocaltickers)
        ticker = mylocaltickers[i];

        # get the model -
        simmodel = singleindexmodels[ticker];
        α = simmodel.alpha;
        β = simmodel.beta;
        CI_alpha_L = simmodel.alpha_95_CI_lower;
        CI_alpha_U = simmodel.alpha_95_CI_upper;
        CI_beta_L = simmodel.beta_95_CI_lower;
        CI_beta_U = simmodel.beta_95_CI_upper;

        # compute random parameters -
        αᵢ = rand(Uniform(CI_alpha_L, CI_alpha_U)); # sample from the confidence interval
        βᵢ = rand(Uniform(CI_beta_L, CI_beta_U));

        # Compute the alpha and beta values -
        R = αᵢ + βᵢ*μₘ; # draw random value from the error distribution -
        γ[i] = tanh_fast(R/(βᵢ^λ));
    end

    # Compute the optimal share count -
    n = zeros(total_number_of_assets); # initialize space for optimal solution
    S = findall(aᵢ -> aᵢ == 1, a); # Which assets does our bandit want us to buy?
    number_of_selected_arms = length(S);

    # In the set of assets to explore, do we have any non-preferred assets?
    negative_gamma_flag = any(γ[S] .< 0);
    if (negative_gamma_flag == false)
        
        # easy case: all of my potential assets are preferred.
        γ̄ = sum(γ[S]);
        B̄ = B;
        for s ∈ S
            n[s] = (γ[s]/γ̄)*(B̄/pₒ[s]);
        end
    else

        # hard case: some assets are *not* preferred. 
        
        # Prep work for non-preferred case
        # First: the non-preferred assets are min_share_purchase -
        # Second: Compute the adjusted budget
        # Third: Compute γ̄
        B̄ = B;
        γ̄ = 0.0;
        for s ∈ S
            if (γ[s] < 0.0)
                B̄ += -min_share_purchase*pₒ[s];
                n[s] = min_share_purchase;
            else
                γ̄ += γ[s];
            end
        end

        # compute the optimal preferred assets -
        for s ∈ S
            if (γ[s] ≥ 0.0)
                n[s] = (γ[s]/γ̄)*(B̄/pₒ[s]);
            end
        end
    end

    # premultiplier -
    κ = negative_gamma_flag == true ? -1.0 : 1.0;
    
    # Compute the optimal utility -
    U = κ;
    for s ∈ S
        U *= (n[s]^γ[s])
    end
    
    # Return the utility and the share count that we allocated
    return U, n, pₒ, γ
end;

___

## Task 1: Maximum Utility Portfolio Optimization Problem
In this task, we'll solve the maximum utility portfolio optimization problem for a given set that we select such that we have the maximum __investor satisfaction__. Let's start by specifying the possible tickers that our online stochastic bandit algorithm can choose from. We'll store these tickers in the `my_test_portfolio_tickers::Array{String,1}` variable:

In [12]:
my_test_portfolio_tickers = ["AAPL", "MSFT", "INTC", "MU", "AMD", "GS", "BAC", "WFC", "C", "F", "GM", 
    "JNJ", "PG", "UPS", "COST", "TGT", "WMT", "MRK", "PFE", "ADBE"]; # tickers selected for portfolio

To implement our bandit strategy, we first build the epsilon-greedy dynamic noise algorithm model. This model maintains estimates of the reward (utility) for each possible portfolio (arm) and balances exploration and exploitation using the learning rate `α::Float64` to update these estimates over time:

In [13]:
dynamic_algorithm_model = let

    # initialize -
    algorithm = nothing; # Initialize the algorithm variable to nothing; this variable will be used to store the algorithm model
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    
    # TODO: Build an algorithm model by uncommenting the code block below
    algorithm = build(MyEpsilonGreedyDynamicNoiseAlgorithmModel, (
        K = K, # arms 
        α = α, # learning rate
    ));

    # return the algorithm -
    algorithm;
end;

Next, we set bounds on the number of shares that we can hold for each asset in our portfolio. The lower bound `ϵ::Float64` represents the minimum number of shares required for portfolio inclusion, while the upper bound is unbounded to allow for flexible allocation. We'll store this in the `static_share_bounds::Array{Float64,2}` variable:

In [14]:
static_share_bounds = let

    # initialize -
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    bounds = Array{Float64,2}(undef, K, 2);
    
    # build my bounds array -
    for i ∈ eachindex(my_test_portfolio_tickers)
        bounds[i,1] = ϵ; # min shares that we can hold of this asset
        bounds[i,2] = Inf; # max number of shares, Inf says this is unbounded
    end
    bounds;
end;

Now we build the context model that encapsulates all the data and parameters needed for the bandit algorithm to make portfolio decisions. The context includes the market data, single index model parameters, ticker symbols, budget constraints, and bounds on share purchases. We'll store this in the `dynamic_context_model::MyDynamicBanditPortfolioAllocationContextModel` variable:

In [15]:
dynamic_context_model = let

    # initialize -
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    nₒ = ones(K); # initial guess, assume 1 x share for each

    # build -
    contextmodel = build(MyDynamicBanditPortfolioAllocationContextModel, (
        dataset = dataset,
        singleindexmodels = sim_model_parameters,
        tickers = my_test_portfolio_tickers,
        nₒ = nₒ,
        B = B,
        bounds = static_share_bounds,
        X̄ = X̄,
        R̄ₘ = Ḡₘ,
        number_of_samples_to_draw = size(X̄,2),
        μₒ = zeros(2^K) # initially we have *no* idea which arm is best
    ));

    contextmodel;
end;

To test our approach, we solve the utility maximization problem for the case where we select all available assets (by setting the action vector `a::Vector{Int64}` to all ones). This gives us a baseline of the maximum possible utility `U::Float64` we could achieve with our full portfolio:

In [16]:
(U, n, pₒ, γ) = let
   
    # call my world function to test -
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    t = 1; # what time period are we in?

    # call the world function directly -
    U, n, pₒ, γ = world(t, ones(Int64,K), dynamic_context_model);
end;

In [17]:
U

-323.22280976309804

__Preferred versus Non-Preferred Assets:__ The `γ::Array{Float64,1}` vector computed above indicates whether an asset is preferred or non-preferred. If $\gamma_i \geq 0$, then the asset is preferred; otherwise, it is non-preferred. Let's look at what assets are preferred versus non-preferred:

In [18]:
let

    # initialize -
    df = DataFrame();

    for i ∈ eachindex(my_test_portfolio_tickers)
        ticker = my_test_portfolio_tickers[i];

        # get the model -
        row_df = (
            ticker = ticker,
            γᵢ = γ[i],
            preferred = γ[i] >= 0 ? "Yes" : "No",

        );
        push!(df, row_df);
    end

    # make a table -
     pretty_table(df, backend = :text, fit_table_in_display_vertically = false, fit_table_in_display_horizontally = false,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

 -------- ------------- -----------
 [1m ticker [0m [1m          γᵢ [0m [1m preferred [0m
 [90m String [0m [90m     Float64 [0m [90m    String [0m
 -------- ------------- -----------
    AAPL      0.164367         Yes
    MSFT      0.196774         Yes
    INTC     0.0526081         Yes
      MU    -0.0398832          No
     AMD     0.0763603         Yes
      GS    0.00191259         Yes
     BAC     0.0129425         Yes
     WFC     0.0313431         Yes
       C   -0.00745081          No
       F     -0.101487          No
      GM    -0.0460617          No
     JNJ      0.109393         Yes
      PG      0.100411         Yes
     UPS    -0.0151326          No
    COST      0.360878         Yes
     TGT     0.0690993         Yes
     WMT      0.427658         Yes
     MRK    -0.0408884          No
     PFE     -0.174446          No
    ADBE      0.146447         Yes
 -------- ------------- -----------


Now that we have the preferred versus non-preferred assets, we'll make a table that summarizes the resulting shares `n::Array{Float64,1}` purchased for each asset in our test portfolio:

In [19]:
let

    # initialize -
    df = DataFrame();
    CS = 0;
    t = 1;
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    context = dynamic_context_model;
    total_number_of_assets = context.number_of_assets; # total number of assets that we *could* purchase
    bounds = context.bounds; # bounds for how much we purchase
    B = context.B; # budget for this period
    mylocaltickers = context.tickers; # tickers for the assets in this portfolio
    localdataset = context.dataset; # dataset for the assets in this portfolio
    singleindexmodels = context.singleindexmodels; # single index models for the assets in this portfolio
    N = 2^K; # number of possible portfolios
    

    for i ∈ 1:K

        CS += n[i]*pₒ[i];
        wᵢ = (n[i]*pₒ[i]/B);

        row_df = (
            ticker = my_test_portfolio_tickers[i],
            Sᵢ = pₒ[i],
            γᵢ = γ[i],
            nᵢ = n[i],
            wᵢ = wᵢ,
            UC = n[i]*pₒ[i],
            CS = CS,
        );
        push!(df, row_df);
    end

    pretty_table(df, backend = :text, fit_table_in_display_vertically = false, fit_table_in_display_horizontally = false,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

 -------- --------- ------------- ----------- ------------ ------------ ---------
 [1m ticker [0m [1m      Sᵢ [0m [1m          γᵢ [0m [1m        nᵢ [0m [1m         wᵢ [0m [1m         UC [0m [1m      CS [0m
 [90m String [0m [90m Float64 [0m [90m     Float64 [0m [90m   Float64 [0m [90m    Float64 [0m [90m    Float64 [0m [90m Float64 [0m
 -------- --------- ------------- ----------- ------------ ------------ ---------
    AAPL    243.02      0.164367     3.86425    0.0939089      939.089   939.089
    MSFT   422.858      0.196774     2.65868     0.112424      1124.24   2063.33
    INTC   20.1255     0.0526081     14.9348     0.030057       300.57    2363.9
      MU    87.682    -0.0398832       0.001    8.7682e-6     0.087682   2363.99
     AMD    122.38     0.0763603     3.56492    0.0436276      436.276   2800.27
      GS   573.822    0.00191259   0.0190431   0.00109274      10.9274   2811.19
     BAC   44.6962     0.0129425     1.65441   0.00739457      73.9

## Task 2: Solve the Online Bandit Portfolio Optimization Problem
In this task, we use the epsilon-greedy bandit algorithm to explore different portfolio combinations over time and learn which asset subsets maximize investor utility. The algorithm iteratively selects actions (portfolio compositions), observes the resulting utility, and updates its beliefs about the reward distribution. The results are stored in the return variables `R::Float64`, `μ::Array{Float64,1}`, `S::Array{Int64,1}`, `P::Array{Int64,1}`, and `G::Array{Int64,1}`. We run the algorithm for 250 time periods to allow sufficient exploration and convergence:

In [20]:
R, μ, S, P, G = my_bandit_solve(dynamic_algorithm_model, T = 250, world = world, context = dynamic_context_model); # check the world function

In [21]:
S

250×20 Matrix{Float64}:
  4.23362  2.093     0.001    6.52222  …   5.10918   0.001   2.3484
  0.0      0.0      20.7625   0.0          0.0      69.4796  3.38877
  8.50586  2.67864   0.001    0.0          0.0       0.0     0.0
  6.65741  0.0      23.1589   9.46357      0.0       0.001   0.0
 10.9476   0.0       0.001    0.001        0.0       0.001   0.0
  0.0      4.89787   0.001    2.67553  …  23.2936    0.001   1.22707
  0.0      4.14459   0.0      0.0         27.9929    0.0     0.0
  0.0      0.0       0.001    0.0         21.0946    0.001   4.15096
  5.52983  0.0       0.0     17.0152       0.0       0.0     2.36496
  9.92596  4.59702   0.0      3.94073      0.0       0.0     0.0
  ⋮                                    ⋱                     
  3.8828   0.0       0.0      4.12187     15.0258    0.0     0.0
  5.14429  3.36628   0.0     10.6324      19.6901    0.0     0.0
  0.0      1.49326   0.0      0.001        5.2295    0.0     1.86412
  0.0      7.20304   0.001    0.0          0.0

In [22]:
sorted_reward_index_array = sortperm(μ, rev=true) # sort the index of the rewards

1048576-element Vector{Int64}:
  515706
  105849
  260275
  466170
  391067
 1030262
  963697
  506090
  733267
  745201
       ⋮
  244215
  997271
  464019
  244339
  843543
  243932
  229055
  998015
  464633

In [23]:
let

    # initialize -
    df = DataFrame();
    CS = 0;
    t = 1;
    K = length(my_test_portfolio_tickers); # TODO: Number of tickers to consider
    context = dynamic_context_model;
    total_number_of_assets = context.number_of_assets; # total number of assets that we *could* purchase
    bounds = context.bounds; # bounds for how much we purchase
    B = context.B; # budget for this period
    mylocaltickers = context.tickers; # tickers for the assets in this portfolio
    localdataset = context.dataset; # dataset for the assets in this portfolio
    singleindexmodels = context.singleindexmodels; # single index models for the assets in this portfolio
    N = 2^K; # number of possible portfolios

    # compute the best portfolio -
    j = sorted_reward_index_array[1]; # get a portfolio index 
    a = digits(j, base=2, pad=K); # generate a binary representation of the number, with K digits  
    if (j == N)
        a = digits(j - 1, base=2, pad=K); # generate a binary representation of the number, with K digits  
    end

    # call world -
    U, n, pₒ, γ = world(t, a, dynamic_context_model);

    @show U,j

    for i ∈ 1:K

        CS += n[i]*pₒ[i];
        wᵢ = (n[i]*pₒ[i]/B);

        row_df = (
            ticker = my_test_portfolio_tickers[i],
            Sᵢ = pₒ[i],
            γᵢ = γ[i],
            nᵢ = n[i],
            wᵢ = wᵢ,
            UC = n[i]*pₒ[i],
            CS = CS,
        );
        push!(df, row_df);
    end

     pretty_table(df, backend = :text, fit_table_in_display_vertically = false, fit_table_in_display_horizontally = false,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

(U, j) = (-43.23291438523442, 515706)
 -------- --------- ------------- --------- ------------ ------------ ---------
 [1m ticker [0m [1m      Sᵢ [0m [1m          γᵢ [0m [1m      nᵢ [0m [1m         wᵢ [0m [1m         UC [0m [1m      CS [0m
 [90m String [0m [90m Float64 [0m [90m     Float64 [0m [90m Float64 [0m [90m    Float64 [0m [90m    Float64 [0m [90m Float64 [0m
 -------- --------- ------------- --------- ------------ ------------ ---------
    AAPL   242.569      0.255923       0.0          0.0          0.0       0.0
    MSFT   421.416      0.165748   3.15868     0.133112      1331.12   1331.12
    INTC   20.3989     0.0736846       0.0          0.0          0.0   1331.12
      MU   90.1169      0.188723   16.8185     0.151563      1515.63   2846.75
     AMD   121.992      0.295744   19.4695     0.237513      2375.13   5221.88
      GS   575.623      0.114424   1.59642    0.0918937      918.937   6140.82
     BAC   44.5283     0.0408242   7.36295     0.

The table above shows the optimal portfolio composition identified by the bandit algorithm after 250 periods of exploration. The utility value `U::Float64` indicates the investor satisfaction level for this portfolio, while the weights `wᵢ::Float64` show the proportion of budget allocated to each asset. Notice that the bandit algorithm may have selected a portfolio subset (where some `nᵢ::Float64` values are near zero) rather than including all assets, demonstrating how the algorithm balances portfolio complexity with investor utility.

## Summary
This example demonstrates how online stochastic bandit algorithms can be effectively applied to solve the combined asset selection and weight allocation problems in portfolio management through the integration of epsilon-greedy exploration with utility maximization.

> __Key Takeaways:__
> 
> * __Bandit Algorithms for Portfolio Selection:__ Multi-armed bandit algorithms like epsilon-greedy enable adaptive asset selection by balancing exploration of uncertain portfolios with exploitation of known high-utility combinations, avoiding the need to exhaustively evaluate all possible subsets.
> * __Utility Maximization for Weight Allocation:__ Once a portfolio subset is selected, Cobb-Douglas utility functions provide a principled approach to weight allocation that reflects investor preferences and risk tolerance, with the preference parameters `γ::Array{Float64,1}` distinguishing between preferred and non-preferred assets.
> * __Online Learning Framework:__ The integration of bandit selection with utility-based allocation creates a dynamic system that continuously learns and adapts portfolio composition, enabling responsive portfolio management that incorporates real-time market information and evolving estimates of asset characteristics.

By combining these techniques, investors can manage both the complexity of asset selection and the precision of weight allocation in a computationally efficient and theoretically sound framework.
___

## Disclaimer and Risks
__This content is offered solely for training and informational purposes__. No offer or solicitation to buy or sell securities or derivative products or any investment or trading advice or strategy is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. You should decide solely based on your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.