# Example: N-Ary Recombining Lattice Models
Fill me in.

___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.

>__Include:__ The [`include(...)` command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 

Let's set up our code environment:

In [1]:
include(joinpath(@__DIR__, "Include.jl")); # include the Include.jl file

For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/) and the [VLQuantitativeFinancePackage.jl documentation](https://github.com/varnerlab/VLQuantitativeFinancePackage.jl). 

### Data
We gathered daily open-high-low-close (OHLC) data for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2014` until `12-31-2024`, along with data for a few exchange-traded funds and volatility products during that time period. 

Let's load the `original_dataset::DataFrame` by calling [the `MyTrainingMarketDataSet()` function](https://varnerlab.github.io/VLQuantitativeFinancePackage.jl/dev/data/#VLQuantitativeFinancePackage.MyTrainingMarketDataSet) and remove firms that do not have the maximum number of trading days. The cleaned dataset $\mathcal{D}$ will be stored in the `dataset` variable.

In [2]:
original_dataset = MyTrainingMarketDataSet() |> x-> x["dataset"];

Not all tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquisition or delisting events. Let's collect only those tickers with the maximum number of trading days.

First, let's compute the number of records for a firm that we know has the maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_days::Int64` variable:

In [3]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow # nrow? (check out: DataFrames.jl)

2767

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_days` records. We'll save that data in the `dataset::Dict{String,DataFrame}` variable:

In [4]:
dataset = let

    # initialize -
    dataset = Dict{String, DataFrame}();

    # iterate through the dictionary; we can't guarantee a particular order
    for (ticker, data) ∈ original_dataset  # we get each (K, V) pair!
        if (nrow(data) == maximum_number_trading_days) # what is this doing?
            dataset[ticker] = data;
        end
    end
    dataset; # return
end;

Finally, let's get a list of the firms in our cleaned dataset and sort them alphabetically. We store the sorted firm ticker symbols in the `list_of_tickers::Array{String,1}` variable.

In [5]:
list_of_tickers = keys(dataset) |> collect |> sort # list of firm "ticker" symbols in alphabetical order

424-element Vector{String}:
 "A"
 "AAL"
 "AAP"
 "AAPL"
 "ABBV"
 "ABT"
 "ACN"
 "ADBE"
 "ADI"
 "ADM"
 ⋮
 "WYNN"
 "XEL"
 "XOM"
 "XRAY"
 "XYL"
 "YUM"
 "ZBRA"
 "ZION"
 "ZTS"

### Constants
Finally, let's set some constants we'll use later in this notebook. The comments describe the constants, their units, and permissible values.

In [None]:
TSIM = 8; # number of trading days to simulate
Δt = (1.0/252); # step size: 1 trading day in units of years
r̄ = 0.05; # risk-free rate (annualized)
n = 2; # number of branches in the N-ary lattice

Fill me in.

In [7]:
nodes_at_level(i::Integer, n::Integer) = binomial(i + n - 1, i)
level_offset(i::Integer, n::Integer) = i == 0 ? 0 : binomial(i + n - 1, i - 1) # start of level i

level_offset (generic function with 1 method)

___

## Task 1: Compute lattice parameters and future prices from historical data
Fill me in.

In [8]:
# random_firm_ticker = rand(list_of_tickers);
random_firm_ticker = "AAPL"
random_firm_index = findfirst(x-> x == random_firm_ticker, list_of_tickers);
random_firm_data = dataset[random_firm_ticker];


Next, we randomly specify the `start_index` as the trading day index in the dataset, which will serve as the tree's starting point or `L = 0`. Finally, we set the variable `Sₒ`, which corresponds to the initial price per share at the root of the tree; we use the [volume-weighted average price (VWAP)](https://en.wikipedia.org/wiki/Volume-weighted_average_price) as the initial condition:

In [9]:
start_index = rand(1:(maximum_number_trading_days - TSIM - 1))
stop_index = start_index + TSIM
println("Visualize Firm-$(random_firm_index) between trading days ($(start_index) -> $(stop_index))")

Visualize Firm-4 between trading days (1136 -> 1144)


Fill me in

In [10]:
log_growth_array = log_growth_matrix(dataset, random_firm_ticker) # array holding growth rate time series

2766-element Vector{Float64}:
 -1.9954223478275546
  0.15124876922831154
  0.9224750663536962
 -1.4261203338251638
 -3.0008957287125435
  1.5049147899343094
  2.9156076155145367
  6.502627170431676
 -1.2561327051214097
 -4.024741418592851
  ⋮
 -2.325529514930973
 -0.7888121514647656
  3.27667430452004
  1.6089944655721826
  2.6547619721872575
  1.4556917537856608
 -3.6286559032528003
 -2.934268163874707
 -1.5398398549769603

In [11]:
result = build_nary_lattice_from_growth_rate(log_growth_array; n = n, dt = Δt, method = :equalwidth);

In [12]:
typeof(result) |> T-> fieldnames(T) # check out the fields of the returned struct

(:edges, :avg_factor, :freq, :counts, :labels, :method, :dt, :N)

In [13]:
result.avg_factor

7-element Vector{Float64}:
 0.9300836130146076
 0.9568933320775288
 0.9813018182599933
 1.0014819811236884
 1.0206311663699725
 1.04840834419366
 1.076795419529746

Fill me in.

In [14]:
result.freq

7-element Vector{Float64}:
 0.0021691973969631237
 0.013015184381778741
 0.15112075198843095
 0.6778741865509761
 0.14461315979754158
 0.009761388286334056
 0.0014461315979754157

In [15]:
print_lattice(result)

n-ary lattice (method=equalwidth, Δt=0.003968253968253968, N=2766)
State μ-bin [low, high)               avg factor    freq      count
S1    [-21.211458 , -15.080382)       0.930084      0.002169  6
S2    [-15.080382 , -8.949305)        0.956893      0.013015  36
S3    [-8.949305 , -2.818228)         0.981302      0.151121  418
S4    [-2.818228 , 3.312848)          1.001482      0.677874  1875
S5    [3.312848 , 9.443925)           1.020631      0.144613  400
S6    [9.443925 , 15.575002)          1.048408      0.009761  27
S7    [15.575002 , 21.706078]         1.076795      0.001446  4


In [35]:
UnicodePlots.histogram(log_growth_array, nbins=21, closed=:left)

                  [38;5;8m┌                                        ┐[0m 
   [-22.0, -20.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▎[0m 3                                     [38;5;8m [0m [38;5;8m[0m
   [-20.0, -18.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m [0m 0                                     [38;5;8m [0m [38;5;8m[0m
   [-18.0, -16.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▏[0m 1                                     [38;5;8m [0m [38;5;8m[0m
   [-16.0, -14.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▎[0m 3                                     [38;5;8m [0m [38;5;8m[0m
   [-14.0, -12.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▌[0m 11                                    [38;5;8m [0m [38;5;8m[0m
   [-12.0, -10.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▋[0m 13                                    [38;5;8m [0m [38;5;8m[0m
   [-10.0,  -8.0) [38;5;8m┤[0m[38;5;2m█[0m[38;5;2m▍[0m 30                                   [38;5;8m [0m [38;5;8m[0m
   [ -8.0,  -6.0) [38;5;8m┤[0m[38

Fill me in.

In [17]:
my_nary_lattice_model = let

    # initialize -
    model = nothing;
    Sₒ = random_firm_data[start_index,:volume_weighted_average_price];
    Δ = result.avg_factor |> reverse; # average growth factors (up to down)

    # build an empty model -
    model = build(MyGeneralAdjacencyRecombiningCommodityPriceTree, (
        n = n,
        h = TSIM, # how many days to simulate 
    ));

    # populate the data in the model -
    model = populate(model, Sₒ, Δ);

    # print -
    println("Starting price: $(Sₒ) USD for firm $(random_firm_ticker)");

    model; # return
end;

Starting price: 47.555 USD for firm AAPL


What's in the `my_nary_lattice_model::MyGeneralAdjacencyRecombiningCommodityPriceTree` instance?

In [18]:
typeof(my_nary_lattice_model) |> T-> fieldnames(T) # check out the fields of the returned struct

(:data, :connectivity, :h, :n)

In [19]:
my_nary_lattice_model.connectivity

Dict{Int64, Vector{Int64}} with 6435 entries:
  4986 => [8814, 8815, 8819, 8834, 8869, 9199, 9991]
  4700 => [8383, 8384, 8387, 8393, 8403, 8418, 9705]
  4576 => [8143, 8144, 8145, 8148, 8168, 8294, 9581]
  6073 => [10668, 10669, 10670, 10680, 10700, 10826, 11078]
  2288 => [4298, 4299, 4302, 4317, 4373, 4499, 5291]
  1703 => [3383, 3384, 3386, 3389, 3393, 3398, 3419]
  1956 => [3766, 3767, 3771, 3781, 3837, 4167, 4959]
  2350 => [4400, 4401, 4405, 4415, 4435, 4561, 5353]
  5975 => [10518, 10519, 10525, 10546, 10602, 10728, 10980]
  3406 => [6353, 6354, 6357, 6363, 6373, 6388, 6409]
  2841 => [5247, 5248, 5249, 5252, 5256, 5382, 5844]
  2876 => [5306, 5307, 5309, 5312, 5347, 5417, 5879]
  687  => [1450, 1451, 1454, 1460, 1470, 1485, 1611]
  185  => [441, 442, 444, 447, 451, 521, 647]
  1090 => [2177, 2178, 2180, 2183, 2218, 2344, 2806]
  2015 => [3862, 3863, 3866, 3876, 3896, 4226, 5018]
  3293 => [6100, 6101, 6105, 6115, 6135, 6170, 6296]
  1704 => [3384, 3385, 3387, 3390, 3394, 3399,

In [20]:
my_nary_lattice_model.data

Dict{Int64, NamedTuple} with 6435 entries:
  4986 => (price = 50.4881, path = [1, 2, 1, 0, 3, 0, 1])
  4700 => (price = 40.1271, path = [0, 2, 0, 0, 0, 6, 0])
  4576 => (price = 41.0721, path = [0, 0, 1, 2, 2, 3, 0])
  6073 => (price = 39.1718, path = [0, 0, 3, 0, 2, 0, 3])
  2288 => (price = 49.9304, path = [0, 2, 2, 1, 0, 2, 0])
  1703 => (price = 36.6675, path = [1, 0, 0, 0, 0, 1, 4])
  1956 => (price = 57.3439, path = [3, 0, 0, 2, 2, 0, 0])
  2350 => (price = 48.3192, path = [0, 3, 0, 0, 2, 2, 0])
  5975 => (price = 52.5077, path = [3, 2, 0, 0, 0, 0, 3])
  3406 => (price = 36.3804, path = [0, 2, 0, 0, 0, 0, 5])
  2841 => (price = 40.0554, path = [0, 0, 1, 0, 4, 1, 1])
  2876 => (price = 42.6486, path = [0, 1, 0, 3, 0, 2, 1])
  687  => (price = 46.9582, path = [2, 0, 0, 0, 0, 2, 1])
  185  => (price = 48.3879, path = [1, 0, 0, 0, 3, 0, 0])
  1090 => (price = 47.0243, path = [0, 1, 0, 3, 1, 1, 0])
  2015 => (price = 52.1845, path = [2, 0, 1, 0, 4, 0, 0])
  3293 => (price = 44.5064, p

## Task 2: Let's compute the probability of each price path
Fill me in.

In [21]:
nodes_at_tree_level = let

    # initialize -
    l = 8; # level
    start = level_offset(l, n);
    stop = start + nodes_at_level(l, n) - 1;

    # compute the nodes indices at level l -
    nodes = range(start, stop=stop, step=1) |> collect;
end

3003-element Vector{Int64}:
 3432
 3433
 3434
 3435
 3436
 3437
 3438
 3439
 3440
 3441
    ⋮
 6426
 6427
 6428
 6429
 6430
 6431
 6432
 6433
 6434

In [22]:
data_array_at_level = let
   
    # initialize -
    number_of_nodes = length(nodes_at_tree_level);
    results_array = Array{Float64, 2}(undef, number_of_nodes, 2);
    Δ = result.avg_factor |> reverse; # average growth factors (up to down)
    p = result.freq |> reverse; # real-world probabilities (up to down)
    d = Multinomial(TSIM, p)

    for i ∈ 1:number_of_nodes
        
        # get node index -
        j = nodes_at_tree_level[i];
        nodemodel = my_nary_lattice_model.data[j];

        # get the price from the node model -
        price = nodemodel.price;

        # ok: let's compute the probability of reaching this node -
        path = nodemodel.path; # path to reach this node
       
        # capture the results -
        results_array[i, 1] = price;
        results_array[i, 2] = pdf(d, path);
    end

    tmp = results_array[:,1];
    sorted_results_array = sortperm(tmp) |> I-> results_array[I, :]; # sort by price
end

3003×2 Matrix{Float64}:
 26.6301  4.90222e-22
 27.3977  2.35306e-20
 28.0965  2.73217e-19
 28.1874  4.94144e-19
 28.6743  1.22555e-18
 28.9064  1.14751e-17
 28.9999  5.92972e-18
 29.2226  2.61452e-19
 29.5009  5.14733e-17
 29.6438  6.66194e-17
  ⋮       
 77.8339  3.38919e-18
 78.3306  1.59907e-20
 79.3222  7.23027e-19
 79.3329  3.29429e-19
 79.9414  7.17288e-20
 81.4699  1.53022e-20
 81.481   2.44022e-20
 83.6872  1.0329e-21
 85.9531  1.91277e-23

__Check:__ The sum of the probabilities at each level should equal `1.0`. Let's check that for level `l = 3` (the close price on Friday from today's perspective):

In [23]:
@assert data_array_at_level[:,2] |> sum ≈ 1.0 # should be 1.0

Let's visualize the possible price distribution for Friday:

In [36]:
UnicodePlots.histogram(data_array_at_level[:,1], nbins=21, closed=:left)

                [38;5;8m┌                                        ┐[0m 
   [25.0, 30.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▋[0m 12                                    [38;5;8m [0m [38;5;8m[0m
   [30.0, 35.0) [38;5;8m┤[0m[38;5;2m██████[0m[38;5;2m▎[0m 118                             [38;5;8m [0m [38;5;8m[0m
   [35.0, 40.0) [38;5;8m┤[0m[38;5;2m██████████████████[0m[38;5;2m▊[0m 364                 [38;5;8m [0m [38;5;8m[0m
   [40.0, 45.0) [38;5;8m┤[0m[38;5;2m███████████████████████████████[0m[38;5;2m▎[0m 601    [38;5;8m [0m [38;5;8m[0m
   [45.0, 50.0) [38;5;8m┤[0m[38;5;2m███████████████████████████████████[0m[38;5;2m [0m 675[38;5;8m [0m [38;5;8m[0m
   [50.0, 55.0) [38;5;8m┤[0m[38;5;2m███████████████████████████[0m[38;5;2m▋[0m 533        [38;5;8m [0m [38;5;8m[0m
   [55.0, 60.0) [38;5;8m┤[0m[38;5;2m██████████████████[0m[38;5;2m▌[0m 357                 [38;5;8m [0m [38;5;8m[0m
   [60.0, 65.0) [38;5;8m┤[0m[38;5;2m██████████[0

## Disclaimer and Risks
__This content is offered solely for training and informational purposes__. No offer or solicitation to buy or sell securities or derivative products, or any investment or trading advice or strategy, is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance. Only risk capital that is not required for living expenses should be used.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on evaluating your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.