# CHEME 5660 Binomial Lattice Markov Decision Process Example

## Introduction
Let's formulate a Markov decision process to manage my holdings of the equity `XYZ` where the agent uses a Binomial lattice model as it's internal model for how the share price of `XYZ` could change over time.

### Binomial lattice model
A binomial lattice model assumes that each discrete time increment, the state of the system, e.g., the share price of equity, the spot rate, etc., can either increase by a factor $u$ with probability $p$ or decrease by a factor $d$ with probability $(1-p)$. Different models can be developed for the specific values of the tuple $(u,d,p)$. One particular model is the Cox, Ross, and Rubinstein (CRR) model:

* [Cox, J. C.; Ross, S. A.; Rubinstein, M. (1979). "Option pricing: A simplified approach". Journal of Financial Economics. 7 (3): 229. CiteSeerX 10.1.1.379.7582. doi:10.1016/0304-405X(79)90015-1](https://www.sciencedirect.com/science/article/pii/0304405X79900151?via%3Dihub)

#### Cox, Ross and Rubinstein (CRR) model
The [CRR binomial lattice model](https://en.wikipedia.org/wiki/Binomial_options_pricing_model) was initially developed for options pricing in 1979. However, one of the critical aspects of estimating an option’s price is calculating the underlying asset’s share price. Thus, let's use the [CRR model](https://en.wikipedia.org/wiki/Binomial_options_pricing_model) to compute the share price of a stock, Advanced Micro Devices, Inc, with the ticker symbol [AMD](https://finance.yahoo.com/quote/AMD?.tsrc=applewf). In the [CRR model](https://en.wikipedia.org/wiki/Binomial_options_pricing_model) model, the `up` and `down` moves are symmetric:

$$ud = 1$$

where the magnitude of an `up` move $u$ is given by:

$$u = \exp(\sigma\sqrt{\Delta{T}})$$

The quantity $\sigma$ denotes a _volatility parameter_, and $\Delta{T}$ represents the time step. The probability $p$ of an `up` move in a [CRR model](https://en.wikipedia.org/wiki/Binomial_options_pricing_model) is given by:

$$p = \frac{\exp(\mu\Delta{T}) - d}{u - d}$$

where $\mu$ denotes a _return parameter_. In the [CRR model](https://en.wikipedia.org/wiki/Binomial_options_pricing_model) model paradigm, the return parameter $\mu$ and the volatility parameter $\sigma$

### Markov decision process
A Markov decision process is the tuple $\left(\mathcal{S}, \mathcal{A}, R_{a}\left(s, s^{\prime}\right), T_{a}\left(s,s^{\prime}\right), \gamma\right)$ where:

* The state space $\mathcal{S}$ is the set of all possible states $s$ that a system can exist in
* The action space $\mathcal{A}$ is the set of all possible actions $a$ that are available to the agent, where $\mathcal{A}_{s} \subseteq \mathcal{A}$ is the subset of the action space $\mathcal{A}$ that is accessible from state $s$.
* An expected immediate reward $R_{a}\left(s, s^{\prime}\right)$ is received after transitioning from state $s\rightarrow{s}^{\prime}$ due to action $a$. 
* The transition $T_{a}\left(s,s^{\prime}\right) = P(s_{t+1} = s^{\prime}~|~s_{t}=s,a_{t} = a)$ denotes the probability that action $a$ in state $s$ at time $t$ will result in state $s^{\prime}$ at time $t+1$
* The quantity $\gamma$ is a _discount factor_; the discount factor is used to weigh the _future expected utility_.

Finally, a policy function $\pi$ is the (potentially probabilistic) mapping from states $s\in\mathcal{S}$ to actions $a\in\mathcal{A}$ used by the agent to solve the decision task. 

### Example setup

In [1]:
import Pkg; Pkg.activate("."); Pkg.resolve(); Pkg.instantiate();

[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-5660-Markets-Mayhem-Example-Notebooks/jupyter-notebooks/CHEME-5660-CRR-MDP-Example`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Markets-Mayhem-Example-Notebooks/jupyter-notebooks/CHEME-5660-CRR-MDP-Example/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Markets-Mayhem-Example-Notebooks/jupyter-notebooks/CHEME-5660-CRR-MDP-Example/Manifest.toml`


In [2]:
# load reqd packages -
using PQEcolaPoint

In [62]:
include("CHEME-5660-Example-CodeLib.jl");

#### Constants

In [22]:
# set some constants -
μₘ = 0.0403;   # assume we grow at the risk free rate -
Sₒ = 100.0;   # current share price
IV = 55.15;   # implied vol
σₘ = (IV/100); # volatility 
L = 60;       # number of levels on the tree
B = 365.0     # days in a year (all values are per year)
DTE = 30.0;   # planning horizon

#### Build the lattice model

In [23]:
lattice_model = build(CRRLatticeModel; 
    number_of_levels=(L+1), Sₒ = Sₒ, σ = σₘ, μ = μₘ, T = (DTE/B));

In [83]:
P = lattice_model.data[:,1];

#### Build: node dictionary `id`

In [24]:
id = build_nodes_dictionary(L) # zero based

Dict{Int64, Vector{Int64}} with 61 entries:
  5  => [16, 17, 18, 19, 20, 21]
  56 => [1597, 1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1606  …  1644, 1…
  35 => [631, 632, 633, 634, 635, 636, 637, 638, 639, 640  …  657, 658, 659, 66…
  55 => [1541, 1542, 1543, 1544, 1545, 1546, 1547, 1548, 1549, 1550  …  1587, 1…
  60 => [1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840  …  1882, 1…
  30 => [466, 467, 468, 469, 470, 471, 472, 473, 474, 475  …  487, 488, 489, 49…
  32 => [529, 530, 531, 532, 533, 534, 535, 536, 537, 538  …  552, 553, 554, 55…
  6  => [22, 23, 24, 25, 26, 27, 28]
  45 => [1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045  …  1072, 1…
  4  => [11, 12, 13, 14, 15]
  13 => [92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105]
  54 => [1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495  …  1531, 1…
  58 => [1712, 1713, 1714, 1715, 1716, 1717, 1718, 1719, 1720, 1721  …  1761, 1…
  52 => [1379, 1380, 1381, 1382, 1383, 1384, 1385, 1386, 

#### Build: probability dictionary `pd`
The probability dictionary holds the probability values for each node at a particular time level:

$$P(S_{t} = S_{\circ}u^{k}d^{t-k}) = \binom{t}{k}p^{k}\left(1-p\right)^{t-k}$$

where $t$ denotes the time index and $k=0,1,\dots,t$.

In [25]:
pd = build_probability_dictionary(lattice_model, L) # zero based

Dict{Int64, Vector{Float64}} with 61 entries:
  5  => [0.0301048, 0.15278, 0.310141, 0.31479, 0.159754, 0.0324298]
  56 => [9.1351e-18, 5.19234e-16, 1.4493e-14, 2.64784e-13, 3.56099e-12, 3.75894…
  35 => [2.24102e-11, 7.96115e-10, 1.37368e-8, 1.5337e-7, 1.24536e-6, 7.83695e-…
  55 => [1.84071e-17, 1.02757e-15, 2.81603e-14, 5.04956e-13, 6.66283e-12, 6.897…
  60 => [5.54143e-19, 3.3747e-17, 1.01046e-15, 1.98284e-14, 2.8679e-13, 3.2602e…
  30 => [7.44408e-10, 2.2667e-8, 3.33598e-7, 3.16026e-6, 2.16515e-5, 0.00011427…
  32 => [1.83343e-10, 5.95493e-9, 9.36851e-8, 9.50894e-7, 6.99733e-6, 3.97724e-…
  6  => [0.0149404, 0.0909862, 0.230875, 0.312448, 0.237849, 0.0965657, 0.01633…
  45 => [2.03103e-14, 9.27664e-13, 2.07145e-11, 3.01359e-10, 3.2117e-9, 2.67308…
  4  => [0.0606608, 0.24628, 0.374958, 0.25372, 0.0643807]
  13 => [0.000110777, 0.00146169, 0.00890164, 0.0331286, 0.084063, 0.153582, 0.…
  54 => [3.70902e-17, 2.0329e-15, 5.46793e-14, 9.61982e-13, 1.24491e-11, 1.2635…
  58 => [2.24992

#### Build: children dictionary `children_dict`

In [71]:
children_dict = build_children_dictionary(id) # root node = 1

Dict{Int64, Vector{Int64}} with 1891 entries:
  1144 => [1192, 1193]
  1175 => [1223, 1224]
  719  => [757, 758]
  1546 => [1602, 1603]
  1703 => [1761, 1762]
  1028 => [1073, 1074]
  699  => [736, 737]
  831  => [872, 873]
  1299 => [1350, 1351]
  1438 => [1492, 1493]
  1074 => [1120, 1121]
  319  => [344, 345]
  687  => [724, 725]
  1812 => [1872, 1873]
  1199 => [1248, 1249]
  185  => [204, 205]
  823  => [864, 865]
  1090 => [1137, 1138]
  420  => [449, 450]
  1370 => [1422, 1423]
  1437 => [1491, 1492]
  1662 => [1720, 1721]
  525  => [557, 558]
  365  => [392, 393]
  638  => [674, 675]
  ⋮    => ⋮

### Let's build my MDP components

#### a) What are my possible states $\mathcal{S}$?
Let the states $\mathcal{S}$ be the set of nodes in the binomial lattice $|L|$ plus a trade closed state (if we sell all of our shares).

In [78]:
number_of_states = maximum(id[60]);
println("|L| = $(number_of_states) + trade closed state gives |𝒮| = $(number_of_states + 1) states")

|L| = 1891 + trade closed state gives |𝒮| = 1892 states


#### b) What are my possible actions $\mathcal{A}$?
The most intuitive action set for equity might be: 

$$\mathcal{A} = \left\{\texttt{buy}, \texttt{sell}, \texttt{hold}\right\}$$

where the $a_{1} = \texttt{buy}$ action purchases a block of shares at the current market price, the $a_{2} = \texttt{sell}$ action sells a block of shares from your holdings back to the market and the $a_{3} = \texttt{hold}$ action does nothing.

#### c) What are my possible rewards $R(s, a)$?
Intuitively, we expect the reward to be related to the capital gain (or loss) associated with the sale of shares of `XYZ`. However, there could be other reward models, e.g., we want to continuously lower our portfolio’s average share price of `XYZ`. 

Suppose we own `XYZ` at an average share price of $\bar{S}$ USD/share. Further, suppose our transactional unit is $n_{a}$ shares, i.e., we choose $a_{1}$ or $a_{2}$ we buy or sell $n_{a}$ shares and that we own $n_{t}$ shares at time $t$.

In [None]:
# build a reward array -

# initialize -
R = Array{Float64,2}(undef, (number_of_states+1), 3);
fill!(R,0.0); # initially all zeros -
nₐ = 1; # number of shares that we buy/sell for a₁ or a₂ -

for s ∈ 1:number_of_states
    
    # setup reward for each action
    R[s,1] = 0.0; # reward in state s if we execute a₁ (buy nₐ)
    R[s,2] = 0.0; # reward in state s if we execute a₂ (sell nₐ)
    R[s,3] = 0.0; # reward in state s if we execute a₃ (hold)
    
end

#### d) What are my possible transitions $T_{a}\left(s,s^{\prime}\right)$?
To answer this, we really need to understand what the states are, epecially for actions $a_{1}$ and $a_{2}$.

### Disclaimer and Risks
__This content is offered solely for training and  informational purposes__. No offer or solicitation to buy or sell securities or derivative products, or any investment or trading advice or strategy,  is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on your evaluation of your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.