# Write-up and code for Assignment 6 - Week 3

### To do
- Model Merton's Portfolio problem as an MDP (write the model in LaTeX)
- Implement this MDP model in code
- Try recovering the closed-form solution with a DP algorithm that you implemented previously

## Merton's Portfolio Problem
### Informal Problem Statement
- You will live for $T$ more years, where $T$ is deterministic. 
- $W_0 > 0 $ is the current wealth $+$ the present value of future income minus debts. 
- You can invest in $n$ risky assets and 1 riskless asset. 
- Each asset has a known normal distribution of returns. You are allowed to go long or short any fractional quantities of assets. 
- Trading is done in continuous time $0 \leq t < T$, with no transaction costs.
- You can consume any fractional amount of wealth at any time.
- Dynamic Decision: Optimal Allocation and Consumption at each time with the goal of maximizing the lifetime-aggregated utility of consumption.
- Consumtion utility assumed to have Constant Relative Risk-Aversion (CRRA).

### Problem Notation
- The riskless asset is described by: $dR_t = r\cdot R_t \cdot dt$
- Risky asset $i$ is governed by: $dS_{i,t} = \mu_i\cdot S_{i,t}\cdot dt + \sigma_i\cdot S_{i,t} \cdot dz_{t}$, if we consider $\mu$ and $\sigma$ to be vectors and matrices we can write the Geometric Brownian motion in vector form as $dS_{t} = \mu_i\cdot S_{t}\cdot dt + \sigma\cdot S_{t} \cdot dz_{t}$. Where $S_t$ and $dS_t$ are vectors.
- $\mu > r> 0,~ \sigma >0$ for all $n$ assets.
- The wealth at time $t$ is denoted by $W_t > 0$.
- Fraction of wealth allocated to risky asset $i$ is denoted by $\pi_i(t,W_t)$.
- Fraction of wealth allocated to riskless asset is denoted by $1 - \sum_{i=1}^n \pi_i$.
- Wealth consumption denoted by $c(t,W(t)) \geq 0$
- Utility of consumption function $U(x) = \frac{x^{1-\gamma}}{1-\gamma}$ for $0<\gamma\neq1$
- Utility of consumption function $U(x) = \log(x)$ for $\gamma =1$
- $\gamma=$ (constant) Relative Risk-Aversion $\frac{-x\cdot U''(x)}{U'(x)}$

### Markov Decision Process Model
We can leverage the framework of an MDP to model Merton's Portolio Problem and thereby use Dynamic Programming to find solutions (more efficient techniques will be covered later). We model this problem using a single risky asset and one riskless asset.
- The _State_ is $(t,W_t)$
- The _Action_ is $[\pi_t, c_t]$
- The _Reward_ per unit time is $U(c_t)$
- The _Return_ is the usual accumulated discounted _Reward_
- The goal is to find a _Policy_: $(t, W_t) \rightarrow [\pi_t, c_t]$ that maximizes the _Expected Return_
- Note: $c_t \geq 0$, but $\pi_t$ is unconstrained
- The _Transitions_ are governed by the processes mentioned previously

### Code for modeling Merton's Portfolio Problem as MDP

In [1]:
from typing import NamedTuple
import numpy as np

class MertonPortfolio(NamedTuple):
    T: float
    rho: float
    r: float
    mu: np.ndarray 
    cov: np.ndarray
    gamma: float

Assume we have 10 days of trading and we are only allowed to adjust once every trading day. Assume that the wealth can take 100 values between 0 and 99.

In [6]:
States = set()
for i in range(10):
    for w in range(100):
        States.add((i, w))

## Appendix Code

In [3]:
from typing import TypeVar

S = TypeVar('S')
A = TypeVar('A')

In [4]:
from typing import NamedTuple, Any, Dict, Tuple, Set, Union
import numpy as np

class MP(NamedTuple):
    States: Set[S]
    P: Dict[S, Dict[S, float]]
        
        
class MRP(NamedTuple):
    mp: MP
    R: Union[Dict[S, float], Dict[S, Dict[S, float]]]
    gamma: float

In [5]:
class MDP(NamedTuple):
    States: Set[S]
    # the transitions depend on s, a, and s'
    # mapping from a state to a mapping of an action to a mapping of a state to a float (probability)
    P: Dict[S, Dict[A, Dict[S, float]]]
    Actions: A
    # reward is a function of the current state,  and the action
    R: Union[Dict[S, Dict[A, float]], Dict[S, Dict[A, Dict[S, float]]]]
    gamma: float

        
class Policy(NamedTuple):
    # state to action to a probability
    pi: Dict[S, Dict[A, float]]
        

class state_value_function(NamedTuple):
    vf: Dict[S,  float]
        
        
class action_value_function(NamedTuple):
    vf: Dict[S, Dict[A, float]]