# Write-up and code for February 1

### To Do
- Work out (in LaTeX) the solution to the Linear Impact model we covered in class
- Model a real-world Optimal Trade Order Execution problem as an MDP (with complete order book included in the State)

## Optimal Trade Order Execution
Our task is to sell a large number $N$ of shares, this task must be done within $T$ discrete time steps. Our goal is to maximize Expected Total Utility of Sales Proceeds, where we need to account for both a temporary and a permanent price impact from our market orders. This is a Dynamic Optimization problem that can be modeled as an MDP.
- We have $T$ time steps indexed by $t=1,...,T$
- $P_t$ denotes Bid PRice at start of time step $t$
- $N_t$ denotes number of shares sold in time step $t$
$R_t = N - \sum_{i=1}^{t-1}N_i$ denotes the number of shares remaining to be sold at start of time step $t$. Easy to see that $R_1 = N, N_T = R_T$
-  Price Dynamics are given by $$P_{t+1} = f_t(P_t, N_t, \epsilon_t)$$ where $F_t(\cdot)$ is an arbitrary function representing permanent price impact
- Sales proceeds in time step $t$ is defined as $$N_t\cdot Q_t = N_t \cdot (P_t - g_t(P_t, N-T))$$ where $g_t(\cdot)$ is an arbitrary function representing temporary price impact
- Utility of Sales Proceeds function denoted as $U(\cdot)$

We can formulate this as an MDP:
- The State is a a tuple of $\langle t,P_t, R_t\rangle $, where $1\leq t\leq T$
- Perform Action $N_t$
- Receive Reward $U(N_t\cdot Q_t) = U(N_t\cdot (P_t-g_t(P_t,N_t)))$
- Experience price dynamics $P_{t+1} = F_t(P_t, N_t, \epsilon_t)$

Now we need to make assumptions about the underlying structure of the price dynamics. Consider a simple model with a Linear Price Impact:
- Where $N, N_t, P_t \in \mathbb R$
- Let $P_{t+1} = P_t - \alpha N_t + \epsilon_t$ where $\alpha \in \mathbb R^+$
- The r.v. $\epsilon_t$ is i.i.d. with $\mathbb E[\epsilon_t|N_t,P_t] = 0$
- Temporary price impact is given by $\beta N_t$, so $Q_t = P_t - \beta N_t$ where $\beta \in \mathbb R^+$
- the utility function is the identity function, meaning that we have no. risk-aversion
- MDP Discount factor $\gamma = 1$