# Chapter 4 Optimal Stopping Reading Note

1. Theory of Optimal Stopping: 
   - Theory: The stopping problem, Lifetime values, Policy operators, the value functions, the Bellman Operator, Optimal policies, value function iteration
   - Firm Valuation and Exit: Optional exit, exit vs no-exit
   - Monotonicity: monotone values, monotone actions
   - Continuation values: continuation value operator, dimensionality reduction, application to firm value

2. Applications:
   - American Options
   - Research and Development: Constant R&D costs, IID R&D costs

## Theorems

1. $T_\sigma$ is a contraction
2. $T$ is a contraction, $v^*$ is the unique fixed point
3. Optimal policy $\sigma^*$ is the $v^*$-greedy policy
4. $e,c$ increasing and $P$ monotone increasing implies $v^*,h^*$ increasing
5. $e$ increasing, $h^*$ decreasing implies $\sigma^*$ increasing
6. $e$ decreasing, $h^*$ increasing implies $\sigma^*$ decreasing
7. continuation value operator is a contraction with unique fixed point $h^*\in\mathbb{R}^X$

### Proposition 4.1.1. $T_\sigma$ is a contraction

For any $\sigma\in\Sigma$, the policy operator $T_\sigma$ is a contraction of modulus $\beta$ on $\mathbb{R}^X$ **under the supremum norm**.


### Proposition 4.1.2. $T$ is a contraction, $v^*$ is the unique fixed point

If $\mathscr{S}$ is an optimal stopping problem with Bellman operator $T$ and value function $v^*$, then

1. $T$ is a contraction of modulus $\beta$ on $\mathbb{R}^X$ under the supremum norm
2. The unique fixed point of $T$ on $\mathbb{R}^X$ is the value function $v^*$

### Proposition 4.1.3. Optimal policy $\sigma^*$ is $v^*$-greedy policy

Policy $\sigma\in\Sigma$ is optimal if and only if it is $v^*$-greedy policy.

(**A version of Bellman's principle of optimality**)

### Lemma 4.1.4. 

If $e,c\in i\mathbb{R}^X$ and $P$ is monotone increasing, then $h^*$ and $v^*$ are both increasing. 

### Proposition 4.1.5. Continuation value operator is a contraction with unique fixed point

The continuation value operator $C$ is a contraction of modulus $\beta$ on $\mathbb{R}^X$ with unique fixed point $h^*\in\mathbb{R}^X$.

### The Stopping Problem

- An entrepreneur who decides to enter or exit a market
- A borrower who considers defaulting on a loan
- A firm that contemplates introducing a new technology
- A portfolio manager decide whether to exercise a real or financial option.

They are all two-action (or binary choice) problems.

### Optimal Stopping Problem

Let $X$ be a finite set. Given $X$, an **optimal stopping problem** is a 4-tuple $\mathscr{S} = (\beta, P, r,e)$ that consists of,

1. a discount factor $\beta\in(0,1)$
2. a Markov operator $P\in\mathscr{M}(\mathbb{R}^X)$
3. a **continuation reward function** $c\in\mathbb{R}^X$
4. an **exit reward function** $e\in\mathbb{R}^X$

Given a $P$-Markov chain $(X_t)_{t\ge 0}$, a decision maker observes the state $X_t$ in each period and decides whether to continue or stop.

- If she chooses to stop, she receives final reward $e(X_t)$ and the process terminates.

- If she chooses to continue, then she receives $c(X_t)$ and the process repeats next period.


### Lifetime reward

$$
\mathbb{E}\sum_{t\ge 0} \beta^t R_t
$$

- If she chooses continue, then $R_t =c(X_t)$ while the agent continues,

- If she chooses stop, then $R_t = e(X_t)$ and zero afterwards.

### Policy function

The optimal decisions are described by **policy function** which is a map 

$$
\sigma:X\mapsto\{0,1\}
$$

After observing state $x$ at any given time, the decision maker takes action $\sigma(x)$, where $0$ means to continue and $1$ means to stop.

**Implicit in this formulation is we assume that the current state contains enough information for the agent to decide whether or not to stop**

### $\sigma$-value function

Let $\Sigma$ be the set of functions from $X$ to $\{0,1\}$ (**Policy function space**).

Let $v_\sigma(x)$ denote the expected lifetime value of following policy $\sigma$ now and in every future period given optimal stopping problem $\mathscr{S}(\beta,P,r,e)$ and current state $x\in X$.

We call $v_\sigma$ the **$\sigma$-value function**. We also call $v_\sigma(x)$ the **lifetime value of policy $\sigma$ conditional on initial state $x$**. 

A policy $\sigma^*\in\Sigma$ is called **optimal** for $\mathscr{S}$ if

$$
v_{\sigma^*}(x)=\max_{\sigma\in\Sigma}v_{\sigma}(x)
$$

for all $x\in X$.





### Lifetime values

Fix $\sigma\in\Sigma$, let's consider how to compute the lifetime value $v_\sigma(x)$ of following $\sigma$ conditional on $X_0=x$.

$v_\sigma$ satisfies:

$$
v_\sigma(x)=\sigma(x)e(x) + (1-\sigma(x))\left[c(x)+\beta\sum_{x'\in X} v_\sigma(x')P(x,x')\right] \tag{1}
$$

for all $x\in X$.

**The value of continuing is the current reward plus the discounted expected reward obtain by continuing with policy $\sigma$ next period**.

### Solve (1)

Define $r_\sigma\in\mathbb{R}^X$ and $L_\sigma \in\mathscr{L}(\mathbb{R}^X)$ via

$$
r_\sigma(x):= \sigma(x)e(x)+(1-\sigma(x)) c(x)
$$

$$
L_\sigma(x,x') = \beta(1-\sigma(x))P(x,x')
$$

With this notation, we have,

$$
v_\sigma = r_\sigma + L_\sigma v_\sigma
$$

If $\rho(L_\sigma)<1$, we have, $v_\sigma$ is **uniquely defined by**

$$
v_\sigma = (I-L_\sigma)^{-1} r_\sigma
$$

### Policy Operator

**It is helpful to view $v_\sigma$ as the fixed point of an operator**.

We associate each $\sigma\in \Sigma$ with an **Policy Operator $T_\sigma$** defined at $v\in \mathbb{R}^X$ by

$$
(T_\sigma v)(x) = \sigma(x)e(x)+(1-\sigma(x))\left[c(x)+\beta\sum_{x'\in X}v(x') P(x,x')\right]
$$

for each $x\in X$. 

$v_\sigma$ is the fixed point of $T_\sigma$, i.e., 

$$
T_\sigma v_\sigma =v_\sigma
$$

$T_\sigma$ is a contraction, this implies that $v_\sigma$ is the unique fixed point of $T_\sigma$ in $\mathbb{R}^X$. Iterating $T_\sigma$ always converges to $v_\sigma$.

### Value function

Given an optimal stopping problem $S=(\beta, P, r,e)$ with $\sigma$-value functions $\{v_\sigma\}_{\sigma\in\Sigma}$, we define the **value function** $v^*$ of $\mathscr{S}$ via

$$
v^*(x) := \max_{\sigma\in\Sigma} v_\sigma(x)
$$

for all $x\in X$.

So that $v^*(x)$ is the maximal lifetime value available to an agent facing current state $x$.

We write $v^* =\bigvee_\sigma v_\sigma$



## Steps to obtain the value function

1. Formulate a **Bellman euqation** for the value function of the optimal stopping problem, namely

$$
v(x) =\max\{e(x),c(x)+\beta\sum_{x'\in X}v(x)P(x,x')\} \tag{$x\in X$}
$$

2. Prove that this Bellman equation **has a unique solution in $\mathbb{R}^X$**

3. Show that this unique solution equals the value function $v^*(x)=\max_{\sigma\in\Sigma} v_\sigma(x)$

### Bellman Operator

Define the **Bellman Operator** for the optimal stopping problem $\mathscr{S}=(\beta, P,r,e)$ as

$$
(Tv)(x)=\max\left\{e(x), c(x)+\beta\sum_{x'}v(x')P(x,x')\right\}
$$

where $x\in X$, $v\in\mathbb{R}^X$.

By construction, any fixed point of $T$ solves the Bellman equation and vice versa.

Pointwise, we can express $T$ via $Tv = e\vee (c+\beta Pv)$

**Proof strategy of Proposition 4.1.2. looks important**

We want to show that the unique fixed point of $T$, $\bar v$ is $v^*$ by showing

1. $\bar v\le v^*$
2. $v^*\le \bar v$

Part 1:

$\bar v$ induces a $\bar v$-greedy policy, $\sigma$. This $\sigma$ induces a Bellman operator $T_\sigma$. Then we can show that $\bar v$ is the fixed point of $T_\sigma$. Since $T_\sigma$ has unique fixed point, we have $\bar v =v_\sigma$. Since $v^*=\bigvee_\sigma v_\sigma$. We have $\bar v\le v^*$.

Part 2:

Since $\bar v$ is the unique fixed point of a globally stable operator $T$ and we have $T$ dominates $T_\sigma$ for all $\sigma\in\Sigma$ by definition. Hence, this implies the unique fixed point dominates all the fixed point of $T_\sigma$ for all $\sigma\in \Sigma$. This implies $v^*=\bigvee_\sigma v_\sigma \le \bar v$.

### Optimal policies

For each $v\in\mathbb{R}^X$, we call $\sigma\in\Sigma$ **v-greedy policy** if for all $x\in X$, we have,

$$
\sigma(x)\in \arg\max_{a\in\{0,1\}}\left\{ae(x)+(1-a)\left[c(x)+\beta\sum_{x'\in X}v(x')P(x,x')\right]\right\}
$$

**A $v$-greedy policy uses $v$ to assign values to states and then chooses to stop or continue based on the action that generates a higher payoff**.

## Firm valuation with EXIT

ADD EXIT OPTION:

Firm now have the option to cease operations and sell all remaining assets.

### Optional Exit

Consider a firm with
- exogenous $Q$-Markov productivity: $(Z_t)_{t\ge 0}$, $Z$ finite.

- fixed profit function productivity: $\pi_t= \pi(Z_t)$

- At the start of each period, firm choose to continue or exit:
  - continue: receive $\pi_t$
  - exit: receive $s>0$, the scrape value of the firm
- Discount factor at $\beta = 1/(1+r)$, where $r$ is the interest rate, $r>0$

- $\Sigma = Z^{\{0,1\}}$: policy function space

- Policy operator:

$$
(T_\sigma v)(z) = \sigma(z)s+(1-\sigma(z))\left[\pi(z)+ \frac{1}{1+r}\sum_{z'\in Z} v(z') Q(z,z')\right]\tag{$z\in Z$}
$$

- $v_\sigma(z)$ is the unique fixed point of $T_\sigma$, representing the value of following $\sigma$ and conditional on the initial state is $Z_0=z$.

- **Bellman operator** $T$ is

$$
(Tv)(z) =\max\left\{s, \pi(z) +\beta\sum_{z'\in Z}v(z')Q(z,z')\right\}\tag{$(z\in Z)$}
$$

or 

$$
Tv = s\vee(\pi + \beta Qv)
$$

- **Stopping value**: s
- **Continuation value function**: 

$$
h^*(z) = \pi(z)+\beta\sum_{z'\in Z}v^*(z')Q(z,z')
$$

or 

$$
h^* = \pi+\beta Qv^*
$$


Using successive approximation, we get $v^*$ and we get $v^*$-greedy policy $\sigma^*$. **The $v^*$-greedy policy instruct the firm to exit when the continuation value is less than the scrap value**.

### EXIT vs NO EXIT

We define 

$$
w(z) = \mathbb{E}_z\sum_{t\ge 0}\beta^t \pi_t\tag{$z\in Z$}
$$

$w(z)$ is the value of the firm given $Z_0=z$ when **firm never exits**.

**Intuitive proof**

Let $\sigma_0$ denote the policy of never exit, hence $\sigma_0\in\Sigma$. Since $v^*=\bigvee_\sigma v_\sigma$. We have $v^*\ge v_{\sigma_0}$.

## Monotonicity

### Monotone values

Let $v^*$ be the value function of an optimal stopping problem defined by $X,P,\beta,c,e$ and defined a **continuation value function** $h^*$

$$
h^*(x) = c(x)+\beta\sum_{x'\in X} v^*(x')P(x,x') \tag{$x\in X$}
$$

Let $X$ be partially ordered and let $i\mathbb{R}^X$ be the increasing function on $i\mathbb{R}^X$.

If $e,c\in i\mathbb{R}^X$, $P$ monotone increasing, $h^*,v^*\in i\mathbb{R}^X$. 

Proof by $P$ is monotone implies invariant on $i\mathbb{R}^X$.

### Monotone actions

Take $X\subset \mathbb{R}$, finite, and ordered by $\le$. 

**Sufficient not necessary conditions for monotone actions**:

- If $e$ decreasing, $h^*$ increasing, this implies $\sigma^*$ decreasing.
- If $e$ increasing, $h^*$ decreasing, this implies $\sigma^*$ increasing.

For a binary function on $X\subset \mathbb{R}$, the condition that 

- **$\sigma^*$ is decreasing means that the decision maker chooses to exit when $x$ is sufficiently small**.

- **$\sigma^*$ is increasing means that the decision maker chooses to exit when $x$ is sufficiently large**.

This applies to the firm exit problem: Since $Q$ is monotone increasing, low current value of $z$ predict low future values of $z$ (low expectation). So profits associated with continuing can be anticipated to be low.


**Since $X\subset \mathbb{R}$ is totally ordered, monotonicity implies that a threshold policy is optimal**, i.e., we take $x^*$ to be the smallest $x\in X$ such that $\sigma^*(x)=1$, for such $x^*$, we have

$$
x<x^* \implies \sigma^*(x)=0, x>x^*\implies \sigma^*(x)=1
$$

## Continuation values

While all relevant state components must be included in the value function, purely transitory components do no affect continuation values. Hence, the continuation value approach is at least as efficient and sometimes substantially more so.

### The continuation value operator

Let $h^*$ be the continuation value function for the optimal stopping problem. To compute $h^*$ directly we begin with the optimal stopping version of the Bellman equation evaluated at $v^*$ and writed as 

$$
v^*(x') = \max\{e(x'),h^*(x')\}\tag{$x'\in X$}
$$

By the formula of continuation value function, we have,

\begin{align*}
h^*(x) &= c(x)+\beta\sum_{x'\in X}v^*(x')P(x,x')\\
&=c(x)+\beta\sum_{x'\in X} \max\{e(x'),h^*(x')\}P(x,x')\tag{$x\in X$}
\end{align*}

This expression motivates us to introduce a **continuation value operator** $C:\mathbb{R}^X\mapsto \mathbb{R}^X$ via

$$
(Ch)(x) = c(x)+\beta\sum_{x'\in X}\max\{e(x'), h(x')\}P(x,x') \tag{$x\in X$}
$$

or

$$
Ch = c+\beta P(e\vee h)
$$

The continuation value operator is a contraction with a unique fixed point $h^*\in\mathbb{R}^X$. 

This implies we have the following **algorithm to compute optimal policy:**

1. Use successive approximation to find $h^*$
2. Calculate $\sigma^*(x)=\mathbb{1}\{e(x)\ge h^*(x)\}$

### Dimensionality reduction

Continuation value iteration can substantially reduce the dimensionality of the problem in some cases. 

Let $W,Z$ be two finite sets and suppose that $\varphi\in\mathcal{D}(W)$ iid and $Q\in \mathscr{M}(\mathbb{R}^Z)$ be an $Q$-Markov chain on $Z$.

**If W,Z are independent, then $(X_t)$ defined by $X_t = (W_t, Z_t)$  is $P$-Markov on $X$**, where,

$$
P(x,x') = P((w,z),(w',z')) = \varphi(w')Q(z,z')
$$

Suppose that the continuation reward depends only on $z$ so that we can write the Bellman operator as

$$
(Tv)(w,z)=\max\left\{e(w,z), c(z)+\beta\sum_{w'\in W}\sum_{z'\in Z} v(w',z') \varphi(w') Q(z,z')\right\}
$$

Since the right-hand side depends on both $w$ and $z$, the Bellman operator acts on an $n$-dimensional spaces where $n:=|X|=|W|\times |Z|$.

However, since the continuation value function only depends on $z$ because dependence on $w'$ vanishes because $w$ does not help predict $w'$.

Thus, the continuation value function is an object in $|Z|$-dimensional space.

The continuation value operator

$$
(Ch)(z) = c(z)+\beta\sum_{w'\in W}\sum_{z'\in Z} \max\{e(w',z'), h^*(z')\}\varphi(w')Q(z,z')
$$

## Application: American Call Option

American Call Option: provide the right to buy a particular stock or bond at fixed **strike price** $K$ at any time befire a set expiration date.  The market price of the asset at time $t$ is denoted by $S_t$.

We are interested in **computing the expected value of holding the option when discounting with a fixed interest rate**

Finite horizon American options can be priced by 
- backward induction
- embed finite horizon options into the thoery of infinite horizon optimal stopping.

We take
- $T\in\mathbb{N}$ be the expiration date. 
- Option purchased at $t=0$ and can be exercised at any $t\le T$.
- We set:
   - $\top:=\{1, \cdots, T+1\}$, $m(t):=\min\{t+1, T+1\}$ for all $t\in \top$
   - (Time is updated via $t'=m(t)$, so that time increments at each update until $t=T+1$, after that we hold $t$ constant, bounding time at $T+1$ keeps the state space finite)

- Stock price $S_t$ evolves according to
   - $S_t = W_t + Z_t$
   - $(W_t)_{t\ge 0} \sim_{IID} \mathcal{D}(W)$
   - $(Z_t)_{t\ge 0}$ is $Q$-Markov for some $Q\in\mathscr{M}(\mathbb{R}^Z)$
   - This implies the share price has both persistent and transcient stochastic components
   
- State space $X:= \top \times W\times Z$

- $(X_t)$ is a $P$-Markov with

$$
P((t,w,z),(t',w',z')):=\mathbb{t'=m(t)}\varphi(w')Q(z,z')
$$
   - time updates deterministically via $t'=m(t)$
   - $z',w'$ are drawn independently from $Q(z,\cdot),\varphi$
- Continuation reward is zero
- Discount factor $\beta = 1/(1+r), r>0$ is fixed risk-free rate.

- Exit(Exercising) reward: $\mathbb{1}\{t\le T\}(S_t-K)$ so that exercising at time $t$ earns the owner $S_t-K$ up to expiration date and zero afterwards.

   - In terms of the state $(t,z)$, the exit reward is
   $$
   e(t,w,z):= \mathbb{1}\{t\le T\}[z+w-K]
   $$

**Bellman equation**

$$
v(t,w,z)=\max\{e(t,w,z), \beta\sum_{w'\in W}\sum_{z'\in Z}v(t', w',z')\varphi(w') Q(z,z')\}
$$

where $t'=m(t)$.

Interpretation:

The value of the option is the maximum of current exercise value and the discounted expected value of carrying the option over the next period.

This is an optimal stopping problem. Hence, iterating the Bellman operator leads to a unique fixed point $v^*$ and the policy is optimal if and only if it is $v^*$-greedy (Bellman's principle of optimality).

### Continuation value function

We can do better than value function iteration, as $(W_t)\sim_{IID}\varphi$. We define the continuation value operator as

$$
(Ch)(t,z)=\beta\sum_{z'\in Z}\sum_{w'\in W}\max\{e(t',w',z'), h(t,z)\}\varphi(w')Q(z,z')
$$

As proved before, the continuation value operator is a contraction with a unique fixed point $h^*\in\mathbb{R}^X$.

After obtaining $h^*$, we can compute the optimal policy as

$$
\sigma^*(t,w,z) =\mathbb{1}\{e(t,w,z)\ge h^*(t,z)\}
$$

where $\sigma^*(t,w,z)=1$ implying exercising the option.

## Application: Research and Development