## Definitions

**Definition (State)**

The state variable $X_t$ is a vector listing current values of variables relevant to choosing the current action.

**Definition (Value function)**

The value function $v_t(w)$ track the optimal lifetime payoffs from a given state at a given time. It depends only on the parameters (think about the *indirect utility function*). 

**Definition (Bellman equation)**

**Definition (Bellman's Principle of Optimality)**

**Definition (Globally Stable)**

A self-map $T$ is **globally stable** on $U\subset \mathbb{R}^n$ if 
- $T$ has a unique fixed point $x^*\in U$
- $T^ku\to u^*$ as $k\to\infty$ for all $u\in U$

**Definition (Invariant)**

Let $T$ be a self-map on $U\subset\mathbb{R}^n$. $C\subset U$ is **invariant** if 

$$\forall u\in C, Tu\in C$$

($T$ is a self-map on $C$ also.)



## Theorem

**Theorem**
The value function $v^*$ is the solution of the Bellman equation.

**Neumann Series Lemma**
Let $\rho(A)$ be the spetral radius of matrix A. If $\rho(A)<1$, then

- $I-A$ is nonsingular
- $\sum_{k\ge 0} A^k$ converges to $(I-A)^{-1}$

**Lemma**
For any square matrix $B$ and any matrix norm $\|\cdot\|$, we have

- $\rho(B)^k \le \|B^k\|$, $\forall k\in\mathbb{N}$

- $\rho(B) = \lim_{k\to\infty} \|B^k\|^{1/k}$  (Gelfand's Formula)

**Lemma**
If $\exists \overline u \in U, m\in\mathbb{N}$, s.t. $T^k u =\overline u$, $\forall u\in U$ and $k\ge m$, then $\overline u$ is the unique fixed point of $T$ in $U$.

## Structure of a typical dynamic program

For a time period $t<T$, we (objective is to maximize the expected lifetime rewards **EPV**)

- observe the current state $X_t$

- choose an action $A_t$

- receive a reward $R_t(X_t,A_t)$

- update $X_{t+1} = F(X_t, A_t, \xi_{t+1})$

## Finite-Horizon Job Search Problem

This section use a finite-horizon job search problem to illustrate finite-period DP. In particular

- agent begins unemployed at time $t=0$

- receive a new job offer paying wage $W_t$ for all $t = 0,1, 2,\ldots, T$

- Two choices and corresponding rewards
  - *accept* $\implies$ work *permanetly* with wage at the time accepting the offer
  - *reject* $\implies$ receive constant unemployment compensation $c$ for the current period
  
The state variable is wage $W_t\sim \varphi\in \mathcal{D}(W)\,\,iid, W\subset \mathbb{R}_{+}$ finite and $\varphi$ is known. Action is to accept or reject the offer $A_{t}=0$ reject, $A_{t} = 1 $ accept.


We can represent this problem using Bellman Equations for each period $t = 0,1,2,\ldots, T$,i.e.,

$$
\begin{align}
v_t(w_t) &= \max\left\{\text{stopping value, continuation value}\right\}\\
&=\max\left\{\sum_{\tau = 0}^{T-t} \beta^\tau w_t, c + \beta \sum_{w'\in W} v_{t+1}(w')\varphi(w')\right\}
\end{align}
$$

We can solve for all $v_t$ by backward induction to calculate the reservation wage at each period. This solves the problem of whether to accept or reject the offer.

### Code of Finite-Horizon Job Search Problem (T period)

First, we start with importing `numpy` for numerical operations and `namedtuple` to store the model parameters.

In [1]:
import numpy as np
from collections import namedtuple

A `namedtuple` is a convenient way to define a class. This type of data structure allows us to create tuple-like objects that have fields accessible by attribute lookup as well as being indexable and interable. 

In this model, we want to use the `namedtuple` to store values of the model, hence, we name it as `Model`. It requires the following parameters:

- `c`: the unemployment compensation
- `w_vals`: $W$, the finite wage space
- `n`: the cardinality of the wage space (in the following code, I use the uniform distribution, hence, this simplies the answer by not including extra parameters for the pdf)
- `β`: the discount factor

In [4]:
Model = namedtuple("Model", ("c", "w_vals", "n", "β","T"))

Then we use a function to input specific values into the `namedtuple`, i.e.,

In [35]:
def create_job_search_model(
    n = 50,          # wage grid size
    w_min = 11,      # lowest wage
    w_max = 60,      # highest wage
    c = 10,          # unemployment compensation
    β = 0.96,        # discount factor
    T = 10           # number of periods t= 0, 1,...,T
):
    """
    This function input the paramters with the above default values, and return a namedtuple
    """
    w_vals = np.linspace(w_min,w_max, n) # create a evenly spaced numbers over the specified intervals, with specified number of sample
    
    return Model(c = c, w_vals = w_vals, n = n, β = β, T=T) # return the namedtuple with the input parameters

Now we define a function that iteratively obtain the continuation value, and reservation wages

In [41]:
def reservation_wage(model):
    c, w_vals, n, β, T = model.c, model.w_vals, model.n, model.β, model.T  # Input the model parameters
    H = np.zeros(T+1)  # Initialize the continuation value sequence
    R = np.zeros(T+1)  # Initialize the reservation wage sequence
    S = np.zeros((T+1,n))  # Initialize the maximum values at each state at each period
    H[T] = c         # Input the last continuation value which is just the unemployment compensation
    R[T] = c         # The reservation wage at the last period is just the unemployment compensation
    S[T,:] = np.maximum(c, w_vals) # At period T, it is just comparing the unemployment compensation with the wages
    for t in range(1, T+1):
        H[T-t] = c + β * np.mean(S[T-t+1,:]) # Assuming uniform distribution, we only need to calculate the mean
        df = np.geomspace(1, β**t, t+1)   # this generate the sequence for the denominator
        dfs = np.sum(df)  # this is the denominator for the reservation wage calculation
        R[T-t] = H[T+1-t]/dfs    # This calculate the reservation wage at time T-t
        S[T-t,:] = np.maximum(dfs * w_vals, H[T-t])   # This returns the maximum values for each wage state by comparing the continuation value and stopping value
    return R
    

This function iteratively generate the reservation wage sequence. We can show the result by create the model and use this function to calculate the reservation wage sequence.

In [42]:
model = create_job_search_model()
reservation_wage(model)

array([36.50032766, 35.38365172, 34.07207926, 32.50704612, 30.6067209 ,
       28.23780776, 25.19318638, 21.10849802, 15.29705719,  5.10204082,
       10.        ])

The key idea is to break down this multi-stage decision problem into a two-stage decision problem. We obtain the value functions by comparing the continuation value and the stopping value. 

## Infinite-Horizon Job Search Problem

The above example motivates the infinite-horizon job search problem. We let,

- $v^*(w)$ denote the maximum lifetime EPV for the wage offer $w$.

In the infinite horizon, we have

$$\text{Stopping value} = \frac{w}{1-\beta}$$

$$\text{Continuation value: }h^* = c+\beta\sum_{w'\in W}v^*(w')\varphi(w')$$\
This implies the optimal choice is
$$\mathbb{1}\{\text{Stopping value}\ge \text{Continuation value}\} = \mathbb{1}\left\{\frac{w}{1-\beta}\ge h^*\right\}$$

**Key Idea**
Solve the Bellman equation to obtain the value function $v^*$, the corresponding Bellman equation is

$$
v^*(w) = \max\left\{\dfrac{w}{1-\beta}, c+\beta\sum_{w'\in W}v^*(w')\varphi(w')\right\} \,\,\,\,(w\in W)
$$