<a id = "chap:markov"></a>
# 3 Markov Processes

We call a random vector $X_t$  the *state* because it completely describes the position  of a dynamic system at
time $t$ from the perspective of a model builder or an econometrician.  
 We construct  a consistent  sequence of probability distributions  $Pr_\ell$  for a sequence of
random vectors 

$$
X^{[\ell]}  \doteq \begin{bmatrix}  X_0  \\ X_1  \\ \vdots \\ X_\ell  \end{bmatrix}
$$ 

for all nonnegative integers $\ell $ by specifying the following two  elementary components of a *Markov process*:  
(i)  a probability
distribution for $X_0$, and (ii) a time-invariant distribution for $X_{t+1}$ conditional on $X_t$ for $t \geq 0$.
All other probabilities are functions of these  two distributions.  By creatively defining the state vector $X_t$, a Markov specification
includes many models used in applied research.

## 3.1 Constituents
Assume a state space $\mathcal{X}$ and a transition distribution
$P(dx^*|x)$. For example, $\mathcal{X}$ could be $\mathbb{R}^n$ or a subset of $\mathbb{R}^n$.
The transition distribution $P$ is a conditional probability measure for each $X_t = x$ in the state space,
    so it satisfies $\int_{\{x^* \in \mathcal{X} \}}P(dx^* | x) = 1$ for every $x$ in the state space.
If in addition we specify a marginal distribution $Q_0$ for the initial state $x_0$ over $\mathcal{X}$, then we have completely specified all joint distributions for the
stochastic process $\{X_t, t = 0, 1, \ldots\}$.

The notation $P(dx^*|x)$ denotes a conditional probability measure; integration is over $x^*$ and conditioning is captured by $x$.
Thus, $x^*$ is a possible realization of next period's state and $x$ is a realization of this period's state.
The conditional probability measure $P(dx^* |x)$ assigns conditional probabilities to next period's state given that this period's state is $x$.
Often, but not always, the conditional distributions have densities against a common distribution $\lambda(dx^*)$ to be used to integrate over states.
That lets us use a *transition density* to represent the conditional
probability measure.

<a id = "ex:ex01"></a>
**Example 3.1.1** A first-order vector autoregression is a Markov Process. Here $Q_0(x)$ is a normal distribution with mean $\mu_0$ and covariance matrix $\Sigma_0$
and $P(dx^*|x)$ is a normal distribution with mean $Ax$ and covariance matrix $BB'$ for a square matrix $A$
and a matrix $B$ with full column rank.[^1] These assumptions imply the vector autoregressive (VAR) representation

$$
X_{t+1} = A X_t + B W_{t+1} ,
$$

for $t \geq 0$, where $W_{t+1}$ is a multivariate standard normally distributed random vector that is independent of $X_t$.

[^1]: When $BB'$ is singular, a density may not exist with respect to Lebesgue measure. The covariance matrix $B B'$ is typically singular for a first-order vector autoregression constructed by rewriting a higher-order vector autoregression.

<a id = "ex:ex02"></a>
**Example 3.1.2** A discrete-state Markov chain consists of a $Q_0$ represented as a row vector and a transition probability $P(dx^*|x)$ represented as a matrix with one row and one column for each possible value of the state $x$. Rows contain vectors of probabilities of next period's state conditioned on a realized value of this period's state.

It is useful to construct an operator by applying a one-step conditional expectation operator to functions of a Markov state. Let $f:{\mathcal X} \rightarrow {\mathbb R}$. For bounded $f$, define:

$$\tag{1}
{\mathbb T} f (x) = E \left[ f(X_{t+1}) | X_t = x \right] = \int_{\{x^* \in {\mathcal X}\}}  f(x^*) P(d x^*|x).
$$

The Law of Iterated Expectations justifies iterating on ${\mathbb T}$ to form conditional expectations of the function $f$ of the Markov state over longer horizons:

$$
{\mathbb T}^j f(x) = E  \left[ f(X_{t+j}) | X_t = x \right].
$$

We can use the operator ${\mathbb T}$ to characterize a Markov process. Indeed, by applying ${\mathbb T}$ to a suitable range of test functions $f$, we can construct a conditional probability measure.

<a id = "eqn:Toperatordef"></a>
**Fact 3.1.3.** Start with a conditional expectation operator ${\mathbb T}$ that maps a space of bounded functions into itself. We can use ${\mathbb T}$ to construct a conditional probability measure $P(dx^*|x)$ provided that ${\mathbb T}$ is (a) well defined on the space of bounded functions, (b) preserves the bound, (c) maps nonnegative functions into nonnegative functions, and d) maps the unit function into the unit function.

<a id = "sec:MarkErgodic"></a>
## 3.2 Stationarity

We can construct a stationary Markov process by carefully choosing the distribution of the initial state $X_0$.

<a id = "def:stationdist"></a>
**Definition 3.2.1** Stationarity A probability measure $Q$ over a state space ${\mathcal X}$ for a Markov process with transition probability $P$ is a **stationary distribution** if it satisfies 

$$
\int_{ \{ x \in {\mathcal X} \}} P(d x^*|x) Q(dx)  = Q(d x^*)  .
$$

We will sometimes refer to a stationary density $q$. A density is always relative to a measure. With this in mind, let $\lambda$ be a measure used to integrate over possible Markov states on the state space ${\mathcal X}$. Then a density $q$ is a nonnegative (Borel measurable) function of the state for which $\int q(x) \lambda(dx) = 1$.

<a id = "def:stationdist"></a>
**Definition 3.2.2** A **stationary density** over a state space $\mathcal{X}$ for a Markov process with transition probability $P$ is a probability density $q$ with respect to a measure $\lambda$ over the state space $\mathcal{X}$ that satisfies 

$$
\int P(d x^*|x) q(x) \lambda(dx) = q(x^*) \lambda(dx^*).
$$ 


<!-- <a id = "eqn:reversiblenew"></a>
**Definition 3.2.2** (Reversible)  
A Markov process with stationary density $q$ and transition density $P(dx|x^*)$ is said to be **reversible** if   

$$
P(dx^*|x)q(x) \lambda(d x) = P(dx|x^*)q(x^*)  \lambda(d x^*). \tag{1}
$$ -->

Various  sufficient  conditions   imply the existence of a stationary distribution. Given a transition distribution $P$, one such condition that is widely used to justify some calculations from  numerical simulations is that the Markov process be *time reversible*, which means that 

$$P(dx^*|x) Q(dx) = P(dx|x^*) Q(dx^*) \tag{eqn:reversiblenew}$$

for some probability distribution $Q$ on $\mathcal{X}$. Because a  transition distribution satisfies $\int_{\{ x \in \mathcal{X}\}}  P(dx|x^*) =1 $,

$$
\int_{\{ x \in \mathcal{X}\}} P(dx^*|x) Q(dx)  = \int_{\{ x \in \mathcal{X}\}} P(dx|x^*) Q(dx^*)  = Q(dx^*) ,
$$

so $Q is a stationary distribution by Definition 3.2.1. Restriction [\eqref{eqn:reversiblenew}](#def:reversible) implies that the  process is time reversible in the sense that forward and backward transition distributions coincide. Time reversibility is special, so later we will explore other sufficient conditions for the existence of stationary distributions.^{Numerical Bayesian statistical analysis often computes a posterior probability distribution by iterating to convergence a reversible Markov process whose stationary distribution is that posterior distribution.}

<a id = "chap:process"></a>
<a id = "sec:stochprocessconstructionI"></a>
**Remark 3.2.3** When a Markov process starts at a stationary distribution, we can construct the process $\{ X_t : t=1,2,...\}$ with a measure-preserving transformation ${\mathbb S}$ of the type featured in chapter [3](#chap:process), section [1](#sec:stochprocessconstructionI).

Given a stationary distribution $Q$, form the space of functions ${\mathcal L}^2$

$$
{\mathcal L}^2 = \{ f:{\mathcal X} \rightarrow {\mathbb R} : \int f(x)^2 Q(dx) < \infty \} .
$$

It can be shown that ${\mathbb T} : {\mathcal L}^2 \rightarrow {\mathcal L}^2$. On this space, a well-defined norm is

$$
\| f \| = \left[\int f(x)^2 Q(dx)\right]^{1/2} . 
$$ 

<a id = "sec:eigfns"></a>
## 3.3 ${\mathcal L}^2$ and Eigenfunctions

We connected ergodicity to a statistical notion of invariance in chapter 1.  

The word invariance brings to mind a generalization of eigenvectors called eigenfunctions. Eigenfunctions of a linear mapping characterize an invariant subspace of functions such that the application of a linear mapping to any element of that space remains in the same subspace. Eigenfunctions associated with a unit eigenvalue are themselves invariant under the mapping. So perhaps it is not surprising that such eigenfunctions of ${\mathbb T}$ come in handy for studying ergodicity of Markov processes.

Given a stationary distribution $Q$, form the space of functions 

$$
{\mathcal L}^2 = \{ f:{\mathcal X} \rightarrow {\mathbb R} : \int f(x)^2 Q(dx)  < \infty \} .
$$ 

It can be verified that ${\mathbb T} : {\mathcal L}^2 \rightarrow {\mathcal L}^2$ and that 

$$
\| f \| = \left[\int f(x)^2 Q(dx)\right]^{1/2} 
$$ 

is a well-defined norm on ${\mathcal L}^2$.

We now study eigenfunctions of the conditional expectation operator \mathbb{T}

**Definition 3.3.1** A function $\tilde f \in {\mathcal L}^2$ that solves ${\mathbb T} f = f$ is an eigenfunction of ${\mathbb T}$ associated with a unit eigenvalue.

The following proposition asserts that an eigenfunction $\tilde{f}(X_t)$ associated with a unit eigenvalue is constant as $X_t$ moves through time.

<a id = "lem:uniteigen"></a>
**Proposition 3.3.2**  Suppose that $\tilde{f}$ is an eigenfunction of $\mathbb{T}$ associated with a unit eigenvalue. Then $\{ \tilde{f}(X_t) : t=0,1,...\}$ is constant over time with probability one.

**Proof.** 

$$
E [\tilde{f}(X_{t+1}) \tilde{f}(X_t)] = \int (\mathbb{T}\tilde{f})(x) \tilde{f}(x) Q(dx) = \int \tilde{f}(x)^2 Q(dx) =
E [\tilde{f}(X_t)^2]
$$

where the first equality follows from the Law of Iterated Expectations. 
Then because $Q$ is a stationary distribution,

$$
\begin{aligned}
E\left([\tilde{f}(X_{t+1}) - \tilde{f}(X_t)]^2\right)  &=  E\left[\tilde{f}(X_{t+1})^2\right] + E \left[\tilde{f}(X_t)^2\right] \\
& \quad - 2 E\left[ \tilde{f}(X_{t+1})\tilde{f}(X_t) \right] \\
  &=  0. 
\end{aligned}
$$

<a id = "sec:MarkErgodic"></a>
## 3.4 Ergodic Markov Processes 

Chapter 1 studied special statistical models that, because they are ergodic, are affiliated with a Law of Large Numbers in which limit points are constant across sample points $\omega \in \Omega$. Section 1.8 described other statistical models that are not ergodic and that are components of more general probability specifications that we used to express the idea that a statistical model is unknown.[^3] 
[^3]: Unknown parameters manifest themselves as unknown statistical models. 

We now explore ergodicity in the context of Markov processes.

From **Proposition 3.3.2** we know that time-series averages of an eigenfunction $\mathbb{T} \tilde f = \tilde f$ are invariant over time, so 

$$
{\frac{1}{N}} \sum_{t=1}^N \tilde f(X_t) = \tilde f(X).
$$

However, when ${\tilde f}(x)$ varies across sets of states $x$ that occur with positive probability under $Q$,
a time series average ${\frac{1}{N}} \sum_{t=1}^N \tilde f(X_t)$ can differ from $\int \tilde f(x) Q(dx)$. This happens when observations of $\tilde f(X_t)$ along a sample path for $\{X_t\}$ convey an inaccurate impression of how $f(X)$ varies across the stationary distribution $Q(dx)$. See **Example 3.6.4** below. We can exclude the possibility of such inaccurate impressions by imposing a restriction on the eigenfunction equation $\mathbb{T}f = f$.

**Proposition 3.4.1** When a unique solution to the equation 

$$
{\mathbb T}f = f
$$ 

is a constant function (with $Q$ measure one), then it is possible to construct $\{ X_t : t=0,1,2,...\}$ as a stationary and ergotic Markov process with ${\mathbb T}$ as the one-period conditional expectation operator and $Q$ as the initial distribution for $X_0$.[^4] 

[^4]: In particular, the process can be represented using a probability measure $Pr$ defined over events in ${\mathfrak F}$, a transformation ${\mathbb S}$ for which $({\mathbb S}, Pr)$ is measure preserving, and ergodic and a measurement function ${\widetilde X}$ such that $\left\{ {\widetilde X}\circ {\mathbb S}^t : t=0,1, \ldots \right\}$ has the same induced distribution as the process $\{X_t : t=0,1,2, \ldots \}$. 

Evidently, ergodicity is a property that obtains relative to a stationary distribution $Q$ of the Markov process. If there are multiple stationary distributions, it is possible that there is a unique constant function $f$ that solves ${\mathbb T}f = f$ problem for one stationary distribution and that non-constant solutions exist for other stationary distributions. 

### Invariant events for a Markov process 

Consider an eigenfunction ${\tilde f}$ of ${\mathbb T}$ associated with a unit eigenvalue. Let $\phi : {\mathbb R} \rightarrow {\mathbb R}$ be a bounded Borel measurable function. Since $\{ {\tilde f}(X_t) : t=0,1,2,... \}$ is invariant over time, so is $\left\{ \phi\left[{\tilde f}(X_t)\right] : t=0,1,2, \ldots \right\}$ and it is necessarily true that 

$$
{\mathbb T} (\phi \circ {\tilde f}) = \phi \circ {\tilde f}.
$$

Therefore, from an eigenfunction ${\tilde f}$ associated with a unit eigenvalue, we can construct other eigenfunctions,\footnote{This construction also works for unbounded functions $\phi$ provided that $\phi \circ \tilde f$ is square integrable under the $Q$ measure.} for example 

<a id="newjunk1"></a> 

$$
\phi[{\tilde f}(x)] = \begin{cases} 1 & \text{if} \ {\tilde f}(x) \in {\tilde {\mathfrak b}} \\ 0 & \text{if} \ {\tilde f}(x) \notin {\tilde {\mathfrak b}} \end{cases} 
$$ 

for some Borel set ${\tilde {\mathfrak b}}$ in ${\mathbb R}$. It follows that 

$$
\Lambda = \{ \omega \in \Omega : {\tilde f}[X_0(\omega)] \in {\tilde {\mathfrak b}} \}
$$ 

is an invariant event in $\Omega$. Note that by constructing the Borel set, ${\mathfrak b}$ in $\mathcal X$ 

$$
 {\mathfrak b} = \left\{ x : {\tilde f}(x) \in {\tilde {\mathfrak b}} \right\}
$$ 

we can represent $\Lambda$ as

<a id="invariantrep"></a>

$$
\Lambda = \left\{ \omega \in \Omega : X_0(\omega) \in {\mathfrak b} \right\}. \tag{2}
$$

Thus we have shown how to construct many non-degenerate eigenfunctions, starting from an initial such function.

For Markov processes, all invariant events can be represented as in (2), which is expressed in terms of the initial state $X_0$. See [Doob (1953, p.~460, Theorem 1.1)](#doob). Thus, associated with an invariant event is a Borel set in ${\mathcal X}$. Let ${\mathfrak J}$ denote the collection of Borel subsets of ${\mathcal X}$ for which $\Lambda$ constructed as in (2) is an invariant event. From these invariant events, we can also construct many non-degenerate eigenfunctions as indicator functions of sets in ${\mathfrak J}$. Formally, if ${\tilde {\mathfrak b}} \in {\mathfrak J}$, then the indicator function 

<a id="neweigen"></a>

$$
f(x) = \begin{cases} 1 & \text{if} \ x \in {\mathfrak b} \\ 0 & \text{if} \ x \notin {\mathfrak b} \end{cases} 
$$

satisfies 

$$
{\mathbb T} f = f
$$

with $Q$ probability one. Provided that the probability of $\Lambda$ is neither zero nor one, then we have constructed a nonnegative function $f$ that is strictly positive on a set of positive $Q$ measure and zero on a set with strictly positive $Q$ measure.

More generally, when a Markov process $\left\{ X_t : t \ge 0\right\}$ is not ergotic, there exist bounded eigenfunctions with unit eigenvalues that are not constant with $Q$ measure one. For a non-degenerate eigenfunction ${\tilde f}$ with unit eigenvalue to be constant with $Q$ measure one, it shouldn't be possible for the Markov process permanently to get stuck in a subset of the state space which has probability different from one or zero. Suppose now we consider any Borel set ${\mathfrak b}$ of ${\mathcal X}$ that has $Q$ measure that is neither zero nor one. Let $f$ be constructed as in (3) without restricting ${\mathfrak b}$ to be in ${\mathfrak J}$. Then ${\mathbb T}^j$ applied to $f$ is the conditional probability of $\{ X_j \in {\mathfrak b} \}$ as of date zero. If we want time series averages to converge to unconditional expectations, we must require that the set ${\mathfrak b}$ be visited eventually with positive probability. To account properly for all possible future dates we use a mathematically convenient resolvent operator defined by 

$$
{\mathbb M} f(x) = (1-\lambda) \sum_{j=0}^\infty \lambda^j {\mathbb T}^j f .
$$

for some constant discount factor $0 < \lambda < 1$. Notice that If $\tilde f$ is an eigenfunction of ${\mathbb T}$ associated with a unit eigenvalue, then the same is true for ${\mathbb T}^j$ and hence for ${\mathbb M}$. We translate the requirement that $X_j$ be eventually visited to a restriction that applying ${\mathbb M}$ the indicator function $f$ yields a strictly positive function. The following statement extends this restriction to all nonnegative functions that are distinct from zero.

<a id = "prop:ergod100"></a>
**Proposition 3.4.2** Suppose that for any $f \ge 0$ such that $\int f(x) Q(dx) > 0$, ${\mathbb M} f(x) > 0$ for all $x$ in ${\mathcal X}$ with $Q$ measure one.  Then any solution ${\tilde f}$ to ${\mathbb T}  f =  f$ is necessarily constant with  $Q$ measure one.

**Proof.** Consider an eigenfunction ${\tilde f}$ associated with a unit eigenvalue.  The function $f = \phi \circ {\tilde f}$ necessarily satisfies: 

$$
{\mathbb M}f = f
$$ 

for any $\phi$ of the form [1](#newjunk1).  If such an $f$ also satisfies $\int f(x) Q(dx) > 0$, then $f(x)=1$
  with $Q$ probability one.  Since this holds for any Borel set ${\mathfrak b}$ in ${\mathbb R}$, ${\tilde f}$ must be constant with $Q$ probability one.

Proposition 3.4.2 supplies a sufficient condition for ergodicity. A more restrictive  sufficient condition is that there exists an integer $m \geq 1$ such that 

$$
{\mathbb T}^{m} f(x) > 0
$$

 for any $f \ge 0$ such that $\int f(x) Q(dx) > 0$
on a set with $Q$ measure one.


**Remark 3.4.3** The sufficient conditions imposed in **Proposition 3.4.2** imply a property called *irreducibility* relative to the probability measure $Q$. While this proposition presumes that $Q$ is a stationary distribution, *irreducibility* allows for a more general specification of $Q$.

**Proposition 3.4.2** provides a way to verify ergodicity. As discussed in Chapter [3](#chap:process), ergodicity is a property of a statistical model. As statisticians or econometricians, we often entertain a set of Markov models, each of which is ergodic. For each model, we can build a probability $Pr$ using the canonical construction given at the outset of Chapter [3](#chap:process). Convex combinations of these probabilities are measure-preserving but not necessarily ergodic when used in conjunction with the shift transformation $\mathbb{S}$. We can take the ergodic Markov models to be the building blocks for a specification to be used in a statistical investigation. There can be a finite number of these building blocks or even a continuum of them represented in terms of an unknown parameter vector.

<!-- <a id = "sec:limitapprox"></a>
**Definition 3.8.1** The process $\left\{ X_t \right\}$ is said to be irreducible with respect to ${\widetilde Q}$ if for any $f \ge 0$ such that $\int f(x) {\widetilde Q}(dx) > 0$, ${\mathbb M} f(x) > 0$ for all $x in {\mathcal X}$ with ${\widetilde Q}$ measure one. -->

<!-- <a id = "prop:ergod100"></a>
**Proposition 3.4.2** When ${\widetilde{Q}}$ is a stationary distribution and $\left\{ X_t \right\}$ is irreducible with respect to $\widetilde{Q}$, the process is necessarily ergodic.

**Proof.** By imitating the proof of Proposition [3.4.2](#prop:ergod100), we can establish that irreducibility rules out bounded eigenfunctions that are not constant with ${\widetilde{Q}}$ measure one. -->

<a id = "lem:uniteigen"></a>
## 3.5 Periodicity

Next, we study a notion of periodicity of a stationary and ergodic Markov process. To define periodicity of a Markov process, for a given positive integer $p$ we construct a new Markov process by sampling an original process every $p$ time periods. This is called 'skip-sampling' at sampling interval $p$. With a view toward applying **Proposition 3.3.2** to $\mathbb{T}^p$, solve

$$
\mathbb{T}^p f = f \tag{perioddef}
$$

for a function $\tilde{f}$. We know from **Proposition 3.3.2** that for a $\tilde{f}$ that solves $\mathbb{T}^p f = f$, $\{ \tilde{f}(X_t) : t=0, p, 2p, \ldots \}$ is invariant and so is $\{ \tilde{f}(X_t) : t=1,p+1,2p+1,\ldots\}$. The process $\tilde{f}(X_t)$ is periodic with period $p$ or $np$ for any positive integer $n$.

<a id = "result:pinterval"></a>
**Result 3.5.1** Consider a counterpart of the resolvent operator $\mathbb{M}$ constructed by sampling at interval given by positive integer $p$:

$$
\mathbb{M}_p f(x) = (1 - \lambda) \sum_{j=0}^\infty \lambda^{j} \mathbb{T}^{pj} f. \tag{eqn:Mpdef}
$$ 

Provided that $\mathbb{M}_p f(x) > 0$ with $Q$ measure one and all $p \ge 0 for any $f \ge 0$ such that $\int f(x) Q(dx) > 0$, the Markov process is aperiodic.

## 3.6 Finite-State Markov Chains 

Suppose that $\mathcal{X}$ consists of $n$ possible states. We can label these states in a variety of ways, but for now we suppose that state $x_j$ is the coordinate vector consisting entirely of zeros except in position $j$, where there is a one.
Let $\mathbb{P}$ be an $n$ by $n$ transition matrix, where entry $i,j$ is the probability of moving from state $i$ to state $j$ in a single period. Thus, the entries of $\mathbb{P}$ are all nonnegative and 

$$
\mathbb{P} \textbf{1}_n = \textbf{1}_n,
$$

where $\textbf{1}_n$ is an $n$-dimensional vector of ones.

Let $\textbf{q}$ be an $n$-dimensional vector of probabilities. Stationarity requires that

$$
\textbf{q}'\mathbb{P} = \textbf{q}' \tag{1},
$$

where $\textbf{q}$ is a row eigenvector (also called a left eigenvector) of $\mathbb{P}$ associated with a unit eigenvalue.

We use a vector $\textbf{f}$ to represent a function from the state space to the real line. Each coordinate of $\textbf{f}$
gives the value of the function at the corresponding coordinate vector. Then the conditional expectation operator $\mathbb{T}$
can be represented in terms of the transition matrix $\mathbb{P}$:

$$
E(\textbf{f}\cdot X_{t+1}| X_t = x) =
(\mathbb{T}\textbf{f})\cdot x = x'\mathbb{P} \textbf{f}.
$$

Now consider column eigenvectors called "right eigenvectors" of P that are associated with a unit eigenvalue. 

<a id = "prop:finiteP1"></a>
**Proposition 3.6.1** Suppose that the only solutions to 

$$
{\mathbb T} {\bf f} = {\bf f}
$$

are of the form ${\bf f} \propto \textbf{1}_n$, where $\propto$ means `proportional to'.
Then we can construct a process that is stationary and ergodic by initializing the process with density ${\bf q}$ determined by equation [qstab](#qstab). 

We can weaken the sufficient condition for stationarity and ergodity to allow nonconstant right eigenvectors. This weakening is of interest when there are multiple stationary distributions.

**Proposition 3.6.2** Assume that there exists a real number $\mathbf{r}$ such that the right eigenvector $\mathbf{f}$ and a stationary distribution $\mathbf{q}$ satisfy 

$$
\min_{\mathsf{r}} \sum_{i=1}^n (\mathsf{f}_i - \mathsf{r})^2 \mathsf{q}_i = 0.
$$ 

Then the process is stationary and ergotic.

Notice that if $\mathsf{q}_i$ is zero, the contribution of $\mathsf{f}_i$ to the least squares objective can be neglected. This allows for non-constant $\mathbf{f}$'s, albeit in a limited way.

Three examples illustrate ideas in these propositions.

<a id = "ex:MC1"></a>
**Example 3.6.3** Recast Example 1.4.3 as a Markov chain with transition matrix ${\mathbb P}=\begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}$. This chain has a unique stationary distribution $q=\begin{bmatrix}.5 & .5 \end{bmatrix}'$ and the invariant functions are $\begin{bmatrix} {\sf r} & {\sf r} \end{bmatrix}'$ for any scalar ${\sf r}$. Therefore, the process initiated from the stationary distribution is ergodic. The process is periodic with period two since the matrix ${\mathbb P}^2$ is an identity matrix and all two-dimensional vectors are eigenvectors associated with a unit eigenvalue.

<a id = "ex:MC2"></a>
**Example 3.6.1** Recast Example 1.4.4 as a Markov chain with transition matrix ${\mathbb P}=\begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}$.  This chain has a continuum of stationary distributions $\pi \begin{bmatrix}1 \\ 0 \end{bmatrix}+ (1- \pi )\begin{bmatrix}0 \\ 1 \end{bmatrix}$ for any $\pi in [0,1]$ and invariant functions $\begin{bmatrix} {\sf r}_1 \\ {\sf r}_2 \end{bmatrix}$ for any scalars ${\sf r}_1, {\sf r}_2$.  Therefore, when $\pi \in (0,1)$ the process is not ergodic because if ${\sf r}_1 \ne {\sf r}_2$ the resulting invariant function fails to be constant across states that have positive probability under the stationary distribution associated with $\pi \in (0,1)$. When $\pi \in (0,1)$,  nature chooses state $i=1$ or $i=2$ with probabilities $\pi, 1-\pi$, respectively, at time $0$.   Thereafter, the chain remains stuck in the realized time $0$ state. Its failure ever to visit the unrealized state prevents the sample average from converging to the population mean of  an arbitrary  function of the state.

**Example 3.6.3** <a href="#ex:MC3"></a><a id="ex:MC3"></a>

A Markov chain with transition matrix
${\mathbb P}=\begin{bmatrix}.8 & .2 & 0  \cr .1  & .9 & 0 \cr
               0 & 0 & 1\end{bmatrix}$ has a continuum of
stationary distributions
$ \pi \begin{bmatrix} {1\over 3} & {2 \over 3} & 0 \end{bmatrix}'
+(1- \pi) \begin{bmatrix} 0 & 0 & 1 \end{bmatrix}' $ for $\pi \in [0,1]$ and
invariant functions
$ \begin{bmatrix} {\sf r}_1  &  {\sf r}_1 & {\sf r}_2 \end{bmatrix}'$
for any scalars ${\sf r}_1, {\sf r}_2$. Under any stationary distribution associated with $\pi \in (0,1)$,
the chain is not ergotic because some invariant functions are not constant with probability one. But under stationary distributions associated with $\pi =1$ or $\pi=0$, the
chain is ergotic.


## 3.7 Limited Dependence
Recall the conditional expectations operator $\mathbb{T}$ defined in equation [\eqref{eqn:Toperatordef}](#eqn:Toperatordef)
for a space $\mathcal{L}^2$ of functions $f$ of a Markov process with
transition probability $P$ and stationary distribution $Q$ and for which $f(X_t)$ has a finite second moment under $Q$: 

$$
\mathbb{T} f (x) = E \left[ f(X_{t+1}) \mid X_t = x \right] = \int_{\{x^* \in \mathcal{X}\}} f(x^*) P(d x^*|x) .
$$

We suppose that under the stationary distribution $Q$, the process is [3.4](#sec:MarkErgodic).

Because it is often useful to work with random variables that have been `centered' by subtracting
out their means, we define the following subspace of $\mathcal{L}^2$: 

$$
\mathcal{N} = \left\{ f in 3.3 : \int f(x) Q(dx) = 0 \right\}. \tag{def:N}
$$

We use the same norm
$\| f \| = \left[ \int f(x)^2 Q(dx)\right]^{1/2}$ on both $\mathcal L^2$ and $\mathcal{N}$ too.

**Definition 3.7.1** The conditional expectation operator $\mathbb{T}$ is said to be a *strong contraction* on $\mathcal{N}$ if there exists $0 < \rho < 1$ such that

$$
\| \mathbb{T} f \| \le \rho \| f \|
$$

for all $f \in \mathcal{N}$.

When $\mathbb{T}^m$ is a strong contraction for some positive integer $m$ and some $\rho \in (0,1)$, the Markov process is said to be $\rho$-mixing conditioned on the invariant events.  

<a id = "eqn:Toperatordef"></a>
**Remark 3.7.2** ${\mathbb T}$ being a strong contraction on ${\mathcal N}$ limits intertemporal dependence of the Markov process $\{X_t\}$.

Let ${\mathbb I}$ be the identity operator. When the conditional expectation operator ${\mathbb T}$ is a strong contraction, the operator $({\mathbb I} - {\mathbb T})^{-1}$ is well defined, bounded on ${\mathcal N}$, and equal to the geometric sum:[^8] 

[^8}: The geometric series after the first equality sign is well defined under the weaker restriction that ${\mathbb T}^m$ is a strong contraction for some integer $m\geq 1$.

$$
\left({\mathbb I} - {\mathbb T}\right)^{-1} f(x) = \sum_{j=0}^\infty {\mathbb T}^j f(x) = \sum_{j=0}^\infty E \left[ f(X_{t+j}) \vert X_t = x \right].
$$

<a id = "qstab"></a>
**Example 3.7.3** Consider the Markov chain setting of section 3.6 with a transition matrix $\mathbb{P}$. A stationary density $\mathbf{q}$ is a nonnegative vector that satisfies

$$
\mathbf{q}' \mathbb{P} = \mathbf{q}'
$$

and $\mathbf{q} \cdot \textbf{1}_n = 1 $. If the only column eigenvector of $\mathbb{T}$ associated with a unit eigenvalue is constant over states $i$ for which $\mathsf{q}_i > 0$, then the process is ergodic. If in addition the only eigenvector of $\mathbb{P}$ that is associated with an eigenvalue that has a unit norm (the unit eigenvalue might be complex) is constant over states $i$ for which $\mathsf{q}_i > 0$, then $\mathbb{T}^m$ is a strong contraction for some integer $m \geq 1$.[^9] 

[^9]: This follows from Gelfand's Theorem, which asserts the following. Let $\mathcal{N}$ be the $n-1$ dimensional space of vectors that are orthogonal to $\mathbf{q}$. $\mathbb{T}$ maps $\mathcal{N}$ into itself. The spectral radius of $\mathbb{T}$ restricted to $\mathcal{N}$ is the maximum of the absolute values of the eigenvalues. Gelfand's Theorem asserts that the spectral radius governs the behavior as $m$ gets large of the decay factor of the $\mathbb{T}$ transformation applied $m$ times. Provided that the spectral radius is less than one, the strong contraction property prevails for any $\rho < 1$ that is larger than the spectral radius.

This implies that the process is ergodic. It also rules out the presence of periodic components that can be forecast perfectly.

<a id = "sec:limitapprox"></a>
## 3.8 Limits of Multi-Period Forecasts

When a Markov process is aperiodic, there are interesting situations in which 

$$
\lim_{j \rightarrow \infty} {\mathbb T}^j f(x) =  {\sf r} 
$$

for some ${\sf r} \in {\mathbb R}$, where convergence is either pointwise in $x$ or in the ${\mathcal  L}^2$ norm. Limit asserts that long-run forecasts do not depend on the current Markov state. Meyn and Tweedie (1993) provide a comprehensive treatment of such convergence. Let $Q$ be a stationary distribution. Then it is necessarily true that 

$$
\int {\mathbb T}^j f(x) Q(dx)  = \int f(x) Q(dx)
$$

for all $j$. Thus,

$$
{\sf r} = \int f(x) Q (dx),
$$

so that the limiting forecast is necessarily the mathematical expectation of $f(x)$ under a stationary distribution. Here we have assumed that the limit point is a number and not a random variable; we have not assumed that the stationary distribution is unique.

Notice that if is satisfied, then any function $f$ that satisfies 

$$
{\mathbb T} f = f
$$ 

is necessarily constant with probability one. Also, if $\int f(x) Q(dx) = 0$ and convergence is sufficiently fast, then

$$
\lim_{N \rightarrow \infty} \sum_{j=0}^N {\mathbb T}^j f(x) 
$$

is a well-defined function of the Markov state. We shall construct the limit when we extract martingales from additive functionals in chapter.

A set of sufficient conditions for the convergence outcome

$$
\lim_{j \rightarrow \infty} {\mathbb T}^j f (x^*) \rightarrow \int f(x) Q(dx) 
$$

for each $x^* \in {\mathcal X}$ and each bounded $f$ is:

**Condition 3.8.1** A Markov process with stationary distribution $Q$ satisfies:

(i) For any $f \ge 0$ such that $\int f(x) Q(dx) > 0$, ${\mathbb M}_p f(x) > 0$ for all $x in {\mathcal X}$ with $Q$ measure one and all positive integers $p \ge 0$, where the operator ${\mathbb M}_p$ is defined in.

(ii) ${\mathbb T}$ maps bounded continuous functions into bounded continuous functions, i.e., the Markov process is said to satisfy the Feller property.

(iii) The support of $Q$ has a nonempty interior in ${\mathcal X}$.

(iv) ${\mathbb T} V(x) - V(x) \le -1$ outside a compact subset of ${\mathcal X}$ for some nonnegative function $V$.

We encountered condition (i) in our section discussion of Markov processes that are ergodic and aperiodic. Condition (iv) is a  drift condition for stability that requires that we find a function $V$ that satisfies the requisite inequality. Heuristically, the drift condition says that outside a compact subset of the state space, application of the conditional expectation operator pushes the function inward. The choice of $-1$ as a comparison point is made only for convenience, since we can always multiply the function $V$ by a number greater than one. Thus, $-1$ could be replaced by any strictly negative number. In section 3.9, we will apply condition to verify ergodicity of a vector autoregression.

<a id = "sec:VAR44"></a>
## 3.9 Vector Autoregressions

A square matrix $\mathbb{A}$ is said to be *stable* when all of its eigenvalues have absolute values that are strictly less than one. For a stable $\mathbb{A}$, suppose that 

$$
X_{t+1} = \mathbb{A} X_t + \mathbb{B} W_{t+1},
$$ 

where $\{ W_{t+1} : t = 1,2,... \}$ is an i.i.d. sequence of multivariate normally distributed random vectors with mean vector zero and covariance matrix $I$ and that $X_0 \sim \mathcal{N}(\mu_0, \Sigma_0)$. This specification constitutes a first-order *vector autoregression*.

Let $\mu_t = E X_t$. Notice that 

$$
\mu_{t+1} = \mathbb{A} \mu_t.
$$ 

The mean $\mu$ of a stationary distribution satisfies

$$
\mu = \mathbb{A} \mu.
$$

Because we have assumed that $\mathbb{A}$ is a stable matrix, $\mu =0$ is the only solution of $(\mathbb{A} - 
\mathbb{I}) \mu =0$, so
 the mean of the stationary distribution is $\mu =0$.

Let $\Sigma_{t} = E(X_t - \mu_t) (X_t - \mu_t)'$ be the covariance matrix of $X_t$. Then

$$
\Sigma_{t+1} = \mathbb{A} \Sigma_t \mathbb{A}' + \mathbb{B}\mathbb{B}'.
$$

For $\Sigma_t = \Sigma$ to be invariant over time, it must satisfy the discrete Lyapunov equation

$$\tag{eq:Sylvester}
\Sigma = \mathbb{A} \Sigma \mathbb{A}' + \mathbb{B}\mathbb{B}'.
$$

When $\mathbb{A}$ is a stable matrix, this equation has a unique solution for a positive semidefinite matrix $\Sigma$.

**Theorem 3.9.1** Given a stable matrix ${\mathbb A}$, the equation $\Sigma = {\mathbb A} \Sigma {\mathbb A}' + {\mathbb B}{\mathbb B}'$ has a unique solution for a positive semidefinite matrix $\Sigma$.

Suppose that $\Sigma_0 = 0$ (a matrix of zeros) and for $t \geq 1$ define the matrix

$$
\Sigma_t = \sum_{j=0}^{t-1} \mathbb{A}^j \mathbb{B}\mathbb{B}'(\mathbb{A}^j)'.
$$

The limit of the sequence $\{\Sigma_t\}_{t=0}^\infty$ is

$$
\Sigma = \sum_{j=0}^{\infty} \mathbb{A}^j \mathbb{B}\mathbb{B}'(\mathbb{A}^j)',
$$

which can be verified to satisfy Lyapunov equation [eqref{eq:Sylvester}](#eq:Sylvester). Thus, $\Sigma$ equals the covariance matrix of the stationary distribution. Similarly, for all $\mu_0 = E X_0$

$$
\mu_t = \mathbb{A}^t \mu_0,
$$

converges to zero, the mean of the stationary distribution.

The linear structure implies that the stationary distribution is Gaussian with mean $\mu$ and covariance matrix $\Sigma$. To verify ergodicity, we suppose that the covariance matrix $\Sigma$ of the stationary distribution has full rank and verify conditions [\ref{cond:suffice}](#cond:suffice). Restriction [\ref{condsupport}](#condsupport) of Condition [\ref{cond:suffice}](#cond:suffice) is satisfied. Furthermore, $\Sigma_t$ has full rank for some $t$, which guarantees that the process is irreducible and aperiodic so that restriction [\ref{condi}](#condi) is satisfied. As a candidate for $V(x)$ in condition [\ref{drift}](#drift), take $V(x) = |x|^2$. Then

$$
\mathbb{T} V(x) = x'\mathbb{A}'\mathbb{A} x + \text{trace}(\mathbb{B}'\mathbb{B})
$$

so

$$
\mathbb{T} V(x) - V(x) = x'(\mathbb{A}'\mathbb{A} - \mathbb{I})x + \text{trace}(\mathbb{B}'\mathbb{B}).
$$

That $\mathbb{A}$ is a stable matrix implies that $\mathbb{A}'\mathbb{A} - \mathbb{I}$ is negative definite, so that drift restriction[\ref{drift}](#drift) of Condition [\ref{cond:suffice}](#cond:suffice) is satisfied for $|x|$ sufficiently large.


We can extend this example to allow the mean of the stationary distribution not to be zero. Partition the Markov state as

$$
x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}
$$

where $x^{2}$ is a scalar. Similarly, partition the matrices $\mathbb{A}$ and $\mathbb{B}$ as

$$
\begin{align*}
\mathbb{A} & = \begin{bmatrix} \mathbb{A}_{11} & \mathbb{A}_{12} \\ 0 & 1 \end{bmatrix}  \\
\mathbb{B} & = \begin{bmatrix} \mathbb{B}_1 \\ 0 \end{bmatrix}
\end{align*}
$$

where $A_{11}$ is a stable matrix. Notice that the dynamics imply 

$$
X_{t+1}^{2} = X_{t}^{2} = \cdots = X_0^{2}
$$

and hence is invariant. Let $\mu_{2}$ denote the mean of $X_t^{2}$ for any $t$. For a stationary distribution we require that the mean $\mu_{1}$ of $X_t^{1}$ satisfy

$$
\mu_1 = \mathbb{A}_{11} \mu_1 + \mathbb{A}_{12} \mu_2.
$$

Hence

$$
\mu_1 = \left(I - \mathbb{A}_{11} \right)^{-1} \mathbb{A}_{12} \mu_2.
$$

Imitating our earlier argument, the covariance matrix, $\Sigma_{11}$ of $X_t^1$ satisfies

$$
\Sigma_{11} = \sum_{j=0}^{\infty} \left(\mathbb{A}_{11}\right)^j \mathbb{B}_1(\mathbb{B}_1)'\left(\mathbb{A}_{11}'\right)^j
+ \left(\mathbb{I} - \mathbb{A}_{11}\right)^{-1} \mathbb{A}_{12} \Sigma_{22} \mathbb{A}_{12}'\left(\mathbb{I} - \mathbb{A}_{11}'\right)^{-1}
$$

where $\Sigma_{22}$ is the variance of $X_t^{2}$ for all $t$. Stationarity imposes no restriction on the mean $\mu_{2}$ and variance $\Sigma_{22}$.

Since $\{X_t^{2} : t\ge0\}$ is invariant, the process $\{X_t : t\ge 0\}$ is ergodic only when the variance $\Sigma_{22}$ is zero. When $\{X_t : t \ge 0 \}$ is not ergodic, the limit points in the Law of Large Numbers should be computed by conditioning on $X_0^{2}$.

<!-- <a id = "sec:VAR44"></a>
# 3.9 Vector Autoregressions

Let $Y_{t+1}$ be one of the entries of $X_{t+1}$, and consider the regression equation:
$$
Y_{t+1} = \beta \cdot X_{t} + U_{t+1},
$$
where $U_{t+1}$ is a least squares residual. To express uncertainty about $\beta$ in the spirit of chapter [3 Markov Processes](#chap:process), we allow it to be random. Letting $\mathfrak{J}$ be the set of invariant events, we presume that the *random vector* $\beta$ is measurable with respect to $\mathfrak{J}$, meaning that it is revealed by events in $\mathfrak{J}$. The first-order condition for minimizing the expected value of $U_{t+1}^2$ requires that the regression residual $U_{t+1}$ be orthogonal to $X_t$:
$$
E\left( X_t U_{t+1} \vert \mathfrak{J} \right) = 0.
$$
Then
$$
E\left( X_{t}Y_{t+1} \vert \mathfrak{J} \right) = E\left[ X_{t} (X_{t})' \vert \mathfrak{J} \right] \beta,
\tag{1}<a id = "eq:LSorth101"></a>
$$
which uniquely pins down the regression coefficient $\beta$ provided that the matrix $E\left[ X_t (X_t)' \vert \mathfrak{J} \right]$ is nonsingular with probability one. Notice that
$$
\begin{align*}
\frac{1}{N}\sum_{t=1}^N X_{t} Y_{t+1} & \rightarrow E\left( X_{t} Y_{t+1} \vert \mathfrak{J} \right)  \\
\frac{1}{N}\sum_{t=1}^N X_{t} (X_{t})' & \rightarrow E\left( X_{t} (X_{t})'  \vert \mathfrak{J} \right),
\end{align*}
$$
where convergence is with probability one. Thus, from equation [3.11](#eq:LSorth101) it follows that a consistent estimator of $\beta$ is a $b_N$ that satisfies
$$
\frac{1}{N}\sum_{t=1}^N X_{t} Y_{t+1} = \frac{1}{N}\sum_{t=1}^N X_{t} (X_{t})' b_N.
$$
Solving for $b_N$ gives the familiar least squares formula:
$$
b_N = \left[\sum_{t=1}^N X_{t} (X_{t})' \right]^{-1} \sum_{t=1}^N X_{t} Y_{t+1}.
$$
Note how statements about the consistency of $b_N$ are conditioned on $\mathfrak{J}$.

*Tom XXXX: maybe expand above example to have a two $\omega \in \Omega$ example for Dongchen and John.* -->

<a id = "sec:VAR_inf_past"></a>
## 3.10 Inventing a Past Again

In section [1.10](#sec:VAR_inf_past), we invented an infinite past for a stochastic process. Here we invent an infinite past for a vector autoregression in a way that is equivalent to drawing an initial condition $X_0$ at time $t=0$ from the stationary distribution $\mathcal{N}(0, \Sigma_\infty)$, where $\Sigma_\infty$ solves the discrete Lyapunov equation [1](#eq:Sylvester), namely, 

$$\Sigma_\infty = \mathbb{A} \Sigma_\infty \mathbb{A}' + \mathbb{B} \mathbb{B}' $$

Thus, consider the vector autoregression

$$ X_{t+1} = \mathbb{A} X_t + \mathbb{B} W_{t+1}  $$

where $\mathbb{A}$ is a stable matrix, $\{W_{t+1}\}_{t=-\infty}^\infty$ is now a two-sided infinite sequence of i.i.d. $\mathcal{N}(0,I)$ random vectors, and $t$ is an integer. We can solve this difference equation backwards to get the moving average representation

$$ X_{t} = \sum_{j=0}^\infty \mathbb{A}^j \mathbb{B} W_{t -j} .  $$

Then

$$
E\left[X_t (X_t)' \right] = \sum_{j=0}^\infty \mathbb{A}^j \mathbb{B} \mathbb{B}' (\mathbb{A}^j)' = \Sigma_\infty
$$

where $\Sigma_\infty$ is also the unique positive semidefinite matrix that solves $\Sigma_\infty = \mathbb{A} \Sigma_\infty \mathbb{A}' + \mathbb{B} \mathbb{B}'$.

<!-- <a id = "sec:VAR44"></a>
# Local Approximation of Nonlinear Markov Processes

We utilize the small noise expansion method described by Lombardo and Uhlig (2018). This technique recursively generates linear difference equations at each order of approximation. We present the steps leading to a second-order approximation for a scalar Markov process $\{X_t\}$.

We start with a multivariate nonlinear Markov process:
$$
X_{t+1} = \psi(X_t, W_{t+1})
$$
where $\{ W_{t+1} : t \ge 0\}$ is an i.i.d. multivariate sequence of shocks. To set the stage for constructing our approximations to the Markov process, we first create a *family* of processes in which the process is embedded. The family is indexed by a scalar parameter $\mathsf{q}$ and created via the following equation
$$\begin{equation} \tag{embed}
X_{t+1} = \psi(X_t, \mathsf{q} W_{t+1}).    
\end{equation}$$
A process $\{ X_t : t\ge 0\}$ defined in this way depends on $\mathsf{q}$, so as we vary $\mathsf{q}$ we sweep out a family of Markov processes. We shall compute an approximation to the original $\{X_t\}$ process that takes the form
$$
X_t \approx X_t^0 + \mathsf{q} X_t^1 + \frac {(\mathsf{q})^2} 2 X_t^2
$$
for $t \ge 0$. Here for each $j=0,1,2$, $\{X_t^j : t \ge 0\}$ is itself a stationary process. Our calculations are recursive in the sense that first we construct an autonomous invariant process $\{X_t^0\}$. Next, we construct a process $\{X_t^1\}$ that depends on $\{X_t^0\}$ and the shock process $\{W_{t+1}\}$ and takes the form of a vector autoregression; finally, we construct process $\{X_t^2\}$ as a solution to a nonlinear difference equation with forcing functions that depend on the $\{X_t^1\}$ process and the shock process $\{W_{t+1}\}$.

## 3.1 Constituents

We begin with an order zero approximation obtained by setting $\mathsf{q} = 0$ and studying:
$$
X_{t+1}^0 = \psi(X_t^0 , 0).  
$$
We look for an invariant solution $X_t^0 = \overline{x}$ for $t \ge 0$ where
$$
\overline{x} = \psi\left(\overline{x}, 0\right) 
$$
Moreover, suppose that this solution is "locally" stable in the sense that the matrix
$$
\psi_x = \frac {\partial \psi}{\partial x} \left( \overline{x},0 \right) 
$$
is stable. In applications to be studied below, we shall encounter some situations in which the invariant point $\overline{x}$ is unique, and others in which it is not. Both entail interesting economics.

## **Theorem 3.9.1** 

We obtain a recursive representation for $\{ X_t^1 : t\ge 0\}$ by differentiating $\eqref{embed}$ and applying the chain rule
$$
\frac {d X_{t+1}}{d \mathsf{q}} = \frac {\partial \psi}{\partial x} \frac {d X_{t}}{d \mathsf{q}} + \frac {\partial \psi}{\partial w} W_{t+1}.
$$
Evaluating terms at $\mathsf{q} = 0$, gives
$$
X_{t+1}^1 = \psi_x X_{t}^1 + \psi_w W_{t+1}
$$
where
$$
\psi_x = \frac {\partial \psi}{\partial x} \left( \overline{x},0 \right)
$$
and similarly for $\psi_w$. Since $\psi_x$ is a stable matrix, the first-order approximation is a stationary vector autoregression.

## Order two approximation

For applications including asset pricing, it is important to move to a higher than first-order approximation, for example, to capture effects of risk aversion. For a multivariate process $\{X_t\}$, extensive notation and bookkeeping is required for a second-order approximation. To simplify the presentation and the associated notation, we consider the case of a single state variable and a single shock each time period. The argument for the case with a state vector and shock vector is conceptually similar, just notationally less pleasant. To deduce a recursive representation, first write
$$
\frac {d^2 X_{t+1}}{d \mathsf{q}^2} = \frac {\partial \psi}{\partial x} \frac {d^2 X_{t}}{d^2 \mathsf{q}} + \frac {\partial^2 \psi}{\partial x^2} \left(\frac {d X_{t}}{d \mathsf{q}}\right)^2 + 2 \frac {\partial^2 \psi}{\partial x \partial w} \frac {d X_{t}}{d \mathsf{q}} W_{t+1} + \frac {\partial^2 \psi}{\partial w^2} \left( W_{t+1} \right)^2
$$
Evaluating the derivatives at $\mathsf{q} = 0$ gives
$$
X_{t+1}^2 = \psi_x X_t^2 + \psi_{xx} \left( X_{t}^1 \right)^2 + 2 \psi_{xw} \left(X_{t}^1W_{t+1}\right) + \psi_{ww} \left( W_{t+1} \right)^2
$$
where the subscripts on the $\psi$'s tell us which variables have been differentiated with respect to the derivatives evaluated at $\left( \overline{x},0 \right)$. Thus, the second-order approximation has a recursive representation that is quadratic in $\left( X_t^1, W_{t+1}\right)$. Since $\psi_x$ is stable (in this scalar case $|\psi_x|<1$), the second-order dynamics are stable and $\{X_t^2 : t \ge 0\}$ has a stationary solution. -->

Reference: 

[]
