# Chapter 2 Operators and Fixed Points Reading Note

## Three subsections and seven subsubsections

### Stability

- conjugate maps

- convergence rates

- Newton's method

### Order

- Partial orders

- Order-preserving maps

- Fixed points and order

### Matrices and Operators

- Linear operators

## Definition

**Definition (Dynamical system)**

A **dynamical system** is a pair $(U,T)$ where

- $U$: state space is a subset of $\mathbb{R}^n$

- $T$ is a self-map on $U$. 

**Definition (Conjugate)**

Dynamical systems $(U,T)$ and $(\hat U, \hat T)$ are called conjugated under $\Phi$ if 

1. $\Phi: U\to\hat U$ a bijection
2. $T = \Phi^{-1} \circ \hat T\circ \Phi$ on $U$

**Definition (Homeomorphism)**

$\Phi:U\mapsto \hat U$ is called a **homeomorphism** if it is

- continuous
- bijective
- has continuous inverse

**Definition (Topologically conjugate)**

The dynamical system $(U,T)$ and $(\hat U,\hat T)$ are **topologically conjugate** under $\Phi$ if 

- $(U,T)$ and $(\hat U,\hat T)$ are conjugate under $\Phi$

- $\Phi$ is a homeomorphism

**Definition (locally stable)**

Let $U$ be a subset of $\mathbb{R}^n$  and let $T$ be a self-map on $U$. 

A fixed point $u^*$ of $T$ in $U$ is called **locally stable** for the dynamical system $(U,T)$ if there exists an open set $O\subset U$ such that $u^*\in O$ and $T^k u\to u^*$ as $k\to\infty$ for every $u\in O$.

In other words, the domain of attraction for $u^*$ contains an open neigborhood of $u^*$.

**Definition (at rate at least $q$)**

Let $e_k := \|u_k-u^*\|$ for all $k\in\mathbb{N}$.

We say that $(u_k)_{k\ge 0}$ converges to $u^*$ **at rate at least $q$** if $q\ge 1$ and for some $\beta \in (0,\infty)$, $N\in\mathbb{N}$, we have

$$
e_{k+1}\le \beta e_k^q,\,\,\forall k\ge N
$$

**Definition (at rate $q$)**

We say that convergence occurs **at rate $q$** if, in addition, 

$$
\limsup_{k\to\infty} \frac{e_{k+1}}{e_k^q} = \beta
$$

-  If $q=2$, then we say that convergence is (at least) **quadratic**
-  If $q=1$ and $\beta <1$, then we say that convergence is (at least) **linear**

**Definition (Worst case complexity)**

Measures the number of fundamental operations such as addition and multiplication of floating point numbers.

**Definition (partial order)**

A **partial order** on a nonempty set $P$ is a relation $\precsim$ on $P\times P$ that, for any $p,q,r,\in P$ satisfies

- Reflexivity: $p\precsim p$
- Anti-symmetry: $p\precsim q, q\precsim p \implies p=q$
- Transitivity: $p\precsim q, q\precsim r\implies p\precsim r$

**Definition (partially ordered set)**

The pair $(P,\precsim)$ is called a partially ordered set. 

**Definition (pointwise order)**

Fix an arbitrary nonemptyset $X$. The **pointwise order** $\le$ on the set $\mathbb{R}^X$ of all functions from $X$ to $\mathbb{R}$ is defined as follows,

given $u,v$ in $\mathbb{R}^X$, set $u\le v$ if $u(x)\le v(x)$ for all $x\in X$.


**Definition (total order)**

A partial order $\precsim$ on $P$ is called **total** if for all $p,q\in P$, either $p\precsim q$ or $q\precsim p$

**Definition (greatest element)**

Given a partially ordered set $(P,\precsim)$ and $A\subset P$, we say that $g\in P$ is a **greatest element** of $A$ if $g\in A$ and

$$
a\in A\implies a\precsim g
$$

**Definition (least element)**

Given a partially ordered set $(P,\precsim)$ and $A\subset P$, we say that $l\in P$ is a **least element** of $A$ if $l\in A$ and

$$
a\in A\implies l\precsim a
$$

**Definition (maximum)**

If $A$ is totally ordered, then a greatest element $g$ of $A$ is called a maximum of $A$.

**Definition (minimum)**

If $A$ is totally ordered, then a least element $l$ of $A$ is called a minimum of $A$.

**Definition (upper bound)**

Given a partially ordered set $(P,\precsim)$ and a nonempty subset $A$ of $P$, we call $u\in P$ an **upper bound** of $A$ if $a\precsim u$ for all $a\in A$.

**Definition (supremum)**

Letting $U_p(A)$ be the set of all upper bounds of $A$ in $P$, we call $\bar u\in P$ a **supremum** of $A$ if

$$
\bar u \in U_p(A), \bar u\precsim u\,\,\forall u\in U_p(A)
$$

Thus, $\bar u$ is the least element of the set of upper bounds $U_p(A)$, whenever it exists.

The supremum of $A$ is typically denoted as $\bigvee A$.

**Definition (lower bound)**

We call $\ell\in P$  a **lower bound** of $A$ if $a\succsim\ell$ for all $a\in A$.

**Definition (infimum)**

An element $\bar\ell \in P$ is called **infimum** of $A$ if $\ell$ is a lower bound of $A$ and $\bar\ell\succsim \ell$ for every lower bound $\ell$ of $A$.

We write $\bar \ell = \bigwedge A$ to denote the infimum.

**Definition(sublattice)**

A subset $V$ of $\mathbb{R}^X$ is called a **sublattice** of $\mathbb{R}^X$ if

$$
u,v\in V\implies u\vee v\in v, u\wedge v\in V
$$

In other words, $V$ is closed under pairwise supremum and infimum.

**Definition (order interval)**

Given a partially ordered set $(P,\precsim)$, and $a,b\in P$, the **order interval** $[a,b]$ is defined as all $p\in P$ such that $a\precsim p\precsim b$.

If $a\not\precsim b$, then $[a,b] = \emptyset$.



**Definition (upper envelope)**

Take $\{T_\sigma\}:= \{T_\sigma\}_{\sigma\in\Sigma}$ to be a finite family of self-maps on a sublattice $V\subset \mathbb{R}^X$. 

Define, 

$$
Tv = \bigvee_{\sigma\in\Sigma} T_\sigma \,\,\,(v\in V)
$$

From the sublattice property, $T$ is a self-map on $V$. 

$T$ is called the **upper envelope** of the functions $\{T_\sigma\}$.

**Definition (order-preserving)**

Given two partially ordered sets $(P,\precsim)$ and $(Q, \trianglelefteq)$, a map $T$ from $P$ to $Q$ is called **order-preserving** if given $p,p'\in P$, we have,

$$
p\precsim p' \implies Tp\trianglelefteq Tp'
$$

**Definition (order-reversing)**

$T$ is called **order-reversing** if,

$$
p\precsim p' \implies Tp'\trianglelefteq Tp
$$

**Definition (increasing)**

Given two partially order sets $(P,\precsim)$, $(\mathbb{R},\le)$, we call $h\in\mathbb{R}^P$ **increasing** if 

$$
p\precsim p'\implies h(p)\le h(p')
$$

**We use the symbol $i\mathbb{R}^P$ for the set of increasing functions in $\mathbb{R}^P$**.

**Definition (decreasing)**

Given two partially order sets $(P,\precsim)$, $(\mathbb{R},\le)$, we call $h\in\mathbb{R}^P$ **decreasing** if 

$$
p\precsim p'\implies h(p)\ge h(p')
$$

**Definition (Strictly increasing)**

If $h:P\mapsto Q$ and $P,Q \subset \mathbb{R}$, then we will call $h$ **strictly increasing** if $x<y\implies h(x)<h(y)$.

**Definition (Strictly decreasing)**

If $h:P\mapsto Q$ and $P,Q \subset \mathbb{R}$, then we will call $h$ **strictly decreasing** if $x<y\implies h(x)>h(y)$.

**Definition (stochastically dominates, first order stochastic dominance)**

Given finite set $X$ partially ordered by $\precsim$, and $\varphi,\psi\in\mathscr{D}(X)$, say we that $\psi$ **stochastically dominates** $\varphi$ and write $\varphi\precsim_F \psi$ if

$$
\sum_{x\in X} u(x)\varphi(x) \le \sum_{x\in X} u(x) \psi(x) 
$$

for every $u\in i\mathbb{R}^X$.

The relation $\precsim_F$ is also called **first order stochastic dominance** to differentiate it from other forms of stochastic order.

**Definition (counter CDF)**

For a given distribution $\varphi$, the function

$$
G^\varphi (y) := \sum_{x\succsim y} \varphi(x) \,\,\,\,\,\,\,(\varphi\in\mathscr{D}(X), y\in X)
$$

is called the **counter CDF** (counter cumulative distribution function) of $\varphi$. 

**Definition (dominates)**

Let $(P,\precsim)$ be a partially ordered set. Given two self-maps $S$ and $T$ on the set $P$.

We write $S\precsim T$ if $Su\precsim Tu$ for every $u\in P$, and say that $T$ **dominates** $S$ on $P$.

**Definition (nonnegative)**

We call a matrix $A$ **nonnegative** and write $A\ge 0$ if all elements of $A$ are nonnegative. 

**Definition (everywhere positive)**

We call $A$ **everywhere positive** and write $A\gg 0$ if all elements of $A$ are strictly positive.

**Definition (irreducible)**

A square matrix $A$ is called **irreducible** if $A\gg 0$ and $\sum_{k=1}^\infty A\gg 0$.

**Definition (stochastic matrix or Markov matrix)**

An $n\times n$ matrix $P$ is called a **stochastic matrix or Markov matrix** if 

$$
P\ge 0, P\mathbb{1} =\mathbb{1}
$$

where $\mathbb{1}$ is a column vector of ones, so that $P$ is nonnegative and has unit row sums.

**Definition (Stationary distribution)**

The row vector $\psi\in\mathbb{R}_+^n$ such that $\psi\mathbb{1}=1,\psi P=\psi$ is called the **stationary distribution** for $P$.

**Definition (dominant eigenvector)**

In the language of Perron-Frobenius theory, the right eigenvector $\bar x$ is called the **dominant eigenvector**, since it corresponds to the dominant(i.e., largest) eigenvalue $\rho(A)$. 

This eigenvector plays an important role in determining long run outcome.

In the Lake model, the dominant eigenvector provides with both the long-run rate of unemployment adn the stable growth path, to which all trajectories with positive initial conditions converge over time.

**Definition (Linear operator)**

A **linear operator** on $\mathbb{R}^n$ is a map $L:\mathbb{R}^n \mapsto \mathbb{R}^n$ such that,

$$
L(\alpha u+\beta v) = \alpha Lu +\beta Lv
$$

for all $u,v\in\mathbb{R}^n$, $\alpha,\beta \in\mathbb{R}$.

A **linear operator on $\mathbb{R}^X$** is a map $L:\mathbb{R}^X\mapsto \mathbb{R}^X$ such that, for all $u,v\in\mathbb{R}^X$, and $\alpha,\beta\in\mathbb{R}$, we have,

$$
L(\alpha u+\beta v) = \alpha Lu +\beta Lv
$$

In what follows,

$$
\mathscr{L}(\mathbb{R}^X):=\{L:L(\alpha u+\beta v) = \alpha Lu +\beta Lv\}
$$
be the set of all linear operators on $\mathbb{R}^X$.

Let $L$ be a function from $X\times X$ to $\mathbb{R}$. This function induces an operator $L$ from $\mathbb{R}^X$ to itself via

$$
(Lu)(x) = \sum_{x'\in X} L(x,x')u(x') \tag{$x\in X, u\in\mathbb{R}^X$}
$$

The function on the right hand side is called the **kernel** of the operator $L$. 

Since $L(x,x') = L(x_i,x_j)$ is just an $n\times n$ arrary of real numbers, when more precision is required, we call it the **matrix representation** of $L$. 

The eigenvalues and eigenvectors of the linear operator $L\in\mathscr{L}(\mathbb{R}^X)$ are defined as the eigenvalues and eigenvectors of its matrix representation.

The spectral radius $\rho(L)$ of $L$ is defined as the spectral radius of its matrix representation.

**Definition (column-major ordering)**

Let $A$ be a $j\times k$ matrix. We call the arrangement of stacking all $k$ columns vertically into one long column as **column-major ordering**.

**Definition (row-major ordering)**

Let $A$ be a $j\times k$ matrix. We call the arrangement of concatenating all $j$ rows horizontally into one long row as **row-major ordering**.

**Definition (positive cone)**

The set

$$
\mathbb{R}_+^X:=\{u\in\mathbb{R}^X: u\ge 0\}
$$

is called the **positive cone of $\mathbb{R}^X$**.

**Definition (positive operator)**

An linear operator $L\in\mathscr{L}(\mathbb{R}^X)$ is called **positive linear operator** if $L$ is invariant on the positive cone, that, if

$$
u\ge 0\implies Lu\ge 0
$$

**Definition (Markov operator)**

An operator $P\in\mathscr{L}(\mathbb{R}^X)$ is called a **Markov operator** on $\mathbb{R}^X$ if $P$ is positive and $P\mathbb{1}=\mathbb{1}$.

We let

$$
\mathscr{M}(\mathbb{R}^X):=\{P\in \mathscr{L}(\mathbb{R}^X): u\ge 0\implies Pu\ge 0, P\mathbb{1} = \mathbb{1}\}
$$

denote the set of all Markov operators on $\mathbb{R}^X$.

The matrix representation of Markov operators are Markov matrix.

$\mathscr{M}(\mathbb{R}^X)$ is closed under multiplication.

## Theorem and some key takeaway

### Key takeaway

- Linear transformations are conjugated (diffeomorphic) with their Jordan normal form though their generalized eigenbasis

- Conjugation implies fixed point in one system is the fixed point in the other system

- $\Phi: fix(T)\mapsto fix(\hat T)$ is a bijection.

- Topological conjugacy preserves convergence.

- Topological conjugacy is a equivalence relation

- Orders of convergence are studied in the neigborhood of zero, implying that higher orders are faster.

- Successive approximation typically converge at a **linear rate**.

- Successive approximation always converges when global stability holds

- Under mild conditions, there exists a neigborhood of the fixed point within which the Newton iterates converge **quadratically**

- We can accelerate computation by exploiting the problem's special structure such as differentiability, convexity, monotonicity. But we face a **tradeoff between speed and robustneess**. **More robust methods exploit less structure (less speed)**

- Successive approximate can be partially parallelized, but the algorithm is inherently serial

- Newton's method is less serial (involving less steps of iteration) but each steps is more expensive (inverting high dimension matrices). Since less serial $\implies$ more potential for parallelization. 

- The objective in dynamic programming is to maximize/minimize a lifetime value/cost function. A function over a state space. Thus, the objective takes values in a particular ordered set and we seek greatest/least elements.

- If a partially ordered set has a greatest element, it is the supremum of the set, if the supremum is in the set, it is the greatest element.

- If $V$ is a sublattice, then the supremum and infimum of any finite subset of $V$ is in $V$.

- The $i\mathbb{R}^P$ is a sublattice.

- While a one-to-one correspondence between linear operators and matrix holds in $\mathbb{R}^n$, the concept of linear operators is far more general. Linear operators can be defined over many differenet kinds of sets whose elements have vector-link properties.

**Proposition 2.1.1.**

If $(U,T)$ and $(\hat U, \hat T)$ are conjugated dynamical systems, then

- $u \in fix (T)\iff \Phi u \in fix(\hat T)$

- $\Phi^{-1} \hat u \in fix (T)\iff \hat u \in fix(\hat T)$

- $|fix(T)| = |fix(\hat T)|$

**Proposition 2.1.2.**

If $(U,T)$ and $(\hat U,\hat T)$ are topologically conjugate, then

- $T$ is globally stable on $U$ if and only if $\hat T$ is globally stable on $\hat U$

- the unique fixed points $u^*\in U$ and $\hat u^*\in\hat U$ satisfy $\hat u^* = \Phi u^*$.

**Hartman-Grobman Theorem**

If $J_T(u^*)$ is nonsingular and contains no eigenvalues on the unit circle in $\mathbb{C}$, then there exists an open neigborhood $O$ of $u^*$ such that $(O,T)$ and $(O,\hat T)$ are topologically conjugate. 

$\hat T$ is the linearization of $T$ near $u^*$, i.e.,

$$
\hat T u = u^* + J_T(u^*) (u-u^*) + \mathcal{O}((u-u^*)^2)
$$

**Corolloary of Hartman-Grobman Theorem**

Under the condition in Hartman-Grobman theorem, the fixed point $u^*$ is locally stable whenever $\rho(J_T(u^*))<1$. 

**Lemma (Inequalities and identities related to pointwise partial order on $\mathbb{R}^X$, $X$ is finite)**

For $f,g,h \in\mathbb{R}^X$, we have

1. Triangle inequality: $|f+g|\le |f| + |g|$

2. Distribution law with addition: 

$$
(f\wedge g)+h = (f+h)\wedge (g+h)
$$

$$
(f\vee g)+h = (f\vee h) + (g\vee h)
$$

3. Distribution with wedge and vee:

$$
(f\vee g)\wedge h = (f\wedge h)\vee(g\wedge h)
$$

$$
(f\wedge g)\vee h = (f\vee h)\wedge (g\vee h)
$$

4. Difference of minimum is less than the difference

$$
|f\wedge h- f\wedge g| \le |f-g|
$$

5. Difference of maximum is less than the difference

$$
|f\vee h - g\vee h| \le|f-g|
$$

If $f,g,h\in\mathbb{R}^X_+$,  we have **minimum wiht a sum is less than the sum of two minimum**

$$
(f+g)\wedge h \le (f\wedge h)+(g\wedge h)
$$

**Lemma (useful to DP)**

Let $D$ be a finite set. If $f$ and $g$ are elements of $\mathbb{R}^D$, then,

$$
|\max_{z\in D} f(z)- \max_{z\in D} g(z)|\le \max_{z\in D}|f(z)-g(z)|
$$

**Lemma (useful to DP)**

If, for each $\sigma\in \Sigma$, the operator $T_\sigma$ is a contraction of modulus $\lambda_\sigma$ under the supremum norm, then $T = \bigvee_\sigma T_\sigma$ is a contraction of modulus $\max_\sigma \lambda_\sigma$ under the same norm.

**Lemma (Order-preserving and contraction)**

Let $(U\subset\mathbb{R}^X,\le)$ be a partially ordered set, $X$ finite. 

And for all $u\in U$, $c\in\mathbb{R}$, $u+c\in U$.

If $T:U\mapsto U$ is an order preserving self-map on $U$ and there exists a constant $\beta\in(0,1)$ such that

$$
T(u+c)\le Tu + \beta c
$$

for all $u\in U$ and $c\in \mathbb{R}_+$

then, $T$ is a contraction of modulus $\beta$ on $U$ with respect to the supremum norm.

**Lemma (related to Counter CDF)**

For each $\varphi,\psi \in\mathscr{D}(X)$, the following statement hold:

$$
\varphi\precsim_F \psi \implies G^\varphi \le G^\psi
$$

and


If $X$ is totally ordered by $\precsim$, then $G^\varphi\le G^\psi\implies \varphi\precsim_F\psi$

**Lemma(Stochastic dominance is a partial order on $\mathscr{D}(X)$)**

Stochastic dominance is a partial order on $\mathscr{D}(X)$.

**Proposition (Dominance and globally stable)**

Let $S$ and $T$ be self-maps on $M\subset \mathbb{R}^n$ and let $\le$ be the pointwise order.

If $T$ dominates $S$ on $M$ and, in addition, $T$ is order-preserving and globally stable on $M$, then its unique fixed point dominates any fixed point of $S$.

### Perron-Frobenius Theorem

If $A\ge 0$, then $\rho(A)$ is an eigenvalue of $A$ with nonnegative, real-valued right and left eigenvectors.

In particular, we can find a nonnegative, nonzero column vector $e$ and a nonnegative,nonzero row vector $\varepsilon$ such that

$$
Ae = \rho(A)e, \,\,\text{and} \,\, \varepsilon A = \rho(A) \varepsilon
$$

If $A$ is irreducible, then, the right and left eigenvectors are everywhere positive and unique. 

Moreover, if $A$ is everywhere positive, then with $e$ and $\varepsilon$ normalized so that $\langle \varepsilon, e\rangle =1$, we have,

$$
\rho(A)^{-t} A^t\to e\varepsilon  \tag{$t\to\infty$}
$$

The assumption that $A$ is everywhere positive can be weakened without affecting this convergence.

**Lemma (PF)**

We can use the Perron-Frobenius theorem to provide bounds on the spectral radius of a nonnegative matrix. Fix $n\times n$ matrix $A=(a_{ij})$ and set,

- $rowsum_i(A): =\sum_{j=1}^n a_{ij}=$ the $i$-th row sum of $A$

- $colsum_j(A): =\sum_{i=1}^n a_{ij}=$ the $j$-th column sum of $A$


If $A\ge 0$,  then,

$$
\min_{i\in[n]} rowsum_i (A) \le \rho(A) \le \max_{i\in[n]} rowsum_i(A)
$$

$$
\min_{j\in[n]} colsum_i(A) \le \rho(A) \le \max_{j\in[n]} colsum_{j} (A)
$$

**Lemma (Local spectral radius result)**

Let $\|\cdot\|$ be any norm on $\mathbb{R}^n$. If $A$ is nonnegative and $h\in\mathbb{R}^n$ obeys $h\gg 0$, then

$$
\|A^kh\|^{1/k} \to \rho(A),\,\,\text{as $k\to\infty$}
$$

The expression on the left is called the **local spetral radius** of $A$ at $h$. This lemma gives one set of conditions under which a local spectral radius equals the spectral radius.

**Theorem (matrix is a linear operator)**

If $L$ is a linear operator on $\mathbb{R}^n$, then there exists an $n\times n$ matrix $A=(a_{ij})$ such that $Lu=Au$ for all $u\in\mathbb{R}^n$.

**Lemma (Equivalence of linear operators)**

When $X=\{x_1,\cdots,x_n\}$,  the followings sets are in one-to-one correspondence:

1. The set of all $n\times n$ real matrices.

2. The set of all linear operators on $\mathbb{R}^n$.

3. The set $\mathscr{L}(\mathbb{R}^X)$ of linear operators on $\mathbb{R}^X$. 

4. The set of all functions from $X\times X$ to $\mathbb{R}$.

**Lemma (Positive linear operator)**

An linear operator $L\in\mathscr{L}(\mathbb{R}^X)$ is positive if and only if its matrix representation is a nonnegative matrix.

### Newton's fixed point method

Let $T$ be a differntiable self-map on an open set $U\subset \mathbb{R}^n$.

We want to find a fixed point of $T$ on $U$.

1. start from an arbitrary guess $u_0\in U$. 

2. Find the fixed point of the linearization of $T$, i.e., $\hat T$ around the guess and let this fixed point be $x_1$.

Since $\hat T u = Tu_0 + J_T(u_0)(u-u_0)$. Let $u_1$ be the fixed point of this linearization, this implies, 

$$
u_1 = Tu_1 = Tu_0 + J_T(u_0)(u_1-u_0) = Tu_0 + J_T(u_0)u_1-J_T(u_0)u_0
$$

$$
(I-J_T(u_0))u_1 = Tu_0 - J_T(u_0)u_0
$$

$$
u_1 = (I-J_T(u_0))^{-1}(Tu_0 - J_T(u_0)u_0)
$$

3. Set $u_1$ as the new guess

4. Find the fixed point of the linearization of $T$ around $u_1$, and let this be $u_2$

5. Iterate the above procedure to lead to the sequence of points

$$
u_{k+1} = Qu_k
$$

where

$$
Qu: = (I-J_T(u))^{-1}(Tu - J_T(u)u)
$$



### Suprema and Infima under a pointwise order

For a pair of functions $\{u,v\}$, the supremum in $(\mathbb{R}^X,\le)$ is the pointwise maximum, while the infimum in $(\mathbb{R}^X, \le)$ is the pointwise minimum.

The same principle holds, for **finite collections of functions**.


Thus, if $\{v_i\} = \{v_i\}_{i\in I}$ is a finite subset of $\mathbb{R}^X$, then, for all $x\in X$, 

$$
\left(\bigvee_{i} v_i\right)(x) := \max_{i\in I}v_i{x}
$$

$$
\left(\bigwedge_{i} v_i\right)(x) := \min_{i\in I}v_i{x}
$$

## Stochastic dominance

- First-order stochastic dominance can be understood by considering an agent with a utility function $u \in \mathbb{R}^X$ that prefers more to less ($u \in i\mathbb{R}^X$). The agent ranks lotteries over $X$ by expected utility, evaluating $\varphi \in \mathscr{D}(X)$ as $\sum_x u(x)\varphi(x)$. The agent (weakly) prefers $\psi$ to $\varphi$ if $\varphi \precsim_F \psi$

Consider the class $\mathscr{A}$ of all agents who

1. have preferences over $X$, i.e., $u\in\mathbb{R}^X$;
2. prefer more to less, i.e., $u\in i\mathbb{R}^X$;
3. rank lotteries over $X$ according to expected utilities, i.e., $\sum_x u(x)\varphi(x)$

Then,

$$
\varphi\precsim_F \psi \iff (a\in\mathscr{A} \implies \text{$a$ prefers $\psi$ to $\varphi$})
$$

## Computational Issues

Linear operators brings some computational advantages compared to matrices.

Consider a setting where the state space $X$ takes the form

$$
X = Y\times Z, |Y|=j, |Z|=k
$$

A typical element of $X$ is $x=(y,z)$.

Let $Q$ be a map from $Z\times Z$ to $\mathbb{R}$, i.e., a $k\times k$ matrix, and consider the operator sending $u\in\mathbb{R}^X$ according to the rule,

$$
(Lu)(x)=(Lu)(y,z) = \sum_{z'\in Z} u(y,z')Q(z,z') \tag{natural representation}
$$

and $L\in\mathscr{L}(\mathbb{R}^X)$.

Since $L$ is a linear operator on $\mathbb{R}^X$, by lemma 2.3.5., $L$ can be represented as a $n\times n$ matrix $(L(x_i,x_j))=(L_{ij})$, where $n = |X| =j\times k$.

To construct this matrix, we first need to **flatten** $Y\times Z$ into a set $X=\{x_1,\cdots,x_n\}$ with a single index.

### Two natural way to flatten $Y\times Z$

Considering $Y\times Z$ as a two-dimensional array with typical element $(y_i,z_j)$.

We can

1. **Column-major ordering** stack all $k$ columns vertically into one long column (**default for Julia and Fortran**)

2. **Row-major ordering** concatenate all $j$ rows into one long row (**default for Python and C**)

After adopting one of these conventions, by Lemma 2.3.5., we can construct a uniquely defined $n\times n$ matrix that represents $L$.

Once we decide how to construct this matrix, we can instantiate it in computer memory and compute the operation $u\mapsto Lu$ by matrix operation. 

#### Disadvantage using matrix approach

1. Constructing the matrix representatioin is tedious

2. Confusion can arise when swapping between column and row major ordering in order to shift between languages or to communicate with others.

3. Differences are introduced between computer code and the natural representation can be a source of bugs.

4. An $n\times n$ matrix has to be instantiated in memory, even though the linear operation is only an inner product in $\mathbb{R}^k$. (This issuse can be alleviated in most languages by employing sparse matrices, but doing so add boilerplate and can be a source of inefficiency)

Hence, modern scientific computing environment support linear operation directly, as well as actions on linear operators such as inverting linear maps. This encourages us to take an operator-based approach.