You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Policy Improvement) Find policy $\pi^{n + 1}$ that improves upon policy $\pi^n$ by being greedy on $V^{\pi_n}$.
If $\pi^{n+1} = \pi^n$ then stop else update the policy and $n$ and repeat step two onwards.
\begin{corollary}
If $f$ is not $\alpha$-optimal, I can always improve it or if you cannot improve a policy for atleast one state, then it is already $\alpha$-optimal.
\end{corollary}
Contraction Mappings
For any function $u \in B(\mathcal{S})$ let $|u| = \sup_{i\ge 0} |u(i)|$
\begin{note}
$\sup$ used when the set is unbounded for example $u(i) = 1 - \frac{1}{i}$. Here $\max$ doesn't make sense as we never reach 1 (only closer to it).
\end{note}
\begin{definition}
A mapping $T: B(\mathcal{S}) \ra B(\mathcal{S})$ is said to be a \textbf{contraction mapping} if
$$
| Tu - Tv | \le \beta |u - v |
$$
for some $\beta < 1, \ \forall u, v \in B(\mathcal{S})$.
Called contraction as the vectors we get after appling the operator are closer than they were before. (norm decreases).
\end{definition}
\begin{theorem}
If $T: B(\mathcal{S}) \ra B(\mathcal{S})$ is a contraction mapping then there exists a unique function $g\in B(\mathcal{S})$ such that $T g= g$. Furthermore $\forall u \in B( \mathcal{S} ), \ T^n u \ra g$ as $n \ra \far$.
\end{theorem}
\begin{proof}
Check slides.
\end{proof}
This theorem is a special case of Banach fixed-point theorem. Other fixed-point theorems are Brouwer and Kakutani.
Optimality Operator $T_\alpha$
\begin{theorem}
The optimality operator $T_\alpha$
$$
(T_\alpha u)(i) = \max_a \bc{ r(i,a) + \alpha \sum_j P_{ij}(a) u(j)}
$$
is a contraction mapping.
\end{theorem}