Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
2796 lines (2794 sloc) 196 KB
<div><div class='wrapper'>
<p><a name='1'></a></p>
<h1>EE 221A: Linear System Theory</h1>
<h2>August 23, 2012</h2>
<h2>Administrivia</h2>
<p>Prof. Claire Tomlin (tomlin@eecs). 721 Sutardja Dai Hall. Somewhat
tentative office hours on schedule: T 1-2, W 11-12.
http://inst.eecs.berkeley.edu/~ee221a</p>
<p>GSI: Insoon Yang (iyang@eecs). Insoon's office hours: M 1:30 - 2:30, θ
11-12.</p>
<p>Homeworks typically due on Thursday or Friday.</p>
<h2>Intro</h2>
<p>Bird's eye view of modeling in engineering + design vs. in science.</p>
<p>"Science":</p>
<p><mathjax>$$
\mbox{Dynamical system}
\rightarrow \mbox{experiments}
\leftrightarrow \mbox{Model}
$$</mathjax></p>
<p>"Engineering":</p>
<p><mathjax>$$
\mbox{dynamical system}
\rightarrow \mbox{experiments}
\leftrightarrow \mbox{Model}
\rightarrow \mbox{control}
$$</mathjax></p>
<p>Control validation, verification, testing.</p>
<p>Broad brush of a couple of concepts: modeling. We're going to spend a lot
of time talking about modeling in this course.</p>
<p>First: for any dynamical system, there's an infinite number of models you
could design, depending on your level of abstraction. Typically, you choose
level of abstraction based on use case. Often, only able to use certain
kinds of experiments. e.g. probing of protein concentration levels. If
you're able to measure just this, then the signals in your model should
have something to do with these concentration levels.</p>
<p>As we said, same phys system can have many different models. Another
example: MEMS device. Can think about having models at various different
models. e.g. electrical model: silicon / electrostatics of the
system. Might be interested in manipulation of the device.</p>
<p>Alt: mechanical model (could have a free-body diagram, e.g.).</p>
<p>Another example: Hubble telescope. Could think of orbital
dynamics. Individual rigid body dynamics. Or properties of the telescope,
the individual optical models of the mirrors and their interactions. The
idea here is to just realize that the word model can mean very different
things. The logical model to use depends on the task at hand. The main
point: a basic tenet of engineering: the value of a model: choose the
simplest model that answers the questions you are asking about the
system. The simplest model that will allow you to predict something that
you didn't build into it.</p>
<p>Predict IO relations that you didn't explicitly design the model on. One of
the properties of a good linear model for a system: it obeys linearity, so
if you form a basis for your domain, then you have the system response to
any input spanned by this basis. Probably the most important thing to take
away from this course: linearity is a <em>very</em> strong principle that allows
us to build up a set of tools.</p>
<h2>Time</h2>
<p>We have this term a "dynamical system". A key part is that it changes with
time, responding with behavior over time. Time will turn out to be quite
important. Depending on how we model time, we can come up with different
variables. We call time (t) a privileged model because it has certain
properties. Namely, when we think about time, we think about time marching
forward (unidirectionality of evolution). Different models: continuous time
(<mathjax>$t \in \Re$</mathjax>, could be negative, could go backwards, if we are interested
in backwards evolution), or discrete time <mathjax>$t \in \{nT, n \in \mathbb{Z}\}$</mathjax>,
where <mathjax>$T$</mathjax> is some sampling time. So in that sense, discrete time, we have
some set. We can also come up with more complicated models of time, like
discrete-time asynchronous. The previous model was some constant period
<mathjax>$T$</mathjax>. In DT asynchronous, we just have a set of points in time. Now becoming
a more important model now with asynchronous processes (react to events
that are going to happen at previously undefined points in time).</p>
<h2>Linear vs. nonlinear models</h2>
<p>More on this later. Suppose we could take the system, and we could
represent it as being in one of a number of states.</p>
<p>First: suppose a finite number of states (so can be modeled by a FSM),
which represent some configuration of the system. State space represents
states system can be in at any point in time. If state space is finite, we
can use a finite-state automaton. Each state has an output (prints out a
message, or a measurement is taken), and we also consider inputs. The
inputs are used to evolve the dynamic system. Input affects a
transition. We can build up the dynamics of the system by just defining the
transition function.</p>
<p>Packet transmitting node: first state is "ready-to-send"; second state is
"send packet &amp; wait"; and the third state is "flush buffer". If buffer
empty, stay in <mathjax>$q_1$</mathjax>. If not empty, transitions to <mathjax>$q_2$</mathjax>. If ACK received,
then transition to <mathjax>$q_3$</mathjax> and return to <mathjax>$q_1$</mathjax>. If <mathjax>$T$</mathjax> time units elapse, we
time out and transition directly to <mathjax>$q_1$</mathjax>. Here, no notion of linear or
nonlinear systems. To be able to talk about linear or nonlinear models, we
need to be able to put some vector space structure on these three
elements. System must then satisfy superposition.</p>
<p>Back to abstract dynamical system (thing we could never hope to model
perfectly): rather than thinking about a set of rules, we're going to think
about a mathematical model. Three classes: CT, DT [synchronous], and
discrete-state (typically finite). Within each of these classes we can
further break each down. For the first two, we can consider linearity, and
we can further break these down into time-varying (TV) and time-invariant
(TI). This course is going to focus just on the linear systems in
continuous and discrete time, both time-varying and time-invariant. We'll
use differential equation models in continuous time and difference equation
models in discrete time. We usually develop in continuous-time and show
analogies in discrete-time.</p>
<h2>Analysis and Control</h2>
<p>Control is pervasive. If you go to any of the control conferences, you see
areas where techniques from this course are applied. Modern control came
about because of aerospace in the 50s. e.g. autopilot, air traffic
control. There the system itself is the system of aircraft. Chemical
process control. Mechatronics, MEMS, robotics. Novel ways to automate
things that hadn't been automated previously, mostly because of a
renaissance in sensing. Power systems. Network control systems: how you
combine models of the system itself with the control models. Quantum
chemistry. Typically, when we think about state spaces, we think about the
state as a vector in <mathjax>$\Re^n$</mathjax>. In many cases, you want to think about the
state spaces as more complicated (e.g. <mathjax>$C^\infty$</mathjax>, the class of smooth
functions).</p>
<h2>Difference between verification, simulation, and validation</h2>
<p>One of the additional basic tenets of this course: if you have a model of
the system, and you can analytically verify that the model behaves in given
ways for ranges of initial conditions, then that is a very valuable thing
to have: you have a proof that as long as the system adheres to the model,
then your model will work as expected. Simulation gives you system behavior
for a certain set of parameters. Very different, but they complement each
other. Analyze simpler models, simulate more complex models.</p>
<h2>Linear Algebra</h2>
<p>Functions and their properties.</p>
<p>Fields, vector spaces, properties and subspaces.</p>
<p>(note regarding notation: <mathjax>$\Re^+$</mathjax> means non-negative reals, as does
<mathjax>$\mathbb{C}_+$</mathjax> (non-negative real part)</p>
<p><mathjax>$\exists!$</mathjax>: exists a unique, <mathjax>$\exists?$</mathjax>: does there exist, <mathjax>$\ni$</mathjax>:
such that.</p>
<p>Cartesian product: <mathjax>$\{(x,y) \vert x \in X \land y \in Y\}$</mathjax> (set of ordered
n-tuples)</p>
<p><a name='2'></a></p>
<h1>Functions and Vector Spaces</h1>
<h2>August 28, 2012</h2>
<p>OH: M/W 5-6, 258 Cory</p>
<p>Today: beginning of the course: review of lin. alg topics needed for the
course. We're going to go through lecture notes 2 and probably start on
the third sets of notes. Will bring copies of 3 and 4 on Thursday.</p>
<p>We did an introduction to notation and topics last time. First topic:
functions, which will be used synonymously with "maps". Terminology will be
used interchangeably.</p>
<p>Given two sets of elements X, Y, we defined <mathjax>$\fn{f}{X}{Y}$</mathjax>. Notion of
range vs. codomain (range is merely the subset of the codomain covered by
f). We define <mathjax>$f(X) \defequals \set{f(x)}{x \in X}$</mathjax> to be the
range.</p>
<h2>Properties of functions</h2>
<p>Injectivity of functions ("one-to-one"). A function <mathjax>$f$</mathjax> is said to be
injective iff the function maps each x in X to a distinct y in
Y. Equivalently, <mathjax>$f(x_1) = f(x_2) \iff x_1 = x_2$</mathjax>. This is also equivalent
to <mathjax>$x_1 \neq x_2 \iff f(x_1) \neq f(x_2)$</mathjax>.</p>
<p>Surjectivity of functions ("onto"). A function <mathjax>$f$</mathjax> is said to be surjective
if the codomain is equal to the range. Basically, the map <mathjax>$f$</mathjax> covers the
entire codomain. A way to write this formally is that <mathjax>$f$</mathjax> is surjective iff
<mathjax>$\forall y \in Y \exists x \in X \ni y = f(x)$</mathjax>.</p>
<p>And then a map <mathjax>$f$</mathjax> is bijective iff it is both injective and surjective. We
can write this formally as there being a unique <mathjax>$x \in X$</mathjax> forall <mathjax>$y \in Y$</mathjax>.</p>
<p>Example: inverse of a map. We can talk about left and right inverses of
maps. Suppose we have a map <mathjax>$\fn{f}{X}{Y}$</mathjax>. We're going to define this
map <mathjax>$\mathbb{1}_X$</mathjax> as the identity map on X. Namely, application of this
map to any <mathjax>$x \in X$</mathjax> will yield the same <mathjax>$x$</mathjax>.</p>
<p>The left inverse of <mathjax>$f$</mathjax> is <mathjax>$\fn{g_L}{Y}{X}$</mathjax> such that <mathjax>$g_L \circ f =
\mathbb{1}_X$</mathjax>. In other words, <mathjax>$\forall x\in X, (g_L \circ f)(x) = x$</mathjax>.</p>
<p>Prove: <mathjax>$f$</mathjax> has a left inverse <mathjax>$g_L$</mathjax> iff <mathjax>$f$</mathjax> is injective. First of all, let
us prove the backwards implication. Assume <mathjax>$f$</mathjax> is injective. Prove that
<mathjax>$g_L$</mathjax> exists. We're going to construct the map <mathjax>$\fn{g_L}{Y}{X}$</mathjax> as
<mathjax>$g_L(f(x)) = x$</mathjax>, where the domain here is the range of <mathjax>$f$</mathjax>. In order for
this to be a well-defined function, we require that <mathjax>$x$</mathjax> is unique, which is
met by injectivity of <mathjax>$f$</mathjax>.</p>
<p>Now let us prove the forward implication. Assume that this left inverse
<mathjax>$g_L$</mathjax> exists. By definition, <mathjax>$g_L \circ f = \mathbb{1}_x \iff \forall x
\in X g_L(f(x)) = x$</mathjax>. If <mathjax>$f$</mathjax> were not injective, then <mathjax>$g_L$</mathjax> would not be
well-defined (<mathjax>$\exists x_1 \neq x_2$</mathjax> such that <mathjax>$f(x_1) = f(x_2)$</mathjax>, and so
<mathjax>$g_L$</mathjax> is no longer a function).</p>
<p>review: contrapositive: <mathjax>$(A \implies B) \iff (\lnot B \implies \lnot A)$</mathjax>;
contradiction: <mathjax>$(A \not\implies B) \implies \text{contradiction}$</mathjax>. </p>
<p>We can similarly shows surjectivity <mathjax>$\iff$</mathjax> existence of a right
inverse. With these two, we can then trivially show that bijectivity <mathjax>$\iff$</mathjax>
existence of an inverse (rather, both a left and right inverse, which we
can easily show must be equal). Proof will likely be part of the first
homework assignment.</p>
<h2>Fields</h2>
<p>We need the definition of a vector and a field in order to define a vector
space.</p>
<p>A field is an object: a set of elements <mathjax>$S$</mathjax> with two closed binary
operations defined upon <mathjax>$S$</mathjax>. These two operations are addition (which
forms an abelian group over <mathjax>$S$</mathjax>) and multiplication (which forms an abelian
group over <mathjax>$S - \{0\}$</mathjax>) such that multiplication distributes over
addition. Note that convention dictates <mathjax>$0$</mathjax> to be the additive identity and
<mathjax>$1$</mathjax> to be the multiplicative identity.</p>
<p>Other silly proofs include showing that if both a left and right identity
exist, they must be equivalent, or that multiplication by <mathjax>$0$</mathjax> maps any
element to <mathjax>$0$</mathjax>.</p>
<h2>Vector spaces (linear spaces)</h2>
<p>A vector space is a set of vectors V and a field of scalars <mathjax>$\mathbb{F}$</mathjax>,
combined with vector addition and scalar multiplication. Vector addition
forms an abelian group, but this time, scalar multiplication has the
properties of a monoid (existence of an identity and associativity). We
then have the distributive laws <mathjax>$\alpha + \beta)x = \alpha x + \beta x$</mathjax> and
<mathjax>$\alpha (x + y)$</mathjax>.</p>
<h2>Function spaces</h2>
<p>We define a space <mathjax>$F(D,V)$</mathjax>, where <mathjax>$(V, \mathbb{F})$</mathjax> is a vector space and
<mathjax>$D$</mathjax> is a set. <mathjax>$F$</mathjax> is the set of all functions <mathjax>$F(D, V) = \fn{f}{D}{V}$</mathjax>. Is
<mathjax>$(F, \mathbb{F})$</mathjax> a vector space (yes) where vector addition is pointwise
addition of functions and scalar multiplication is pointwise multiplication
by a scalar?</p>
<p>Examples of this: space of continuous functions on the closed interval
<mathjax>$\fn{\mathcal{C}}{\bracks{t_0, t_1}}{\Re^n}$</mathjax>, (<mathjax>$(C(\bracks{t_0, t_1},
\Re^n), \Re)$</mathjax>). This is indeed a vector space.</p>
<h2>Lebesgue spaces</h2>
<p><mathjax>$L_p t_0, t_1) = \set{\fn{f}{[t_0, t_1]}{\Re}}{\int_{t_0}^{t_1}
\abs{f(t)}^p dt &lt; \infty}$</mathjax>.</p>
<p>We can then talk about <mathjax>$\ell_p$</mathjax>, which are spaces of sequences. <mathjax>$\ell_2$</mathjax> is
the space of square-summable sequences of real numbers. Informally, <mathjax>$\ell_2
= \set{ v = \{v_1, v_2, ... v_k\}}{v_k \in \Re \sum_k \abs{v_k}^2 &lt;
\infty}$</mathjax>.</p>
<p>In general, when looking at vector spaces, often we use <mathjax>$\mathbb{F} = \Re$</mathjax>,
and we refer to the space as simply <mathjax>$V$</mathjax>.</p>
<p>Next: subspaces, bases, linear dependence/independence, linearity. One of
the main things we're going to do is look at properties of linear functions
and representation as multiplication by matrices.</p>
<p><a name='3'></a></p>
<h1>Vector Spaces and Linearity</h1>
<h2>August 30, 2012</h2>
<h2>From last time</h2>
<p>Subspaces, bases, linear dependence/independence, linearity. One of the
main things we're going to do is look at properties of linear functions and
representation as multiplication by matrices.</p>
<h2>Example (of a vector space)</h2>
<p><mathjax>$\ell_2 = \{v = \{v_1, v_2, ...\} \st \sum_{i=1}^\infty \abs{v_i}^2 &lt;
\infty, v_i \in \Re \}$</mathjax></p>
<p>Vector addition and scalar multiplication? ("pointwise" addition,
multiplication by reals)</p>
<h2>What is a vector subspace?</h2>
<p>Consider vector space <mathjax>$(V, \mathbb{F})$</mathjax>. Consider a subset W of V combined
with the same field. <mathjax>$(W, \mathbb{F})$</mathjax> is a subspace of <mathjax>$(V, \mathbb{F})$</mathjax>
if it is closed under vector addition and scalar multiplication (formally,
this must be a vector space in its own right, but these are the only
vector space properties that we need to check).</p>
<p>Consider vectors from <mathjax>$\Re^n$</mathjax>. A plane (in <mathjax>$\Re^3$</mathjax>) is a subspace of
<mathjax>$\Re^3$</mathjax> if it contains the origin.</p>
<p>Aside: for <mathjax>$x \in V$</mathjax>, <strong>span</strong><mathjax>$(x) = \alpha x, \alpha \in \mathbb{F}$</mathjax>.</p>
<h2>Linear dependence, linear independence.</h2>
<p>Consider a set of <mathjax>$p$</mathjax> vectors <mathjax>$\{v_1, v_2, ..., v_p\}, v_i \in V$</mathjax>. This set
of vectors is said to be a <strong>linear independent set</strong> iff no nontrivial
homogeneous equation exists, i.e. <mathjax>$\sum_i \alpha_i v_i = 0 \implies \forall
i, \alpha_i = 0$</mathjax>. This is equivalent to saying that no one vector can be
written as a linear combination of the others.</p>
<p>Otherwise, the set is said to be <strong>linearly dependent</strong>.</p>
<h2>Bases</h2>
<p>Recall: a set of vectors <mathjax>$W$</mathjax> is said to span a space <mathjax>$(V, \mathbb{F})$</mathjax> if
any vector in the space can be written as a <strong>linear combination</strong> of
vectors in the set, i.e. <mathjax>$\forall v \in V, \exists \set{(\alpha_i,
w_i)}{v = \sum \alpha_i w_i}$</mathjax> for <mathjax>$w_i \in W, \alpha_i \in \mathbb{F}$</mathjax>.</p>
<p>W is a <strong>basis</strong> iff it is also linearly independent.</p>
<h2>Coordinates</h2>
<p>Given a basis <mathjax>$B$</mathjax> of a space <mathjax>$(V, \mathbb{F})$</mathjax>, there is a unique
representation (trivial proof) of every <mathjax>$v \in V$</mathjax> as a linear combination
of elements of <mathjax>$B$</mathjax>. We define our <strong>coordinates</strong> to be the coefficients
that appear in this unique representation. A visual representation is the
<strong>coordinate vector</strong>, which defines</p>
<p><mathjax>$$\alpha = \begin{bmatrix}\alpha_i \\ \vdots \\ \alpha_n \end{bmatrix}$$</mathjax></p>
<p>Basis is not uniquely defined, but what is constant is the number of
elements in the basis. This number is the <strong>dimension</strong> (rank) of the
space. Another notion is that a basis generates the corresponding space,
since once you have a basis, you can acquire any element in the space.</p>
<h2>Linearity</h2>
<p>A function <mathjax>$\fn{f}{(V, \mathbb{F})}{(W, \mathbb{F})}$</mathjax> (note that these
spaces are defined over the same field!) is <strong>linear</strong> iff <mathjax>$f(\alpha_1 v_1 +
\alpha_2 v_2) = \alpha_1 f(v_1) + \alpha_2 f(v_2)$</mathjax>.</p>
<p>This property is known as <strong>superposition</strong>, which is an amazing property,
because if you know what this function does to the basis elements of a
vector space, then you know what it does to <em>any</em> element in the space.</p>
<p>An interesting corollary is that a linear map will <em>always</em> map the zero
vector to itself.</p>
<h2>Definitions associated with linear maps</h2>
<p>Suppose we have a linear map <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax>. The <strong>range
(image)</strong> of <mathjax>$\mathcal{A}$</mathjax> is defined to be <mathjax>$R(\mathcal{A}) = \set{v}{v =
A(u), u \in U} \subset V$</mathjax>. The <strong>nullspace (kernel)</strong> of <mathjax>$\mathcal{A}$</mathjax> is
defined to be <mathjax>$N(\mathcal{A}) = \set{u}{\mathcal{A}(u) = 0} \subset U$</mathjax>. Also
trivial (from definition of linearity) to prove that these are subspaces.</p>
<p>We have a couple of very important properties now that we've defined range
and nullspace.</p>
<h2>Properties of linear maps <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax></h2>
<p><mathjax>$$(b \in V) \implies (\mathcal{A}(u) = b \iff b \in R(\mathcal{A}))$$</mathjax></p>
<p><mathjax>$$(b \in R(\mathcal{A})) \iff (\exists!\ u\ \st \mathcal{A}(u) = b \iff
[N(\mathcal{A}) = 0])$$</mathjax></p>
<p>(if the nullspace only contains the zero vector, we say it is <strong>trivial</strong>)</p>
<p><mathjax>$$\mathcal{A}(x_0) = \mathcal{A}(x_1) \iff x - x_0 \in N(\mathcal{A})$$</mathjax></p>
<p><a name='4'></a></p>
<h1>Matrix Representation of Linear Maps</h1>
<h2>September 4, 2012</h2>
<h2>Today</h2>
<p>Matrix multiplication as a representation of a linear map; change of basis
-- what happens to matrices; norms; inner products. We may get to adjoints
today.</p>
<p>Last time, we talked about the concept of the range and the nullspace of a
linear map, and we ended with a relationship that related properties of the
nullspace to properties of the linear equation <mathjax>$\mathcal{A}(x) = b$</mathjax>. As
we've written here, this is not <em>matrix</em> multiplication. As we'll see
today, it can be represented as matrix multiplication, in which case, we'll
write this as <mathjax>$Ax = b$</mathjax>.</p>
<p>There's one more important result, called the rank-nullity theorem. We
defined the range and nullspace of a linear operator. We also showed that
these are subspaces (range of codomain; nullspace of domain). We call
<mathjax>$\text{dim}(R(\mathcal{A})) = \text{rank}(\mathcal{A})$</mathjax> and
<mathjax>$\text{dim}(N(\mathcal{A})) = \text{nullity}(\mathcal{A})$</mathjax>. Taking the
dimension of the domain as <mathjax>$n$</mathjax> and the dimension of the codomain as <mathjax>$m$</mathjax>,
<mathjax>$\text{rank}(\mathcal{A}) + \text{nullity}(\mathcal{A}) = n$</mathjax>. Left as an
exercise. Hints: choose a basis for the nullspace. Presumably you'd extend
it to a basis for the domain (without loss of generality, because any set
of <mathjax>$n$</mathjax> linearly independent vectors will form a basis). Then consider how
these relate to the range of <mathjax>$\mathcal{A}$</mathjax>. Then map <mathjax>$\mathcal{A}$</mathjax> over
this basis.</p>
<h2>Matrix representation</h2>
<p><strong>Any linear map between finite-dimensional vector spaces can be
represented as matrix multiplication.</strong> We're going to show that it's true
via construction.</p>
<p><mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax>. We're going to choose bases for the domain and
codomain. <mathjax>$\forall x \in U, x = \sum_{j=1}^n \xi_k u_j$</mathjax>. Now consider
<mathjax>$\mathcal{A}(x) = \mathcal{A}(\sum_{j=1}^n \xi_k u_j) = \sum_{j=1}^n \xi_k
\mathcal{A}(u_j)$</mathjax> (through linearity). Each <mathjax>$\mathcal{A}(u_j) =
\sum_{i=1}^n a_{ij} v_i$</mathjax>. Uniqueness of <mathjax>$a_{ij}$</mathjax> and <mathjax>$\xi_j$</mathjax> follows from
writing the vector spaces in terms of a basis.</p>
<p><mathjax>$$
\mathcal{A}(x) = \sum_{j=1}^n \xi_j \sum_{i=1}^m a_{ij} v_i
\\ = \sum_{i=1}^m \left(\sum_{j=1}^n a_{ij} \xi_j\right) v_i
\\ = \sum_{i=1}^m \eta_i v_i
$$</mathjax></p>
<p>Uniqueness of representation tells me that <mathjax>$\eta_i \equiv \sum_{j=1}^n
a_{ij} \xi_j$</mathjax>. We've got <mathjax>$i = \{1 .. m\}$</mathjax> and <mathjax>$j = \{1 .. n\}$</mathjax>. We can turn
this representation into a matrix by defining <mathjax>$\eta = A\xi$</mathjax>. <mathjax>$A \in
\mathbb{F}^{m \times n}$</mathjax> is defined such that its <mathjax>$j^{\text{th}}$</mathjax> column is
<mathjax>$\mathcal{A}(u_j)$</mathjax> written with respect to the <mathjax>$v_i$</mathjax>s.</p>
<p>All we used here was the definitions of basis, coordinate vectors, and
linearity.</p>
<p>Let's do a couple of examples. Foreshadowing of work later in
controllability of systems. Consider a linear map <mathjax>$\fn{\mathcal{A}}
{(\Re^n, \Re)}{(\Re^n, \Re)}$</mathjax>. Try to derive the matrix <mathjax>$A \in \Re^{n
\times n}$</mathjax>. Both the domain and codomain have as basis <mathjax>$\{b,
\mathcal{A}(b), \mathcal{A}^2(b), ..., \mathcal{A}^{n-1}(b)\}$</mathjax>, where <mathjax>$b
\in \Re^n$</mathjax> and <mathjax>$A^n = -\sum_1^n -\alpha_i \mathcal{A}^{n-i}$</mathjax>. Your task is
to show that the representation of <mathjax>$b$</mathjax> and <mathjax>$\mathcal{A}$</mathjax> is:</p>
<p><mathjax>$$
\bar{b} = \begin{bmatrix}1 \\ 0 \\ \vdots \\ 0\end{bmatrix}
\\ \bar{A} = \begin{bmatrix}
\\ 0 &amp; 0 &amp; ... &amp; 0 &amp; -\alpha_n
\\ 1 &amp; 0 * ... &amp; 0 &amp; -\alpha_{n-1}
\\ 0 &amp; 1 * ... &amp; \vdots &amp; -\alpha_{n-2}
\\ \vdots &amp; \vdots&amp; \ddots &amp; \vdots &amp; -\alpha_{n-2}
\\ \vdots &amp; \vdots &amp; \ddots &amp; \vdots &amp; -\alpha_{n-2}
\\ 0 &amp; 0 &amp; \dots &amp; 1 &amp; -\alpha_1
\end{bmatrix}
$$</mathjax></p>
<p>This is really quite simple; it's almost by definition.</p>
<p>Note that these are composable maps, where composition corresponds to
matrix multiplication.</p>
<h2>Change of basis</h2>
<p>Consider we have <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax> and two sets of bases for the
domain and codomain. There exist maps between the first set of bases and
the second set; composing those appropriately will give you your change of
basis. Essentially, do a change of coordinates to those in which <mathjax>$A$</mathjax> is
defined (represented this as <mathjax>$P$</mathjax>), apply <mathjax>$A$</mathjax>, then change the coordinates
of the codomain back (represented as <mathjax>$Q$</mathjax>). Thus <mathjax>$\bar{A} = QAP$</mathjax>.</p>
<p>If <mathjax>$V = U$</mathjax>, then you can easily derive that <mathjax>$Q = P^{-1}$</mathjax>, so <mathjax>$\bar{A} =
P^{-1}AP$</mathjax>.</p>
<p>We consider this transformation (<mathjax>$\bar{A} = QAP$</mathjax>) to be a <strong>similarity
transformation</strong>, and <mathjax>$A$</mathjax> and <mathjax>$\bar{A}$</mathjax> are called <strong>similar</strong>
(<strong>equivalent</strong>).</p>
<p>We derived these two matrices from the same linear map, but they're derived
using different bases.</p>
<p>Proof of Sylvester's inequality on homework 2.</p>
<p>One last note about the dimension of the rank of a linear map, which
corresponds to the rank of the associated matrix representation: that is
<mathjax>$\text{dim}(R(A)) = \text{dim}(R(\mathcal{A}))$</mathjax>. Similarly, <mathjax>$\text
{nullity}(A) = \text{dim}(\text{nullspace}(A)) = \text{dim}(\text
{nullspace}(\mathcal{A}))$</mathjax>.</p>
<p>Sylvester's inequality, which is an important relationship, says the
following: <strong>Suppose you have <mathjax>$A \in \mathbb{F}^{m \times n}$</mathjax>, <mathjax>$B \in
\mathbb{F}^{n \times p}$</mathjax>, then <mathjax>$AB \in \mathbb{F}^{m \times p}$</mathjax>, then
<mathjax>$\text{rk}(A) + \text{rk}(B) - n \le \text{rk}(AB) \le \min(\text{rk}(A),
\text{rk}(B)$</mathjax>.</strong> On the homework, you'll have to show both
inequalities. Note at the end about elementary row operations.</p>
<p>Next important concept about vector spaces: that of norms.</p>
<h2>Norms</h2>
<p>With some vector spaces, you can associate some entity called a norm. We
can then speak of a <strong>normed vector space</strong> (more commonly known as a
<strong>metric space</strong>). Suppose you have a vector space <mathjax>$(V, \mathbb{F})$</mathjax>, where
<mathjax>$\mathbb{F}$</mathjax> is either <mathjax>$\Re$</mathjax> or <mathjax>$\mathbb{C}$</mathjax>. This is a metric space if you
can find <mathjax>$\fn{\mag{\cdot}}{V}{\Re_+}$</mathjax> that satisfies the following axioms:</p>
<p><mathjax>$\mag{v_1 + v_2} \le \mag{v_1} + \mag{v_2}$</mathjax></p>
<p><mathjax>$\mag{\alpha v} = \abs{\alpha}\mag{v}$</mathjax></p>
<p><mathjax>$\mag{v} = 0 \iff v = \theta$</mathjax></p>
<p>We have some common norms on these fields:</p>
<p><mathjax>$\mag{x}_1 = \sum_{i=1}^n \abs{x_i}$</mathjax> (<mathjax>$\ell_1$</mathjax>)</p>
<p><mathjax>$\mag{x}_2 = \sum_{i=1}^n \abs{x_i}^2$</mathjax> (<mathjax>$\ell_2$</mathjax>)</p>
<p><mathjax>$\mag{x}_p = \sum_{i=1}^n \abs{x_i}^p$</mathjax> (<mathjax>$\ell_p$</mathjax>)</p>
<p><mathjax>$\mag{x}_\infty = \max \abs{x_i}$</mathjax> (<mathjax>$\ell_\infty$</mathjax>)</p>
<p>One of the most important norms that we'll be using: the <strong>induced norm</strong>
is that induced by a linear operator. We'll define <mathjax>$\mathcal{A}$</mathjax> to be a
continuous linear map between two metric spaces; the induced norm is
defined as</p>
<p><mathjax>$$ \mag{\mathcal{A}}_i = \sup_{u \neq \theta}
\frac{\mag{\mathcal{A}u}_V}{\mag{u}_U} $$</mathjax></p>
<p>From analysis: the <strong>supremum</strong> is the least upper bound (the smallest
<mathjax>$\forall y \in S, x : x \ge y$</mathjax>).</p>
<p><a name='5'></a></p>
<h1>Guest Lecture: Induced Norms and Inner Products</h1>
<h2>September 6, 2012</h2>
<h2>Induced norms of matrices</h2>
<p>The reason that we're going to start talking about induced norms: today
we're just going to build abstract algebra machinery, and at the end, we'll
do the first application: least squares. We'll see why we need this
machinery and why abstraction is a useful tool.</p>
<p>The idea is that we want to find a norm on a matrix using existing norms on vectors.</p>
<p>Let 1) <mathjax>$\fn{A}{(U,F)}{(U,F)}$</mathjax>, 2) let U have the norm <mathjax>$\mag{\ }_u$</mathjax>, 3) let
V have the norm <mathjax>$\mag{\ }_v$</mathjax>. Let the <strong>induced norm</strong> be <mathjax>$\mag{A}_{u,v} =
\sup_{x\neq 0} \frac{\mag{Ax}_v}{\mag{x}_u}$</mathjax>. Theorem: the induced norm is
a norm. Not going to bother showing positive homogeneity and triangle
inequality (trivial in this case). Only going to show last property:
separates points. Essentially, <mathjax>$\mag{A}_{u,v} = 0 \iff A = 0$</mathjax>. The reason
that this is not necessarily trivial is because of the supremum. It's a
complex operator that's trying to maximize this function over an infinite
set of points. It's possible that the supremum does not actually exist at a
finite point.</p>
<p>The first direction is easy: if <mathjax>$A$</mathjax> is zero, then its norm is 0 (by
definition -- numerator is 0).</p>
<p>The second direction is a hard one. If <mathjax>$\mag{A}_{u,v} = 0$</mathjax>, then given any
<mathjax>$x \neq 0$</mathjax>, it holds that <mathjax>$\frac{\mag{Ax}_u}{\mag{v}_u} \le 0$</mathjax> (from the
definition of supremum). Denominator must be positive definite (being the
norm of a vector), and numerator must be positive definite (also being a
norm). Thus the norm is also bounded below by zero, which means that the
numerator is zero for all nonzero x. Thus everything is in the nullspace of
<mathjax>$A$</mathjax>, which means that <mathjax>$A$</mathjax> is zero.</p>
<p>Proposition: the induced norm has (a) <mathjax>$\mag{Ax}_u \le \mag{A}_{u,v}
\mag{x}_u$</mathjax>; (b) <mathjax>$\mag{AB}_{u,v} \le \mag{A}_{u,v} \mag{B}_{u,v}$</mathjax>. (b)
follows from (a).</p>
<p>Not emphasized in Claire's notes: induced norms form a small amount of all
possible norms on matrices.</p>
<p>Examples of induced norms:</p>
<ul>
<li><mathjax>$\mag{A}_{1,1} = \max_j \sum_i \abs{u_{ij}}$</mathjax>: maximum column sum: maximum
of the sum of columns;</li>
<li><mathjax>$\mag{A}_{2,2} = \max_j \sqrt{\lambda_j A^T A}$</mathjax>: max singular value norm;</li>
<li><mathjax>$\mag{A}_{\infty, \infty} = \max_i \sum_j \abs{u_{ij}}$</mathjax>: maximum row sum.</li>
</ul>
<p>Other matrix: special case of Schatten norms. (a) Frobenius norm
<mathjax>$\sqrt{\text{trace}(A^T A)}$</mathjax>. Also square root of singular
values. Convenient way to write nuclear norm.</p>
<p>Statistical regularization; Frobenius norm is analogous to <mathjax>$\ell_2$</mathjax>
regularization; nuclear norm analogous to <mathjax>$\ell_1$</mathjax> regularization. It is
important to be aware that these other norms exist.</p>
<h2>Sensitivity analysis</h2>
<p>Nice application of norms, but we won't see that it's a nice application
until later.</p>
<p>Computation for numerical linear algebra.</p>
<p>Some algebra can be performed to show that if <mathjax>$Ax_0 = b$</mathjax> (when <mathjax>$A$</mathjax>
invertible), then for <mathjax>$(A + \delta A)(x + \delta_x) = b + \delta b$</mathjax>, we
have an approximate bound of <mathjax>$\frac{\mag{\delta_x}}{\mag{x_0}} \le
\mag{A}\mag{A^{-1}} \bracks{\frac{\mag{\delta A}}{\mag{A}} +
\frac{\mag{\delta b}}{\mag{b}}}$</mathjax>. Need to engineer computation to improve
situation. Namely, we're perturbing <mathjax>$A$</mathjax> and <mathjax>$b$</mathjax> slightly: how much can the
solution vary? In some sense, we have a measure of effect
(<mathjax>$\mag{A}\mag{A^{-1}}$</mathjax>) and a measure of perturbation. The first quantity
is important enough that people in linear algebra have defined it and
called it a <strong>condition number</strong>: <mathjax>$\kappa(A) = \mag{A}\mag{A^{-1}} \ge
1$</mathjax>. The best you can do is 1. If you have a condition number of 1, your
system is well-conditioned and very robust to perturbations. Larger
condition number will mean less robustness to perturbation.</p>
<h2>More machinery: Inner Product &amp; Hilbert Spaces</h2>
<p>Consider a linear space <mathjax>$(H, \mathbb{F})$</mathjax>, and define a function
<mathjax>$\fn{\braket{}{}}{(H, \mathbb{F})}{\mathbb{F}}$</mathjax>. This function is an
inner product if it satisfies the following properties.</p>
<ul>
<li>Conjugate symmetry. <mathjax>$\braket{x}{y} = \braket{y}{x}^*$</mathjax>.</li>
<li>Homogeneity. <mathjax>$\braket{x}{\alpha y} = \alpha \braket{x}{y}$</mathjax>.</li>
<li>Linearity. <mathjax>$\braket{x}{y + z} = \braket{x}{y} + \braket{x}{z}$</mathjax>.</li>
<li>Positive definiteness. <mathjax>$\braket{x}{x} \ge 0$</mathjax>, where equality only occurs
when <mathjax>$x = 0$</mathjax>.</li>
</ul>
<p>Inner product spaces have a natural norm (might not be the official name),
and that's the norm induced by the inner product.</p>
<p>One can define <mathjax>$\mag{x}^2 = \braket{x}{x}$</mathjax>, which satisfies the axioms of a
norm.</p>
<p>Examples of Hilbert spaces: finite-dimensional vectors. Most of the time,
infinite-dimensional Hilbert spaces match up with finite-dimensional. All
linear operators in finite vector spaces are continuous because they can be
written as a matrix (not always the case with infinite vector
spaces). Suppose I have the field <mathjax>$\mathbb{F}$</mathjax>; <mathjax>$(\mathbb{F}^n,
\mathbb{F})$</mathjax>, where the inner product <mathjax>$\braket{x}{y} = \sum_i \bar{x_i}
y_i$</mathjax>, but another important inner product space is the space of
square-integrable functions, <mathjax>$L^2([t_0, t_1], \mathbb{F}^n
)$</mathjax>. Infinite-dimensional space which actually is the space spanned by
Fourier series. It turns out that the inner product (of functions) is
<mathjax>$\int_{t_0}^{t_1} f(t)^* g(t) dt$</mathjax>.</p>
<p>We're going to power through a little more machinery, but we're getting
very close to the application. Need to go through adjoints and
orthogonality before we can start doing applications.</p>
<h2>Adjoints</h2>
<p>Consider Hilbert spaces <mathjax>$(U, \mathbb{F}, \braket{}{}_u), V, \mathbb{F},
\braket{}{}_v)$</mathjax>, and let <mathjax>$\fn{A}{U}{V}$</mathjax> be a continuous linear
function. The <strong>adjoint</strong> of <mathjax>$A$</mathjax> is denoted <mathjax>$A^*$</mathjax> and is the map
<mathjax>$\fn{A^*}{V}{U}$</mathjax> such that <mathjax>$\braket{x}{Ay}_v = \braket{A^*x}{y}_u$</mathjax>.</p>
<p>Reasoning? Sometimes you can simplify things. Suppose <mathjax>$A$</mathjax> maps an
infinite-dimensional space to a finite-dimensional space (e.g. functions to
numbers). In some sense, you can convert that function into something that
goes from real numbers to functions on numbers. Generalization of the
Hermitian transpose.</p>
<p>Consider functions <mathjax>$f, g \in C([t_0, t_1], \Re^n)$</mathjax>. What is the adjoint of
<mathjax>$\fn{A}{C([t_0, t_1], \Re^n)}{\Re}$</mathjax>, where <mathjax>$A = \braket{g}{f}_{C
([t_0, t_1], \Re^n)}$</mathjax>? (aside: this notion of the adjoint will be very
important when we get to observability and reachability)</p>
<p>Observe that <mathjax>$\braket{v}{A}_\Re = v \cdot A = v \braket{g}{f}_C = \braket{v
g}{f}$</mathjax>, and so consequently, we have that the adjoint of <mathjax>$A^*[v] = v g$</mathjax>.</p>
<h2>Orthogonality</h2>
<p>With Hilbert spaces, one can define orthogonality in an axiomatic manner (a
more abstract form, rather). Let <mathjax>$(H, \mathbb{F}, \braket{}{})$</mathjax> be a
Hilbert space. Two vectors <mathjax>$x, y$</mathjax> are defined to be <strong>orthogonal</strong> if
<mathjax>$\braket{x}{y} = 0$</mathjax>.</p>
<p>Cute example: suppose <mathjax>$c = a + b$</mathjax> and <mathjax>$a, b$</mathjax> are orthogonal. In fact,
<mathjax>$\mag{c}^2 = \mag{a + b}^2 = \braket{a + b}{a + b} = \braket{a}{a} +
\braket{b}{b} + \braket{a}{b} + \braket{b}{a} = \mag{a}^2 +
\mag{b}^2$</mathjax>. Cute because the result is the Pythagorean theorem, which we
got just through these axioms.</p>
<p>One more thing: the orthogonal complement of a subspace <mathjax>$M$</mathjax> in a Hilbert
space is defined as <mathjax>$M^\perp = \set{y \in H}{\forall x \in M
\braket{x}{y}}$</mathjax>.</p>
<p>We are at a point now where we can talk about an important theorem:</p>
<h2>Fundamental Theorem of Linear Algebra (partially)</h2>
<p>Let <mathjax>$A \in \Re^{m \times n}$</mathjax>. Then:</p>
<ul>
<li><mathjax>$R(A) \perp N(A^T)$</mathjax></li>
<li><mathjax>$R(A^T) \perp N(A)$</mathjax></li>
<li><mathjax>$R(AA^T) = R(A)$</mathjax></li>
<li><mathjax>$R(A^TA) = R(A^T)$</mathjax></li>
<li><mathjax>$N(AA^T) = N(A)$</mathjax></li>
<li><mathjax>$N(A^TA) = N(A^T)$</mathjax></li>
</ul>
<p>Proofs:</p>
<ul>
<li>
<p>Given any <mathjax>$x \in \Re^n, y \in \Re^m \st A^T y = 0$</mathjax> (<mathjax>$y \in N(A^T)$</mathjax>),
consider the quantity <mathjax>$\braket{y}{Ax} = \braket{A^Ty}{x} = 0$</mathjax>.</p>
</li>
<li>
<p>Given any <mathjax>$x \in \Re^n, \exists y \in \Re^m \st x = A^T y + z$</mathjax>, where <mathjax>$z
\in N(A)$</mathjax>(as a result of the decomposition above). Thus <mathjax>$Ax =
AA^Ty$</mathjax>. Implies that <mathjax>$R(A) \subset R(A A^T)$</mathjax></p>
</li>
</ul>
<p>Now for the application.</p>
<h2>Application: Least Squares</h2>
<p>Consider the following problem: minimze <mathjax>$\mag{y - Ax}_2$</mathjax>, where <mathjax>$y \not\in
R(A)$</mathjax>. If <mathjax>$y$</mathjax> were in the range of A, and A were invertible, the solution
would be trivial (<mathjax>$A^{-1}y$</mathjax>). In many problems, <mathjax>$A \in \Re^{m\times n}$</mathjax>,
where <mathjax>$m \gg n$</mathjax>, <mathjax>$y \in \Re^m$</mathjax>, <mathjax>$x \in \Re^n$</mathjax>.</p>
<p>Since we cannot solve <mathjax>$Ax = y$</mathjax>, we instead solve <mathjax>$Ax = \hat{y}$</mathjax>. According
to our intuition, we would like <mathjax>$y - \hat{y}$</mathjax> to be orthogonal to
<mathjax>$R(A)$</mathjax>. From the preceding (partial) theorem, this means that <mathjax>$y - \hat{y}
\in N(A^T) \iff A^T(y - y_0) = 0$</mathjax>. Remember: what we really want to solve
is <mathjax>$A^T(y - Ax) = 0 \implies A^T Ax = A^T y \implies x = (A^T A)^{-1} A^T
y$</mathjax> if <mathjax>$A^T A$</mathjax> is invertible.</p>
<p>If A has full column-rank (that is, for <mathjax>$A \in \Re^{m \times n}$</mathjax>, we have
<mathjax>$R(A) = n$</mathjax>), then this means that in fact <mathjax>$N(A) = \{0\}$</mathjax>, and the preceding
theorem implies that the dimension of <mathjax>$R(A^T) = n$</mathjax>, which means that the
dimension of <mathjax>$R(A^T A) = n$</mathjax>. However, <mathjax>$A^T A \in \Re^{n \times n}$</mathjax>. Thus,
<mathjax>$A^T A$</mathjax> is invertible.</p>
<h2>Back to condition numbers (special case)</h2>
<p>Consider a self-adjoint and invertible matrix in <mathjax>$\Re^{n \times
n}$</mathjax>. <mathjax>$\hat{x} = (A^T A)^{-1} A^T y = A^{-1} y$</mathjax>. We have two ways of
determining this value: the overdetermined least-squares solution and the
standard inverse. Let us look at the condition numbers.</p>
<p><mathjax>$\kappa(A^T A) = \mag{A^T A}\mag{(A^T A)^{-1}} = \mag{A^2}\mag{(A^{-1})^2}
= \bracks{\kappa(A)}^2$</mathjax>. This result is more general: also applies in the
<mathjax>$L^2$</mathjax> case even if <mathjax>$A$</mathjax> is not self-adjoint. As you can see, this is worse
than if we simply use the inverse.</p>
<h2>Gram-Schmidt orthonormalization</h2>
<p>This is a theoretical toy, not used for computation (numerics are very bad).</p>
<p>More definitions:</p>
<p>A <em>set</em> of vectors S is <strong>orthogonal</strong> if <mathjax>$x \perp y \forall x
\neq y$</mathjax> and <mathjax>$x, y \in S$</mathjax>.</p>
<p>The set is <strong>orthonormal</strong> if also <mathjax>$\mag{x} = 1, \forall x \in S$</mathjax>. Why do we
care about orthonormality? Consider Parseval's theorem. The reason you get
that theorem is that the bases are required to be orthonormal so that you
can get that result. Otherwise it wouldn't be as clean. That's typically
why people like orthonormal bases: you can represent your vectors as just
coefficients (and you don't need to store the length of the vectors).</p>
<p>We conclude with an example of Gram-Schmidt orthonormalization. Consider
the space <mathjax>$L^2([t_0, t_1], \Re)$</mathjax>. Suppose I have <mathjax>$v_1 = 1, v_2 = t, v_3 =
t^2$</mathjax>, <mathjax>$t_0 = 0$</mathjax>, <mathjax>$t_1 = 1$</mathjax>, and <mathjax>$\mag{v_1}^2 = \int_0^1 1 \cdot 1 dt =
1$</mathjax>. The key idea of Gram-Schmidt orthonormalization is the following: start
with <mathjax>$b_1 \equiv \frac{v_1}{\mag{v_1}}$</mathjax>. Then go on with <mathjax>$b_2 = \frac{v_2 -
\braket{v_2}{b_1}b_1}{\mag{v_2 - \braket{v_2}{b_1}b_1}}$</mathjax>, and repeat until
you're done (in essence: you want to preserve only the component that is
orthogonal to the space spanned by the vectors you've computed so far, then
renormalize).</p>
<p>Basically, you get after all this computation that <mathjax>$b_2 = \frac{1}{12} t -
\frac{1}{24}$</mathjax>. Same construction for <mathjax>$b_3$</mathjax>.</p>
<p><a name='6'></a></p>
<h1>Singular Value Decomposition &amp; Introduction to Differential Equations</h1>
<h2>September 11, 2012</h2>
<p>Reviewing the adjoint, suppose we have two vector spaces <mathjax>$U, V$</mathjax>; like we
have with norms, let us associated a field that is either <mathjax>$\Re$</mathjax> or
<mathjax>$\mathbb{C}$</mathjax>. Assume that these spaces are inner product spaces (we're
associating with each an inner product). Suppose we have a continuous
(linear) map <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax>. We define the <strong>adjoint</strong> of this
map to be <mathjax>$\fn{\mathcal{A}^*}{V}{U}$</mathjax> such that <mathjax>$\braket{u}{\mathcal{A} v} =
\braket{\mathcal{A}^* v}{u}$</mathjax>.</p>
<p>We define <strong>self-adjoint</strong> maps as maps that are equal to their adjoints,
i.e. <mathjax>$\fn{\mathcal{A}}{U_1}{U_2} \st \mathcal{A} = \mathcal{A}^*$</mathjax>.</p>
<p>In finite-dimensional vector spaces, the adjoint of a map is equivalent to
the conjugate transpose of the matrix representation of the map. We refer
to matrices that correspond to self-adjoint maps as <strong>hermitian</strong>.</p>
<h2>Unitary matrices</h2>
<p>Suppose that we have <mathjax>$U \in \mathbb{F}^{n\times n}$</mathjax>. <mathjax>$U$</mathjax> is <strong>unitary</strong> iff
<mathjax>$U^*U = UU^* = I_n$</mathjax>. If <mathjax>$\mathbb{F}$</mathjax> is <mathjax>$\Re$</mathjax>, the matrix is called
<strong>orthogonal</strong>.</p>
<p>These constructions lead us to something useful: singular value
decomposition. We'll come back to this later when we talk about matrix
operations.</p>
<h2>Singular Value Decomposition (SVD)</h2>
<p>Suppose you have a matrix <mathjax>$M \in \mathbb{F}^{m\times m}$</mathjax>. An <strong>eigenvalue</strong>
<mathjax>$\lambda$</mathjax> of <mathjax>$M$</mathjax> is a complex number iff there exists a nonzero vector <mathjax>$v$</mathjax>
such that <mathjax>$Mv = \lambda v$</mathjax> (<mathjax>$v$</mathjax> is thus called the <strong>eigenvector</strong>
associated to <mathjax>$\lambda$</mathjax>). Now we can think about how to define singular
values of a matrix in terms of these definitions.</p>
<p>Let us think about this in general for a matrix <mathjax>$A \in \mathbb{F}^{m \times
n}$</mathjax> (which we consider to be a matrix representation of some linear map
with respect to a basis). Note that <mathjax>$A A^* = \mathbb{F}^{m \times m}$</mathjax>,
which will have <mathjax>$m$</mathjax> eigenvalues <mathjax>$\lambda_i, i = 1 ... m$</mathjax>.</p>
<p>Note that <mathjax>$AA^*$</mathjax> is hermitian. We note that from the Spectral theorem, we
can decompose the matrix into an orthonormal basis of eigenvectors
corresponding to real eigenvalues. In fact, in this case, the eigenvalues
must be real and non-negative.</p>
<p>If we write the eigenvalues of <mathjax>$AA^*$</mathjax> as <mathjax>$\lambda_1 \ge \lambda_2 \ge
... \ge \lambda_m$</mathjax>, where the first <mathjax>$r$</mathjax> are nonzero, note that <mathjax>$r =
\text{rank} AA^*$</mathjax>. We define the <strong>non-zero singular values</strong> of <mathjax>$A$</mathjax> to be
<mathjax>$\sigma_i = \sqrt{\lambda_i}, i \le r$</mathjax>. The remaining singular values are
zero.</p>
<p>Recall the <strong>induced 2-norm</strong>: let us relate this notion of singular values
back to the induced 2-norm of a matrix <mathjax>$A$</mathjax> (<mathjax>$\mag{A}_{2,i}$</mathjax>). Consider the
induced norm to be the norm induced by the action of <mathjax>$A$</mathjax> on the domain of
<mathjax>$A$</mathjax>; thus if we take the induced 2-norm, then this is the <mathjax>$\max (\lambda_i
(A^*A))^{1/2}$</mathjax>, which is simply the maximum singular value.</p>
<p>Now that we know what singular values are, we can do a useful decomposition
called singular value decomposition.</p>
<p>Take <mathjax>$M \in \mathbb{C}^{m \times n}$</mathjax>. We have the following theorem: there
exist unitary matrices <mathjax>$U \in \mathbb{C}^{m \times m}, V \in \mathbb{C}^{n
\times n}$</mathjax> such that <mathjax>$A = U \Sigma V$</mathjax>, where <mathjax>$\Sigma$</mathjax> is defined as a
diagonal matrix containing the singular values of <mathjax>$A$</mathjax>. Consider the first
<mathjax>$r$</mathjax> columns of <mathjax>$U$</mathjax> to be <mathjax>$U_1$</mathjax>, the first <mathjax>$r$</mathjax> columns of <mathjax>$V$</mathjax> to be <mathjax>$V_1$</mathjax>,
and the <mathjax>$r \times r$</mathjax> block of <mathjax>$\Sigma$</mathjax> containing the nonzero singular
values to be <mathjax>$\Sigma_r$</mathjax>. Then <mathjax>$A = U \Sigma V = U_1 \sigma_r
V_1^*$</mathjax>.</p>
<p>Consider <mathjax>$AA^*$</mathjax>. With a bit of algebra, we can show that <mathjax>$AA^*U_1 = U_1
\sigma_r^2$</mathjax>. We call the columns <mathjax>$u_i$</mathjax> of <mathjax>$U_1$</mathjax> are the eigenvectors of
<mathjax>$AA^*$</mathjax> associated to eigenvalues <mathjax>$\sigma_i^2$</mathjax>; these are called the
<strong>right-singular vectors</strong>.</p>
<p>Similarly, if we consider <mathjax>$A^*A$</mathjax>, we can show that <mathjax>$A^*A = V_1^* \Sigma_r^2
V_1$</mathjax> and that <mathjax>$v_i^* A^*A = \Sigma_r^2 v_1^*$</mathjax>; the columns of this matrix
are called the <strong>left-singular vectors</strong>.</p>
<h2>Recap</h2>
<p>We've covered a lot of ground these past few weeks: we covered functions,
vector spaces, bases, and then we started to consider linearity. And then
we started talking about endowing vector spaces with things like norms,
inner products; induced norms. From that, we went on to talk about
adjoints. We used adjoints, we went on to talk a little about projection
and least-squares optimization. We then went on to talk about Hermitian
matrices and singular value decomposition. I think about this first unit as
having many basic units that we'll use over and over again. Two interesting
applications: least-squares, SVD.</p>
<p>So we have this basis now to build on as we talk about linear
systems. We'll also need to build a foundation on linear differential
equations. We'll spend some time going over the basics: what a solution
means, under what conditions a solution exists (i.e. what properties does
the differential equation need to have?). We'll spend the next couple weeksn
talking about properties of differential equations.</p>
<p>All of what we've done up to now has been covered in appendix A of Callier
&amp; Desoer. For the introduction to differential equations, we'll follow
appendix B of Callier &amp; Desoer. Not the easiest to read, but very
comprehensive background reading.</p>
<p>The existence and uniqueness theorems are in many places, however.</p>
<p>Lecture notes 7.</p>
<h2>Differential Equations</h2>
<p><mathjax>$$
\dot{x} = f((x(t), t)), x(t_0) = x_0
\\ x \in \Re^n
\\ \fn{f}{\Re^n \times \Re}{\Re^n}
$$</mathjax></p>
<p>(strictly speaking, <mathjax>$f$</mathjax> maps <mathjax>$x$</mathjax> to the tangent space, but for this course,
we're going to consider the two spaces to be equivalent)</p>
<p>Often, we're going to consider the <strong>time-invariant</strong> case (where there is
no dependence on <mathjax>$t$</mathjax>, but rather only on <mathjax>$x$</mathjax>), but this is a time-variant
case. Recall that we consider time to be a privileged variable, i.e. always
"marching forward".</p>
<p>What we're going to talk about now is how we can solve this differential
equation. Rather (for now), under what conditions does there exist a
(unique) solution to the differential equation (with initial condition)?
We're interested in these two properties: existence and uniqueness. The
solution we call <mathjax>$x(t)$</mathjax> where <mathjax>$x(t_0) = x_0$</mathjax>. We need some understanding of
some properties of that function <mathjax>$f$</mathjax>. We'll talk about continuity,
piecewise continuity, Lipschitz continuity (thinking about the
existence). In terms of uniqueness, we'll be talking about Cauchy
sequences, Banach spaces, Bellman-Gr&ouml;nwall lemma.</p>
<p>A couple of different ways to prove uniqueness and existence; we'll use the
Callier &amp; Desoer method.</p>
<p>We'll finish today's lecture by just talking about some definitions of
continuity. Suppose we have a function <mathjax>$f(x)$</mathjax> that is said to be
<strong>continuous</strong>: that is, <mathjax>$\forall \epsilon &gt; 0, \exists \delta &gt; 0 \st
\abs{x_1
- x_2} &lt; \delta \implies \abs{f(x_1) - f(x_2)} &lt; \epsilon$</mathjax>
(<mathjax>$\epsilon$</mathjax>-<mathjax>$\delta$</mathjax> definition).</p>
<p>Suppose we have <mathjax>$\fn{f(x,t)}{\Re^n \times \Re}{\Re^n}$</mathjax>. <mathjax>$f$</mathjax> is said to be
piece-wise continuous (w.r.t. <mathjax>$t$</mathjax>), <mathjax>$\forall x$</mathjax> if <mathjax>$\fn{f(x,
\cdot)}{\Re}{\Re^n}$</mathjax> is continuous except at a finite number of
(well-behaved) discontinuities in any closed and bounded interval of
time. What I'm not allowing in this definition are functions with
infinitely many points of discontinuity.</p>
<p>Next time we'll talk about Lipschitz continuity.</p>
<p><a name='7'></a></p>
<h1>Existence and Uniqueness of Solutions to Differential Equations</h1>
<h2>September 13, 2012</h2>
<p>Section this Friday only, 9:30 - 110:30, Cory 299.</p>
<p>Today: existence and uniqueness of solutions to differential equations.</p>
<p>We called this a DE or ODE, and we associated with it an initial
condition. We started to talk about properties of the function <mathjax>$f$</mathjax> as a
function of <mathjax>$x$</mathjax> only, but we can consider thinking about this as a function
of <mathjax>$x$</mathjax> for all t. This is a map from <mathjax>$\Re^n \to \Re^n$</mathjax>. In this class,
recall, we used the <mathjax>$\epsilon$</mathjax>-<mathjax>$\delta$</mathjax> definition for continuity.</p>
<p>We also introduced the concept of piecewise continuity, which will be
important for thinking about the right-hand-side of the differential
equation.</p>
<p>We defined piecewise continuity as <mathjax>$\fn{f(t)}{\Re_+}{\Re^n}$</mathjax>, where <mathjax>$f(t)$</mathjax>
is said to be piecewise continuous in <mathjax>$t$</mathjax>, where the function is continuous
except at a set of well-behaved discontinuities (finitely many in any
closed and bounded, i.e. <strong>compact</strong>, interval).</p>
<p>Finally, we will define Lipschitz continuity as follows: a function
<mathjax>$\fn{f(\cdot, t)}{\Re^n}{\Re^n}$</mathjax> is <strong>Lipschitz continuous</strong> in x if there
exists a piecewise continuous function of time <mathjax>$\fn{k(t)}{\Re_+}{\Re_+}$</mathjax>
such that the following inequality holds: <mathjax>$\mag{f(x_1) - f(x_2)} \le
k(t)\mag{x_1 - x_2}, \forall x_1, x_2 \in \Re^n, \forall t \in \Re_+$</mathjax>. This
inequality (condition) is called the <strong>Lipschitz condition</strong>.</p>
<p>An important thing in this inequality is that there has to be one function
<mathjax>$k(t)$</mathjax>, and it has to be piecewise continuous. That is, there exists such a
function that is not allowed to go to infinity in compact time
intervals.</p>
<p>It's an interesting condition, and if we look at this and compare the
Lipschitz continuity definition to the general continuity definition, we
can easily show that if the function is LC (Lipschitz continuous), then
it's C (continuous), since LC is a stricter condition than C. That
implication is fairly straightforward to show, but the inverse relationship
is not necessarily true (i.e. continuity does not necessarily imply
Lipschitz continuity).</p>
<p>Aside: think about this condition and what it takes to show that a function
is Lipschitz continuous. Need to come up with a candidate <mathjax>$k(t)$</mathjax> (often
called the Lipschitz function or constant, if it's constant). Often the
hardest part: trying to extract from <mathjax>$f$</mathjax> what a possible <mathjax>$k$</mathjax> is.</p>
<p>But there's a useful possible candidate for <mathjax>$k(t)$</mathjax>, given a particular
function <mathjax>$f$</mathjax>. Let's forget about time for a second and consider a function
just of <mathjax>$x$</mathjax>. If the <strong>Jacobian</strong> <mathjax>$Df$</mathjax> (often you also use <mathjax>$\pderiv{f}{x}$</mathjax>),
which is an <mathjax>$n \times n$</mathjax> matrix (where <mathjax>$(Df)^j_i = \pderiv{f_j}{x_i}$</mathjax>. If
the Jacobian <mathjax>$Df$</mathjax> exists, then its norm provides a candidate Lipschitz
function <mathjax>$k(t)$</mathjax>.</p>
<p>A norm of the Jacobian of <mathjax>$f$</mathjax>, if independent of <mathjax>$x$</mathjax>, tells you that the
function is Lipschitz. If the norm always seems to depend on <mathjax>$x$</mathjax>, you can
still say something about the Lipschitz properties of the function: you can
call it locally Lipschitz by bounding the value of <mathjax>$x$</mathjax> in some region.</p>
<p>Sketch of proof: generalization of mean value theorem (easy to sketch in
<mathjax>$\Re^1$</mathjax>). Mean value theorem states that there exists a point such that the
instantaneous slope is the same as the average slope (assuming that the
function is differentiable). If we want to generalize it to more
dimensions, we say <mathjax>$f(x_1) - f(x_2) = Df(\lambda x_1 + (1 - \lambda)
x_2)(x_1 - x_2)$</mathjax> (where <mathjax>$0 &lt; \lambda &lt; 1$</mathjax>). All we've required is the
existence of <mathjax>$Df$</mathjax>.</p>
<p>Now we can just take norms (and this is what's interesting now) and use
some of the results we have from norms. This provides a very useful
construction for a candidate for <mathjax>$k$</mathjax> (might not provide a great bound), but
it's the second thing to try if you can't immediately extract out a
function <mathjax>$k(t)$</mathjax>.</p>
<p>Something not in the notes, but useful. Let's go back to where we started,
the differential equation with initial condition, and state the main
theorem.</p>
<h2>Fundamental Theorem of DEs / the Existence and Uniqueness theorem of (O)DEs</h2>
<p>suppose we have a differential equation with an initial condition. Assume
that <mathjax>$f(x)$</mathjax> is piecewise continuous in <mathjax>$t$</mathjax> and Lipschitz continuous in
<mathjax>$x$</mathjax>. With that information, we have that there exists a unique function of
time which maps <mathjax>$\Re_+ \to \Re^n$</mathjax>, which is differentiable (<mathjax>$C^1$</mathjax>) <em>almost</em>
everywhere (derivative exists at all points at which <mathjax>$f$</mathjax> is continuous),
and it satisfies the initial condition and differential equation. This
derivative exists at all points <mathjax>$t \in [t_1, t_2] - D$</mathjax>, where
<mathjax>$D$</mathjax> is the set of points where <mathjax>$f$</mathjax> is discontinuous in <mathjax>$t$</mathjax>.</p>
<p>We are going to be interested in studying differential equations where we
know these conditions hold. We're also going to prove the theorem. It's a
nice thing to do (a little in depth) because it demonstrates some proof
techniques (as well as giving you an idea of why the theorem works).</p>
<h2>LC condition</h2>
<p>The norm of the Jacobian of the example is bounded for bounded <mathjax>$x$</mathjax>. That
is, we can choose a local region in <mathjax>$\Re$</mathjax> for which our <mathjax>$Df$</mathjax> is bounded to
be less than some constant. That gives us a candidate Lipschitz constant
for that local region. We say then that <mathjax>$f(x)$</mathjax> is (at least) <strong>locally
Lipschitz continuous</strong> (usually we just say this without specifying a
region, since you can usually find a bound given any region). Further, it
is trivially piecewise continuous in time (since it doesn't depend on
time).</p>
<p>Note: if the Lipschitz condition holds only locally, it may be that the
solution is only defined over a certain range of time.</p>
<p>We didn't show this, but in this example, the Lipschitz condition does not
hold globally.</p>
<h2>Local Fundamental theorem of DEs</h2>
<p>Now assume that <mathjax>$f(x)$</mathjax> is piecewise continuous in <mathjax>$t$</mathjax> and Lipschitz
continuous in <mathjax>$x$</mathjax> (for all <mathjax>$x \in G \in \Re^n$</mathjax>). We now have that there
exists a unique function of time and an interval <mathjax>$[t_0,t_1]$</mathjax> (such that
<mathjax>$t_0 \in G, t_1 \in G$</mathjax>) which maps <mathjax>$\Re_+ \to \Re^n$</mathjax>, which is
differentiable (<mathjax>$C^1$</mathjax>) <em>almost</em> everywhere (derivative exists at all points
at which <mathjax>$f$</mathjax> is continuous), and it satisfies the initial condition and
differential equation. As before, This derivative exists at all points <mathjax>$t
\in [t_1, t_2] - D$</mathjax>, where <mathjax>$D$</mathjax> is the set of points where <mathjax>$f$</mathjax> is
discontinuous in <mathjax>$t$</mathjax>. If it is global, we can make the interval as large as
desired.</p>
<h2>Proof</h2>
<p>There are two pieces: the proof of existence and the proof of
uniqueness. Today will likely just be existence.</p>
<h2>Existence</h2>
<p>Roadmap: construct an infinite sequence of continuous functions defined
(recursively) as follows <mathjax>$x_{m+1}(t) = x_0 + \int_{t_0}^t f(x_m(\tau),
\tau) d\tau$</mathjax>. First, show that this sequence converges to a continuous
function <mathjax>$\fn{\Phi(\cdot)}{\Re_+}{\Re^n}$</mathjax> which solves the DE/IC pair.</p>
<p>Would like to be able to prove the first thing here: I've constructed a
sequence, and I want to show that the limit of this sequence is a solution
to the differential equation.</p>
<p>The tool that I'm going to use is a property called Cauchy, and then I'm
going to invoke the result that if I have a complete space, any Cauchy
sequence on the space converges to something in the space. Gives me the
basis of the existence of the thing that this converges to.</p>
<p>Goal: (1) to show that this sequence is a Cauchy sequence in a
complete normed vector space, which means the sequence converges to
something in the space, and (2) to show that the limit of this sequence
satisfies the DE/IC pair.</p>
<p>A <strong>Cauchy sequence</strong> (on a normed vector space) is such that there exists
some point in the sequence (some finite index <mathjax>$m$</mathjax>) such that if you look at
any point beyond that index, the distance between the later points can be
made smaller than some arbitrarily small <mathjax>$\epsilon &gt; 0$</mathjax>. In other words: if
we drop a finite number of elements from the start of the sequence, the
distance between any remaining elements can be made arbitrarily small.</p>
<p>We define a <strong>Banach space</strong> (equivalently, a <strong>complete normed vector
space</strong>) is one in which all Cauchy sequences converge. Implicitly in that,
it means to something in the space itself.</p>
<p>Just an aside, a <strong>Hilbert space</strong> is a <strong>complete inner product
space</strong>. If you have an inner product space, and you define the norm in
that inner product space induced by that inner product, if all Cauchy
sequences of that space converge (to a limit in the space) with this norm,
then it is a Hilbert space.</p>
<p>Think about a Cauchy sequence on a space that converges to something not
necessarily in the space. Example: any continued fraction.</p>
<p>To show (1), we'll show that this sequence <mathjax>$\{x_m\}$</mathjax> that we constructed is
a Cauchy sequence in a Banach space. Interestingly, it matters what norm
you choose.</p>
<p><a name='8'></a></p>
<h1>Proof of Existence and Uniqueness Theorem</h1>
<h2>September 18, 2012</h2>
<p>Today:</p>
<ul>
<li>proof of existence and uniqueness theorem.</li>
<li>[ if time ] introduction to dynamical systems.</li>
</ul>
<p>First couple of weeks of review to build up basic concepts that we'll be
drawing upon throughout the course. Either today or Thursday we will launch
into linear system theory.</p>
<p>We're going to recall where we were last time. We had the fundamental
theorem of differential equations, which said the following: if we had a
differential equation, <mathjax>$\dot{x} = f(x,t)$</mathjax>, with initial condition <mathjax>$x(t_0) =
x_0$</mathjax>, where <mathjax>$x(t) \in \Re^n$</mathjax>, etc, if <mathjax>$f( \cdot , t)$</mathjax> is Lipschitz
continuous, and <mathjax>$f(x, \cdot )$</mathjax> is piecewise continuous, then there exists a
unique solution to the differential equation / initial condition pair (some
function <mathjax>$\phi(t)$</mathjax>) wherever you can take the derivative (may not be
differentiable everywhere: loses differentiability on the points where
discontinuities exist).</p>
<p>We spent quite a lot of time discussing Lipschitz continuity. Job is
usually to test both conditions; first one requires work. We described a
popular candidate function by looking at the mean value theorem and
applying it to <mathjax>$f$</mathjax>: a norm of the Jacobian function provides a candidate
Lipschitz if it works.</p>
<p>We also described local Lipschitz continuity, and often, when using a norm
of the Jacobian, that's fairly easy to show.</p>
<p>Important point to recall: a norm of the Jacobian of <mathjax>$f$</mathjax> provides a
candidate Lipschitz function.</p>
<p>Another important thing to say here is that we can use any norm we want, so
we can be creative in our choice of norm when looking for a better bound.</p>
<p>We started our proof last day, and we talked a little about the structure
of the proof. We are going to proceed by constructing a sequence of
functions, then show (1) that it converges to a solution, then show (2)
that it is unique.</p>
<h2>Proof of Existence</h2>
<p>We are going to construct this sequence of functions as follows:
<mathjax>$x_{m+1}(t) = x_0 + \int_0^t f(x_m(\tau)) d\tau$</mathjax>. Here we're dealing with
an arbitrary interval from <mathjax>$t_1$</mathjax> to <mathjax>$t_2$</mathjax>, and so <mathjax>$0 \in [t_1, t_2]$</mathjax>. We
want to show that this sequence is a Cauchy sequence, and we're going to
rely on our knowledge that the space these functions are defined in is a
Banach space (hence this sequence converges to something in the space).</p>
<p>We have to put a norm on the set of reals, so we'll use the infinity
norm. Not going to prove it, but rather state it's a Banach space. If we
show that this is a Cauchy sequence, then the limit of that Cauchy sequence
exists in the space. The reason that's interesting is that it's this limit
that provides a candidate for this differential equation.</p>
<p>We will then prove that this limit satisfies the DE/IC pair. That is
adequate to show existence. We'll then go on to prove uniqueness.</p>
<p>Our immediate goal is to show that this sequence is Cauchy, which is, we
should show <mathjax>$\exists m \st (x_{m+p} - x_m) \to 0$</mathjax> as <mathjax>$m$</mathjax> gets large.</p>
<p>First let us look at the difference between <mathjax>$x_{m+1}$</mathjax> and <mathjax>$x_m$</mathjax>. Just
functions of time, and we can compute this. <mathjax>$\mag{x_{m+1} - x_m} =
\int_{t_0}^t (f(x_m, \tau) - f(x_{m+1}, \tau)) d\tau$</mathjax>. Use the fact that f
is Lipschitz continuous, and so it is <mathjax>$\le k(\tau)\mag{x_m(\tau) -
x_{m+1}(\tau)} d\tau$</mathjax>. The function is Lipschitz, so well-defined, and it
has a supremum in this interval. Let <mathjax>$\bar{k}$</mathjax> be the supremum of <mathjax>$k$</mathjax> over
the whole interval <mathjax>$[t_1, t_2]$</mathjax>. This means that we can take this
inequality and rewrite as <mathjax>$\mag{x_{m+1} - x_m} \le \bar{k} \int_{t_0}^t
\mag{x_m(\tau) - x_{m+1}(\tau)} d\tau$</mathjax>. Now we have a bound that relates
the bound between <mathjax>$x_m$</mathjax> and <mathjax>$x_{m+1}$</mathjax>. You can essentially relate the
distance we've just related between two subsequent elements to some further
distance by counting.</p>
<p>Let us do two things: sort out the integral on the right-hand-side, then
look at arbitrary elements beyond an index.</p>
<p>We know that <mathjax>$x_1(t) = x_0 + \int_{t_0}^t f(x_0, \tau) d\tau$</mathjax>, and that <mathjax>$x_1
- x_0 \le \int_{t_0}^{t} \mag{f(x_0, \tau)} d\tau \le \int_{t_1}{t_2}
\mag{f(x_0, \tau) d\tau} \defequals M$</mathjax>. From the above inequalities,
<mathjax>$\mag{x_2 - x_1} \le M \bar{k}\abs{t - t_0}$</mathjax>. Now I can look at general
bounds: <mathjax>$x_3 - x_2 \le \frac{M\bar{k}^2 \abs{t - t_0}^2}{2!}$</mathjax>. In general,
<mathjax>$x_{m+1} - x_m \le \frac{M\parens{\bar{k} \abs{t - t_0}}^m}{m!}$</mathjax>.</p>
<p>If we look at the norm of <mathjax>$\dot{x}$</mathjax>, that is going to be a function
norm. What I've been doing up to now is look at a particular value <mathjax>$t_1 &lt; t
&lt; t_2$</mathjax>.</p>
<p>Try to relate this to the norm <mathjax>$\mag{x_{m+1} - x_m}_\infty$</mathjax>. Can what we've
done so far give us a bound on the difference between two functions? We
can, because the infinity norm of a function is the maximum value that the
function assumes (maximum vector norm for all points <mathjax>$t$</mathjax> in the interval
we're interested in). If we let <mathjax>$T$</mathjax> be the difference between our larger
bound <mathjax>$t_2 - t_1$</mathjax>, we can use the previous result on the pointwise norm,
then a bound on the function norm has to be less than the same
bound, i.e. if a pointwise norm function is less than this bound for all
relevant <mathjax>$t$</mathjax>, then its max value must be less than this bound.</p>
<p>That gets us on the road we want to be, since that now gets us a bound. We
can now go back to where we started. What we're actually interested in is
given an index <mathjax>$m$</mathjax>, we can construct a bound on all later elements in the
sequence.</p>
<p><mathjax>$\mag{x_{m+p} - x_m}_\infty = \mag{x_{m+p} + x_{m+p-1} - x_{m+p-1} + ... -
x_m} = \mag{\sum_{k=0}^{p-1} (x_{m+k+1} - x_{m+k})} \le M \sum_{k=0}^{p-1}
\frac{(\bar{k}T)^{m+k}}{(m+k)!}$</mathjax>.</p>
<p>We're going to recall a few things from undergraduate calculus: Taylor
expansion of the exponential function and <mathjax>$(m+k)! \ge m!k!$</mathjax>.</p>
<p>With these, we can say that <mathjax>$\mag{x_{m+p} - x_m}_\infty \le
M\frac{(\bar{k}T)^m}{m!} e^{\bar{k} T}$</mathjax>. What we'd like to show is that this
can be made arbitrarily small as <mathjax>$m$</mathjax> gets large. We study this bound as <mathjax>$m
\to \infty$</mathjax>, and we recall that we can use the Stirling approximation,
which shows that factorial grows faster than the exponential function. That
is enough to show that <mathjax>$\{x_m\}_0^\infty$</mathjax> is Cauchy. Since it is in a
Banach space (not proving, since beyond our scope), it converges to
something in the space to a function (call it <mathjax>$x^\ell$</mathjax>) in the same
space.</p>
<p>Now we just need to show that the limit <mathjax>$x^\ell$</mathjax> solves the differential
equation (and initial condition). Let's go back to the sequence that
determines <mathjax>$x^\ell$</mathjax>. <mathjax>$x_{m+1} = x_0 + \int_{t_0}^t f(x_m, \tau)
d\tau$</mathjax>. We've proven that this limit converges to <mathjax>$x^\ell$</mathjax>. What we want to
show is that if we evaluate <mathjax>$f(x^\ell, t)$</mathjax>, then <mathjax>$\int_{t_0}^t f(x_m, \tau)
\to \int_{t_0}^t f(x^\ell, \tau) d\tau$</mathjax>. Would be immediate if we had that
the function were continuous. Clear that it satisfies initial condition by
the construction of the sequence, but we need to show that it satisfies the
differential equation. Conceptually, this is probably more difficult than
what we've just done (establishing bounds, Cauchy sequences). Thinking
about what that function limit is and what it means for it to satisfy that
differential equation.</p>
<p>Now, you can basically use some of the machinery we've been using all along
to show this. Difference between these goes to <mathjax>$0$</mathjax> as <mathjax>$m$</mathjax> gets large.</p>
<p><mathjax>$$\mag{\int_{t_0}^t (f(x_m, \tau) f(x^\ell, \tau)) d\tau}
\\ \le \int_{t_0}^t k(\tau) \mag{x_m - x^\ell} d\tau \le \bar{k}\mag{x_m - x^\ell}_\infty T
\\ \le \bar{k} M e^{\bar{k} T} \frac{(\bar{k} T)^m}{m!}T
$$</mathjax></p>
<p>Thus <mathjax>$x^\ell$</mathjax> solves the DE/IC pair. A solution <mathjax>$\Phi$</mathjax> is <mathjax>$x^\ell$</mathjax>,
i.e. <mathjax>$x^\ell(t) = f(x^\ell, t) \forall [t_1, t_2] - D$</mathjax> and <mathjax>$x^\ell(t_0) =
x_0$</mathjax></p>
<p>To show that this solution is unique, we will use the Bellman-Gronwall
lemma, which is very important. Used ubiquitously when you want to show
that functions of time are equal to each other: candidate mechanism to do
that.</p>
<h2>Bellman-Gronwall Lemma</h2>
<p>Let <mathjax>$u, k$</mathjax> be real-valued positive piece-wise continuous functions of time,
and we'll have a constant <mathjax>$c_1 \ge 0$</mathjax> and <mathjax>$t_0 \ge 0$</mathjax>. If we have such
constants and functions, then the following is true: if <mathjax>$u(t) \le c_1 +
\int_{t_0}^t k(\tau)u(\tau) d\tau$</mathjax>, then <mathjax>$u(t) \le c_1 e^{\int_{t_0}^t
k(\tau) d\tau}$</mathjax>.</p>
<h2>Proof (of B-G)</h2>
<p><mathjax>$t &gt; t_0$</mathjax> WLOG.</p>
<p><mathjax>$$U(t) = c_1 + \int_{t_0}^t k(\tau) u(\tau) d\tau
\\ u(t) \le U(t)
\\ u(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau} \le U(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau}
\\ \deriv{}{t}\parens{U(t)e^{\int_{t_0}^t k(\tau) d\tau}} \le 0 \text{(then integrate this derivative, note that U(t_0) = c_1)}
\\ u(t) \le U(t) \le c_1 e^{\int_{t_0}^t k(\tau) d\tau}
$$</mathjax></p>
<h2>Using this to prove uniqueness of DE/IC solutions</h2>
<p>How we're going to use this to prove B-G lemma.</p>
<p>We have a solution that we constructed <mathjax>$\Phi$</mathjax>, and someone else gives us a
solution <mathjax>$\Psi$</mathjax>, constructed via a different method. Show that these must
be equivalent. Since they're both solutions, they have to satisfy the DE/IC
pair. Take the norm of the difference between the differential equations.</p>
<p><mathjax>$$\mag{\Phi - \Psi} \le \bar{k} \int_{t_0}^t \mag{\Phi - \Psi} d\tau \forall
t_0, t \in [t_1, t_2]$$</mathjax></p>
<p>From the Bellman-Gronwall Lemma, we can rewrite this inequality as
<mathjax>$\mag{\Phi - \Psi} \le c_1 e^{\bar{k}(t - t_0)}$</mathjax>. Since <mathjax>$c_1 = 0$</mathjax>, this
norm is less than or equal to 0. By positive definiteness, this norm must
be equal to 0, and so the functions are equal to each other.</p>
<h2>Reverse time differential equation</h2>
<p>We think about time as monotonic (either increasing or decreasing, usually
increasing). Suppose that time is decreasing. <mathjax>$\exists \dot{x} =
f(x,t)$</mathjax>. Going backwards in time, explore existence and uniqueness going
backwards in time. Suppose we had a time variable <mathjax>$\tau$</mathjax> which goes from
<mathjax>$t_0$</mathjax> backwards, and defined <mathjax>$\tau \defequals t_0 - t$</mathjax>. We want to define
the solution to that differential equation backwards in time as <mathjax>$z(\tau) =
x(t)$</mathjax> if <mathjax>$t &lt; t_0$</mathjax>. Derive what reverse order time derivative is. Equation
is just <mathjax>$-f$</mathjax>; we're going to use <mathjax>$\bar{f}$</mathjax> to represent this
function (<mathjax>$\deriv{}{\tau}z = -\deriv{}{t}x = -f(x, t) = -f(z, \tau) =
\bar{f}$</mathjax>).</p>
<p>This equation, if I solve the reverse time differential equation, we'll
have some corresponding backwards solution. Concluding statement: can think
about solutions forwards and backwards in time. Existence of unique
solution forward in time means existence of unique solution backward in
time (and vice versa). You can't have solutions crossing themselves in
time-invariant systems.</p>
<p><a name='9'></a></p>
<h1>Introduction to dynamical systems</h1>
<h2>September 20, 2012</h2>
<p>Suppose we have equations <mathjax>$\dot{x} = f(x, u, t)$</mathjax>, <mathjax>$\fn{f}{\Re^n \times
\Re^n \times \Re_+}{\Re^n}$</mathjax> and <mathjax>$y = h(x, u, t)$</mathjax>, <mathjax>$\fn{h}{\Re^n \times
\Re^n \times \Re_+}{\Re^n}$</mathjax>. We define <mathjax>$n_i$</mathjax> as the dimension of the input
space, <mathjax>$n_o$</mathjax> as dimension of the output space, and <mathjax>$n$</mathjax> as the dimension of
the state space.</p>
<p>We've looked at the form, and if we specify a particular <mathjax>$\bar{u}(t)$</mathjax> over some
time interval of interest, then we can plug this into the right hand side
of this differential equation. Typically we do not supply a particular
input. Thinking about solutions to this differential equation, for now,
let's suppose that it's specified.</p>
<p>Suppose we have some feedback function of the state. If <mathjax>$u$</mathjax> is specified,
as long as <mathjax>$\bar{f}$</mathjax> satisfies the conditions for the existence and
uniqueness theorem, we have a differential equation we can solve.</p>
<p>Another example: instead of differential equation (which corresponds to
continuous time), we have a difference equation (which corresponds to
discrete time).</p>
<p>Example: dynamic system represented by an LRC circuit. One practical way to
define the state <mathjax>$x$</mathjax> is as a vector of elements whose derivatives appear in
our differential equation. Not formal, but practical for this example.</p>
<p>Notions of discretizing.</p>
<h2>What is a dynamical system?</h2>
<p>As discussed in first lecture, we consider time <mathjax>$\tau$</mathjax> to be a privileged
variable. Based on our definition of time, the inputs and outputs are all
functions of time.</p>
<p>Now we're going to define a <strong>dynamical system</strong> as a 5-tuple: <mathjax>$(\mathcal{U},
\Sigma, \mathcal{Y}, s, r)$</mathjax> (input space, state space, output space, state
transition function, output map).</p>
<p>We define the <strong>input space</strong> as the set of input functions over time to an
input set <mathjax>$U$</mathjax> (i.e. <mathjax>$\mathcal{U} = \{\fn{u}{\tau}{U}\}$</mathjax>. Typically, <mathjax>$U =
\Re^{n_i}$</mathjax>).</p>
<p>We also define the <strong>output space</strong> as the set of output functions over time to
an output set <mathjax>$Y$</mathjax> (i.e. <mathjax>$\mathcal{Y} = \{\fn{y}{\tau}{Y}\}$</mathjax>). Typically, <mathjax>$Y
= \Re^{n_o}$</mathjax>.</p>
<p><mathjax>$\Sigma$</mathjax> is our <strong>state space</strong>. Not defined as the function, but the actual
state space. Typically, <mathjax>$\Sigma = \Re^n$</mathjax>, and we can go back and think
about the function <mathjax>$x(t) \in \Sigma$</mathjax>. <mathjax>$\fn{x}{\tau}{\Sigma}$</mathjax> is called the
state trajectory.</p>
<p><mathjax>$s$</mathjax> is called the <strong>state transition function</strong> because it defines how the
state changes in response to time and the initial state and the
input. <mathjax>$\fn{s}{\tau \times \tau \times \Sigma \times U }{\Sigma}$</mathjax>. Usually
we write this as <mathjax>$x(t_1) = s(t_1, t_0, x_0, u)$</mathjax>, where <mathjax>$u$</mathjax> is the function
<mathjax>$u(\cdot) |_{t_0}^{t_1}$</mathjax>. This is important: coming towards how we define
state. Only things you need to get to state at the new time are the initial
state, inputs, and dynamics.</p>
<p>Finally, we have this <strong>output map</strong> (sometimes called the readout map)
<mathjax>$r$</mathjax>. <mathjax>$\fn{r}{\tau \times \Sigma \times U}{Y}$</mathjax>. That is, we can think about
<mathjax>$y(t) = r(t, x(t), u(t))$</mathjax>. There's something fundamentally different
between <mathjax>$r$</mathjax> and <mathjax>$s$</mathjax>. <mathjax>$s$</mathjax> depended on the function <mathjax>$u$</mathjax>, whereas <mathjax>$r$</mathjax> only
depended on the current value of <mathjax>$u$</mathjax> at a particular time.</p>
<p><mathjax>$s$</mathjax> captures dynamics, while <mathjax>$r$</mathjax> is static. Remark: <mathjax>$s$</mathjax> has dynamics
(memory) -- things that depend on previous time, whereas <mathjax>$r$</mathjax> is static:
everything it depends on is at the current time (memoryless).</p>
<p>In order to be a dynamical system, we need to satisfy two axioms: a
dynamical system is a five-tuple with the following two axioms:</p>
<ul>
<li>The <strong>state transition axiom</strong>: <mathjax>$\forall t_1 \ge t_0$</mathjax>, given <mathjax>$u, \tilde{u}$</mathjax>
that are equal to each other over a particular time interval, the state
transition functions must be equal over that interval, i.e. <mathjax>$s(t_1, t_0,
x_0, u) = s(t_1, t_0, x_0, \tilde{u})$</mathjax>. Requires us to not have
dependence on the input outside of the time interval of interest.</li>
<li>The <strong>semigroup axiom</strong>: suppose you start a system at <mathjax>$t_0$</mathjax> and evolve it to
<mathjax>$t_2$</mathjax>, and you're considering the state. You have an input <mathjax>$u$</mathjax> defined
over the whole time interval. If you were to look at an intermediate
point <mathjax>$t_1$</mathjax>, and you computed the state at <mathjax>$t_1$</mathjax> via the state transition
function, we can split our time interval into two intervals, and we can
compute the result any way we like. Stated as the following: <mathjax>$s(t_2, t_1,
s(t_1, t_0, x_0, u), u) = s(t_2, t_0, x_0, u)$</mathjax>.</li>
</ul>
<p>When we talk about a dynamical system, we have to satisfy these axioms.</p>
<h2>Response function</h2>
<p>Since we're interested in the outputs and not the states, we can define
what we call the <strong>response map</strong>. It's not considered part of the definition
of a dynamical system because it can be easily derived.</p>
<p>It's the composition of the state transition function and the readout map,
i.e. <mathjax>$y(t) = r(t, x(t), u(t)) = r(t, s(t, t_0, x_0, u), u(t)) \defequals
\rho(t, t_0, x_0, u)$</mathjax>. This is an important function because it is used to
define properties of a dynamical system. Why is that? We've said that
states are somehow mysterious. Not something we typically care about:
typically we care about the outputs. Thus we define properties like
linearity and time invariance.</p>
<h2>Time Invariance</h2>
<p>We define a time-shift operator <mathjax>$\fn{T_\tau}{\mathcal{U}}{\mathcal{U}}$</mathjax>,
<mathjax>$\fn{T_\tau}{\mathcal{Y}}{\mathcal{Y}}$</mathjax>. <mathjax>$(T_\tau u)(t) \defequals u(t -
\tau)$</mathjax>. Namely, the value of <mathjax>$T_\tau u$</mathjax> is that of the old signal at
<mathjax>$t-\tau$</mathjax>.</p>
<p>A <strong>time-invariant</strong> (dynamical) system is one in which the input space and
output space are closed under <mathjax>$T_\tau$</mathjax> for all <mathjax>$\tau$</mathjax>, and <mathjax>$\rho(t, t_0,
x_0, u) = \rho(t + \tau, t_0 + \tau, x_0, T_\tau u)$</mathjax>.</p>
<h2>Linearity</h2>
<p>A <strong>linear</strong> dynamical system is one in which the input, state, and output
spaces are all linear spaces over the same field <mathjax>$\mathbb{F}$</mathjax>, and the
response map <mathjax>$\rho$</mathjax> is a linear map of <mathjax>$\Sigma \times \mathcal{U}$</mathjax> into
<mathjax>$\mathcal{Y}$</mathjax>.</p>
<p>This is a strict requirement: you have to check that the response map
satisfies these conditions. Question that comes up: why do we define
linearity of a dynamical system in terms of linearity of the response and
not the state transition function? Goes back to a system being
intrinsically defined by its inputs and outputs. Often states, you can have
many different ways to define states. Typically we can't see all of
them. It's accepted that when we talk about a system and think about its
I/O relations, it makes sense that we define linearity in terms of this
memory function of the system, as opposed to the state transition function.</p>
<p>Let's just say a few remarks about this: <strong>zero-input response</strong>,
<strong>zero-state response</strong>. If we look at the zero element in our spaces (so
we have a zero vector), then we can take our superposition, which implies
that the response at time <mathjax>$t$</mathjax> is equal to the zero-state response, which is
the response, given that we started at the zero state, plus the zero input
response.</p>
<p>That is: <mathjax>$\rho(t, t_0, x_0, u) = \rho(t, t_0, \theta_x, u) + \rho(t, t_0,
x_0, \theta_u)$</mathjax> (from the definition of linearity).</p>
<p>The second remark is that the zero-state response is linear in the input,
and similarly, the zero-input response is linear in the state.</p>
<p>One more property of dynamical systems before we finish: <strong>equivalence</strong> (a
property derived from the definition). Take two dynamical systems <mathjax>$D = (U,
\Sigma, Y, s, r), \tilde{D} = (U, \bar{\Sigma}, Y, \bar{s}, \bar{r})$</mathjax>. <mathjax>$x_0
\in D$</mathjax> is equivalent to <mathjax>$\tilde{x_0} \in \tilde{D}$</mathjax> at <mathjax>$t_0$</mathjax>. If <mathjax>$\forall t
\ge t_0, \rho(t, t_0, x_0, u) = \tilde{\rho}(t, t_0, \tilde{x_0}, u)$</mathjax>
<mathjax>$\forall x$</mathjax> and some <mathjax>$\tilde{x}$</mathjax>, the two systems are equivalent.</p>
<p><a name='10'></a></p>
<h1>Linear time-varying systems</h1>
<h2>September 25, 2012</h2>
<p>Recall the state transition function is given some function of the current
time with initial state, initial time, and inputs, Suppose you have a
differential equation; how do you acquire the state transition function?
Solve the differential equation.</p>
<p>For a general dynamical system, there are different ways to get the state
transition function. This is an instantiation of a dynamical system, and
we're going to ge thte state transition function by solving the
differential equation / initial condition pair. </p>
<p>We're going to call <mathjax>$\dot{x}(t) = A(t)x(t) + B(t)u(t)$</mathjax> a vector
differential equation with initial condition <mathjax>$x(t_0) = x_0$</mathjax>.</p>
<p>So that requires us to think about solving that differential equation. Do a
dimension check, to make sure we know the dimensions of the matrices. <mathjax>$x
\in \Re^n$</mathjax>, so <mathjax>$A \in \Re^{n_0 \times n}$</mathjax>. We could define the matrix
function <mathjax>$A$</mathjax>, which takes intervals of the real line and maps them over to
matrices. As a function, <mathjax>$A$</mathjax> is piecewise continuous matrix function in
time.</p>
<p>The entries are piecewise-continuous scalars in time. We would like to get
at the state transition function; to do that, we need to solve the
differential equation.</p>
<p>Let's assume for now that <mathjax>$A, B, U$</mathjax> are given (part of the system
definition).</p>
<p>Piece-wise continuous is trivial; we can use the induced norm of <mathjax>$A$</mathjax> for a
Lipschitz condition. Since this induced norm is piecewise-continuous in
time, this is a fine bound. Therefore <mathjax>$f$</mathjax> is globally Lipschitz continuous.</p>
<p>We're going to back off for a bit and introduce the state transition
matrix. Background for solving the VDE. We're going to introduce a matrix
differential equation, <mathjax>$\dot{X} = A(t) X$</mathjax> (where <mathjax>$A(t)$</mathjax> is same as before).</p>
<p>I'm going to define <mathjax>$\Phi(t, t_0)$</mathjax> as the solution to the matrix
differential equation (MDE) for the initial condition <mathjax>$\Phi(t_0, t_0) =
1_{n \times n}$</mathjax>. I'm going to define <mathjax>$\Phi$</mathjax> as the solution to the <mathjax>$n
\times n$</mathjax> matrix when my differential equation starts out in the identity
matrix.</p>
<p>Let's first talk about properties of this matrix <mathjax>$\Phi$</mathjax> just from the
definition we have.</p>
<ul>
<li>If you go back to the vector differential equation, and let's just drop
the term that depends on <mathjax>$u$</mathjax> (either consider <mathjax>$B$</mathjax> to be 0, or the input
to be 0), the solution of <mathjax>$\cdot{x} = A(t)x(t)$</mathjax> is given by <mathjax>$x(t) =
\Phi(t, t_0)x_0$</mathjax>.</li>
<li>This is what we call the semigroup property, since it's reminiscent of
the semigroup axiom. <mathjax>$\Phi(t, t_0) = \Phi(t, t_1) \Phi(t_1, t_0) \forall
t, t_0, t_1 \in \Re^+$</mathjax></li>
<li><mathjax>$\Phi^{-1}(t, t_0) = \Phi(t_0, t)$</mathjax>.</li>
<li><mathjax>$\text{det} \Phi(t, t_0) = \exp\parens{\int_{t_0}^t \text{tr} \parens{A
(\tau)} d\tau}$</mathjax>.</li>
</ul>
<p>Here's let's talk about some machinery we can now invoke when
we want to show that two functions of time are equal to each other when
they're both solutions to the differential equation. You can simply show by
the existence and uniqueness theorem (assuming it applies) that they
satisfy the same initial condition and the same differential
equation. That's an important point, and we tend to use it a lot.</p>
<p>(i.e. when faced with showing that two functions of time are equal to each
other, you can show that they both satisfy the same initial condition and
the same differential equation [as long as the differential equation
satisfies the hypotheses of the existence and uniqueness theorem])</p>
<p>Obvious, but good to state.</p>
<p>Note: the initial condition doesn't have to be the initial condition given;
it just has to hold at one point in the interval. Pick your point in time
judiciously.</p>
<p>Proof of (2): check <mathjax>$t=t_1$</mathjax>. (3) follows directly from (2). (4) you can
look at if you want. Gives you a way to compute <mathjax>$\Phi(t, t_0)$</mathjax>. We've
introduced a matrix differential equation and an abstract solution.</p>
<p>Consider (1). <mathjax>$\Phi(t, t_0)$</mathjax> is a map that takes the initial state and
transitions to the new state. Thus we call <mathjax>$\Phi$</mathjax> the <strong>state transition
matrix</strong> because of what it does to the states of this vector differential
equation: it transfers them from their initial value to their final value,
and it transfers them through matrix multiplication.</p>
<p>Let's go back to the original differential equation. Claim that the
solution to that differential equation has the following form: <mathjax>$x(t) =
\Phi(t, t_0)x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau)u(\tau) d\tau$</mathjax>. Proof:
we can use the same machinery. If someone gives you a candidate solution,
you can easily show that it is the solution.</p>
<p>Recall the Leibniz rule, which we'll state in general as follows:
<mathjax>$\pderiv{}{z} \int_{a(z)}^{b(z)} f(x, z) dx = \int_{a(z)}^{b(z)}
\pderiv{}{x}f(x, z) dx + \pderiv{b}{z} f(b, z) - \pderiv{a}{z} f(a, z)$</mathjax>.</p>
<p><mathjax>$$
\dot{x}(t) = A(t) \Phi(t, t_0) x_0 + \int_{t_0}^t
\pderiv{}{t} \parens{\Phi(t, \tau)B(\tau)u(\tau)} d\tau +
\pderiv{t}{t}\parens{\Phi(t, t)B(t)u(t)} - \pderiv{t_0}{t}\parens{...}
\\ = A(t)\Phi(t, t_0)x_0 + \int_{t_0}^t A(t)\Phi(t,\tau)B(\tau)u(\tau)d\tau + B(t)u(t)
\\ = A(\tau)\Phi(t, t_0) x_0 + A(t)\int_{t_0}^t \Phi(t, \tau)B(\tau)
u(\tau) d\tau + B(t) u(t)
\\ = A(\tau)\parens{\Phi(t, t_0) x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau)
u(\tau) d\tau} + B(t) u(t)
$$</mathjax></p>
<p><mathjax>$x(t) = \Phi(t,t_0)x_0 + \int_{t_0}^t \Phi(t,\tau)B(\tau)u(\tau) d\tau$</mathjax> is
good to remember.</p>
<p>Not surprisingly, it depends on the input function over an interval of
time.</p>
<p>The differential equation is changing over time, therefore the system
itself is time-varying. No way in general that will be time-invariant,
since the equation that defines its evolution is changing. You test
time-invariance or time variance through the response map. But is it
linear? You have the state transition function, so we can compute the
response function (recall: readout map composed with the state transition
function) and ask if this is a linear map.</p>
<p><a name='11'></a></p>
<h1>Linear time-Invariant systems</h1>
<h2>September 27, 2012</h2>
<p>Last time, we talked about the time-varying differential equation, and we
expressed <mathjax>$R(\cdot) = \bracks{A(\cdot), B(\cdot), C(\cdot),
D(\cdot)}$</mathjax>. Used state transition matrix to show that the solution was
given by <mathjax>$x(t) = \Phi(t, t_0) x_0 + \int_{t_0}^t B(\tau) u(\tau)
d\tau$</mathjax>. Integral part is the state transition matrix, and we haven't
talked about how we would compute this matrix. In general, computing the
state transition matrix is hard. But there's one important class where
computing that class becomes much simpler than usual. That is where the
system does not depend on time.</p>
<p>Linear time-invariant case: <mathjax>$\dot{x} = Ax + Bu, y = Cx + Du, x(t_0) =
x_0$</mathjax>. Does not matter at what time we start. Typically, WLOG, we use <mathjax>$t_0 =
0$</mathjax> (we can't do this in the time-varying case).</p>
<h2>Aside: Jacobian linearization</h2>
<p>In practice, generally the case that someone
doesn't present you with a model that looks like this. Usually, you derive
this (usually nonlinear) model through physics and whatnot. What can I do
to come up with a linear representation of that system? What is typically
done is an approximation technique called Jacobian linearization.</p>
<p>So suppose someone gives you a nonlinear system and an output equation,
and you want to come up with some linear representation of the system.</p>
<p>Two points of view: we could look at the system, and suppose we applied a
particular input to the system and solve the differential equation
(<mathjax>$u^0(t) \mapsto x^0(t)$</mathjax>, the <strong>nominal input</strong> and <strong>nominal
solution</strong>). That would result in a solution (<strong>state trajectory</strong>, in
general). Now suppose that we for some reason want to perturb that input
(<mathjax>$u^0(t) + \delta u(t)$</mathjax>, the <strong>perturbed input</strong>). Suppose in general
that <mathjax>$\delta u$</mathjax> is a small perturbation. What this results in is a new
state trajectory, that we'll define as <mathjax>$x^0(t) + \delta x(t)$</mathjax>, the
<strong>perturbed solution</strong>.</p>
<p>Now we can derive from that what we call the Jacobian linearization. That
tells us that if we apply the input, the solution will be <mathjax>$x^0 =
f(x^0, u^0, t)$</mathjax>, and I also have that <mathjax>$x^0(t_0) = x_0$</mathjax>.</p>
<p><mathjax>$\dot{x}^0 + \dot{\delta}x = f(x^0 + \delta x, u^0 + \delta u, t)$</mathjax>, where
<mathjax>$(x^0 + \delta x)(t_0) = x_0 + \delta x_0$</mathjax>. Now I'm going to look at these
two and perform a Taylor expansion about the nominal input and
solution. Thus <mathjax>$f(x^0 + \delta x, u^0 + \delta u, t) = f(x^0, u^0, t) +
\pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)}\delta x +
\pderiv{}{u}f(x,u,t)\vert_{(x^0, u^0)} \delta u + \text{higher order
terms}$</mathjax> (recall that we also called <mathjax>$\pderiv{}{x}$</mathjax> <mathjax>$D_1$</mathjax>, i.e. the
derivative with respect to the first argument).</p>
<p>What I've done is expanded the right hand side of the differential
equation. Thus <mathjax>$\delta x = \pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)} \delta
x + \pderiv{}{u} f(...)\vert_{(x^0, y^0)}\delta u + ...$</mathjax>. If <mathjax>$\delta u,
\delta x$</mathjax> small, then we can assume that they are approximately zero, which
gives us an approximate first-order linear differential equation. This
gives us a linear time-varying approximation of the dynamics of this
perturbation vector, in response to a perturbation input. That's what the
Jacobian linearization gives you: the perturbation away from the nominal
(we linearized about a bias point).</p>
<p>Consider A(t) to be the Jacobian matrix with respect to x, and B(t) to be
the Jacobian matrix with respect to u. Remember that this is an
approximation, and if your system is really nonlinear, and you perturb the
system a lot (stray too far from the bias point), then this linearization
may cease to hold.</p>
<h2>Linear time-invariant systems</h2>
<p>Motivated by the fact that we have a solution to the time-varying equation,
it depends on the state transition matrix, which right now is an abstract
thing which we don't have a way of solving. Let's go to a more specific
class of systems: that where <mathjax>$A, B, C, D$</mathjax> do not depend on time. We know
that this system is linear (we don't know yet that it is time-invariant; we
have to find the response function and show that it satisfies the
definition of a time-invariant system), so this still requires proof.</p>
<p>Since these don't depend on time, we can use some familiar tools
(e.g. Laplace transforms) and remember what taking the Laplace transform of
a derivative is. Denote <mathjax>$\hat{x}(s)$</mathjax> to be the Laplace transform of
<mathjax>$x(t)$</mathjax>. The Laplace transform is therefore <mathjax>$s\hat{x}(s) - x_0 = A\hat{x}(s)
+ B\hat{u}(s)$</mathjax>; <mathjax>$s\hat{y}(s) - y_0 = C\hat{x}(s) + D\hat{u}(s)$</mathjax>. The first
equation becomes <mathjax>$(sI - A)\hat{x}(s) = x_0 + B\hat{u}(s)$</mathjax>, and we'll leave
the second equation alone.</p>
<p>Let's first consider <mathjax>$\hat{x} = Ax$</mathjax>, <mathjax>$x(0) = x_0$</mathjax>. I could have done the
same thing, except my right hand side doesn't depend on B: <mathjax>$(sI -
A)\hat{x}(s) = x_0$</mathjax>. Let's leave that for a second and come back to it, and
make the following claim: the state transition matrix for <mathjax>$\hat{x} = Ax,
x(t_0) = x_0$</mathjax> is <mathjax>$\Phi(t,t_0) = e^{A(t-t_0)}$</mathjax>, which is called the <strong>matrix
exponential</strong>, defined as <mathjax>$e^{A(t-t_0)} = I + A(t-t_0) + \frac{A^2(t-t_0)^2}{2!}
+ ...$</mathjax> (Taylor expansion of the exponential function).</p>
<p>We just need to show that the state transition matrix, using definitions we
had last day, is indeed the state transition matrix for that system. We
could go back to the definition of the state transition matrix for the
system, or we could go back to the state transition function for the vector
differential equation.</p>
<p>From last time, we know that the solution to <mathjax>$\dot{x}A(t)x, x(t_0) = x_0$</mathjax>
is given by <mathjax>$x(t) = \Phi(t, t_0)x_0$</mathjax>; here, we are claiming then that <mathjax>$x(t)
= e^{A(t - t_0)} x_0$</mathjax>, where <mathjax>$x(t)$</mathjax> is the solution to <mathjax>$\dot{x} = Ax$</mathjax> with
initial condition <mathjax>$x_0$</mathjax>.</p>
<p>First show that it satisfies the vector differential equation: <mathjax>$\dot{x} =
\pderiv{}{t}\exp\parens{A(t-t_0)} x_0 = (0 + A + A^2(t - t_0 + ...)x_0 =
A(I + A(t-t_0) + \frac{A^2}{2}(t-t_0)^2 + ...) x_0 = Ae^{At} x_0 = Ax(t)$</mathjax>,
so it satisfies the differential equation. Checking the initial condition,
we get <mathjax>$e^{A \cdot 0}x_0 = I x_0 = x_0$</mathjax>. We've proven that this represents
the solution to this time-invariant differential equation. By the existence
and uniqueness theorem, this is the same solution.</p>
<p>Through this proof, we've shown a couple of things: the derivative of the
matrix exponential, and we evaluated it at <mathjax>$t-t_0=0$</mathjax>. So now let's go back
and reconsider its infinite series representation and classify some of its
other properties.</p>
<h2>Properties of the matrix exponential</h2>
<ul>
<li><mathjax>$e^0 = I$</mathjax></li>
<li><mathjax>$e^{A(t+s)} = e^{At}e^{As}$</mathjax></li>
<li><mathjax>$e^{(A+B)t} = e^{At}e^{Bt}$</mathjax> iff <mathjax>$\comm{A}{B} = 0$</mathjax>.</li>
<li><mathjax>$\parens{e^{At}}^{-1} = e^{-At}$</mathjax>, and these properties hold in general if
you're looking at <mathjax>$t$</mathjax> or <mathjax>$t - t_0$</mathjax>.</li>
<li><mathjax>$\deriv{e^{At}}{t} = Ae^{At} = e^{At}A$</mathjax> (i.e. <mathjax>$\comm{e^At}{A} = 0$</mathjax>)</li>
<li>Suppose <mathjax>$X(t) \in \Re^{n \times n}$</mathjax>, <mathjax>$\dot{X} = AX, X(0) = I$</mathjax>, then the
solution of this matrix differential equation and initial condition pair
is given by <mathjax>$X(t) = e^{At}$</mathjax>. Proof in the notes; very similar to what we
just did (more general proof, that the state transition matrix is just
given by the matrix exponential).</li>
</ul>
<h2>Calculating <mathjax>$e^{At}$</mathjax>, given <mathjax>$A$</mathjax></h2>
<p>What this is now useful for is making more concrete this state transition
concept. Still a little abstract, since we're still considering the
exponential of a matrix.</p>
<p>The first point is that using the infinite series representation to compute
<mathjax>$e^{At}$</mathjax> is in general hard.</p>
<p>Would be doable if you knew <mathjax>$A$</mathjax> were nilpotent (<mathjax>$A^k = 0$</mathjax> for some <mathjax>$k \in
\mathbb{Z}$</mathjax>), but it's not always feasible. Would not be feasible if <mathjax>$k$</mathjax>
large.</p>
<p>The way one usually computes the state transition matrix <mathjax>$e^{At}$</mathjax> is as
follows:</p>
<p>Recall: <mathjax>$\dot{X}(t) = AX(t)$</mathjax>, with <mathjax>$X(0) = I$</mathjax>. We know from what we've done
before (property 6) that we can easily prove <mathjax>$X(t) = e^{At}$</mathjax>. We also know
that <mathjax>$(sI - A)\hat{X}(s) = I$</mathjax>, so <mathjax>$\hat{X}(s) = (sI - A)^{-1}$</mathjax>. That tells
me that <mathjax>$e^{At} = \mathcal{L}^{-1}\parens{(sI - A)^{-1}}$</mathjax>. That gives us a
way of computing <mathjax>$e^{At}$</mathjax>, assuming we have a way to compute a matrix's
inverse and an inverse Laplace transform. This is what people usually do,
and most algorithms approach the problem this way. Generally hard to
compute the inverse and the inverse Laplace transform.</p>
<p>Requires proof regarding why <mathjax>$sI - A$</mathjax> always has an inverse given by
<mathjax>$e^{-At}$</mathjax>.</p>
<p>Clive Moller started LINPACK (Linear algebra package; engine behind
MATLAB). Famous in computational linear algebra. Paper: 19 dubious ways to
compute the matrix exponential. Actually a hard problem in
general. Factoring of <mathjax>$n$</mathjax>-degree polynomials.</p>
<p>If we were to consider our simple nilpotent case, we'll compute <mathjax>$sI - A =
\begin{bmatrix}s &amp; -1 \\ 0 &amp; s\end{bmatrix}$</mathjax>. We can immediately write down
its inverse as <mathjax>$\begin{bmatrix}\frac{1}{s} &amp; \frac{1}{s^2} \\ 0 &amp;
\frac{1}{s}\end{bmatrix}$</mathjax>. Inverse Laplace transform takes no work; it's
simply <mathjax>$\begin{bmatrix}1 &amp; t \\ 0 &amp; 1\end{bmatrix}$</mathjax>.</p>
<p>In the next lecture (and next series of lectures) we will be talking about
the Jordan form of a matrix. We have a way to compute <mathjax>$e^{At}$</mathjax>. We'll write
<mathjax>$A = TJT^{-1}$</mathjax>. In its simplest case, it's diagonal. Either way, all of the
work is in exponentiating <mathjax>$J$</mathjax>. You still end up doing something that's the
inverse Laplace transform of <mathjax>$sI - J$</mathjax>.</p>
<p>We've shown that for a linear TI system, <mathjax>$\dot{x} = Ax + Bu$</mathjax>; <mathjax>$y = Cx + Du$</mathjax>
(<mathjax>$x(0) = x_0$</mathjax>). <mathjax>$x(t) = e^{At}x_0 + \int_0^t e^{A(t-\tau)} Bu(\tau)
d\tau$</mathjax>. We proved it last time, but you can check this satisfies the
differential equation and initial condition.</p>
<p>From that, you can compute the response function and show that it's
time-invariant. Let's conclude today's class with a planar inverted
pendulum. Let's call the angle of rotation away from the vertical <mathjax>$\theta$</mathjax>,
mass <mathjax>$m$</mathjax>, length <mathjax>$\ell$</mathjax>, and torque <mathjax>$\tau$</mathjax>. Equations of motion: <mathjax>$m\ell^2
\ddot{\theta} - mg\ell \sin \theta = \tau$</mathjax>. Perform Jacobian
linearization; we'll define <mathjax>$\theta = 0$</mathjax> at <mathjax>$\pi/2$</mathjax>, and we're linearizing
about the trivial trajectory that the pendulum is straight up. Therefore
<mathjax>$\delta \theta = \theta \implies m\ell^2 \ddot{\theta} + mg\ell\theta
= \tau$</mathjax>, where <mathjax>$u = \frac{\tau}{m\ell^2}$</mathjax>, and <mathjax>$\Omega^2 = \frac{g}{\ell}$</mathjax>,
<mathjax>$\dot{x}_1 = x_2$</mathjax>, and <mathjax>$\dot{x}_2 = \Omega^2 x_1 + u$</mathjax>.</p>
<p><mathjax>$y = \theta - x_1, \dot{x}_1 = x_2, \dot{x}_2 = \Omega^2 x_1 + u, y =
x_1$</mathjax>. Stabilization of system via feedback by considering poles of Laplace
transform, etc. <mathjax>$\frac{\hat{y}}{\hat{u}} = \frac{1}{s^2 - \Omega^2} =
G(s)$</mathjax> (the plant).</p>
<p>In general, not a good idea: canceling unstable pole, and then using
feedback. In the notes, this is some controller <mathjax>$K(s)$</mathjax>. If we look at the
open-loop transfer function (<mathjax>$K(s)G(s) = \frac{1}{s(s+\Omega)}$</mathjax>), <mathjax>$u =
\frac{s-\Omega}{s}\bar{u}$</mathjax>, so <mathjax>$\dot{u} = \dot{\bar{u}} - \Omega\bar{u}$</mathjax>
(assume zero initial conditions on <mathjax>$u, \bar{u}$</mathjax>). If we define a third
state variable now, <mathjax>$x_3 = u - \bar{u}$</mathjax>, then that tells us that <mathjax>$\dot{x}_3
= \Omega \bar{u}$</mathjax>. Here, I have <mathjax>$A = \begin{bmatrix} 0 &amp; 1 &amp; 0 \\ \Omega^2
&amp; 0 &amp; -1 \\ 0 &amp; 0 &amp; 0 \end{bmatrix}$</mathjax>, <mathjax>$B = \begin{bmatrix}0 \\ 1 \\
\Omega\end{bmatrix}$</mathjax>, <mathjax>$C = \begin{bmatrix}1 &amp; 0 &amp; 0\end{bmatrix}$</mathjax>, <mathjax>$D =
0$</mathjax>. Out of time today, but we'll solve at the beginning of Tuesday's class.</p>
<p>Solve for <mathjax>$x(t) = \begin{bmatrix}x_1, x_2, x_3\end{bmatrix}$</mathjax>. We have a few
approaches:</p>
<ul>
<li>Using <mathjax>$A,B,C,D$</mathjax>: compute the following: <mathjax>$y(t) = Ce^{At} x_0 + C\int_0^t
e^{A(t - \tau)}Bu(\tau) d\tau$</mathjax>. In doing that, we'll need to compute
<mathjax>$e^{At}$</mathjax>, and then we have this expression for general <mathjax>$u$</mathjax>: suppose you
supply a step input.</li>
<li>Suppose <mathjax>$\bar{u} = -y = -Cx$</mathjax>. Therefore <mathjax>$\dot{x} = Ax + B(-Cx) = (A -
BC)x$</mathjax>. We have a new <mathjax>$A_{CL} = A - BC$</mathjax>, and we can exponentiate this
instead.</li>
</ul>
<p>Foreshadows later, when we think about control. Introduces this standard
notion of feedback for stabilizing systems. Using newfound knowledge of
state transition matrix for TI systems (how to compute it), see how to
compute. See what MATLAB is doing.</p>
<p><a name='12'></a></p>
<h1>Computing the Matrix Exponential; Dyadic Expansion</h1>
<h2>October 2, 2012</h2>
<p>Today: example computing <mathjax>$x(t)$</mathjax> using <mathjax>$e^{At}$</mathjax>. Sastry guest lecturing on
Thursday; LN11 is linear quadratic optimization.</p>
<p>Date for midterm: in class, Tuesday Oct. 16 (two weeks from today). It's
going to cover material up to and including what we cover in homework
assignments (e.g. won't have anything on linear quadratic optimization;
will probably be everything up to what we finish this week: up to lecture
notes 13, not including lecture notes 11). Sample midterms, which will be
posted; next Friday, Oct. 12, will be midterm review in section. On Monday,
Oct. 15, Insoon will have extended office hours. One more homework: #4,
posted tonight, due before midterm (so likely next Thursday).</p>
<p>In general, you'll compute the matrix exponential via the inverse Laplace
transform of <mathjax>$(SI - A)^{-1}$</mathjax>. Similarity transformation: <mathjax>$TJT^{-1}$</mathjax>. We
don't in general talk about <mathjax>$e^{At}$</mathjax>, but rather functions of a matrix <mathjax>$A$</mathjax>
which can be represented in its Jordan form.</p>
<p>Continuation of example. We recall that we're working in the realm of LTI
systems described by the matrices <mathjax>$\dot{x} = Ax + Bu, \dot{y} = Cx +
D$</mathjax>. Remember the solution given by the convolution (so to speak) of the
input with the matrix exponential.</p>
<p>We set up this model with the inverted pendulum, and we considered the
open-loop representation. We proposed a controller, <mathjax>$k(s)$</mathjax>, and then we
wrote out the system dynamics. We then defined a new state variable to
represent the dynamics of the controller, so we had three first-order
differential equations where the right-hand side is just a function of the
inputs.</p>
<p>We then began to consider a closed-loop representation of the system. If we
write the system in terms of the transfer functions, what we're doing is
measuring the output and feeding it back through negative feedback to the
input. If we look at the new state update equation and the output equation,
we recognize that even though the matrices is different, they still have
the same dimensions.</p>
<p>Computed a few things in MATLAB: CL step response. What we see is that it
has a step response indicative of a nice, stable system, but that's the
closed loop zero-state response. If we consider the zero-input response, we
find that the solution diverges.</p>
<p>So what's going on here? We can go back and look at the system. At some
times, we can compute <mathjax>$x_{CL}(t)$</mathjax>.</p>
<p>Once we compute the matrix's inverse, we consider the partial fraction
decomposition of the elements of the inverse. In general, you would have to
do that for all nine terms (but in this case, there are repeated terms, so
you'd have to do this just three times). You can perform this partial
fraction expansion, and then you have to remember what the inverse Laplace
transform is.</p>
<p>In our problem, we're asking for less: we only need to compute the first
column (which is actually still everything).</p>
<p>One of the things you're tempted to do as an undergraduate in controls is
attempt to cancel out your poles, but it is impossible to get perfect
pole-zero cancellation.</p>
<p>Back to the main point: we're in general faced with computing the matrix
exponential, and taking the inverse Laplace transform is hard. The idea is
to try to represent A in a canonical form to simply compute what that
Laplace transform is, if possible.</p>
<h2>General: Functions of a matrix <mathjax>$A \in \mathbb{C}^{n\times n}$</mathjax></h2>
<p>(Goal: find simpler ways to compute <mathjax>$e^{At}$</mathjax>, but in general, functions of
a matrix)</p>
<p>We started with LTI systems; we know the solution in terms of the state
transition matrix is given by <mathjax>$e^{At}x_0$</mathjax>. More for terminology more than
anything, we're going to write the inverse of a matrix <mathjax>$sI - A$</mathjax> in the
following form: the ratio of something called the adjugate of <mathjax>$SI - A$</mathjax> over
the determinant, i.e. <mathjax>$\frac{\mathrm{adj} (sI - A)}{\det (sI - A)}$</mathjax>. I'm
going to define the <strong>characteristic polynomial</strong> of <mathjax>$A$</mathjax> as
<mathjax>$\hat{\chi}_A(s) \defequals \det (SI - A)$</mathjax> (so it will look like <mathjax>$s^n +
\alpha_1 s^{n-1} + ... + \alpha_n$</mathjax>. The <strong>adjugate</strong> is given by a
polynomial <mathjax>$B_0 s^{n-1} = B_1 s^{n-2} + ... + B_{n-2} s + B_{n-1}$</mathjax>, where
<mathjax>$B_i \in \mathbb{C}^{n \times n}$</mathjax>. We define this matrix as follows:
<mathjax>$\mathrm{adj} (sI - A) = C^T$</mathjax>, where <mathjax>$C_{ij} = )-1)^{i+j} M_{ij}$</mathjax> (where
<mathjax>$M_{ij}$</mathjax> is the determinant of the <mathjax>$n-1\times n-1$</mathjax> matrix where row <mathjax>$i$</mathjax> and
column <mathjax>$j$</mathjax> are eliminated).</p>
<p>What we really want to do is define a notation here for understanding the
Cayley-Hamilton theorem. If we had to, we could compute it, but let's go
back and use the notation.</p>
<p>The theorem we're going to state here is the <strong>Cayley-Hamilton theorem</strong>,
which states that every <mathjax>$n \times n$</mathjax> matrix satisfies its own
characteristic polynomial. What does that mean? If you set the
characteristic polynomial of the matrix to 0, and everywhere you see <mathjax>$s$</mathjax>
you plug in the matrix, that will sum up to 0.</p>
<p>Given the setup here, we can easily prove this. With the definition of an
inverse, let's multiply both sides (of our definition of the inverse) by
<mathjax>$sI - A$</mathjax> and the characteristic polynomial. This will yield the equivalent
expression that <mathjax>$\mathrm{adj}(sI - A)(sI - A) = \hat{\chi}_A(s) I$</mathjax>. Just
matching coefficients, we can write out the <mathjax>$B_i$</mathjax>. From that, we can write
out the polynomial and simply work through the math. All we used in the
proof of that is just the general form of the matrix in terms of <mathjax>$sI - A$</mathjax>.</p>
<p>Really important result, since we can use this to say general things. This
tells us something immediately about polynomials of matrices. If you were
looking at some polynomial and wanted to simplify that, we can relate
higher powers of the matrix A which are less than or equal to <mathjax>$n$</mathjax>. If you
had general k-degree polynomials in <mathjax>$s$</mathjax>, <mathjax>$\hat{p}_1, \hat{p}_2$</mathjax>, then if we
divide <mathjax>$\hat{p}_1$</mathjax> by <mathjax>$\hat{\chi}_A$</mathjax>, then we can write these as <mathjax>$\hat{q}_1
\chi_A + \hat{r}_1$</mathjax> and <mathjax>$\hat{q}_2 \chi_A + \hat{r}_2$</mathjax>. Even if the two are
not equal, if the remainders are the same (i.e. <mathjax>$\hat{r}_1 = \hat{r}_2$</mathjax>),
if you evaluate the two on the matrix A, <mathjax>$\hat{p}_1(A) = \hat{p}_2
(A)$</mathjax>. Just a result of the Cayley-Hamilton theorem. That tells us that
every polynomial function of <mathjax>$A$</mathjax> can be written as a function of <mathjax>$I, A,
A^2, ..., A^{n-1}$</mathjax>.</p>
<p>In general, interesting functions are not necessarily polynomials, and we
have an infinite series representation. How do we compute functions of a
matrix A? In order to do that, we're going to have to think about two
cases: we're going to look at the matrix A and compute its eigenvalues and
eigenvectors. We've mentioned that before; we'll have a quick review of
computing eigenvalues and eigenvectors. We'll break this up into two cases:
we have <mathjax>$n$</mathjax> linearly independent eigenvectors, and we have less than <mathjax>$n$</mathjax>
linearly independent eigenvectors. In the first case, <mathjax>$A$</mathjax> is
diagonalizable, meaning there exists a similarity transformation that can
compute a diagonalization of the matrix where the diagonal elements are the
eigenvalues, and in the latter case, <mathjax>$A$</mathjax> can be represented in Jordan form.</p>
<p>If <mathjax>$A$</mathjax> has <mathjax>$n$</mathjax> distinct eigenvalues, that implies that <mathjax>$A$</mathjax> has <mathjax>$n$</mathjax> linearly
independent eigenvectors (but the reverse implication is not true!).</p>
<p>Next: Sastry will talk about diagonalizable case, and he'll probably get
through part of lecture notes 13.</p>
<p><a name='13'></a></p>
<h1>Eigenvalues, etc.</h1>
<h2>October 4, 2012</h2>
<p>If <mathjax>$A$</mathjax> is real, then <mathjax>$d_1, ... d_n \in \Re$</mathjax>, which also tells us that roots
appear in complex conjugate pairs. If the matrix is complex, all bets are
off.</p>
<p>Amazingly, the study of these eigenvalues and eigenvectors is actually a
lot easier if the <mathjax>$\lambda_i$</mathjax> are <strong>distinct</strong>, i.e. <mathjax>$\lambda_i\lambda_j =
\delta_{ij}$</mathjax>. The first thing to notice is that if you have <mathjax>$n$</mathjax> distinct
eigenvalues, from the definition of eigenvectors, there's at least one
eigenvector associated with every eigenvalue.</p>
<p>First thing to prove is that the eigenvectors of distinct eigenvectors are
linearly independent. Consider a solution to the homogeneous equation
<mathjax>$\sum_i \alpha_i e_i$</mathjax>. Apply <mathjax>$A - \lambda_i I$</mathjax> for each <mathjax>$i$</mathjax> except for one;
as a result, we have <mathjax>$\alpha_k (\lambda_k - \lambda_1)(\lambda_k -
\lambda_2) ... (\lambda k - \lambda_n) e_k$</mathjax> for some <mathjax>$k$</mathjax>. Thus since the
<mathjax>$\lambda_i$</mathjax> are all distinct, <mathjax>$\alpha_k$</mathjax> must be identically <mathjax>$0$</mathjax> for all
<mathjax>$k$</mathjax>.</p>
<h2>Jordan's Theorem</h2>
<p>We have <mathjax>$A\begin{bmatrix}e_1 &amp; e_2 &amp; ... &amp; e_n \end{bmatrix} =
\begin{bmatrix} \lambda_1 &amp; 0 &amp; ... &amp; 0 \\ 0 &amp; \lambda_2 &amp; ... &amp; 0 \\
\vdots &amp; &amp; \ddots &amp; \\ 0 &amp; &amp; ... &amp; 0\end{bmatrix}\begin{bmatrix}e_1 &amp; e_2 &amp;
... &amp; e_n\end{bmatrix} = T^{-1}\Lambda T$</mathjax>. If you look at <mathjax>$A$</mathjax> in a new
basis given by the eigenvectors, <mathjax>$A$</mathjax> is a diagonal matrix, where the
diagonal entries are given by the eigenvalues.</p>
<p>Jordan (canonical) form of a matrix.</p>
<p>Let's give a name to the inverse of <mathjax>$T^{-1} = \begin{bmatrix} v_1^T \\
v_2^T \\ \vdots \\ v_n^T \end{bmatrix}$</mathjax>. Looks also like eigenvectors;
these are called the left eigenvectors (<mathjax>$\{e_i\}$</mathjax> are the right
eigenvectors) -- these are actually row vectors as opposed to column
vectors.</p>
<p>Notion of an outer product. The dyadic (outer) product of the right and
left eigenvectors is sometimes called the residual product (basically, a
projection matrix): <mathjax>$e_i v_i^T = R_i$</mathjax>. If you flipped the order of the
arguments, that would also be the identity, and what that yields is a
scalar: <mathjax>$v_i^T e_j = \delta_{ij}$</mathjax>. How would you normally define a function
<mathjax>$f(A)$</mathjax>? <mathjax>$f(A) = \sum_i \lambda_i R_i$</mathjax>. <mathjax>$f$</mathjax> is required to be <strong>analytic</strong>
at <mathjax>$\lambda_i$</mathjax>.</p>
<p>Let's try to apply this to (single input, single output) linear
systems. Assume that <mathjax>$A$</mathjax> has distinct eigenvalues. That means that there
exists <mathjax>$T$</mathjax> such that <mathjax>$A = T^{-1}\Lambda T$</mathjax>. If I were to rewrite this with
a change of basis with <mathjax>$z = Tx$</mathjax>, then <mathjax>$\dot{z} = \Lambda z + Tbu$</mathjax>, <mathjax>$y =
cT^{-1}z$</mathjax>. If we consider the Laplace transform, we have poles
corresponding to the eigenvalues. No coupling; cannot observe. By putting
systems in modal form, you can actually see which inputs are influencing
the overall transfer function. Just the transient response corresponding to
the initial condition. The multi-input, multi-output matrix is identical,
by linearity.</p>
<p>Just to tell you that this kind of manipulation has some quite interesting
other implications, let me lead you through another calculation and talk
about some popular misconceptions about linear systems. In most
undergraduate courses, you'll be told about the "eigenfunction property":
if you put in an input at some frequency, you'll get an output at the same
frequency. Two problems: is that always true for every frequency, and
secondly, what about the role of the initial condition: what if the system
is unstable?</p>
<p>What I'm going to do is just an exercise in Laplace transforms,
really. (end of lecture 12 notes)</p>
<p>Output generally corresponds to the <strong>force response</strong>, the response to the
input at the same frequency, (by the way, without further assumption, the
eigenfunction property is immediately false if you force it at one of its
<strong>natural modes</strong>, i.e. eigenvalues) and a second component, the
contribution due to the initial condition. If <mathjax>$A$</mathjax> is stable, then the
second component goes to 0. Is it possible to still use this response by
clever choice of initial condition? Yes: regularization (unique choice of
<mathjax>$x_0$</mathjax> once <mathjax>$u_0$</mathjax> is specified and <mathjax>$B$</mathjax> is given: <mathjax>$x_0 = = (\lambda I -
A)^{-1}Bu_0$</mathjax>). Otherwise, we have a transient response, and if <mathjax>$A$</mathjax> is
growing exponentially, then this will dominate the system.</p>
<p>The other thing this calculation is good for is that it tells you something
about the zeros of a linear system. The <strong>zeros</strong> are (1) <mathjax>$\exists u_0 \neq
\theta \st H(\lambda) u_0 = \theta$</mathjax>. <mathjax>$\lambda$</mathjax> is a <strong>zero of
transmission</strong> if at this value, <mathjax>$\hat{H}$</mathjax> has a nontrivial right
nullspace. Here is a way to think about what a zero of transmission is: it
blocks transmission at that value <mathjax>$\lambda$</mathjax>. If you've chosen the initial
condition properly, you'll see nothing at the output.</p>
<p>What you need to remember: distinct eigenvalues means distinct
eigenvectors; you can decompose a matrix into its Jordan form; we can see
which modes affect the output; and here is a definition of a zero of
transmission: if there is a nontrivial choice of <mathjax>$u_0$</mathjax> and frequency
<mathjax>$\lambda$</mathjax> where you can block transmission.</p>
<h2>Lecture Notes 13</h2>
<p>Repeated eigenvalue case. Invariant subspaces. If <mathjax>$\fn{A}{(U,
\mathbb{C})}{(U, \mathbb{C})}$</mathjax>, <mathjax>$\mathcal{M} \subset V$</mathjax> is said to be an
<strong>invariant subspace</strong> if <mathjax>$x \in \mathcal{M} \implies Ax \in
\mathcal{M}$</mathjax>. Aside: <mathjax>$\mathcal{M}$</mathjax> is also invariant under any function of
<mathjax>$A$</mathjax>. Examples of invariant subspaces: <mathjax>$\mathcal{N}(A)$</mathjax>, <mathjax>$\mathcal{R}(A)$</mathjax>,
<mathjax>$\mathcal{N}(A - \lambda I)$</mathjax>, <mathjax>$m_1 \cap m_2$</mathjax> (with invariant subspaces
<mathjax>$m_1, m_2$</mathjax>), <mathjax>$m_1 + m_2$</mathjax> (<mathjax>$+$</mathjax> is the <strong>sum</strong>). The reason why <mathjax>$m_1 \cup
m_2$</mathjax> is not an invariant subspace is because it is not a subspace.n</p>
<p><a name='14'></a></p>
<h1>Jordan Form</h1>
<h2>October 9, 2012</h2>
<p>We just started talking about invariant subspaces. Midterm will be up to
and including lecture notes 12 (but we haven't gone through 11). Today I'd
like to continue in this vein that we've been presenting for the past
couple of lectures. What we've been doing is being motivated by our need to
efficiently compute <mathjax>$e^{At}$</mathjax>, and we've been thinking more generally about
how to compute functions of a square matrix <mathjax>$A$</mathjax>. What we've done in our
presentation is divide that problem into two classes: case 1 (where <mathjax>$A$</mathjax> is
diagonalizable) -- that is, <mathjax>$A$</mathjax> has <mathjax>$n$</mathjax> linearly independent eigenvectors,
each of which is an element of <mathjax>$\mathbb{C}^n$</mathjax>. Remember that if <mathjax>$A$</mathjax> has <mathjax>$n$</mathjax>
distinct eigenvalues, it is given that <mathjax>$A$</mathjax> has <mathjax>$n$</mathjax> eigenvectors. However,
we don't need <mathjax>$n$</mathjax> distinct eigenvalues (consider degeneracy of quantum
states).</p>
<p>What we proved last time was a theorem that in this case, the eigenvectors
form a basis for <mathjax>$\mathbb{C}^n$</mathjax>, and we went ahead and constructed a
similarity transform which transforms <mathjax>$A$</mathjax> to a diagonal matrix. Consider
<mathjax>$T^{-1}$</mathjax>, whose columns are the eigenvectors (since the columns form a
basis, it is invertible, and <mathjax>$T$</mathjax> exists). <mathjax>$\Lambda = TAT^{-1}$</mathjax>, and
<mathjax>$\Lambda$</mathjax>, a diagonal matrix containing the corresponding eigenvalues.</p>
<p>Consider a function <mathjax>$\fn{f}{\mathbb{C}^{n\times n}}{\mathbb{C}^{n\times
n}}$</mathjax>. We talked about polynomial functions of a matrix (recall
Cayley-Hamilton theorem). Let us consider a more general case: <mathjax>$f$</mathjax> is
<strong>analytic</strong>, that is, one that is locally given by a convergent power
series. We have that <mathjax>$f(A) = f(T^{-1}\Lambda T) = T^{-1}f(\Lambda)T$</mathjax>. What
does this mean? This is just <mathjax>$f$</mathjax> applied to the diagonal matrix.</p>
<p>In general, <mathjax>$A$</mathjax> is not diagonalizable, and so we have to think about what a
canonical form (a simple form for a general <mathjax>$n \times n$</mathjax> matrix). We now
are considering case 2, where <mathjax>$A$</mathjax> is not diagonalizable, i.e. <mathjax>$A$</mathjax> has
strictly less than <mathjax>$n$</mathjax> eigenvectors. This can only happen if <mathjax>$A$</mathjax> has
repeated eigenvalues (note: not guaranteed!).</p>
<p>We will present the "generalization" of the diagonal form, which is called
Jordan form. Instead of using <mathjax>$\Lambda$</mathjax>, we'll use <mathjax>$J$</mathjax>. Basically, it'll be
an upper triangular matrix with the eigenvalues on the diagonal, and on the
off-diagonal, we'll have either ones or zeros (depending on how many
eigenvectors we have for a given eigenvalue). We call this a canonical form
(note that the diagonal form is a special case of the Jordan form when
there are n eigenvectors).</p>
<p>We're going to approach this today (by constructing a similarity
transform). In order to do that, we need a little bit of background: we
need to look in more detail at the eigenstructures of a matrix. We're going
to do that through this concept of <mathjax>$\mathcal{A}$</mathjax>-invariant
subspaces. Suppose you have a linear map <mathjax>$\fn{\mathcal{A}}{V}{V}$</mathjax>. <mathjax>$M$</mathjax>, a
subspace of <mathjax>$V$</mathjax> is said to be <strong><mathjax>$\mathcal{A}$</mathjax>-invariant</strong> (or invariant under
<mathjax>$\mathcal{A}$</mathjax>) if <mathjax>$\fn{\mathcal{A}}{M}{M}$</mathjax>.</p>
<p>Example: the nullspace of <mathjax>$\mathcal{A}$</mathjax> is <mathjax>$\mathcal{A}$</mathjax>-invariant. Very
easy to show that a subspace is <mathjax>$\mathcal{A}$</mathjax>-invariant, typically.</p>
<p>Now consider <mathjax>$N(A - \lambda I)$</mathjax>, where <mathjax>$\lambda$</mathjax> is an eigenvalue of
<mathjax>$A$</mathjax>. We know that this space contains the eigenvectors of <mathjax>$A$</mathjax> that
correspond to <mathjax>$\lambda$</mathjax>; we can then show that this is also an invariant
subspace. A few more examples: if you have a polynomial function of a
matrix <mathjax>$A$</mathjax>, then the nullspace of this polynomial function is
<mathjax>$A$</mathjax>-invariant. Further, given two <mathjax>$A$</mathjax>-invariant subspaces, the intersection
of these subspaces is also <mathjax>$A$</mathjax>-invariant. Further, if you define sums of
subspaces in the usual manner, the sum of two <mathjax>$A$</mathjax>-invariant subspaces is
also <mathjax>$A$</mathjax>-invariant.</p>
<h2>Dorect sum of subspaces</h2>
<p>Again we have a vector space <mathjax>$X$</mathjax>, and we'll have <mathjax>$k$</mathjax> subspaces <mathjax>$M_1, M_2,
..., M_K$</mathjax> of <mathjax>$X$</mathjax>. <mathjax>$V$</mathjax> is said to be a <strong>direct sum</strong> (i.e. <mathjax>$V = \Oplus
M_i$</mathjax>) if <mathjax>$\forall v \in V, \exists! m_i \in M_i \st v = \sum a_i m_i$</mathjax>. In
some sense, this is a generalization of linear independence.</p>
<p>A direct result of the uniqueness constraint of the definition of the
direct sum is that the intersection of any two of these subspaces contains
only the zero vector.</p>
<p>We need two more things before we consider the Jordan form of a matrix.</p>
<p>One way we can relate <mathjax>$A$</mathjax>-invariance and direct sum is going back to the
beginning of the class, if you look at <mathjax>$A \in \mathbb{C}^{n \times n}$</mathjax> with
<mathjax>$n$</mathjax> distinct eigenvalues (or even just <mathjax>$n$</mathjax> linearly independent
eigenvectors), then <mathjax>$\mathbb{C}^n = \Oplus N(A-\lambda_i)$</mathjax> (we proved last
time that in this case, the eigenvectors form a basis for the space). This
is a <strong>direct sum decomposition</strong> of <mathjax>$\mathbb{C}^n$</mathjax>.</p>
<p>So now let's go on to something quite important, which we generally call
the Second Representation Theorem. First, let's consider what
representation means: we define <strong>representation</strong> as the representation of
a linear map between finite-dimensional vector spaces as a matrix. Aside:
the first representation theorem was that we could construct a matrix
representation for any linear map between finite-dimensional vector
spaces.</p>
<p>Now we consider the case where <mathjax>$\fn{\mathcal{A}}{V}{V}$</mathjax>. Consider <mathjax>$V = M_1
\oplus M_2$</mathjax>, where <mathjax>$V$</mathjax> has dimension <mathjax>$n$</mathjax>, <mathjax>$M_1$</mathjax> has dimension <mathjax>$k$</mathjax>, and
<mathjax>$M_2$</mathjax> has dimension <mathjax>$n-k$</mathjax>. If <mathjax>$M_1$</mathjax> is <mathjax>$A$</mathjax>-invariant, we can say more about
what this matrix <mathjax>$A$</mathjax> can look like: <mathjax>$\mathcal{A}$</mathjax> can be represented as a
block matrix <mathjax>$A$</mathjax> in which the lower left block is the zero matrix. The
upper left block is <mathjax>$k$</mathjax>-dimensional, and the remaining block sizes can be
inferred from this.</p>
<p>Easy to prove: can use first representation theorem. We'd like to prove
that there exists a basis for <mathjax>$V$</mathjax> such that <mathjax>$A$</mathjax> has the stated form. Begin
by constructing a basis for <mathjax>$M_1, M_2$</mathjax>. Let <mathjax>$\{b_1, ..., b_k\}$</mathjax> be a basis
for <mathjax>$M_1$</mathjax> and <mathjax>$\{b_{k+1}, ..., b_n\}$</mathjax> be a basis for <mathjax>$M_2$</mathjax>. The image of
<mathjax>$A$</mathjax> on the first <mathjax>$k$</mathjax> basis vectors can be written as linear combinations of
the first <mathjax>$k$</mathjax> basis vectors (i.e. you don't need any components involving
<mathjax>$b_{k+1}, ..., b_n$</mathjax>), and by uniqueness (guaranteed by this being a direct
sum decomposition), any representation with respect to this basis will
produce the lower-left block of zeros (by the first representation
theorem).</p>
<p>If <mathjax>$M_1, M_2$</mathjax> are both <mathjax>$A$</mathjax>-invariant, then the top-right block can also be
zero (by symmetry); in fact, the basis we chose earlier will yield this
result. The proof is identical.</p>
<p>Example:</p>
<ul>
<li>
<p>Suppose <mathjax>$A$</mathjax> has one eigenvector. I'm going to write then <mathjax>$\mathbb{C}^n
= N(A - \lambda_1 I)$</mathjax>. Let us just understand this example as we leave
today's class: <mathjax>$A$</mathjax> is no longer diagonalizable. Going back to our second
representation theorem and applying it to this case, we'll derive the
Jordan form (not on the midterm).</p>
</li>
<li>
<p>Suppose <mathjax>$A$</mathjax> has <mathjax>$n$</mathjax> eigenvectors. <mathjax>$\mathbb{C}^n = \Oplus_i N(A -
\lambda_i I)$</mathjax>. By the second representation theorem, we are guaranteed that
this matrix is diagonalizable.</p>
</li>
</ul>
<p><a name='15'></a></p>
<h1>Jordan Form</h1>
<h2>October 11, 2012</h2>
<p>We're going to finish up talking about the case where we don't have <mathjax>$n$</mathjax>
eigenvectors. Recall that if we have <mathjax>$n$</mathjax> distinct eigenvalues, we are
guaranteed <mathjax>$n$</mathjax> eigenvectors, but this is not a necessary condition.</p>
<p>We'll then start an example, the linear quadratic regulator.</p>
<p>Recall: we defined the characteristic polynomial <mathjax>$\hat{\chi}_A(s)$</mathjax> as
<mathjax>$\det(sI-A)$</mathjax>, which we rewrote as <mathjax>$\hat{\psi}_A(s) = \prod_i (s -
\lambda_i)^{d_i}$</mathjax>. We're using <mathjax>$d_i$</mathjax> to represent the (algebraic)
multiplicity of eigenvalue <mathjax>$\lambda_i$</mathjax>. The Cayley-Hamilton theorem tells
us that the characteristic polynomial evaluated on the matrix <mathjax>$A$</mathjax> is the
<mathjax>$n$</mathjax>-by-<mathjax>$n$</mathjax> zero matrix.</p>
<p>Now we're going to define something new. We've been talking about the
characteristic polynomial, and we know its properties. The roots of the
characteristic polynomial are the eigenvalues of the matrix, and by
Cayley-Hamilton, every matrix satisfies its characteristic polynomial.</p>
<p>Let's now define something called the minimal polynomial of <mathjax>$A$</mathjax>, The
<strong>minimal polynomial</strong> <mathjax>$\hat{\psi}_A(s)$</mathjax> is the polynomial of least-degree
such that <mathjax>$\hat{\psi}_A(A) = \theta_{n \times n}$</mathjax>. The question is whether
there is a polynomial of degree less than <mathjax>$n$</mathjax> such that this identity is
satisfied. It is the structure of the characteristic polynomial and that of
the minimal polynomial that allows us to explore the Jordan form.</p>
<p>Just by definition, we have the result that <mathjax>$\hat{\psi}_A(s)$</mathjax> divides
<mathjax>$\hat{\chi}_A(s)$</mathjax>. Proof: by the definition of "divides", we have a
remainder term <mathjax>$\hat{r}$</mathjax>; by linearity, <mathjax>$\hat{r}(A) = 0$</mathjax>. <mathjax>$\hat{r}$</mathjax> must
either be lesser degree than <mathjax>$\hat{\psi}$</mathjax> (which would violate our
definition), or it is identically the zero polynomial.</p>
<p>Thus, we can write <mathjax>$\hat{\psi}_A(s) = \prod_i (s - \lambda_i)^{m_i}$</mathjax>, <mathjax>$m_i
\le d_i$</mathjax>.</p>
<p>It turns out that the exponents <mathjax>$m_i$</mathjax> in the minimal polynomial are
determined by the largest Jordan block associated with <mathjax>$\lambda_i$</mathjax>. (We
define a <strong>Jordan block</strong> to be a block of the Jordan form that contains
only ones as the super-diagonal elements.) In general, given <mathjax>$A$</mathjax>, how do we
find its minimal polynomial?</p>
<p>(thoughts: consider each Jordan block of <mathjax>$A - \lambda_i$</mathjax> to be analogous to
a linear feedback shift register, with no new inputs; as such, it shows
that the largest Jordan block has nilpotency corresponding to the number of
registers, and all other Jordan blocks for the eigenvalue reach 0
earlier. Alternately, repeat the above while considering raising and
lowering operators)</p>
<p><strong>Theorem</strong>: <mathjax>$\mathbb{C}^n = \Oplus_i N(A - \lambda_i I)^{m_i}$</mathjax>, where <mathjax>$N(A -
\lambda_i I)^{m_i}$</mathjax> is of dimension <mathjax>$d_i$</mathjax>.</p>
<p>Recall: if there were <mathjax>$d_i$</mathjax> eigenvectors associated to <mathjax>$\lambda_i$</mathjax> for all
<mathjax>$i$</mathjax>, then the matrix would be diagonalizable. Generally, the things you add
to this form are called generalized eigenvectors.</p>
<p>Proof: We have the minimal polynomial <mathjax>$\hat{\psi}_A(s)$</mathjax>. Consider that
<mathjax>$\hat{\psi}_A(s) = \frac{1}{\prod_i (s - \lambda_i)^{m_i}} = \sum_j
\frac{\hat{n}_i(s)}{(s-\lambda_i)^{m_1}}}$</mathjax>. Now, multiplying both sides by
the minimal polynomial, we end up with <mathjax>$1 = \sum \hat{n}_1(s)\hat{p}_i(s)$</mathjax>,
as in the notes. Evaluating this polynomial at the matrix <mathjax>$A$</mathjax>, we get <mathjax>$I =
\sum \hat{n}_i(A)\hat{p_i}(A)$</mathjax>. Multiplying by an arbitrary <mathjax>$x \in
\mathbb{C}^n$</mathjax>, <mathjax>$x = \hat{n}_1(s)\hat{p}_i(s)x$</mathjax>, which we've written as the
sum of <mathjax>$\sigma$</mathjax> terms.</p>
<p>In general, <mathjax>$x_i = \hat{n}_i(A)\hat{p}_i(A)x$</mathjax>. Since <mathjax>$\hat{p}_i(s)
=\frac{\hat{\psi}_A(s)}{(s - \lambda_i)^{m_i}}$</mathjax>, we know that <mathjax>$(A -
\lambda_i I)^{m_i}x_i = \theta$</mathjax> since from the above, <mathjax>$x_i \in N(A -
\lambda_i I)^{m_i}$</mathjax>.</p>
<p>In order to prove the theorem, we must next show that this decomposition is
<strong>unique</strong> (meaning that for any vector <mathjax>$x$</mathjax>, there is not another way you
can break it up into <mathjax>$\sum x_i$</mathjax>; see notes), and that the nullity of <mathjax>$(A
- \lambda_i I)^{m_i} = d_i$</mathjax>, which is the multiplicity of <mathjax>$(s - \lambda_i)$</mathjax> in
the characteristic polynomial.</p>
<p>As a concrete but slightly abstract example, we're going to look at the
geometric structure of eigenspaces.</p>
<h2>Geometric Structure of Eigenspaces</h2>
<p>Consider <mathjax>$A$</mathjax> with just one eigenvalue, whose characteristic polynomial is
given as defined above. In this case, <mathjax>$d_1 = n, 1 \le m_1 \le d_1$</mathjax>.</p>
<p>Suppose <mathjax>$n=6, m=3$</mathjax>. One question is whether you can uniquely determine a
Jordan form (no; we'll see this in a second). What we're saying from the
theorem we just proved is that we can decompose <mathjax>$\mathbb{C}^6 = N(A -
\lambda I)^3$</mathjax>. We don't have enough information yet to determine what the
Jordan form looks like, and how the characteristic and minimal polynomials
lead to the Jordan form.</p>
<p>Suppose, in addition, that <mathjax>$N(A - \lambda I) = \mathrm{span}\{e_1, e_2,
e_3\}$</mathjax> (where <mathjax>$e_i$</mathjax> is an eigenvector of <mathjax>$A$</mathjax> associated to the eigenvalue
<mathjax>$\lambda$</mathjax>). Further, suppose that <mathjax>$N(A - \lambda I)^2 = N(A - \lambda I)
\oplus v_1 \oplus v_2$</mathjax>, where <mathjax>$v_1, v_2$</mathjax> are linearly
independent solutions of <mathjax>$(A - \lambda I)$</mathjax>.</p>
<p>Then, suppose without loss of generality that <mathjax>$(A = \lambda I) v_1 = e_1$</mathjax>,
and <mathjax>$(A = \lambda I) v_2 = e_2$</mathjax>.</p>
<p>Finally, suppose <mathjax>$N(A - \lambda I)^3 = N(A - \lambda I)^2 \oplus w_1$</mathjax>,
where <mathjax>$(A - \lambda I)w_1 = v_1$</mathjax> (without loss of generality, again).</p>
<p>In this case (and in general), <mathjax>$e_1$</mathjax> is an eigenvector associated to
<mathjax>$\lambda$</mathjax>, <mathjax>$v_i$</mathjax> is a generalized eigenvector (of degree 1) associated to
<mathjax>$\lambda, e_1$</mathjax>, and <mathjax>$w_1$</mathjax> is a generalized eigenvector (of degree 2)
associated to <mathjax>$\lambda, e_1, v_1$</mathjax>.</p>
<p><mathjax>$e_2$</mathjax> is also an eigenvector associated to <mathjax>$\lambda$</mathjax>, and <mathjax>$v_2$</mathjax> is also a
generalized eigenvector corresponding to <mathjax>$\lambda, e_1$</mathjax>.</p>
<p>Finally, <mathjax>$e_3$</mathjax> is an eigenvector associated to <mathjax>$\lambda$</mathjax>.</p>
<p>As such, through these definitions, we can construct a similarity
transformation that gives us a canonical form for the matrix <mathjax>$A$</mathjax>, the
Jordan form.</p>
<h2>Constructing the Jordan Form</h2>
<p><mathjax>$$
Ae_1 = \lambda e_1
\\ Ae_2 = \lambda e_2
\\ Ae_3 = \lambda e_3
\\ Av_1 = \lambda v_1 + e_1
\\ Av_2 = \lambda v_2 + e_2
\\ Aw_1 = \lambda w_1 + v_1
$$</mathjax></p>
<p>I know I can always construct these (in a linearly independent
fashion). Now, consider a similarity transform <mathjax>$J = TAT^{-1}$</mathjax>, where
<mathjax>$T^{-1} = \begin{pmatrix} e_1 &amp; v_1 &amp; w_1 &amp; e_2 &amp; v_2 &amp; e_3
\end{pmatrix}$</mathjax>. I'm putting my eigenvectors with their generalized
eigenvectors. <mathjax>$J$</mathjax> is the Jordan form of <mathjax>$A$</mathjax>, which is effectively a
diagonal matrix with some ones in the super-diagonal. This Jordan form
might have been moved around, so it's not unique. However, the Jordan form
is unique up to permutations of the Jordan blocks.</p>
<p>Note: we can't have multiple generalized eigenvectors that correspond to
the same (generalized) eigenvector, or else they are no longer linearly
independent with the generalized eigenvectors established thus far; we
require this chain structure.</p>
<p>What we've done is define a construct called the generalized
eigenvector. We'll start next Thursday's class with computing <mathjax>$e^{Jt}$</mathjax>.</p>
<p><a name='16'></a></p>
<h1>Functions of Jordan Form, Introduction to Optimal Control</h1>
<h2>October 18, 2012</h2>
<p>Alternative derivation of linear quadratic regulator; intro to optimal
control via LQR (Dr. Jerry Ding).</p>
<p>Today's a bit of a different lecture: finish up the last note we have in
this functions of a matrix. If we want to compute an arbitrary analytic
function of a matrix (which we showed we could generally transform to its
Jordan canonical form), how do we do that? We'll then go on to optimal
control.</p>
<p>For the first five or ten minutes, let's talk about this Jordan form and
where we ended last day.</p>
<h2>Functions of a matrix (<mathjax>$n\times n$</mathjax>) in Jordan Form</h2>
<p>From what we derived last day, we can write down a minimal polynomial, and
recall that the exponent corresponds to the largest Jordan block for that
polynomial. This Jordan form could have come (generally) from a matrix <mathjax>$A$</mathjax>,
and we could have computed <mathjax>$J$</mathjax> through the similarity transform we
discussed last day. We recall that the columns of <mathjax>$T^{-1}$</mathjax> are the
eigenvectors and generalized eigenvectors. The number of Jordan blocks
tells you the number of eigenvectors of the matrix.</p>
<p><mathjax>$f(A) = f(T^{-1} J T) = T^{-1} f(J) T$</mathjax>. We claim that if you have an
analytic function <mathjax>$f$</mathjax> (i.e. locally given by a convergent power series). If
we have the Jordan form and an analytic function, we have the result that
if we compute <mathjax>$f(J)$</mathjax> (which we can do by applying <mathjax>$f$</mathjax> to each of the Jordan
blocks), we can use these results to compute <mathjax>$f(A)$</mathjax>.</p>
<p>So we look at each Jordan block, and when we compute <mathjax>$f$</mathjax> of each Jordan
block, the diagonal elements are just the function applied to the
eigenvalues, they're all diagonal matrices, where all entries to the right
of the diagonal are nth derivatives of <mathjax>$f$</mathjax>, evaluated at the eigenvalues.</p>
<p>If in general you were looking at a Jordan block, the claim is as follows:
the elements are of the following structure:</p>
<p><mathjax>$$(T^{-1}_k)_{ij} = f^{[j - i]}(\lambda_k)\theta(j - i + \epsilon)$$</mathjax></p>
<p>In the notes, we prove it -- this is a very interesting proof, but it's
very involved and doesn't add much to our discussion. I'll refer you to the
notes; we use the minimal polynomial structure and what's called the method
of interpolating polynomials.</p>
<p>This claim gives us an important theorem, which we can pull off from this
result, the <strong>spectral mapping theorem</strong>, which tells us something about
the eigenvalues of a function of a matrix. Recall we used <mathjax>$\sigma$</mathjax> to
represent the spectrum (set of eigenvalues). This theorem says that
<mathjax>$\sigma(f(J)) = f(\sigma(J))$</mathjax>. Specifically we see this from the Jordan
form, and since we know that we can always construct this matrix, we can
consider that this applies to all matrices.</p>
<p>This is probably one of our most powerful tools: we can look at the
eigenvalues of the matrix, and by the spectral mapping theorem, we can
consider the effect of the function of the eigenvalues to figure something
out about the matrix.</p>
<p>Another thing: 10:18 is apparently the California shake-down day. We
practice what we're supposed to do in the event of an earthquake.</p>
<h2>Linear Quadratic Optimization</h2>
<p>You have some linear time-varying system <mathjax>$x(t) = A(t)x(t) + B(t)u(t)$</mathjax> with
initial condition <mathjax>$x(0) = x_0$</mathjax>. Consider our state to be in <mathjax>$\Re^n, u \in
\Re^{n_i}, t \in [t_0, t_1]$</mathjax>. Our objective function in this case will be
the integral of a penalty on control plus a penalty on state plus a
terminal cost, i.e. <mathjax>$J(u) = \frac{1}{2}\bracks{\int_{t_0}^{t_1}
(\mag{u(t)}^2_2 + \mag{C(t)x(t)}^2_2) dt + x^T(t_1)Sx(t_1)}$</mathjax>. In some
sense, we want to penalize the deviation from the system.</p>
<p>Our optimization space <mathjax>$U$</mathjax> here is a set of piecewise-continuous
functions. Obviously not differentiable at points of discontinuity, so it
satisfies the differential equation almost everywhere.</p>
<p>The quadratic optimization problem is this: compute the optimal cost of our
objective function, subject to LTV, i.e. <mathjax>$J^* = \min_{u \in U} J(u)$</mathjax>.</p>
<p>What are the applications? e.g. stabilization and minimum energy control,
set point tracking. More interestingly, you can do trajectory tracking (one
slight problem is you also have some dynamics in the trajectory that you
want to track; this changes the structure of the controller a little bit,
but all the ideas are the same).</p>
<p>Finally, an interesting problem is LQG (linear quadratic gaussian)
control. Suppose you have a plant, which is linear, pluss some gaussian
noise. With some input, we get some output (and in general you don't want
to observe the full state vector). Estimate system's state based on
measurements and compute control. With the LQG problem, you're minimizing
the control. We have a Kalman filter fed directly into a LQR. This is one
of the famous instances of the equivalence principle, i.e. you can do the
control completely separately. This is one of the few cases in stochastic
control where you actually have this equivalence principle.</p>
<p>There are typically two types of solutions: either go by dynamic
programming or optimality condition. So what does optimality condition
mean?</p>
<p>Suppose we have a continuously differentiable scalar function on Euclidean
space, and we want to solve <mathjax>$\min_{x \in \Re^n} \phi(x)$</mathjax>. If <mathjax>$x$</mathjax> is
optimal, then the gradient of the cause function is 0 (i.e. <mathjax>$\nabla \phi(x)
= 0$</mathjax>. Your optimal point must be one of your critical points; you don't
know which one you're working with. In a lot of numerical optimization
problems, what you're trying to do is find the gradient; you're trying to
proceed in the direction of least cost. However, you can get stuck in a
local minimum.</p>
<p>Equivalently, a similar condition should hold. Equivalent form: the
directional derivative <mathjax>$\nabla_v\phi(x) = 0 \forall v \in \Re^n$</mathjax> (what
we're doing is taking an <mathjax>$\epsilon$</mathjax> deviation along this direction and
figuring out what the perturbation is). Ideally, you'd like to find a
similar derivative for the cost function, but you've got a few problems:
(1) perturbation in <mathjax>$u$</mathjax>. The nice thing about <mathjax>$U$</mathjax> is that it's a vector
space. We can now define our perturbation as follows: <mathjax>$u_\epsilon = u +
\epsilon\delta u, \delta u \in U$</mathjax>.</p>
<p>Next question: how does the cause vector change as a result of control? The
trajectory of the system with this input is <mathjax>$x_2(t) \Phi(t,t_0)\x_0 +
\int_{t_0}^t \Phi(t, \tau) B(\tau)(u(\tau) + \epsilon\delta u(\tau))d\tau =
x(t) + \epsilon \int_{t_0}^{t_1} \Phi(t,\tau)B(\tau) \delta u(\tau) d\tau$</mathjax>;
we can call this second term <mathjax>$\delta x(t)$</mathjax>, which we then refer to as our
perturbation of the state trajectory. Basically, this is the zero-state
response.</p>
<p><mathjax>$\delta \dot{x} = A(t) \delta x(t) + B(t) \delta u(t); \delta x(0) = 0$</mathjax>. </p>
<p><mathjax>$$J(u + \epsilon\delta u = \frac{1}{2}\bracks{\int_{t_0}^{t_1}\mag{u +
\epsilon \delta u}^2_2 + \mag{C(x + \epsilon \delta x)} dt + (x(t_1) +
\epsilon \delta x(t_1))^T S (x(t_1) + \epsilon \delta x(t_1))} = J(u) +
\epsilon F(u, \delta u) + \epsilon^2 Q(\delta u)$$</mathjax></p>
<p>What you see is that if you introduce an <mathjax>$\epsilon$</mathjax> perturbation in your
input, you get an <mathjax>$\epsilon$</mathjax> perturbation in your output. As such, the
directional derivative is well-defined, and can be found explicitly as
<mathjax>$F(u, \delta u)$</mathjax>. This is the first variation of <mathjax>$J$</mathjax>. This first variation
we can write down in an analogous manner to the static analysis we talked
about.</p>
<p><mathjax>$u$</mathjax> is optimal <mathjax>$\iff \nabla_{\delta u}J(u) = 0, \forall \delta u \in U$</mathjax>
(this is two-way because it is a convex optimization problem).</p>
<p>Two parts: first assume <mathjax>$u$</mathjax> is optimal. This implies that <mathjax>$\forall \delta
u, \epsilon, J(u + \epsilon\delta u) \ge J(u)$</mathjax>. which means that <mathjax>$J(u +
\epsilon \delta u) - J(u) = \epsilon(F(u, \delta u) + \epsilon Q(\delta u))
\ge 0$</mathjax>. Case 1: <mathjax>$\epsilon &gt; 0$</mathjax>, so <mathjax>$\nabla_{\delta u}J(u) + \epsilon
Q(\delta u) \ge 0 \implies \nabla_{\delta u} J(u) \ge 0$</mathjax>. In case 2,
<mathjax>$\epsilon &lt; 0 \implies \nabla_{\delta u} J(u) + \epsilon Q(\delta u) \le 0
\implies \nabla_{\delta u} J(u) \le 0 \implies \nabla_{\delta u} J(u) = 0$</mathjax>.</p>
<p>For the other direction, assume <mathjax>$\nabla_{\delta u} J(u) = 0, \forall \delta
u$</mathjax>. <mathjax>$J(u + \epsilon \delta u) = J(u) + \epsilon \nabla_{\delta u}J(u) +
\epsilon^2 Q(\delta u) \ge J(u)$</mathjax> (<mathjax>$\forall \delta u, \epsilon$</mathjax>), so <mathjax>$u$</mathjax> is
optimal.</p>
<p>This is where we deviate from the approach in Callier and Desoer. Because
of this result, what we're really interested in is this first
variation. Now I'll write down more explicitly what that cross-term
is. <mathjax>$\nabla_{\delta u} J(u) = \int_{t_0}^{t_1} \bracks{u(t)^T \delta u(t) +
x(t)^T C(t)^TC(t)\delta x(t)} dt + x(t_1)^T S\delta x(t_1)$</mathjax>.</p>
<p>We'll begin by introducing a quantity that we call the control Hamiltonian:
<mathjax>$H(x, p, u, t) = p^T(A(t)x + B(t)u) + t_2(u^Tu + x^TC(t)^T C(t)x)$</mathjax>. <mathjax>$p$</mathjax> is
basically your costate. The thing to note here is that your original system
equation came from <mathjax>$\dot{x} = \nabla_p H(x, p, u, t)$</mathjax>. The costate is
basically if you took the negative gradient with respect to <mathjax>$x$</mathjax>, i.e. <mathjax>$p =
-\nabla _x H(x, p, u, t)$</mathjax>. This comes from classical mechanics and
conservation of energy. It's sort of intuitive as an optimal control
problem that there's a coupling relationship between your dynamics and your
system, and you're trying to descend to the state of lowest cost
(minimizing the action). Instead of solving the dynamic optimization
problem, all we're going to minimize is our Hamiltonian.</p>
<p>I'll try to give an informal derivation for now. Here, <mathjax>$\dot{p}(t) =
-A^T(t) p(t) - C(t)^TC(t) x(t)$</mathjax>; <mathjax>$p(t_1) = \nabla_x \parens{\frac{1}{2}
x^T(t_1) S(x(t_1))} = Sx(t_1)$</mathjax>. What we do now is take a time derivative of
<mathjax>$p(t)^T \delta x(t)$</mathjax>. When you evaluate this derivative, something magical
sort of happens: you get <mathjax>$p(t)^T \delta x(t) + p(t)^T\delta x(t)$</mathjax>. What you
can show from this is if you just plug this in, <mathjax>$= -x(t)^T C(t)^T C(t)
\delta x(t) + p(t)^T B(t) \delta u(t)$</mathjax>. If you integrate both sides of this
expression from <mathjax>$t_0$</mathjax> to <mathjax>$t_1$</mathjax>, you get <mathjax>$p(t_1)^T \delta x(t_1) - p(t_0)^T
\delta x(t_0) = \int_{t_0}^{t_1} {p(t)^T B(t) \delta u(t) - x(t)^T C(t)^T
C(t) \delta x(t)} dt$</mathjax>.</p>
<p>Rearranging terms, this basically becomes <mathjax>$x(t_1)^T S x(t_1) +
\int_{t_0}^{t_1} x(t)^T C(t)^T C(t) \delta x(t) dt = \int{t_0}^{t_1} p(t)^T
B(t) \delta u(t) dt$</mathjax>. Essentially, we can rewrite the first variation as
<mathjax>$\nabla_{\delta u} J(u) = \int_{t_0}^{t_1} \parens{u(t)^T + p(t)^T B(t)}
\delta u(t) dt$</mathjax>. If <mathjax>$u$</mathjax> is optimal, then <mathjax>$\int_{t_0}^{t_1} \mag{u(t)^T +
p(t)^T B(t)}^2_2 dt = 0$</mathjax>, namely <mathjax>$\nabla_u H(x, p, u, t) = 0$</mathjax>. This
informal proof doesn't tell you what the critical points are.</p>
<p>If you go through a more formal proof, we'd apply Pontryagin's minimum
principle: if <mathjax>$u$</mathjax> is optimal, let <mathjax>$(x,p)$</mathjax> be corresponding state and
costate trajectories. Along this trajectory, <mathjax>$H(x(t), p(t), u(t), t) =
\min_{u \in \Re^{n_i}} H(x(t), p(t), u,t)$</mathjax>. Basically, you get a feedback
function in terms of <mathjax>$x$</mathjax>. However, this is only a necessary condition,
so you only have a set of possible solutions.</p>
<p>Applying the minimum principle, the unique solution to this problem <mathjax>$\min_u
H(x, p, u, t)$</mathjax> is <mathjax>$U^* = -B^T p$</mathjax>. Basically, what this says is by the
minimum principle, if <mathjax>$u$</mathjax> is optimal, then it must have this form. If you
knew nothing about optimal control, and you just wrote down the
Hamiltonian, this gives you the form, if it exists at all.</p>
<p>You then have to go back and see which value works, and then going back to
the other problem (which is both necessary and sufficient, since this is a
convex optimization problem), then you can verify the solution.</p>
<p>This is basically the main result for this linear quadratic optimization
problem. Theorem 2.1.157 from Callier and Desoer: </p>
<p><strong>The optimal solution to (LQ) is <mathjax>$u(t) = -B(t)^Tp(t)$</mathjax>, for almost every <mathjax>$t
\in [t_0, t_1]$</mathjax>, where (x,p) is the solution of <mathjax>$\dot{x} = A(t)x(t) +
B(t)B(t)^Tp(t), x(t_0) = x_0$</mathjax> and <mathjax>$\dot{p} = C(t)^TC(t)x(t) - A(t)^T
p(t), p(t_1) = Sx(t_1)$</mathjax>. So this is basically the structure of the
controller.</strong></p>
<p>The rest of the proof tries to find a more tractable form that you can
compute. Because in general this two-point boundary problem is hard to
solve, you want to turn this into an initial value or terminal condition
problem. You can represent your co-state as a linear function of your state
<mathjax>$p(t) = P(t)x(t)$</mathjax>; what you get out is a linear state feedback law: <mathjax>$u(t) =
-B(t)^TP(t) x(t)$</mathjax>. This matrix then evolves as follows: <mathjax>$-\dot{P}(t) =
A(t)^T P(t) + P(t) A(t) - P(t)B(t)B(t)^T P(t) + C(t)^T C(t), P(t_1) =
S$</mathjax>.</p>
<p>The thing to note is that this equation is derived from the theorem.</p>
<p><a name='17'></a></p>
<h1>Linear Quadratic Optimization</h1>
<h2>October 23, 2012</h2>
<p>How old is control theory? As old as Troy. Legend goes that Aeneas, when in
Carthage met Queen Dido and discovered the following legend: local
chieftains of Carthage gave her a piece of oxhide and told her she could be
the queen of whatever she could cover with the oxhide. Legend says that she
took the oxhide, cut it into a long thin strip, and she solved the
corresponding isoparametric optimization problem.</p>
<p>Mechanics is a special case of optimal control. Part of what we're not
trying to do is drag you through thousands of years of thinking, but rather
to encapsulate so many years of human thought into two lectures.</p>
<p>Jerry gave you one version of this; different way, to get to where optimal
control is going.</p>
<p>Start by writing down a linear system: <mathjax>$\dot{x} = A(t)x + B(t)u, y(t) =
C(t)x + D(t) u$</mathjax>. Associated with this linear system, one defines the dual
system (also known as the costate) -- last time we represented this with
<mathjax>$p$</mathjax>. Following from the notes, <mathjax>$-\dot{\tilde{x}} = A^*\tilde{x} + C^*(t)
\tilde{u}, \y = B^*\tilde{x} + D^*\tilde{u}$</mathjax>. Recall that these are linear
maps between continuous functions. This is the adjoint (or dual) system,
which plays a large role in optimal control.</p>
<p>In circuit theory or mechanical systems, you might find the dual of a
circuit by replacing capacitors with inductors, series with parallel, etc,
or you might see mass-damper spring systems. And so that's the relationship
between them.</p>
<p>A pretty interesting equality that goes into them is so-called the pairing
lemma, which is a way to associate the initial conditions and final
state. The pairing lemma states that <mathjax>$\braket{\tilde{x}(t)}{x(t)} +
\int_{t_0}^t \braket{\tilde{u}(\tau)}{y(\tau)}d\tau = \braket
{\tilde{x}(t_0)}{x(t_0)} + \int_{t_0}^t \braket{\tilde{y}(\tau)} {u(\tau)}
d\tau$</mathjax>. The proof is not very involved: start with <mathjax>$\braket{\dot{\tilde{x}}
+ A^*\tilde{x} + C^*\tilde{u}}{x} + \braket{-y + B^*\tilde{x} +
D^*\tilde{u}}{u} = 0$</mathjax>; work through algebra. Think of this as a little
algebraic fact; it has some meaning in terms of the language of the
adjoint, but let's not get too sidetracked yet.</p>
<p>Using the pairing lemma, there's a more algebraic proof, compared to the
one on Thursday. In these notes, we set up the optimal control problem.</p>
<p>The optimal control problem that you give yourself is stated for the given
system: minimize, for a certain choice of <mathjax>$u$</mathjax>, the cost function as defined
last time: <mathjax>$J(u) = \frac{1}{2}\bracks{\int_{t_0}^{t_1} (\mag{u(t)}^2_2 +
\mag{C(t)x(t)}^2_2) dt + x^T(t_1)Sx(t_1)}$</mathjax> (note that <mathjax>$S &gt; 0$</mathjax>, so it's a
Hermitian positive semi-definite matrix). Note: there's a cost on the final
state, cost associated with regulation of state, and in fact, this whole
thing doesn't need you to define an output <mathjax>$y$</mathjax>. Optimal control says that
given some dynamics, maximize this.</p>
<p>The reason you talk about Newton and Calculus of variations is that Newton
considered this. Newton's laws were a consequence of the particles
minimizing the total action (<mathjax>$(\mag{u(t)}^2_2 + \mag{C(t)x(t)}^2_2)$</mathjax> is the
Lagrangian).</p>
<p>The way you do this is via calculus of variations: <mathjax>$J(u + \epsilon\delta
u(t)) = J(u) + \epsilon\delta J(\delta u) + O(\epsilon)$</mathjax>. If <mathjax>$u$</mathjax> were the
best choice, <mathjax>$\forall \delta u$</mathjax> (in some class), <mathjax>$\delta(\delta u) = 0$</mathjax>.</p>
<p>Newton invented calculus to solve these calculus of variations problems
because he wanted to describe how things moved.</p>
<p>So what you get from this is that <mathjax>$\delta u$</mathjax> causes a <mathjax>$\delta x$</mathjax>, and as
last time, you get <mathjax>$\delta \dot{x} = A\delta x + B \delta u, \delta x(t_0)
= 0$</mathjax>. From this (when you do the first variation), we saw that you get
<mathjax>$\int_{t_0}^{t_1} \braket{u}{\delta u} dt + \int_{t_0}^{t_1}\braket{C^*Cx}
{\delta x}dt + \braket{Sx}{\delta x_1}$</mathjax>. This is a necessary condition for
optimality, but it isn't so explicit. The reason to use this pairing lemma
is to simplify the differential equation. I'm going to associate with the
system <mathjax>$\delta\dot{x} = A\delta x + B\delta u$</mathjax> a dummy output, i.e. <mathjax>$y =
\delta x$</mathjax>. The dual of this system is simply <mathjax>$-\dot{\tilde{x}} =
A^*\tilde{x} + \tilde{u}, \tilde{y} = B^*\tilde{x}$</mathjax>.</p>
<p>Using the pairing lemma and working through a bit of algebra, we get that
<mathjax>$\int_{t_0}^{t_1} \bracks{\braket{u}{\delta u} + \braket{B^* \tilde{u}}
{\delta u}}dt = 0$</mathjax>, so <mathjax>$\int_{t_0}^{t_1} \bracks{\braket{u +
B^*\tilde{x}}{\delta u}dt = 0$</mathjax>.</p>
<p>(aside: Kalman filtering is the dual of optimal control)</p>
<p>The Hamiltonian system: <mathjax>$\begin{bmatrix}\dot{x} \\ \dot{\tilde{x}}
\end{bmatrix} = \begin{bmatrix}A &amp; -BB^* \\ -C^*C &amp; -A^* \end{bmatrix}
\begin{bmatrix}x \\ \tilde{x} \end{bmatrix}$</mathjax>, <mathjax>$x(t_0) = x_0, \tilde{x}(t_1)
= Sx(t_1)$</mathjax>. In mathematics, we call this a two-point boundary equation. Not
fun to solve, generally.</p>
<p>However, the theorem (of Linear Quadratic Optimal Control) is as follows:
<mathjax>$U = B^*(t) P(t) x(t)$</mathjax>; we have <mathjax>$-\dot{P} = A^*P + PA - PBB^*P + C^*C$</mathjax>,
<mathjax>$P(t_1) = -S$</mathjax>. This has a very famous name associated with it:
Ricatti.</p>
<p>Ricatti did the following: given <mathjax>$x$</mathjax> and <mathjax>$y$</mathjax> which were both linear
equations in terms of each other, and you defined the new state variable <mathjax>$z
= y/x$</mathjax>, what differential equation would <mathjax>$z$</mathjax> satisfy? <mathjax>$\dot{z} = a + bz -
cz + dz^2$</mathjax>. Further, the reason that the original equation looks quadratic
is because <mathjax>$P(t) = \tilde{x} X^{-1} $</mathjax>, where the matrices satisfy the
Hamiltonian system we had before. Amazingly, the answer to this problem is
simply a state feedback law, <mathjax>$u = F(t)x(t)$</mathjax>, where <mathjax>$F(t) =
B^*(t)P(t)$</mathjax>.</p>
<p>If we consider the matrix differential equation <mathjax>$\deriv{}{t}
\begin{bmatrix}X \\ \tilde{X} \end{bmatrix} = \begin{bmatrix}A &amp; -BB^* \\
-C^*C &amp; -A^* \end{bmatrix} \begin{bmatrix}X \\ \tilde{X} \end{bmatrix}$</mathjax>,
<mathjax>$x(t_0) = x_0, \tilde{x}(t_1) = Sx(t_1)$</mathjax> where <mathjax>$X(t_1) = I, \tilde{X}(t_1)
= S$</mathjax>,</p>
<p>What we want to show is that <mathjax>$u(t) = -B^*\tilde{X}X^{-1}x(t)$</mathjax>. One part of
this proof is to show that <mathjax>$X(t)$</mathjax> is invertible (page 36 of C&amp;D, so we'll
skip this).</p>
<p>Begin by choosing k such that <mathjax>$X(t_0)k = x_0$</mathjax>, then define <mathjax>$x(t) = X(t) k,
\tilde{x}(t) = \tilde{X}(t) k$</mathjax>. As an artifact of defining this, <mathjax>$x,
\tilde{x}$</mathjax> satisfy the same expressions, and by the initial conditions,
<mathjax>$x(t_1) = k$</mathjax>, so <mathjax>$\tilde{x}(t_1) = Sx(t_1)$</mathjax>. This shows now that <mathjax>$u(t) =
-B^* \tilde{X} X^{-1} x(t)$</mathjax>, which is our linear feedback system, so we
must show that <mathjax>$\tilde{X} X^{-1} = P$</mathjax>, which is more algebra.</p>
<p>Note: we can find the derivative of <mathjax>$\deriv{}{t}X^{-1}$</mathjax> by differentiating
<mathjax>$X^{-1} X = I$</mathjax>.</p>
<p>If you take the ratio of two quantities in a differential equation, then
this ratio satisfies the Ricatti equation. People say that the Ricatti
equation is the simplest linear differential equation because it's just a
linear differential equation in disguise.</p>
<p>The person we should really credit for finishing off optimal control
(coming up with this composite story) was R. E. Kalman. This paper (1961)
was rejected by most journals, including IEEE.</p>
<p>There had been a Russian called Pontryagin, who came up with a version of
optimal control: in classical mechanics, it's Hamilton's principle of
minimized action; Pontryagin said that given <mathjax>$\int_{t_0}^{t_1} L(x, u)dt +
V(x(t_1))$</mathjax> (Pontryagin's maximum principle). Zadeh and Desoer rewrote
Pontryagin's maximum principle. Kalman, of course, to his credit, made
everything quadratic.</p>
<p>Algebraic Ricatti equation: <mathjax>$C^*C + A^*\bar{P} + \bar{P}A^* - \bar{P}BB^*
\bar{P} = 0$</mathjax>, in time-invariant case. To do this right, you'll have to
cover notions of stability, then controllability and observability. in
undergraduate classes, you heard about root-locus, in which you tweak a
parameter and make sure that poles don't end up in the right-half plane.</p>
<p><a name='18'></a></p>
<h1>Input-Output Stability</h1>
<h2>October 25, 2012</h2>
<p>Think back to LTI systems: consider a system specified by <mathjax>$A, B, C, D$</mathjax>;
assume single-input, single-output; and <mathjax>$D = 0$</mathjax>; <mathjax>$A$</mathjax> is a diagonal
matrix. We've got a keen idea as to what the solution should look like;
we've seen examples like this before. We've got our system model, and we
have a keen sense as to what the solution is.</p>
<p>IO stability refers to the stability of the system from the point of view
of the input/output relation. If we call the input/output response <mathjax>$H(s)$</mathjax>,
we know that <mathjax>$H(s) = Y(s) / U(s)$</mathjax>, and we know from our Laplace transform
when we were looking at solutions to the time-invariant equation, that this
is equal to <mathjax>$C(sI-A)^{-1}B$</mathjax>. We can write <mathjax>$(sI-A)^{-1}$</mathjax> by inspection,
since this is a diagonal matrix.</p>
<p>For a multi-input, multi-output system, with <mathjax>$A$</mathjax> as the same as before, we
still take the Laplace transform; and we now can't just take the same
ratio. The same formulation is still used to compute <mathjax>$H(s)$</mathjax>, which is now a
2-dimensional matrix.</p>
<p>When looking at an I/O relationship, you may not have all of the information
available in <mathjax>$A$</mathjax> in the I/O relationship because of either how you're
controlling <mathjax>$B$</mathjax> or how you're sensing: notion of <em>hidden modes</em>.</p>
<p>Important when we start thinking about I/O stability.</p>
<p>Consider a system whose output <mathjax>$y(t)$</mathjax> is characterized by <mathjax>$\int_{-\infty}^t
H(t,\tau) u(\tau) d\tau$</mathjax>. We define <mathjax>$H(t,\tau)$</mathjax> (the weighting pattern /
function of the system; a matrix for each <mathjax>$(t,\tau)$</mathjax>) and <mathjax>$u(\tau)$</mathjax>, which
are piecewise continuous in <mathjax>$\tau$</mathjax>; require that the integral of the norm
of <mathjax>$H(t,\tau)u(\tau)$</mathjax> is finite.</p>
<p>For linear time-varying systems, <mathjax>$H(t,\tau)$</mathjax> is given by the following
matrix: <mathjax>$C(\tau)\Phi(t,\tau)B(\tau) + D(t)\delta(t-\tau)$</mathjax> (<mathjax>$\delta(t)$</mathjax> here
is the delta function).</p>
<p>For linear time-invariant systems, it's a little easier: we can do the same
thing, but it simplifies: it only depends (unsurprisingly) on the
difference of <mathjax>$t$</mathjax> and <mathjax>$\tau$</mathjax>, so we can write it as <mathjax>$H(t-\tau) =
C\exp(A(t-\tau))B + D$</mathjax> (abusing our notation a bit: to be more precise, we
would then multiply this by a delta centered at <mathjax>$t - \tau$</mathjax>). Then take the
unilateral Laplace transform.</p>
<p>If we have a vector in <mathjax>$\Re^n$</mathjax>, we know that the infinity norm is simply
the max element of a vector. Recall that the corresponding induced norm is
simply the max row sum.</p>
<p>Suppose we have a function <mathjax>$u$</mathjax>; the infinity norm is simply the supremum of
the infinity norm of the resulting vectors. <mathjax>$L_{\infty}^{n_i}$</mathjax> is simply
the set of functions that are finite over the infinity-norm.</p>
<p>We'll consider a particular case of I/O stability, called bounded input,
bounded output stability.</p>
<h2>Bounded Input, Bounded Output Stability (BIBO)</h2>
<p>A system (<mathjax>$L$</mathjax>) is said to be BIBO stable very generally if <mathjax>$u \in
L_\infty^{n_i} \implies y(u) \in L_\infty^{n_o}$</mathjax>, or equivalently, <mathjax>$\exists
K &lt; \infty \st \forall u \in L_\infty^{n_i}, \mag{y(\cdot)}_\infty \le
\mag{u(\cdot)}_\infty$</mathjax>.</p>
<p>Equivalently, a system is not BIBO stable if no k exists for this
condition; that is, <mathjax>$\forall k &lt; \infinity, \exists u \st \mag{u}_\infty &gt;
k\mag{u}\infty$</mathjax>.</p>
<h2>BIBO stability theorem:</h2>
<p>(L) is BIBO stable iff <mathjax>$\sup \int_{-\infty}^t \mag{H(t,\tau)}_{i,\infty}
d\tau &lt; \infty$</mathjax>.</p>
<p><a name='19'></a></p>
<h1>Infinite Horizon, BIBO for LTI systems, State space stability</h1>
<h2>October 30, 2012</h2>
<p>We want to write a function that minimizes the cost function (with <mathjax>$t_0=0$</mathjax>)
<mathjax>$J = \int_0^\infty (y^TQy + u^TRu) dt$</mathjax>. We could consider the first term as
the norm of <mathjax>$\mag{Cx}^2$</mathjax> and the second term as the norm as <mathjax>$\mag{u}^2$</mathjax>;
<mathjax>$Q, R$</mathjax> specified by designer; only rule we have to follow here is that <mathjax>$Q$</mathjax>
is positive semi-definite (all eigenvalues are in closed right half plane)
and <mathjax>$R$</mathjax> is a positive-definite matrix (all eigenvalues are strictly
positive). Typically, Q, R are chosen to be diagonal, since we generally
like to penalize components of the input and output relative to each other.</p>
<p>Penalty on an eigenvector <mathjax>$y_i$</mathjax> being nonzero is simply the associated
eigenvalue. What's typically important is the ratio of the penalties.</p>
<p>Optimization problem whose constraints are defined by a dynamical
system. It turns out that the optimal input is given by linear state
feedback where the state is multiplied by a matrix <mathjax>$F$</mathjax> determined by the
system model and the matrices of the cost function: (as seen before): <mathjax>$u =
-Fx, F = R^{-1}B^T P$</mathjax>, where <mathjax>$P$</mathjax> is the positive-definite solution to the
algebraic Ricatti equation: <mathjax>$PA + A^TP - PBR^{-1}B^TP + C^TQC = 0$</mathjax>. <mathjax>$P \in
\Re^{n \times n}$</mathjax>. Out of all possible inputs, the best one is given by
linear state feedback.</p>
<p>We have some intuition given the linear time-varying case, which is more
complicated; easy to justify through completing the square. Intuition as to
what cost means; now we just need to solve this equation (there's a MATLAB
routine for this, ARE).</p>
<p>If we were to take the derivative of <mathjax>$x^TPx$</mathjax> (<mathjax>$\dot{x}^TPx + x^TP\dot{x}$</mathjax>)
and integrate it, we get <mathjax>$(x^TPx)(\infty) - (x^TPx)(0)$</mathjax>. Since we defined
<mathjax>$J = \int_0^\infty (y^TQy + u^TRu)dt$</mathjax>, <mathjax>$J - (x^TPx)(0) = -(x^TPx)(\infty) +
\int_0^\infty (y^TQy + u^TRu)dt + \int_0^\infty (\deriv{}{t}(x^TPx)
dt$</mathjax>. From the definition of <mathjax>$\dot{x}$</mathjax>, we can rewrite the derivative. After
some tweaking, <mathjax>$J - (x^TPx)(0) = -(x^TPx)(\infty) + \int_0^\infty (u +
R^{-1}B^TPx)^TR(u + R^{-1}B^TPx)dt$</mathjax></p>
<p>By our hypothesis, we are assuming that <mathjax>$P$</mathjax> solves the algebraic Ricatti
equation. We can use that to simplify the equation. If the system under
this control law is stable (exponential, internal stability), then
<mathjax>$x^TPx(\infty) = 0$</mathjax>. If we choose <mathjax>$u = -R^{-1}B^TPx$</mathjax>, then the term under
the integral is 0, and so <mathjax>$J$</mathjax> is minimized -- cost depends only on initial
state, which you can't affect.</p>
<p>The other thing we talked about was the positive semi-definite solution to
this equation. Has only one positive semi-definite solution (symmetric). We
can also prove that if you choose this control input, then it will
stabilize the system: not only optimal control law, but also stabilizing
control law.</p>
<p>We defined BIBO stability as describing a system in which all bounded
inputs provide bounded outputs.</p>
<p>We get the following theorem: if we write out the transfer function of the
weighting pattern of the system, <mathjax>$\hat{H} = C(sI - A)^{-1}B + D$</mathjax>, which is
a proper matrix, <mathjax>$R = [A,B,C,D]$</mathjax> is BIBO stable, then the poles are all in
the open left half of the complex plane (<mathjax>$\mathbb{C}^0_-$</mathjax>).</p>
<p>We can ignore <mathjax>$D$</mathjax> because it's a constant, and so it does not generate (or
cancel) any poles.</p>
<p><a name='20'></a></p>
<h1>State space stability</h1>
<h2>November 1, 2012</h2>
<p>overslept :/</p>
<p><a name='21'></a></p>
<h1>Internal stability, Lyapunov condition for LTI systems, Controllability / Observability</h1>
<h2>November 6, 2012</h2>
<p>LTI case: <mathjax>$\dot{x} = Ax$</mathjax>. Asymptotic / exponential stability iff all
eigenvalues of A in <mathjax>$C_-^o$</mathjax>. Stability iff all eigenvalues in <mathjax>$C_-$</mathjax> and
each eigenvalue on <mathjax>$j\omega$</mathjax> axis has J-block of size 1.</p>
<h2>Connections between internal stability and BIBO stability</h2>
<p>Go back to general linear time-varying case. If we have the result that
equilibrium at 0 (WLOG) is equal to 0 (zero-input case) is exponentially
stable, then assuming (which we do) that <mathjax>$B,C,D$</mathjax> are boudned matrices, then
the system is BIBO stable.</p>
<p>We just need to show, now, that given that this is exponentially stable,
the condition (with weighting patterns and all) for BIBO stability holds.</p>
<p>(show, as an exercise, that given those bounds, and given our knowledge
that this is an exponentially stable system and we can develop a bound on
the norm, that this is BIBO).</p>
<p>What if <mathjax>$\dot{x} A$</mathjax> is just asymptotically stable? Do we have BIBO
stability? Something to think about.</p>
<p>We know from our examples that a system may be BIBO stable but not
internally stable (notion of modes being blocked by <mathjax>$B$</mathjax> or not exposed by
<mathjax>$C$</mathjax>) due to the possibility of what we call <strong>hidden modes</strong>,
i.e. eigenvalues of <mathjax>$A$</mathjax> may be not accessible through the input or
invisible at the output.</p>
<p>So when can you relate BIBO stability to internal stability? Foreshadowing
of what we'll start in a minute.</p>
<p>If we have an LTI system, we say that if the (A,B) pair is completely
controllable, and the (A,C) pair is completely observable, then we call the
system (A,B,C) minimal. If this is the case, then it's true that BIBO
stability is equivalent to internal exponential stability.</p>
<p>For linear time-variant systems, I'll make one remark. Often tempting to
look at eigenvalues of A(t). In general there is no connection between the
spectrum of A(t) (<mathjax>$\sigma(A(t))$</mathjax>) and the stability of the system.</p>
<p>There are two cases in which the eigenvalues tell us something about the
stability:</p>
<ul>
<li>
<p>A is symmetric (or more generally, <mathjax>$\comm{A^\dag A}{AA^\dag} = 0$</mathjax>): A can
be decomposed into an orthonormal basis of eigenvectors corresponding to
real eigenvalues. If <mathjax>$\sigma(A(t)) \le -\mu$</mathjax> (<mathjax>$\mu &gt; 0$</mathjax>), then we have
that equilibrium at 0 is exponentially stable.</p>
</li>
<li>
<p>Proof: Consider <mathjax>$V(x) = x^Tx$</mathjax>. Its time derivative (actually a special
kind of derivative) is <mathjax>$\dot{x}^Tx + x^T\dot{x} = 2x^TAx$</mathjax>. Rewrite <mathjax>$A =
T^{-1} \Lambda T$</mathjax>, and we have <mathjax>$x^TAx = x^{-1}T \Lambda T x$</mathjax> (we have
that <mathjax>$T^{-1} = T^T$</mathjax>). Our initial assumption gives us an upper bound on
the product <mathjax>$x^T A x$</mathjax>, which tells us that <mathjax>$\dot{V} \le -2\mu V$</mathjax>. We can
integrate this inequality and come up with the following equation:
<mathjax>$V(x(t)) \le V(x(0))\exp(-2\mu t)$</mathjax>. Note that <mathjax>$V$</mathjax> is a norm of <mathjax>$x$</mathjax>, so
using this norm, we have that <mathjax>$\mag{x(t)}^2 = \mag{x(0)}^2\exp(-2\mu t)$</mathjax>,
so <mathjax>$\mag{x(t)} = \mag{x(0)}\exp(-\mu t)$</mathjax>.</p>
</li>
<li>
<p>The reason why this proof is interesting is that it invokes a technique
common to control theory: we bring in this function and use the
derivative to establish the stability of the system. This is a special
case of what we call a Lyapunov function. Lyapunov theory is the most
popular tool for assessing stability.</p>
</li>
<li>
<p><mathjax>$Re(A(t)) &lt; -\lambda and \mag{A(t)} \le \epsilon$</mathjax>, with <mathjax>$\epsilon$</mathjax> small
enough, then you can show that <mathjax>$x = A(t)x$</mathjax> is exponentially stable
(beyond the scope of this course).</p>
</li>
</ul>
<p>Today we'll be looking just at Lyapunov as it applies to the time-invariant
case. We spend a third of EE 222 studying Lyapunov theory and its
variations and how to construct Lyapunov functions. Most exportable: use of
external functions to avoid needing to integrate the system itself.</p>
<p>This leads us to our last test: a taste of what Lyapunov theory
is.</p>
<h2>Theorem (Lyapunov condition for exponential stability of <mathjax>$\dot{x} = Ax$</mathjax>)</h2>
<p>We consider the following matrix equation (the <strong>Lyapunov equation</strong>) <mathjax>$A^TP
+ PA = -Q$</mathjax>, where <mathjax>$P,Q$</mathjax> are square matrices. Theorem: the system is
exponentially stable iff <mathjax>$\forall Q = Q^T &gt; 0$</mathjax> the Lyapunov equation has a
unique solution <mathjax>$P = P^T &gt; 0$</mathjax>.</p>
<p>Consider <mathjax>$\fn{V}{\Re^n}{\Re^n} = x^TPx$</mathjax>, where for <mathjax>$Q = Q^T &gt; 0$</mathjax>, $P = P^T</p>
<blockquote>
<p>0$ is the solution to the Lyapunov equation.</p>
</blockquote>
<p>We're going to look at <mathjax>$\dot{V}$</mathjax> (this is the same derivative as before,
the <strong>Lie derivative</strong>: wherever we see the time derivative of the state,
we apply the dynamics of the system). This derivative gives a measure of
how much <mathjax>$V(x)$</mathjax> is changing along the trajectories of the system. It
suffices to say that <mathjax>$\dot{x}$</mathjax> represents the dynamics of the system.</p>
<p>So <mathjax>$\dot{V} = x^TA^TPx + x^TPax = x^T(A^TP + PA)x = -x^TQx$</mathjax>. What we want
to do is relate the changes in <mathjax>$V$</mathjax> back to the changes in <mathjax>$x$</mathjax>. As before,
<mathjax>$\dot{V}(x) = -x^TQx \ge -\lambda_{\min}(Q)\mag{x}^2$</mathjax>. Note that <mathjax>$-\mag{x}^2
\le -\frac{V(x}{\lambda_{\max}(P)}$</mathjax>, and so we can proceed as before.</p>
<p>This in itself is an important result. More of a technique for analysis
than something that you'd use for assessing stability. The introduction of
a Lyapunov function with a specific form (a positive definite function
whose Lie derivative is a negative definite function) is what's useful.</p>
<p>Often the work in using Lyapunov theory is to construct such functions. In
linear system it's easy because it's just given to you.</p>
<p>(four equivalent statements, proven in Callier and Desoer -- one of these
is that if one Q exists such that this holds, then it holds for all Q)</p>
<h2>Controllability / Observability</h2>
<p>Motivation: designing and manipulating the system so you can get it to do
a host of things. What we're going to launch into in the last part of the
class: understanding what controllability of a system means. Part of that
question is whether or not we can sufficiently observe the dynamics of the
system through the output. That tells us how well we can control the
system. Given that, we go on to ponder how to design controllers; we'll use
the feedback topology we've seen twice already.</p>
<p>Two parts: assessment of controllability of systems, controller design.</p>
<p>In the last ten minutes: intuitive discussion about what controllability
and observability mean. These are general to dynamic systems (not specific
to linear systems).</p>
<p>Recall our definition of dynamical systems: five-tuple representation <mathjax>$D =
(\mathcal{U}, \Sigma, \mathcal{Y}, s,r)$</mathjax>, semigroup and state transition
axioms hold. The input is said to <strong>steer</strong> the initial condition <mathjax>$x_0$</mathjax> @
<mathjax>$t_0$</mathjax> to <mathjax>$x_1$</mathjax> @ <mathjax>$t_1$</mathjax> if <mathjax>$x_1 = s(t_1, t_0, x_0, u)$</mathjax>.</p>
<p>We say that D is <strong>controllable</strong> on some time interval if <mathjax>$\forall x_0, x_1
\in \Sigma \exists u \in \mathcal{U}$</mathjax> such that <mathjax>$u$</mathjax> steers <mathjax>$x_0$</mathjax> @ <mathjax>$t_0$</mathjax> to
<mathjax>$x_1$</mathjax> @ <mathjax>$t_1$</mathjax>.</p>
<p>We define controllability to be specific on a time interval. <strong>completely
controllable</strong> (cc) denotes that the system is controllable on all time (if
controllability is not indicated with an interval, it usually denotes
complete controllability). Controllability of the system corresponds to
surjectivity of the state transition function, i.e. <mathjax>$\forall x_0 \in
\Sigma, \fn{s(t_1, t_0, x_0, \cdot)}{U}{\Sigma}$</mathjax> is surjective.</p>
<p>We'll start next day by talking about observability and thinking about more
specific tests.</p>
<p><a name='22'></a></p>
<h1>Controllability and Observability</h1>
<h2>November 8, 2012</h2>
<p>Difference between dynamical system and its representation. Instantiating
it with the representation is simply our model of the system; the system
itself is just the five-tuple. Same system with different representations;
notion of a minimal representation. Controllability and observability are
properties of a representation.</p>
<p>The computational tests for controllability and observability -- these
grammians -- are dependent on the system representations.</p>
<p>We can define it more basically with the properties of the state transition
function and the readout map (rather, recall that the composition of the
two yields the response function).</p>
<p>note: cc = completely controllable, co = completely observable.</p>
<p>We discussed what it meant for an input to steer the system, and then we
defined controllability to be surjectivity of the state transition
function. In other words, a dynamic system <mathjax>$D$</mathjax> is controllable on
<mathjax>$\bracks{t_0, T_1}$</mathjax> if the response map (from inputs to states, given some
initial conditions) is surjective.</p>
<p>We'll now define observability as follows: <mathjax>$D$</mathjax> is said to be observable on
<mathjax>$\bracks{t_0, t_1}$</mathjax> if for all valid input functions and all valid output
functions, the initial state <mathjax>$x_0$</mathjax> at <mathjax>$t_0$</mathjax> is uniquely determined. In
other words, the response map <mathjax>$\rho(t_1, t_0, \cdot, u)$</mathjax> is injective.</p>
<p>What happens with controllability and observability under feedback? We're
going to talk about feedback and one feed-forward topology. What can we say
about the composite system? Is controllability / observability preserved
with feedback? We're going to look at two kinds of feedback: state feedback
and output feedback.</p>
<h2>Memoryless feedback</h2>
<p>No memory; static. Recall that the solution to LQR was just linear state
feedback. That's memoryless: takes the state and multiplies by a constant
gain matrix.</p>
<p>State feedback takes states and gives us inputs; output feedback takes
outputs and gives us inputs.</p>
<p>Suppose you have a dynamical system <mathjax>$D$</mathjax>. We can actually measure the state
and feed it back through this function <mathjax>$F_s$</mathjax> to construct a new input.</p>
<p>Negative state feedback -- subtract this from our input. Initial input is
the <strong>auxiliary input</strong>.</p>
<p>The output feedback case is typically more popular because we don't usually
have the state to measure. We can no longer measure the state, but we can
take the output.</p>
<p>With these feedback forms, we're going to make the <strong>well-posedness
assumption</strong>: for all auxiliary inputs <mathjax>$v$</mathjax> and all initial states and
times, when you look at the algebraic relationships of those forms, there
exists unique trajectories <mathjax>$x(t), y(t)$</mathjax> over that time interval.</p>
<p>If you don't have well-posedness, you can come up with problems: consider
<mathjax>$A=B=C=0, D=1, u = v+y$</mathjax>. The only <mathjax>$v$</mathjax> that works is 0.</p>
<p>If we have that the systems are well-posed, we can make some nice
observations about the observability of the feedback forms.</p>
<p>The first theorem states the following: suppose you have a well-posed
system. <mathjax>$D$</mathjax> is controllable on the interval iff <mathjax>$D_s$</mathjax> is controllable on
the same interval, and iff <mathjax>$D_o$</mathjax> is controllable on the same interval.</p>
<p>Namely, controllability is preserved under static state feedback.</p>
<p>Formally rests on the fact that <mathjax>$F_s, F_o$</mathjax> are known maps. We have all
information to be able to compute one input from the other uniquely (by
well-posedness assumption).</p>
<h2>Observability</h2>
<p>Still considering memoryless output feedback, but now we're also going to
consider input feed-forward.</p>
<p>We have another system: D is observable on <mathjax>$\bracks{t_0, t_1}$</mathjax> iff <mathjax>$D_o$</mathjax> is
observable and iff <mathjax>$D_f$</mathjax> is observable.</p>
<p><mathjax>$D$</mathjax> is observable on interval given all valid inputs and outputs over
the time interval if there exists a unique initial condition.</p>
<p>We've talked about preservation of controllability over state and output
feedback; we've talked about preservation of observability of output
feedback and feedforward. But the case we've missed is state feedback. Is
observability preserved by state feedback? The answer, in general, is
no. We can explain this in a couple of ways.</p>
<p>Observability is not necessarily preserved over state feedback -- consider
nullspace of <mathjax>$F_f$</mathjax>.</p>
<p>Dynamic feedback. More assumptions that have to be made. Up to now, we've
been assuming that all of these feedback / feedforward functions are
memoryless. What if they actually have dynamics associated to them? A lot
of control design involves using dynamic controllers.</p>
<p>In general these properties we've talked about (preservation of
controllability / observability) hold for the memoryless case. If we now
consider that the feedback and feedforward systems themselves are dynamical
systems (and have memory), we have to be careful in what we mean by
controllability and observability of the resulting system.</p>
<p>The input space of a state feedback function is the space of state
trajectories of <mathjax>$D$</mathjax>, and the output space is the input space of <mathjax>$D$</mathjax>.</p>
<p>In order to talk about preservation of controllability and observability,
we must now about the controllability and observability of the feedback
system (i.e. the "loop").</p>
<p>We're going to actually approach that later. For now, skip over
preservation of controllability and observability under dynamic (state and
output) feedback until later. We'll discuss different topologies under
which these are preserved. We'll return to it under the pretext of
controllability and observability of linear systems.</p>
<h2>Grammians</h2>
<p>Given a system representation, how do we construct tests for
controllability and observability? We have a test for the linear
time-varying case, which is in general hard to compute, and tests for the
linear time-invariant case, which are easier. In both cases, they're both
matrix-rank conditions.</p>
<h2>Controllability Grammians</h2>
<p>Let us look at our linear system representation and focus on the
time-varying case. When we talk about controllability of a linear system,
we care about controllability of <mathjax>$(A, B)$</mathjax> -- we know that controllability
on <mathjax>$\bracks{t_0, t_1}$</mathjax> is the same as surjectivity of the state transition
function, which is defined only in terms of <mathjax>$A, B$</mathjax>: recall that <mathjax>$s(t_1,
t_0, x_0, u_{[t_0,t_1]}(\cdot)) = \Phi(t_1,t_0)x_0 + \int_{t_0}^{t_1}
\Phi(t, \tau)B(\tau) u(\tau) d\tau$</mathjax>. Let us refer to the integral as
<mathjax>$\mathcal{L}_c(u)$</mathjax>, which maps input functions to states. Now we can
characterize controllability in the following manner: we say that <mathjax>$(A,B)$</mathjax>
(or generally <mathjax>$R$</mathjax>) is completely controllable on <mathjax>$[t_0, t_1]$</mathjax> iff
<mathjax>$R(\mathcal{L}_c) = \Re^n$</mathjax>, which is equivalent to saying <mathjax>$\forall x_0 \in
\Re^n \exists u_{[t_0,t_1]}$</mathjax> that steers <mathjax>$(x_0, t_0)$</mathjax> to <mathjax>$(\theta_m, t_1)$</mathjax>
(steering to origin), which is further equivalent to saying that you can
steer <mathjax>$\theta_m$</mathjax> to any state (reaching from origin).</p>
<p>What is initially unintuitive is that saying either of the two later
statements is equivalent to saying the first statement. Notion of equating
this to any vector in <mathjax>$\Re^n$</mathjax>. Because the first piece of the state
transition function does not depend on <mathjax>$u$</mathjax>, it all boils down to
surjectivity to <mathjax>$\mathcal{L}_c(u)$</mathjax> (since this is the only part that
depends on <mathjax>$u$</mathjax>). Thus surjectivity of <mathjax>$s(u)$</mathjax> is equivalent to surjectivity
of <mathjax>$\mathcal{L}_c(u)$</mathjax>.</p>
<p><a name='23'></a></p>
<h1>Controllability and Observability: Grammians and tools</h1>
<h2>November 13, 2012</h2>
<h2>Controllability of <mathjax>$[A, B]$</mathjax></h2>
<p><mathjax>$s(t_1, t_0, x_0, u_{[t_0, t_1]}) = \Phi(t_1, t_0) x_0 + \int_{t_0}^{t_1}
\Phi(t_1,\tau)B(\tau)u(\tau)d\tau$</mathjax></p>
<p>We refer to the term with the integral as <mathjax>$\mathcal(L)_c(u)$</mathjax> (<strong>L</strong>inear
<strong>c</strong>ontroller map). Recall that we showed that controllability of <mathjax>$[A,B]$</mathjax>
on <mathjax>$[t_0, t_1]$</mathjax> is equivalent to surjectivity of <mathjax>$\mathcal{L}_c$</mathjax>.</p>
<p>Grammians: matrix tests for controllability and observability for
time-varying systems, and end up being something quite nice in the LTI
case.</p>
<p>Recall: a system is said to be controllable if for any pair of states,
there exists an input that takes you from the initial state to the terminal
state over this time interval. We further discussed that this was
equivalent to saying that we could reach any final state from 0, which was
further equivalent to saying that we could reach 0 from any initial
state. (reachability from the origin, steering to the origin)</p>
<p>Let us pair this, now, with</p>
<h2>Observability</h2>
<p>You have your model, final state, and some defined time interval, and from
that information, you can uniquely determine the initial state of the
system. For now we'll assume we know the input, but that turns out to be
extraneous information.</p>
<p>Response map is the readout map composed with state transition function: in
particular, it has memory.</p>
<p>Given the output and the input, observability relates to uniquely
determining the initial state. Just as we isolated the term that depended
on input in the controllability case, in observability, we isolate the
dependence on the initial state.</p>
<p>We've turned injectivity of <mathjax>$\rho$</mathjax> to injectivity of the linear operator
<mathjax>$\fn{\mathcal{L}_o}{\Sigma}{Y} \defequals C(t_1)\Phi(t_1, t_0)x_0$</mathjax>
(<strong>L</strong>inear <strong>o</strong>bservability map).</p>
<p>Theorem: complete observability of the system <mathjax>$A, C$</mathjax> on the interval
<mathjax>$[t_0, t_1]$</mathjax> is equivalent to saying that <mathjax>$\mathcal{N}(\mathcal{L}_o) =
{\theta_n}$</mathjax> (injectivity of the observability map; saying that nullspace is
trivial)</p>
<p>Controllability and Observability grammians; named after the same Gram of
Gram-Schmidt. Recall work in adjoints: we'll turn these tests into matrix
tests through our knowledge of adjoint maps.</p>
<p>Recall: adjoint of linear maps. Take two inner product spaces and a map
that goes from one space to the other. The map's adjoint is defined in
terms of the inner product.</p>
<p>We can show that the adjoint of the linear controller map is simply
<mathjax>$B^*(\cdot)\Phi^(t_1, \cdot)*$</mathjax>: just consider the inner product; bring
<mathjax>$v^T$</mathjax> under the integral.</p>
<p>Given <mathjax>$\fn{\mathcal{A}}{\mathcal{U}{V}}$</mathjax>, we have the following
propositions:</p>
<h1><mathjax>$V = R(A) \perp\over\oplus N(A^*)$</mathjax></h1>
<h1><mathjax>$U = R(A^*) \perp\over\oplus N(A)$</mathjax></h1>
<p>What we'd want to show is one pair; you'd prove this by showing that the
nullspace of <mathjax>$A^*$</mathjax> is perpendicular to the range of <mathjax>$A$</mathjax>.</p>
<p>You'll often see this as bubble diagrams: the space <mathjax>$U$</mathjax> is often drawn as
two bubbles that intersect only at the zero vector.</p>
<p>More important propositions:</p>
<h1><mathjax>$N(AA^*) = N(A^*)$</mathjax></h1>
<h1><mathjax>$R(AA^*) = R(A)$</mathjax></h1>
<h1><mathjax>$N(A^*A) = N(A)$</mathjax></h1>
<h1><mathjax>$R(A^*A) = R(A^*)$</mathjax></h1>
<p>These are very important. Quick sketch of proof of (1): show that the norm
induced by the adjoint map of anything in <mathjax>$N(AA^*)$</mathjax> is the zero vector;
this is adequate.</p>
<p>Sketch of proof of (2): use decomposition from above propositions (<mathjax>$u_1 \in
R(A^*), u_2 \in N(A)$</mathjax>; apply <mathjax>$A$</mathjax> to this.</p>
<p>We've deviated a little. Started with tests for controllability and
observability; did adjoints, showed decomposition of domain and codomain
spaces when mapping through <mathjax>$A$</mathjax> and its adjoint, we showed how to break
down these spaces.</p>
<p>We can now relate these to our observability and controllability tests.</p>
<h2>Controllability</h2>
<p><mathjax>$\fn{\mathcal{L}_c}{\mathcal{U}_{[t_0, t_1]}}{\Re^n} = \int_{t_0}^{t_1}
\Phi(t_1, \tau)B(\tau)u(\tau)$</mathjax>. Recall that we just computed that
<mathjax>$\mathcal{L}_c^*(x) = B^*(\cdot)\Phi(t_1, \cdot)x$</mathjax>. Complete
controllability on this interval, using a proposition from above, is
equivalent to the following: <mathjax>$R(\mathcal{L}_v) = \Re^n =
R(\mathcal{L}_c\mathcal{L}_c^*)$</mathjax>. This is often easier to
compute. Composition of <mathjax>$\mathcal{L}_c$</mathjax> with <mathjax>$\mathcal{L}_c^*$</mathjax> yields
<mathjax>$\int_{t_0}^{t_1}\Phi(t_1, \tau) B(\tau)B^*(\tau)\Phi^*(t_1, \tau) d\tau$</mathjax>,
which is an <mathjax>$n\times n$</mathjax> matrix. The system is completely controllable on
this interval iff <mathjax>$W_c$</mathjax> (the <strong>controllability grammian</strong>) has full rank.</p>
<p>If you look at the integrand, this matrix is a (symmetric) positive
semidefinite matrix (just write it in terms of the norm). If completely
controllable, it is positive definite.</p>
<p>exercise. hint: look at quadratic form, e.g. <mathjax>$x^* W_c x$</mathjax>, and show, using
that quadratic form for any <mathjax>$x$</mathjax>, that it's positive semi-definite. Our
controllability test is that <mathjax>$\text{rank } W_c = n$</mathjax>.</p>
<h2>Observability</h2>
<p>Recall this involved looking at the nullspace of the map. <mathjax>$\mathcal{L}_o^*
\mathcal{L}_o y(\cdot) = \int_{t_0}^{t_1} \Phi^*(\tau, t_0) C^*(\tau)
C(\tau) \Phi(\tau, t_0) y(\tau) d\tau$</mathjax>. Complete observability claims that
this composition of maps has full rank. This is called <mathjax>$W_o$</mathjax>, the
observability grammian. Turned some fairly abstract tests into something
that's more concrete; matrix that involves computing state transition
matrix. Main theme is to get at simple tests for the time-invariant
case. Todo: show how these grammians simplify to very simple grammians for
the time-invariant case.</p>
<p>Last five minutes: talk about the controller and observability maps, what
they mean in terms of grammians.</p>
<h2>Kalman filters; some thoughts about observability</h2>
<p>Kalman filter: the way we tend to reconstruct state from measurements given
our understanding of the system.</p>
<p>We started today by talking about the nullspace of the observability map
being trivial. So how do we solve for <mathjax>$x_0$</mathjax>? Use the left-inverse, if this
is completely observable. Let's take that equation and pre-multiply both
sides by <mathjax>$\mathcal{L}_o^*$</mathjax>. We know that this product (<mathjax>$\mathcal{L}_o^*
\mathcal{L}_o$</mathjax>) is bijective, so we can invert it. Not actually done in
Kalman filters; keep a running estimate.</p>
<p><a name='24'></a></p>
<h1>Controllability and Observability of LTI systems</h1>
<h2>November 15, 2012</h2>
<p>What does it mean in terms of controlling the state and observing the
system? Controllable and observable subspaces.</p>
<p>Recall: the injectivity of <mathjax>$\mathcal{L}_o$</mathjax> corresponds to observability,
which relates to the bijectivity of what we called the observability
grammian, <mathjax>$\mathcal{L}_o^*\mathcal{L}_o$</mathjax>, in our time interval. Suppose
that this is injective. How do we solve for the initial state? What we came
up with last day as finding the inverse of the grammian, i.e. <mathjax>$x_0 =
W_o^{-1} \int_{t_0}^{t_1}\Phi^*(\tau, t_0)C^*y(\tau) d\tau$</mathjax>.</p>
<p>Suppose <mathjax>$y$</mathjax> is not in the range of this grammian, but our grammian is
injective. What is the "best" estimate of <mathjax>$x_0$</mathjax>? Claim: our computation
from before still gives us the best estimate. What do we mean by best?
We'll explore this.</p>
<p>We've got this map <mathjax>$\mathcal{L}_o$</mathjax>. We can write <mathjax>$y$</mathjax> to be the sum of two
components, one being in the range of <mathjax>$\mathcal{L}_o$</mathjax>, and the other being
in the nullspace of <mathjax>$\mathcal{L}_o^*$</mathjax> (recall that we showed that these two
spaces are orthogonal). Therefore we minimize the distance between the
actual initial state and our estimate.</p>
<p>Our third case is that <mathjax>$\mathcal{L}_o$</mathjax> is not injective, but it is
surjective. <mathjax>$\mathcal{L}_o\mathcal{L}_o^*$</mathjax> is bijective, therefore. Even
though this is a map between infinite-dimensional spaces, we can define an
inverse because it is bijective.</p>
<p>Our fourth case is that <mathjax>$\mathcal{L}_o$</mathjax> is neither injective nor
surjective. You can compute how you would find the least-squares estimate
or least norm.</p>
<p>This formulation leads us to the idea of gathering all of the information
about the input and output, and then compute exactly or estimate the
state. Typically, you update your best estimate with some online algorithm.</p>
<p>(derivation of recursive formula, which yields Kalman filter; information
form of the Kalman filter.)</p>
<h2>LTI case</h2>
<p>Our representation is simply <mathjax>$[A,B,C,D]$</mathjax>. Let's start with
controllability. </p>
<p><a name='25'></a></p>
<h1>Controllability and Observability of LTI systems</h1>
<h2>November 20, 2012</h2>
<p><mathjax>$\{A,B,C,D\}$</mathjax> is completely observable (note that again, time interval
irrelevant for LTI because integrand of grammian does not depend on <mathjax>$t$</mathjax>)
<mathjax>$\iff$</mathjax> rank of <mathjax>$\begin{bmatrix} C \\ CA \\ \vdots \\ CA^{n-1}\end{bmatrix}
= n$</mathjax> <mathjax>$\iff$</mathjax> rank <mathjax>$\begin{bmatrix} sI - A \\ C\begin{bmatrix} = n \forall s
\in \sigma(A)$</mathjax>.</p>
<p>Same proofs as controllability theorem: instead of considering <mathjax>$\{A, B\}$</mathjax>,
consider matrix pair <mathjax>$\{A^T, C^T\}$</mathjax>.</p>
<p>We call <mathjax>$Q = \begin{bmatrix}B &amp; AB &amp; \hdots &amp; A^{n-1} B \end{bmatrix}$</mathjax> the
controllability matrix; might also use <mathjax>$\mathcal{C}$</mathjax>. Similarly, we call
<mathjax>$\mathcal{O} = \begin{bmatrix} C \\ CA \\ \vdots \\ CA^{n-1}\end{bmatrix}$</mathjax>
the observability matrix.</p>
<p>We can easily prove the following, but we'll state it as a theorem:</p>
<p>The range of the controllability grammian (does not matter which time
interval, so again we'll consider <mathjax>$[0, \Delta]$</mathjax>) is equal to the range of
the controllability map, wihch is further equal to the range of the
controllability matrix. We haven't shown this directly; however, it is a
direct extension of the proof of the theorems in the controllability matrix
or just a direct extension of the Cayley-Hamilton theorem.</p>
<p>Similarly, the nullspace of the observability grammian is equal to the
nullspace of the observability map (already shown), but that's equal to the
nullspace of the observability matrix. That's in general some subset of
<mathjax>$\Re^n$</mathjax>. If the system is completely observable, the nullspace of these
maps is trivial.</p>
<p>The range of the controllability matrix is the set of all states reachable
from <mathjax>$\theta$</mathjax>; The nullspace of the observability matrix is the set of
unobservable states.</p>
<p>A few definitions:</p>
<ul>
<li>
<p><mathjax>$R$</mathjax> is stabilizable if all of its uncontrollable states are stable.</p>
</li>
<li>
<p><mathjax>$R$</mathjax> is detectable if all of its unobservable states are stable.</p>
</li>
<li>
<p><mathjax>$R$</mathjax> is stabilizable <mathjax>$\iff \text{rk}\begin{bmatrix} sI - A &amp; B \end{bmatrix} =
n \forall s \in \sigma(A) \cap \mathbb{C}_+$</mathjax>.</p>
</li>
<li>
<p><mathjax>$R$</mathjax> is detectable <mathjax>$\iff \text{rk}\begin{bmatrix} sI - A \\ C
\end{bmatrix} = n \forall s \in \sigma(A) \cap \mathbb{C}_+$</mathjax>.</p>
</li>
</ul>
<h2>Intro to control design via pole placement</h2>
<p>Using state feedback, when can we place the eigenvalues of the closed loop
system anywhere in <mathjax>$\mathbb{C}$</mathjax>? System must be controllable.</p>
<p><a name='26'></a></p>
<h1>Control design using eigenvalue placement</h1>
<h2>November 27, 2012</h2>
<p>Observer design. Final exam Tues Dec. 11, 9-12, 293 Cory.</p>
<p>Theorem: Let <mathjax>$(A, b)$</mathjax> be completely controllable. Let <mathjax>$\hat{\pi}(s)$</mathjax> be any
monic polynomial of degree <mathjax>$n$</mathjax> with real coefficients. Then <mathjax>$\exists! f^T
\in \Re^n$</mathjax> such that <mathjax>$\hat{X}_{A + bf}(s) = \hat{\pi}(s)$</mathjax>, where <mathjax>$f^T =
-\begin{bmatrix} 0 &amp; 0 &amp; ... &amp; 1\end{bmatrix}\begin{bmatrix}b &amp; Ab &amp; ... &amp;
A^{n-1} b\end{bmatrix} \hat{\pi}(A)$</mathjax>.</p>
<p>Proposition: Complete controllability is equivalent to saying <mathjax>$\exists T
\in \Re^{n\times n}$</mathjax> s.t. <mathjax>$\tilde{A} = T^{-1}AT = \begin{bmatrix}0 &amp; 1 &amp; 0
&amp; ... &amp; 0\\ 0 &amp; 0 &amp; 1 &amp; ... &amp; 0\\ vdots &amp; \ddots &amp; &amp; \\ -\alpha_n &amp; ... &amp;
... &amp; -\alpha_1\end{bmatrix}$</mathjax>.</p>
<p>This result is basically changing the basis; we define a new state <mathjax>$z =
T^{-1}$</mathjax>. That's why we're interested in this particular similarity
transform.</p>
<p>Recall what it means to be completely controllable. For LTI systems, we
have our tests for controllability, which are easy to check.</p>
<p>Controllable canonical form. Canonical means that once you get a general
matrix and you can transform it, the coefficients are all the same. In
fact, as we saw last time, these are the coefficients of the characteristic
polynomial.</p>
<p><a name='27'></a></p>
<h1>Multiple-input, multiple output; Observer design</h1>
<h2>November 29, 2012</h2>
<p>The closed-loop matrix is the same; we always have the same structure:
we're feeding back an input equal to <mathjax>$Fx + z$</mathjax>.</p>
<p>Algorithm in LN 21; how to simplify problem by considering it as a
single-input system. Basically asks the problem "does there exist a
feedback gain matrix <mathjax>$F$</mathjax> such that <mathjax>$(A + bF, b), b \in R(B)$</mathjax>?</p>
<p>State estimation: whether or not <mathjax>$\hat{z}$</mathjax> is a good estimate of the
state. Measure error, which tells you how good your estimate is. Luenberger
model.</p>
<p>Theorem: if <mathjax>$A, C$</mathjax> is completely observable, <mathjax>$\exists T \in \Re^{n \times
n_o}$</mathjax> such that <mathjax>$\sigma(A - TC)$</mathjax> can be placed arbitrarily in <mathjax>$\mathbb{C}$</mathjax>.</p>
<p>We can think about this problem by relating it back to the complete
controllability and placement of spectrum of <mathjax>$A - BF$</mathjax>.</p>
<p>If we look at the transpose of <mathjax>$A - TC$</mathjax> (eigenvalues are preserved under
transposition), then we're back to the complete controllability case.</p>
<p>Don't typically move observer eigenvalues much farther than
needed. Stipulates that gain matrix is large; you'd have large gain in
feedback loop, which is typically undesirable.</p>
<p>Want observer to converge faster than estimate.</p></div><div class='pos'></div>
<script src='mathjax/unpacked/MathJax.js?config=default'></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () {
var TEX = MathJax.InputJax.TeX;
var PREFILTER = TEX.prefilterMath;
TEX.Augment({
prefilterMath: function (math,displaymode,script) {
math = "\\displaystyle{"+math+"}";
return PREFILTER.call(TEX,math,displaymode,script);
}
});
});
var a = document.getElementsByTagName('a'),
ll = a.length;
if (ll > 0) {
var div = document.getElementsByClassName('pos')[0];
div.style.float = 'right';
div.style.position = 'fixed';
div.style.background = '#FFF';
div.style.fontSize = '90%';
div.style.top = '10%';
div.style.right = '5%';
div.style.width = '15%';
for (var i = 0; i < ll; i++) {
div.innerHTML += '<a href="\#' + a[i].name + '">'
+ a[i].parentElement.nextElementSibling
.nextElementSibling.innerHTML
+ '</a><br />';
}
var div = document.getElementsByClassName('wrapper')[0];
div.style.width = '80%';
}
</script></div>