Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: 0cc157fa30
Fetching contributors…

Cannot retrieve contributors at this time

1471 lines (1469 sloc) 104.08 kb
<div><div class='wrapper'>
<p><a name='1'></a></p>
<h1>EE 221A: Linear System Theory</h1>
<h2>August 23, 2012</h2>
<h2>Administrivia</h2>
<p>Prof. Claire Tomlin (tomlin@eecs). 721 Sutardja Dai Hall. Somewhat
tentative office hours on schedule: T 1-2, W 11-12.
http://inst.eecs.berkeley.edu/~ee221a</p>
<p>GSI: Insoon Yang (iyang@eecs). In Soon's office hours: M 1:30 - 2:30, θ
11-12.</p>
<p>Homeworks typically due on Thursday or Friday.</p>
<h2>Intro</h2>
<p>Bird's eye view of modeling in engineering + design vs. in science.</p>
<p>"Science":</p>
<p><mathjax>$$
\mbox{Dynamical system}
\rightarrow \mbox{experiments}
\leftrightarrow \mbox{Model}
$$</mathjax></p>
<p>"Engineering":</p>
<p><mathjax>$$
\mbox{dynamical system}
\rightarrow \mbox{experiments}
\leftrightarrow \mbox{Model}
\rightarrow \mbox{control}
$$</mathjax></p>
<p>Control validation, verification, testing.</p>
<p>Broad brush of a couple of concepts: modeling. We're going to spend a lot
of time talking about modeling in this course.</p>
<p>First: for any dynamical system, there's an infinite number of models you
could design, depending on your level of abstraction. Typically, you choose
level of abstraction based on use case. Often, only able to use certain
kinds of experiments. e.g. probing of protein concentration levels. If
you're able to measure just this, then the signals in your model should
have something to do with these concentration levels.</p>
<p>As we said, same phys system can have many different models. Another
example: MEMS device. Can think about having models at various different
models. e.g. electrical model: silicon / electrostatics of the
system. Might be interested in manipulation of the device.</p>
<p>Alt: mechanical model (could have a free-body diagram, e.g.).</p>
<p>Another example: Hubble telescope. Could think of orbital
dynamics. Individual rigid body dynamics. Or properties of the telescope,
the individual optical models of the mirrors and their interactions. The
idea here is to just realize that the word model can mean very different
things. The logical model to use depends on the task at hand. The main
point: a basic tenet of engineering: the value of a model: choose the
simplest model that answers the questions you are asking about the
system. The simplest model that will allow you to predict something that
you didn't build into it.</p>
<p>Predict IO relations that you didn't explicitly design the model on. One of
the properties of a good linear model for a system: it obeys linearity, so
if you form a basis for your domain, then you have the system response to
any input spanned by this basis. Probably the most important thing to take
away from this course: linearity is a <em>very</em> strong principle that allows
us to build up a set of tools.</p>
<h2>Time</h2>
<p>We have this term a "dynamical system". A key part is that it changes with
time, responding with behavior over time. Time will turn out to be quite
important. Depending on how we model time, we can come up with different
variables. We call time (t) a privileged model because it has certain
properties. Namely, when we think about time, we think about time marching
forward (unidirectionality of evolution). Different models: continuous time
(<mathjax>$t \in \Re$</mathjax>, could be negative, could go backwards, if we are interested
in backwards evolution), or discrete time <mathjax>$t \in \{nT, n \in \mathbb{Z}\}$</mathjax>,
where <mathjax>$T$</mathjax> is some sampling time. So in that sense, discrete time, we have
some set. We can also come up with more complicated models of time, like
discrete-time asynchronous. The previous model was some constant period
<mathjax>$T$</mathjax>. In DT asynchronous, we just have a set of points in time. Now becoming
a more important model now with asynchronous processes (react to events
that are going to happen at previously undefined points in time).</p>
<h2>Linear vs. nonlinear models</h2>
<p>More on this later. Suppose we could take the system, and we could
represent it as being in one of a number of states.</p>
<p>First: suppose a finite number of states (so can be modeled by a FSM),
which represent some configuration of the system. State space represents
states system can be in at any point in time. If state space is finite, we
can use a finite-state automaton. Each state has an output (prints out a
message, or a measurement is taken), and we also consider inputs. The
inputs are used to evolve the dynamic system. Input affects a
transition. We can build up the dynamics of the system by just defining the
transition function.</p>
<p>Packet transmitting node: first state is "ready-to-send"; second state is
"send packet &amp; wait"; and the third state is "flush buffer". If buffer
empty, stay in <mathjax>$q_1$</mathjax>. If not empty, transitions to <mathjax>$q_2$</mathjax>. If ACK received,
then transition to <mathjax>$q_3$</mathjax> and return to <mathjax>$q_1$</mathjax>. If <mathjax>$T$</mathjax> time units elapse, we
time out and transition directly to <mathjax>$q_1$</mathjax>. Here, no notion of linear or
nonlinear systems. To be able to talk about linear or nonlinear models, we
need to be able to put some vector space structure on these three
elements. System must then satisfy superposition.</p>
<p>Back to abstract dynamical system (thing we could never hope to model
perfectly): rather than thinking about a set of rules, we're going to think
about a mathematical model. Three classes: CT, DT [synchronous], and
discrete-state (typically finite). Within each of these classes we can
further break each down. For the first two, we can consider linearity, and
we can further break these down into time-varying (TV) and time-invariant
(TI). This course is going to focus just on the linear systems in
continuous and discrete time, both time-varying and time-invariant. We'll
use differential equation models in continuous time and difference equation
models in discrete time. We usually develop in continuous-time and show
analogies in discrete-time.</p>
<h2>Analysis and Control</h2>
<p>Control is pervasive. If you go to any of the control conferences, you see
areas where techniques from this course are applied. Modern control came
about because of aerospace in the 50s. e.g. autopilot, air traffic
control. There the system itself is the system of aircraft. Chemical
process control. Mechatronics, MEMS, robotics. Novel ways to automate
things that hadn't been automated previously, mostly because of a
renaissance in sensing. Power systems. Network control systems: how you
combine models of the system itself with the control models. Quantum
chemistry. Typically, when we think about state spaces, we think about the
state as a vector in <mathjax>$\Re^n$</mathjax>. In many cases, you want to think about the
state spaces as more complicated (e.g. <mathjax>$C^\infty$</mathjax>, the class of smooth
functions).</p>
<h2>Difference between verification, simulation, and validation</h2>
<p>One of the additional basic tenets of this course: if you have a model of
the system, and you can analytically verify that the model behaves in given
ways for ranges of initial conditions, then that is a very valuable thing
to have: you have a proof that as long as the system adheres to the model,
then your model will work as expected. Simulation gives you system behavior
for a certain set of parameters. Very different, but they complement each
other. Analyze simpler models, simulate more complex models.</p>
<h2>Linear Algebra</h2>
<p>Functions and their properties.</p>
<p>Fields, vector spaces, properties and subspaces.</p>
<p>(note regarding notation: <mathjax>$\Re^+$</mathjax> means non-negative reals, as does
<mathjax>$\mathbb{C}_+$</mathjax> (non-negative real part)</p>
<p><mathjax>$\exists!$</mathjax>: exists a unique, <mathjax>$\exists?$</mathjax>: does there exist, <mathjax>$\ni$</mathjax>:
such that.</p>
<p>Cartesian product: <mathjax>$\{(x,y) \vert x \in X \land y \in Y\}$</mathjax> (set of ordered
n-tuples)</p>
<p><a name='2'></a></p>
<h1>Functions and Vector Spaces</h1>
<h2>August 28, 2012</h2>
<p>OH: M/W 5-6, 258 Cory</p>
<p>Today: beginning of the course: review of lin. alg topics needed for the
course. We're going to go through lecture notes 2 and probably start on
the third sets of notes. Will bring copies of 3 and 4 on Thursday.</p>
<p>We did an introduction to notation and topics last time. First topic:
functions, which will be used synonymously with "maps". Terminology will be
used interchangeably.</p>
<p>Given two sets of elements X, Y, we defined <mathjax>$\fn{f}{X}{Y}$</mathjax>. Notion of
range vs. codomain (range is merely the subset of the codomain covered by
f). We define <mathjax>$f(X) \defequals \set{f(x)}{x \in X}$</mathjax> to be the
range.</p>
<h2>Properties of functions</h2>
<p>Injectivity of functions ("one-to-one"). A function <mathjax>$f$</mathjax> is said to be
injective iff the function maps each x in X to a distinct y in
Y. Equivalently, <mathjax>$f(x_1) = f(x_2) \iff x_1 = x_2$</mathjax>. This is also equivalent
to <mathjax>$x_1 \neq x_2 \iff f(x_1) \neq f(x_2)$</mathjax>.</p>
<p>Surjectivity of functions ("onto"). A function <mathjax>$f$</mathjax> is said to be surjective
if the codomain is equal to the range. Basically, the map <mathjax>$f$</mathjax> covers the
entire codomain. A way to write this formally is that <mathjax>$f$</mathjax> is surjective iff
<mathjax>$\forall y \in Y \exists x \in X \ni y = f(x)$</mathjax>.</p>
<p>And then a map <mathjax>$f$</mathjax> is bijective iff it is both injective and surjective. We
can write this formally as there being a unique <mathjax>$x \in X$</mathjax> forall <mathjax>$y \in Y$</mathjax>.</p>
<p>Example: inverse of a map. We can talk about left and right inverses of
maps. Suppose we have a map <mathjax>$\fn{f}{X}{Y}$</mathjax>. We're going to define this
map <mathjax>$\mathbb{1}_X$</mathjax> as the identity map on X. Namely, application of this
map to any <mathjax>$x \in X$</mathjax> will yield the same <mathjax>$x$</mathjax>.</p>
<p>The left inverse of <mathjax>$f$</mathjax> is <mathjax>$\fn{g_L}{Y}{X}$</mathjax> such that <mathjax>$g_L \circ f =
\mathbb{1}_X$</mathjax>. In other words, <mathjax>$\forall x\in X, (g_L \circ f)(x) = x$</mathjax>.</p>
<p>Prove: <mathjax>$f$</mathjax> has a left inverse <mathjax>$g_L$</mathjax> iff <mathjax>$f$</mathjax> is injective. First of all, let
us prove the backwards implication. Assume <mathjax>$f$</mathjax> is injective. Prove that
<mathjax>$g_L$</mathjax> exists. We're going to construct the map <mathjax>$\fn{g_L}{Y}{X}$</mathjax> as
<mathjax>$g_L(f(x)) = x$</mathjax>, where the domain here is the range of <mathjax>$f$</mathjax>. In order for
this to be a well-defined function, we require that <mathjax>$x$</mathjax> is unique, which is
met by injectivity of <mathjax>$f$</mathjax>.</p>
<p>Now let us prove the forward implication. Assume that this left inverse
<mathjax>$g_L$</mathjax> exists. By definition, <mathjax>$g_L \circ f = \mathbb{1}_x \iff \forall x
\in X g_L(f(x)) = x$</mathjax>. If <mathjax>$f$</mathjax> were not injective, then <mathjax>$g_L$</mathjax> would not be
well-defined (<mathjax>$\exists x_1 \neq x_2$</mathjax> such that <mathjax>$f(x_1) = f(x_2)$</mathjax>, and so
<mathjax>$g_L$</mathjax> is no longer a function).</p>
<p>review: contrapositive: <mathjax>$(A \implies B) \iff (\lnot B \implies \lnot A)$</mathjax>;
contradiction: <mathjax>$(A \not\implies B) \implies \text{contradiction}$</mathjax>. </p>
<p>We can similarly shows surjectivity <mathjax>$\iff$</mathjax> existence of a right
inverse. With these two, we can then trivially show that bijectivity <mathjax>$\iff$</mathjax>
existence of an inverse (rather, both a left and right inverse, which we
can easily show must be equal). Proof will likely be part of the first
homework assignment.</p>
<h2>Fields</h2>
<p>We need the definition of a vector and a field in order to define a vector
space.</p>
<p>A field is an object: a set of elements <mathjax>$S$</mathjax> with two closed binary
operations defined upon <mathjax>$S$</mathjax>. These two operations are addition (which
forms an abelian group over <mathjax>$S$</mathjax>) and multiplication (which forms an abelian
group over <mathjax>$S - \{0\}$</mathjax>) such that multiplication distributes over
addition. Note that convention dictates <mathjax>$0$</mathjax> to be the additive identity and
<mathjax>$1$</mathjax> to be the multiplicative identity.</p>
<p>Other silly proofs include showing that if both a left and right identity
exist, they must be equivalent, or that multiplication by <mathjax>$0$</mathjax> maps any
element to <mathjax>$0$</mathjax>.</p>
<h2>Vector spaces (linear spaces)</h2>
<p>A vector space is a set of vectors V and a field of scalars <mathjax>$\mathbb{F}$</mathjax>,
combined with vector addition and scalar multiplication. Vector addition
forms an abelian group, but this time, scalar multiplication has the
properties of a monoid (existence of an identity and associativity). We
then have the distributive laws <mathjax>$\alpha + \beta)x = \alpha x + \beta x$</mathjax> and
<mathjax>$\alpha (x + y)$</mathjax>.</p>
<h2>Function spaces</h2>
<p>We define a space <mathjax>$F(D,V)$</mathjax>, where <mathjax>$(V, \mathbb{F})$</mathjax> is a vector space and
<mathjax>$D$</mathjax> is a set. <mathjax>$F$</mathjax> is the set of all functions <mathjax>$F(D, V) = \fn{f}{D}{V}$</mathjax>. Is
<mathjax>$(F, \mathbb{F})$</mathjax> a vector space (yes) where vector addition is pointwise
addition of functions and scalar multiplication is pointwise multiplication
by a scalar?</p>
<p>Examples of this: space of continuous functions on the closed interval
<mathjax>$\fn{\mathcal{C}}{\bracks{t_0, t_1}}{\Re^n}$</mathjax>, (<mathjax>$(C(\bracks{t_0, t_1},
\Re^n), \Re)$</mathjax>). This is indeed a vector space.</p>
<h2>Lebesgue spaces</h2>
<p><mathjax>$L_p t_0, t_1) = \set{\fn{f}{[t_0, t_1]}{\Re}}{\int_{t_0}^{t_1}
\abs{f(t)}^p dt &lt; \infty}$</mathjax>.</p>
<p>We can then talk about <mathjax>$\ell_p$</mathjax>, which are spaces of sequences. <mathjax>$\ell_2$</mathjax> is
the space of square-summable sequences of real numbers. Informally, <mathjax>$\ell_2
= \set{ v = \{v_1, v_2, ... v_k\}}{v_k \in \Re \sum_k \abs{v_k}^2 &lt;
\infty}$</mathjax>.</p>
<p>In general, when looking at vector spaces, often we use <mathjax>$\mathbb{F} = \Re$</mathjax>,
and we refer to the space as simply <mathjax>$V$</mathjax>.</p>
<p>Next: subspaces, bases, linear dependence/independence, linearity. One of
the main things we're going to do is look at properties of linear functions
and representation as multiplication by matrices.</p>
<p><a name='3'></a></p>
<h1>Vector Spaces and Linearity</h1>
<h2>August 30, 2012</h2>
<h2>From last time</h2>
<p>Subspaces, bases, linear dependence/independence, linearity. One of the
main things we're going to do is look at properties of linear functions and
representation as multiplication by matrices.</p>
<h2>Example (of a vector space)</h2>
<p><mathjax>$\ell_2 = \{v = \{v_1, v_2, ...\} \st \sum_{i=1}^\infty \abs{v_i}^2 &lt;
\infty, v_i \in \Re \}$</mathjax></p>
<p>Vector addition and scalar multiplication? ("pointwise" addition,
multiplication by reals)</p>
<h2>What is a vector subspace?</h2>
<p>Consider vector space <mathjax>$(V, \mathbb{F})$</mathjax>. Consider a subset W of V combined
with the same field. <mathjax>$(W, \mathbb{F})$</mathjax> is a subspace of <mathjax>$(V, \mathbb{F})$</mathjax>
if it is closed under vector addition and scalar multiplication (formally,
this must be a vector space in its own right, but these are the only
vector space properties that we need to check).</p>
<p>Consider vectors from <mathjax>$\Re^n$</mathjax>. A plane (in <mathjax>$\Re^3$</mathjax>) is a subspace of
<mathjax>$\Re^3$</mathjax> if it contains the origin.</p>
<p>Aside: for <mathjax>$x \in V$</mathjax>, <strong>span</strong><mathjax>$(x) = \alpha x, \alpha \in \mathbb{F}$</mathjax>.</p>
<h2>Linear dependence, linear independence.</h2>
<p>Consider a set of <mathjax>$p$</mathjax> vectors <mathjax>$\{v_1, v_2, ..., v_p\}, v_i \in V$</mathjax>. This set
of vectors is said to be a <strong>linear independent set</strong> iff no nontrivial
homogeneous equation exists, i.e. <mathjax>$\sum_i \alpha_i v_i = 0 \implies \forall
i, \alpha_i = 0$</mathjax>. This is equivalent to saying that no one vector can be
written as a linear combination of the others.</p>
<p>Otherwise, the set is said to be <strong>linearly dependent</strong>.</p>
<h2>Bases</h2>
<p>Recall: a set of vectors <mathjax>$W$</mathjax> is said to span a space <mathjax>$(V, \mathbb{F})$</mathjax> if
any vector in the space can be written as a <strong>linear combination</strong> of
vectors in the set, i.e. <mathjax>$\forall v \in V, \exists \set{(\alpha_i,
w_i)}{v = \sum \alpha_i w_i}$</mathjax> for <mathjax>$w_i \in W, \alpha_i \in \mathbb{F}$</mathjax>.</p>
<p>W is a <strong>basis</strong> iff it is also linearly independent.</p>
<h2>Coordinates</h2>
<p>Given a basis <mathjax>$B$</mathjax> of a space <mathjax>$(V, \mathbb{F})$</mathjax>, there is a unique
representation (trivial proof) of every <mathjax>$v \in V$</mathjax> as a linear combination
of elements of <mathjax>$B$</mathjax>. We define our <strong>coordinates</strong> to be the coefficients
that appear in this unique representation. A visual representation is the
<strong>coordinate vector</strong>, which defines</p>
<p><mathjax>$$\alpha = \begin{bmatrix}\alpha_i \\ \vdots \\ \alpha_n \end{bmatrix}$$</mathjax></p>
<p>Basis is not uniquely defined, but what is constant is the number of
elements in the basis. This number is the <strong>dimension</strong> (rank) of the
space. Another notion is that a basis generates the corresponding space,
since once you have a basis, you can acquire any element in the space.</p>
<h2>Linearity</h2>
<p>A function <mathjax>$\fn{f}{(V, \mathbb{F})}{(W, \mathbb{F})}$</mathjax> (note that these
spaces are defined over the same field!) is <strong>linear</strong> iff <mathjax>$f(\alpha_1 v_1 +
\alpha_2 v_2) = \alpha_1 f(v_1) + \alpha_2 f(v_2)$</mathjax>.</p>
<p>This property is known as <strong>superposition</strong>, which is an amazing property,
because if you know what this function does to the basis elements of a
vector space, then you know what it does to <em>any</em> element in the space.</p>
<p>An interesting corollary is that a linear map will <em>always</em> map the zero
vector to itself.</p>
<h2>Definitions associated with linear maps</h2>
<p>Suppose we have a linear map <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax>. The <strong>range
(image)</strong> of <mathjax>$\mathcal{A}$</mathjax> is defined to be <mathjax>$R(\mathcal{A}) = \set{v}{v =
A(u), u \in U} \subset V$</mathjax>. The <strong>nullspace (kernel)</strong> of <mathjax>$\mathcal{A}$</mathjax> is
defined to be <mathjax>$N(\mathcal{A}) = \set{u}{\mathcal{A}(u) = 0} \subset U$</mathjax>. Also
trivial (from definition of linearity) to prove that these are subspaces.</p>
<p>We have a couple of very important properties now that we've defined range
and nullspace.</p>
<h2>Properties of linear maps <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax></h2>
<p><mathjax>$$(b \in V) \implies (\mathcal{A}(u) = b \iff b \in R(\mathcal{A}))$$</mathjax></p>
<p><mathjax>$$(b \in R(\mathcal{A})) \iff (\exists!\ u\ \st \mathcal{A}(u) = b \iff
[N(\mathcal{A}) = 0])$$</mathjax></p>
<p>(if the nullspace only contains the zero vector, we say it is <strong>trivial</strong>)</p>
<p><mathjax>$$\mathcal{A}(x_0) = \mathcal{A}(x_1) \iff x - x_0 \in N(\mathcal{A})$$</mathjax></p>
<p><a name='4'></a></p>
<h1>Matrix Representation of Linear Maps</h1>
<h2>September 4, 2012</h2>
<h2>Today</h2>
<p>Matrix multiplication as a representation of a linear map; change of basis
-- what happens to matrices; norms; inner products. We may get to adjoints
today.</p>
<p>Last time, we talked about the concept of the range and the nullspace of a
linear map, and we ended with a relationship that related properties of the
nullspace to properties of the linear equation <mathjax>$\mathcal{A}(x) = b$</mathjax>. As
we've written here, this is not <em>matrix</em> multiplication. As we'll see
today, it can be represented as matrix multiplication, in which case, we'll
write this as <mathjax>$Ax = b$</mathjax>.</p>
<p>There's one more important result, called the rank-nullity theorem. We
defined the range and nullspace of a linear operator. We also showed that
these are subspaces (range of codomain; nullspace of domain). We call
<mathjax>$\text{dim}(R(\mathcal{A})) = \text{rank}(\mathcal{A})$</mathjax> and
<mathjax>$\text{dim}(N(\mathcal{A})) = \text{nullity}(\mathcal{A})$</mathjax>. Taking the
dimension of the domain as <mathjax>$n$</mathjax> and the dimension of the codomain as <mathjax>$m$</mathjax>,
<mathjax>$\text{rank}(\mathcal{A}) + \text{nullity}(\mathcal{A}) = n$</mathjax>. Left as an
exercise. Hints: choose a basis for the nullspace. Presumably you'd extend
it to a basis for the domain (without loss of generality, because any set
of <mathjax>$n$</mathjax> linearly independent vectors will form a basis). Then consider how
these relate to the range of <mathjax>$\mathcal{A}$</mathjax>. Then map <mathjax>$\mathcal{A}$</mathjax> over
this basis.</p>
<h2>Matrix representation</h2>
<p><strong>Any linear map between finite-dimensional vector spaces can be
represented as matrix multiplication.</strong> We're going to show that it's true
via construction.</p>
<p><mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax>. We're going to choose bases for the domain and
codomain. <mathjax>$\forall x \in U, x = \sum_{j=1}^n \xi_k u_j$</mathjax>. Now consider
<mathjax>$\mathcal{A}(x) = \mathcal{A}(\sum_{j=1}^n \xi_k u_j) = \sum_{j=1}^n \xi_k
\mathcal{A}(u_j)$</mathjax> (through linearity). Each <mathjax>$\mathcal{A}(u_j) =
\sum_{i=1}^n a_{ij} v_i$</mathjax>. Uniqueness of <mathjax>$a_{ij}$</mathjax> and <mathjax>$\xi_j$</mathjax> follows from
writing the vector spaces in terms of a basis.</p>
<p><mathjax>$$
\mathcal{A}(x) = \sum_{j=1}^n \xi_j \sum_{i=1}^m a_{ij} v_i
\\ = \sum_{i=1}^m \left(\sum_{j=1}^n a_{ij} \xi_j\right) v_i
\\ = \sum_{i=1}^m \eta_i v_i
$$</mathjax></p>
<p>Uniqueness of representation tells me that <mathjax>$\eta_i \equiv \sum_{j=1}^n
a_{ij} \xi_j$</mathjax>. We've got <mathjax>$i = \{1 .. m\}$</mathjax> and <mathjax>$j = \{1 .. n\}$</mathjax>. We can turn
this representation into a matrix by defining <mathjax>$\eta = A\xi$</mathjax>. <mathjax>$A \in
\mathbb{F}^{m \times n}$</mathjax> is defined such that its <mathjax>$j^{\text{th}}$</mathjax> column is
<mathjax>$\mathcal{A}(u_j)$</mathjax> written with respect to the <mathjax>$v_i$</mathjax>s.</p>
<p>All we used here was the definitions of basis, coordinate vectors, and
linearity.</p>
<p>Let's do a couple of examples. Foreshadowing of work later in
controllability of systems. Consider a linear map <mathjax>$\fn{\mathcal{A}}
{(\Re^n, \Re)}{(\Re^n, \Re)}$</mathjax>. Try to derive the matrix <mathjax>$A \in \Re^{n
\times n}$</mathjax>. Both the domain and codomain have as basis <mathjax>$\{b,
\mathcal{A}(b), \mathcal{A}^2(b), ..., \mathcal{A}^{n-1}(b)\}$</mathjax>, where <mathjax>$b
\in \Re^n$</mathjax> and <mathjax>$A^n = -\sum_1^n -\alpha_i \mathcal{A}^{n-i}$</mathjax>. Your task is
to show that the representation of <mathjax>$b$</mathjax> and <mathjax>$\mathcal{A}$</mathjax> is:</p>
<p><mathjax>$$
\bar{b} = \begin{bmatrix}1 \\ 0 \\ \vdots \\ 0\end{bmatrix}
\\ \bar{A} = \begin{bmatrix}
\\ 0 &amp; 0 &amp; ... &amp; 0 &amp; -\alpha_n
\\ 1 &amp; 0 * ... &amp; 0 &amp; -\alpha_{n-1}
\\ 0 &amp; 1 * ... &amp; \vdots &amp; -\alpha_{n-2}
\\ \vdots &amp; \vdots&amp; \ddots &amp; \vdots &amp; -\alpha_{n-2}
\\ \vdots &amp; \vdots &amp; \ddots &amp; \vdots &amp; -\alpha_{n-2}
\\ 0 &amp; 0 &amp; \dots &amp; 1 &amp; -\alpha_1
\end{bmatrix}
$$</mathjax></p>
<p>This is really quite simple; it's almost by definition.</p>
<p>Note that these are composable maps, where composition corresponds to
matrix multiplication.</p>
<h2>Change of basis</h2>
<p>Consider we have <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax> and two sets of bases for the
domain and codomain. There exist maps between the first set of bases and
the second set; composing those appropriately will give you your change of
basis. Essentially, do a change of coordinates to those in which <mathjax>$A$</mathjax> is
defined (represented this as <mathjax>$P$</mathjax>), apply <mathjax>$A$</mathjax>, then change the coordinates
of the codomain back (represented as <mathjax>$Q$</mathjax>). Thus <mathjax>$\bar{A} = QAP$</mathjax>.</p>
<p>If <mathjax>$V = U$</mathjax>, then you can easily derive that <mathjax>$Q = P^{-1}$</mathjax>, so <mathjax>$\bar{A} =
P^{-1}AP$</mathjax>.</p>
<p>We consider this transformation (<mathjax>$\bar{A} = QAP$</mathjax>) to be a <strong>similarity
transformation</strong>, and <mathjax>$A$</mathjax> and <mathjax>$\bar{A}$</mathjax> are called <strong>similar</strong>
(<strong>equivalent</strong>).</p>
<p>We derived these two matrices from the same linear map, but they're derived
using different bases.</p>
<p>Proof of Sylvester's inequality on homework 2.</p>
<p>One last note about the dimension of the rank of a linear map, which
corresponds to the rank of the associated matrix representation: that is
<mathjax>$\text{dim}(R(A)) = \text{dim}(R(\mathcal{A}))$</mathjax>. Similarly, <mathjax>$\text
{nullity}(A) = \text{dim}(\text{nullspace}(A)) = \text{dim}(\text
{nullspace}(\mathcal{A}))$</mathjax>.</p>
<p>Sylvester's inequality, which is an important relationship, says the
following: <strong>Suppose you have <mathjax>$A \in \mathbb{F}^{m \times n}$</mathjax>, <mathjax>$B \in
\mathbb{F}^{n \times p}$</mathjax>, then <mathjax>$AB \in \mathbb{F}^{m \times p}$</mathjax>, then
<mathjax>$\text{rk}(A) + \text{rk}(B) - n \le \text{rk}(AB) \le \min(\text{rk}(A),
\text{rk}(B)$</mathjax>.</strong> On the homework, you'll have to show both
inequalities. Note at the end about elementary row operations.</p>
<p>Next important concept about vector spaces: that of norms.</p>
<h2>Norms</h2>
<p>With some vector spaces, you can associate some entity called a norm. We
can then speak of a <strong>normed vector space</strong> (more commonly known as a
<strong>metric space</strong>). Suppose you have a vector space <mathjax>$(V, \mathbb{F})$</mathjax>, where
<mathjax>$\mathbb{F}$</mathjax> is either <mathjax>$\Re$</mathjax> or <mathjax>$\mathbb{C}$</mathjax>. This is a metric space if you
can find <mathjax>$\fn{\mag{\cdot}}{V}{\Re_+}$</mathjax> that satisfies the following axioms:</p>
<p><mathjax>$\mag{v_1 + v_2} \le \mag{v_1} + \mag{v_2}$</mathjax></p>
<p><mathjax>$\mag{\alpha v} = \abs{\alpha}\mag{v}$</mathjax></p>
<p><mathjax>$\mag{v} = 0 \iff v = \theta$</mathjax></p>
<p>We have some common norms on these fields:</p>
<p><mathjax>$\mag{x}_1 = \sum_{i=1}^n \abs{x_i}$</mathjax> (<mathjax>$\ell_1$</mathjax>)</p>
<p><mathjax>$\mag{x}_2 = \sum_{i=1}^n \abs{x_i}^2$</mathjax> (<mathjax>$\ell_2$</mathjax>)</p>
<p><mathjax>$\mag{x}_p = \sum_{i=1}^n \abs{x_i}^p$</mathjax> (<mathjax>$\ell_p$</mathjax>)</p>
<p><mathjax>$\mag{x}_\infty = \max \abs{x_i}$</mathjax> (<mathjax>$\ell_\infty$</mathjax>)</p>
<p>One of the most important norms that we'll be using: the <strong>induced norm</strong>
is that induced by a linear operator. We'll define <mathjax>$\mathcal{A}$</mathjax> to be a
continuous linear map between two metric spaces; the induced norm is
defined as</p>
<p><mathjax>$$ \mag{\mathcal{A}}_i = \sup_{u \neq \theta}
\frac{\mag{\mathcal{A}u}_V}{\mag{u}_U} $$</mathjax></p>
<p>From analysis: the <strong>supremum</strong> is the least upper bound (the smallest
<mathjax>$\forall y \in S, x : x \ge y$</mathjax>).</p>
<p><a name='5'></a></p>
<h1>Guest Lecture: Induced Norms and Inner Products</h1>
<h2>September 6, 2012</h2>
<h2>Induced norms of matrices</h2>
<p>The reason that we're going to start talking about induced norms: today
we're just going to build abstract algebra machinery, and at the end, we'll
do the first application: least squares. We'll see why we need this
machinery and why abstraction is a useful tool.</p>
<p>The idea is that we want to find a norm on a matrix using existing norms on vectors.</p>
<p>Let 1) <mathjax>$\fn{A}{(U,F)}{(U,F)}$</mathjax>, 2) let U have the norm <mathjax>$\mag{\ }_u$</mathjax>, 3) let
V have the norm <mathjax>$\mag{\ }_v$</mathjax>. Let the <strong>induced norm</strong> be <mathjax>$\mag{A}_{u,v} =
\sup_{x\neq 0} \frac{\mag{Ax}_v}{\mag{x}_u}$</mathjax>. Theorem: the induced norm is
a norm. Not going to bother showing positive homogeneity and triangle
inequality (trivial in this case). Only going to show last property:
separates points. Essentially, <mathjax>$\mag{A}_{u,v} = 0 \iff A = 0$</mathjax>. The reason
that this is not necessarily trivial is because of the supremum. It's a
complex operator that's trying to maximize this function over an infinite
set of points. It's possible that the supremum does not actually exist at a
finite point.</p>
<p>The first direction is easy: if <mathjax>$A$</mathjax> is zero, then its norm is 0 (by
definition -- numerator is 0).</p>
<p>The second direction is a hard one. If <mathjax>$\mag{A}_{u,v} = 0$</mathjax>, then given any
<mathjax>$x \neq 0$</mathjax>, it holds that <mathjax>$\frac{\mag{Ax}_u}{\mag{v}_u} \le 0$</mathjax> (from the
definition of supremum). Denominator must be positive definite (being the
norm of a vector), and numerator must be positive definite (also being a
norm). Thus the norm is also bounded below by zero, which means that the
numerator is zero for all nonzero x. Thus everything is in the nullspace of
<mathjax>$A$</mathjax>, which means that <mathjax>$A$</mathjax> is zero.</p>
<p>Proposition: the induced norm has (a) <mathjax>$\mag{Ax}_u \le \mag{A}_{u,v}
\mag{x}_u$</mathjax>; (b) <mathjax>$\mag{AB}_{u,v} \le \mag{A}_{u,v} \mag{B}_{u,v}$</mathjax>. (b)
follows from (a).</p>
<p>Not emphasized in Claire's notes: induced norms form a small amount of all
possible norms on matrices.</p>
<p>Examples of induced norms:</p>
<ul>
<li><mathjax>$\mag{A}_{1,1} = \max_j \sum_i \abs{u_{ij}}$</mathjax>: maximum column sum: maximum
of the sum of columns;</li>
<li><mathjax>$\mag{A}_{2,2} = \max_j \sqrt{\lambda_j A^T A}$</mathjax>: max singular value norm;</li>
<li><mathjax>$\mag{A}_{\infty, \infty} = \max_i \sum_j \abs{u_{ij}}$</mathjax>: maximum row sum.</li>
</ul>
<p>Other matrix: special case of Schatten norms. (a) Frobenius norm
<mathjax>$\sqrt{\text{trace}(A^T A)}$</mathjax>. Also square root of singular
values. Convenient way to write nuclear norm.</p>
<p>Statistical regularization; Frobenius norm is analogous to <mathjax>$\ell_2$</mathjax>
regularization; nuclear norm analogous to <mathjax>$\ell_1$</mathjax> regularization. It is
important to be aware that these other norms exist.</p>
<h2>Sensitivity analysis</h2>
<p>Nice application of norms, but we won't see that it's a nice application
until later.</p>
<p>Computation for numerical linear algebra.</p>
<p>Some algebra can be performed to show that if <mathjax>$Ax_0 = b$</mathjax> (when <mathjax>$A$</mathjax>
invertible), then for <mathjax>$(A + \delta A)(x + \delta_x) = b + \delta b$</mathjax>, we
have an approximate bound of <mathjax>$\frac{\mag{\delta_x}}{\mag{x_0}} \le
\mag{A}\mag{A^{-1}} \bracks{\frac{\mag{\delta A}}{\mag{A}} +
\frac{\mag{\delta b}}{\mag{b}}}$</mathjax>. Need to engineer computation to improve
situation. Namely, we're perturbing <mathjax>$A$</mathjax> and <mathjax>$b$</mathjax> slightly: how much can the
solution vary? In some sense, we have a measure of effect
(<mathjax>$\mag{A}\mag{A^{-1}}$</mathjax>) and a measure of perturbation. The first quantity
is important enough that people in linear algebra have defined it and
called it a <strong>condition number</strong>: <mathjax>$\kappa(A) = \mag{A}\mag{A^{-1}} \ge
1$</mathjax>. The best you can do is 1. If you have a condition number of 1, your
system is well-conditioned and very robust to perturbations. Larger
condition number will mean less robustness to perturbation.</p>
<h2>More machinery: Inner Product &amp; Hilbert Spaces</h2>
<p>Consider a linear space <mathjax>$(H, \mathbb{F})$</mathjax>, and define a function
<mathjax>$\fn{\braket{}{}}{(H, \mathbb{F})}{\mathbb{F}}$</mathjax>. This function is an
inner product if it satisfies the following properties.</p>
<ul>
<li>Conjugate symmetry. <mathjax>$\braket{x}{y} = \braket{y}{x}^*$</mathjax>.</li>
<li>Homogeneity. <mathjax>$\braket{x}{\alpha y} = \alpha \braket{x}{y}$</mathjax>.</li>
<li>Linearity. <mathjax>$\braket{x}{y + z} = \braket{x}{y} + \braket{x}{z}$</mathjax>.</li>
<li>Positive definiteness. <mathjax>$\braket{x}{x} \ge 0$</mathjax>, where equality only occurs
when <mathjax>$x = 0$</mathjax>.</li>
</ul>
<p>Inner product spaces have a natural norm (might not be the official name),
and that's the norm induced by the inner product.</p>
<p>One can define <mathjax>$\mag{x}^2 = \braket{x}{x}$</mathjax>, which satisfies the axioms of a
norm.</p>
<p>Examples of Hilbert spaces: finite-dimensional vectors. Most of the time,
infinite-dimensional Hilbert spaces match up with finite-dimensional. All
linear operators in finite vector spaces are continuous because they can be
written as a matrix (not always the case with infinite vector
spaces). Suppose I have the field <mathjax>$\mathbb{F}$</mathjax>; <mathjax>$(\mathbb{F}^n,
\mathbb{F})$</mathjax>, where the inner product <mathjax>$\braket{x}{y} = \sum_i \bar{x_i}
y_i$</mathjax>, but another important inner product space is the space of
square-integrable functions, <mathjax>$L^2([t_0, t_1], \mathbb{F}^n
)$</mathjax>. Infinite-dimensional space which actually is the space spanned by
Fourier series. It turns out that the inner product (of functions) is
<mathjax>$\int_{t_0}^{t_1} f(t)^* g(t) dt$</mathjax>.</p>
<p>We're going to power through a little more machinery, but we're getting
very close to the application. Need to go through adjoints and
orthogonality before we can start doing applications.</p>
<h2>Adjoints</h2>
<p>Consider Hilbert spaces <mathjax>$(U, \mathbb{F}, \braket{}{}_u), V, \mathbb{F},
\braket{}{}_v)$</mathjax>, and let <mathjax>$\fn{A}{U}{V}$</mathjax> be a continuous linear
function. The <strong>adjoint</strong> of <mathjax>$A$</mathjax> is denoted <mathjax>$A^*$</mathjax> and is the map
<mathjax>$\fn{A^*}{V}{U}$</mathjax> such that <mathjax>$\braket{x}{Ay}_v = \braket{A^*}{y}_u$</mathjax>.</p>
<p>Reasoning? Sometimes you can simplify things. Suppose <mathjax>$A$</mathjax> maps an
infinite-dimensional space to a finite-dimensional space (e.g. functions to
numbers). In some sense, you can convert that function into something that
goes from real numbers to functions on numbers. Generalization of the
Hermitian transpose.</p>
<p>Consider functions <mathjax>$f, g \in C([t_0, t_1], \Re^n)$</mathjax>. What is the adjoint of
<mathjax>$\fn{A}{C([t_0, t_1], \Re^n)}{\Re}$</mathjax>, where <mathjax>$A = \braket{g}{f}_{C
([t_0, t_1], \Re^n)}$</mathjax>? (aside: this notion of the adjoint will be very
important when we get to observability and reachability)</p>
<p>Observe that <mathjax>$\braket{v}{A}_\Re = v \cdot A = v \braket{g}{f}_C = \braket{v
g}{f}$</mathjax>, and so consequently, we have that the adjoint of <mathjax>$A^*[v] = v g$</mathjax>.</p>
<h2>Orthogonality</h2>
<p>With Hilbert spaces, one can define orthogonality in an axiomatic manner (a
more abstract form, rather). Let <mathjax>$(H, \mathbb{F}, \braket{}{})$</mathjax> be a
Hilbert space. Two vectors <mathjax>$x, y$</mathjax> are defined to be <strong>orthogonal</strong> if
<mathjax>$\braket{x}{y} = 0$</mathjax>.</p>
<p>Cute example: suppose <mathjax>$c = a + b$</mathjax> and <mathjax>$a, b$</mathjax> are orthogonal. In fact,
<mathjax>$\mag{c}^2 = \mag{a + b}^2 = \braket{a + b}{a + b} = \braket{a}{a} +
\braket{b}{b} + \braket{a}{b} + \braket{b}{a} = \mag{a}^2 +
\mag{b}^2$</mathjax>. Cute because the result is the Pythagorean theorem, which we
got just through these axioms.</p>
<p>One more thing: the orthogonal complement of a subspace <mathjax>$M$</mathjax> in a Hilbert
space is defined as <mathjax>$M^\perp = \set{y \in H}{\forall x \in M
\braket{x}{y}}$</mathjax>.</p>
<p>We are at a point now where we can talk about an important theorem:</p>
<h2>Fundamental Theorem of Linear Algebra (partially)</h2>
<p>Let <mathjax>$A \in \Re^{m \times n}$</mathjax>. Then:</p>
<ul>
<li><mathjax>$R(A) \perp N(A^T)$</mathjax></li>
<li><mathjax>$R(A^T) \perp N(A)$</mathjax></li>
<li><mathjax>$R(AA^T) = R(A)$</mathjax></li>
<li><mathjax>$R(A^TA) = R(A^T)$</mathjax></li>
<li><mathjax>$N(AA^T) = N(A)$</mathjax></li>
<li><mathjax>$N(A^TA) = N(A^T)$</mathjax></li>
</ul>
<p>Proofs:</p>
<ul>
<li>
<p>Given any <mathjax>$x \in \Re^n, y \in \Re^m \st A^T y = 0$</mathjax> (<mathjax>$y \in N(A^T)$</mathjax>),
consider the quantity <mathjax>$\braket{y}{Ax} = \braket{A^Ty}{x} = 0$</mathjax>.</p>
</li>
<li>
<p>Given any <mathjax>$x \in \Re^n, \exists y \in \Re^m \st x = A^T y + z$</mathjax>, where <mathjax>$z
\in N(A)$</mathjax>(as a result of the decomposition above). Thus <mathjax>$Ax =
AA^Ty$</mathjax>. Implies that <mathjax>$R(A) \subset R(A A^T)$</mathjax></p>
</li>
</ul>
<p>Now for the application.</p>
<h2>Application: Least Squares</h2>
<p>Consider the following problem: minimze <mathjax>$\mag{y - Ax}_2$</mathjax>, where <mathjax>$y \not\in
R(A)$</mathjax>. If <mathjax>$y$</mathjax> were in the range of A, and A were invertible, the solution
would be trivial (<mathjax>$A^{-1}y$</mathjax>). In many problems, <mathjax>$A \in \Re^{m\times n}$</mathjax>,
where <mathjax>$m \gg n$</mathjax>, <mathjax>$y \in \Re^m$</mathjax>, <mathjax>$x \in \Re^n$</mathjax>.</p>
<p>Since we cannot solve <mathjax>$Ax = y$</mathjax>, we instead solve <mathjax>$Ax = \hat{y}$</mathjax>. According
to our intuition, we would like <mathjax>$y - \hat{y}$</mathjax> to be orthogonal to
<mathjax>$R(A)$</mathjax>. From the preceding (partial) theorem, this means that <mathjax>$y - \hat{y}
\in N(A^T) \iff A^T(y - y_0) = 0$</mathjax>. Remember: what we really want to solve
is <mathjax>$A^T(y - Ax) = 0 \implies A^T Ax = A^T y \implies x = (A^T A)^{-1} A^T
y$</mathjax> if <mathjax>$A^T A$</mathjax> is invertible.</p>
<p>If A has full column-rank (that is, for <mathjax>$A \in \Re^{m \times n}$</mathjax>, we have
<mathjax>$R(A) = n$</mathjax>), then this means that in fact <mathjax>$N(A) = \{0\}$</mathjax>, and the preceding
theorem implies that the dimension of <mathjax>$R(A^T) = n$</mathjax>, which means that the
dimension of <mathjax>$R(A^T A) = n$</mathjax>. However, <mathjax>$A^T A \in \Re^{n \times n}$</mathjax>. Thus,
<mathjax>$A^T A$</mathjax> is invertible.</p>
<h2>Back to condition numbers (special case)</h2>
<p>Consider a self-adjoint and invertible matrix in <mathjax>$\Re^{n \times
n}$</mathjax>. <mathjax>$\hat{x} = (A^T A)^{-1} A^T y = A^{-1} y$</mathjax>. We have two ways of
determining this value: the overdetermined least-squares solution and the
standard inverse. Let us look at the condition numbers.</p>
<p><mathjax>$\kappa(A^T A) = \mag{A^T A}\mag{(A^T A)^{-1}} = \mag{A^2}\mag{(A^{-1})^2}
= \bracks{\kappa(A)}^2$</mathjax>. This result is more general: also applies in the
<mathjax>$L^2$</mathjax> case even if <mathjax>$A$</mathjax> is not self-adjoint. As you can see, this is worse
than if we simply use the inverse.</p>
<h2>Gram-Schmidt orthonormalization</h2>
<p>This is a theoretical toy, not used for computation (numerics are very bad).</p>
<p>More definitions:</p>
<p>A <em>set</em> of vectors S is <strong>orthogonal</strong> if <mathjax>$x \perp y \forall x
\neq y$</mathjax> and <mathjax>$x, y \in S$</mathjax>.</p>
<p>The set is <strong>orthonormal</strong> if also <mathjax>$\mag{x} = 1, \forall x \in S$</mathjax>. Why do we
care about orthonormality? Consider Parseval's theorem. The reason you get
that theorem is that the bases are required to be orthonormal so that you
can get that result. Otherwise it wouldn't be as clean. That's typically
why people like orthonormal bases: you can represent your vectors as just
coefficients (and you don't need to store the length of the vectors).</p>
<p>We conclude with an example of Gram-Schmidt orthonormalization. Consider
the space <mathjax>$L^2([t_0, t_1], \Re)$</mathjax>. Suppose I have <mathjax>$v_1 = 1, v_2 = t, v_3 =
t^2$</mathjax>, <mathjax>$t_0 = 0$</mathjax>, <mathjax>$t_1 = 1$</mathjax>, and <mathjax>$\mag{v_1}^2 = \int_0^1 1 \cdot 1 dt =
1$</mathjax>. The key idea of Gram-Schmidt orthonormalization is the following: start
with <mathjax>$b_1 \equiv \frac{v_1}{\mag{v_1}}$</mathjax>. Then go on with <mathjax>$b_2 = \frac{v_2 -
\braket{v_2}{b_1}b_1}{\mag{v_2 - \braket{v_2}{b_1}b_1}}$</mathjax>, and repeat until
you're done (in essence: you want to preserve only the component that is
orthogonal to the space spanned by the vectors you've computed so far, then
renormalize).</p>
<p>Basically, you get after all this computation that <mathjax>$b_2 = \frac{1}{12} t -
\frac{1}{24}$</mathjax>. Same construction for <mathjax>$b_3$</mathjax>.</p>
<p><a name='6'></a></p>
<h1>Singular Value Decomposition &amp; Introduction to Differential Equations</h1>
<h2>September 11, 2012</h2>
<p>Reviewing the adjoint, suppose we have two vector spaces <mathjax>$U, V$</mathjax>; like we
have with norms, let us associated a field that is either <mathjax>$\Re$</mathjax> or
<mathjax>$\mathbb{C}$</mathjax>. Assume that these spaces are inner product spaces (we're
associating with each an inner product). Suppose we have a continuous
(linear) map <mathjax>$\fn{\mathcal{A}}{U}{V}$</mathjax>. We define the <strong>adjoint</strong> of this
map to be <mathjax>$\fn{\mathcal{A}^*}{V}{U}$</mathjax> such that <mathjax>$\braket{u}{\mathcal{A} v} =
\braket{\mathcal{A}^* v}{u}$</mathjax>.</p>
<p>We define <strong>self-adjoint</strong> maps as maps that are equal to their adjoints,
i.e. <mathjax>$\fn{\mathcal{A}}{U_1}{U_2} \st \mathcal{A} = \mathcal{A}^*$</mathjax>.</p>
<p>In finite-dimensional vector spaces, the adjoint of a map is equivalent to
the conjugate transpose of the matrix representation of the map. We refer
to matrices that correspond to self-adjoint maps as <strong>hermitian</strong>.</p>
<h2>Unitary matrices</h2>
<p>Suppose that we have <mathjax>$U \in \mathbb{F}^{n\times n}$</mathjax>. <mathjax>$U$</mathjax> is <strong>unitary</strong> iff
<mathjax>$U^*U = UU^* = I_n$</mathjax>. If <mathjax>$\mathbb{F}$</mathjax> is <mathjax>$\Re$</mathjax>, the matrix is called
<strong>orthogonal</strong>.</p>
<p>These constructions lead us to something useful: singular value
decomposition. We'll come back to this later when we talk about matrix
operations.</p>
<h2>Singular Value Decomposition (SVD)</h2>
<p>Suppose you have a matrix <mathjax>$M \in \mathbb{F}^{m\times m}$</mathjax>. An <strong>eigenvalue</strong>
<mathjax>$\lambda$</mathjax> of <mathjax>$M$</mathjax> is a complex number iff there exists a nonzero vector <mathjax>$v$</mathjax>
such that <mathjax>$Mv = \lambda v$</mathjax> (<mathjax>$v$</mathjax> is thus called the <strong>eigenvector</strong>
associated to <mathjax>$\lambda$</mathjax>). Now we can think about how to define singular
values of a matrix in terms of these definitions.</p>
<p>Let us think about this in general for a matrix <mathjax>$A \in \mathbb{F}^{m \times
n}$</mathjax> (which we consider to be a matrix representation of some linear map
with respect to a basis). Note that <mathjax>$A A^* = \mathbb{F}^{m \times m}$</mathjax>,
which will have <mathjax>$m$</mathjax> eigenvalues <mathjax>$\lambda_i, i = 1 ... m$</mathjax>.</p>
<p>Note that <mathjax>$AA^*$</mathjax> is hermitian. We note that from the Spectral theorem, we
can decompose the matrix into an orthonormal basis of eigenvectors
corresponding to real eigenvalues. In fact, in this case, the eigenvalues
must be real and non-negative.</p>
<p>If we write the eigenvalues of <mathjax>$AA^*$</mathjax> as <mathjax>$\lambda_1 \ge \lambda_2 \ge
... \ge \lambda_m$</mathjax>, where the first <mathjax>$r$</mathjax> are nonzero, note that <mathjax>$r =
\text{rank} AA^*$</mathjax>. We define the <strong>non-zero singular values</strong> of <mathjax>$A$</mathjax> to be
<mathjax>$\sigma_i = \sqrt{\lambda_i}, i \le r$</mathjax>. The remaining singular values are
zero.</p>
<p>Recall the <strong>induced 2-norm</strong>: let us relate this notion of singular values
back to the induced 2-norm of a matrix <mathjax>$A$</mathjax> (<mathjax>$\mag{A}_{2,i}$</mathjax>). Consider the
induced norm to be the norm induced by the action of <mathjax>$A$</mathjax> on the domain of
<mathjax>$A$</mathjax>; thus if we take the induced 2-norm, then this is the <mathjax>$\max (\lambda_i
(A^*A))^{1/2}$</mathjax>, which is simply the maximum singular value.</p>
<p>Now that we know what singular values are, we can do a useful decomposition
called singular value decomposition.</p>
<p>Take <mathjax>$M \in \mathbb{C}^{m \times n}$</mathjax>. We have the following theorem: there
exist unitary matrices <mathjax>$U \in \mathbb{C}^{m \times m}, V \in \mathbb{C}^{n
\times n}$</mathjax> such that <mathjax>$A = U \Sigma V$</mathjax>, where <mathjax>$\Sigma$</mathjax> is defined as a
diagonal matrix containing the singular values of <mathjax>$A$</mathjax>. Consider the first
<mathjax>$r$</mathjax> columns of <mathjax>$U$</mathjax> to be <mathjax>$U_1$</mathjax>, the first <mathjax>$r$</mathjax> columns of <mathjax>$V$</mathjax> to be <mathjax>$V_1$</mathjax>,
and the <mathjax>$r \times r$</mathjax> block of <mathjax>$\Sigma$</mathjax> containing the nonzero singular
values to be <mathjax>$\Sigma_r$</mathjax>. Then <mathjax>$A = U \Sigma V = U_1 \sigma_r
V_1^*$</mathjax>.</p>
<p>Consider <mathjax>$AA^*$</mathjax>. With a bit of algebra, we can show that <mathjax>$AA^*U_1 = U_1
\sigma_r^2$</mathjax>. We call the columns <mathjax>$u_i$</mathjax> of <mathjax>$U_1$</mathjax> are the eigenvectors of
<mathjax>$AA^*$</mathjax> associated to eigenvalues <mathjax>$\sigma_i^2$</mathjax>; these are called the
<strong>right-singular vectors</strong>.</p>
<p>Similarly, if we consider <mathjax>$A^*A$</mathjax>, we can show that <mathjax>$A^*A = V_1^* \Sigma_r^2
V_1$</mathjax> and that <mathjax>$v_i^* A^*A = \Sigma_r^2 v_1^*$</mathjax>; the columns of this matrix
are called the <strong>left-singular vectors</strong>.</p>
<h2>Recap</h2>
<p>We've covered a lot of ground these past few weeks: we covered functions,
vector spaces, bases, and then we started to consider linearity. And then
we started talking about endowing vector spaces with things like norms,
inner products; induced norms. From that, we went on to talk about
adjoints. We used adjoints, we went on to talk a little about projection
and least-squares optimization. We then went on to talk about Hermitian
matrices and singular value decomposition. I think about this first unit as
having many basic units that we'll use over and over again. Two interesting
applications: least-squares, SVD.</p>
<p>So we have this basis now to build on as we talk about linear
systems. We'll also need to build a foundation on linear differential
equations. We'll spend some time going over the basics: what a solution
means, under what conditions a solution exists (i.e. what properties does
the differential equation need to have?). We'll spend the next couple weeksn
talking about properties of differential equations.</p>
<p>All of what we've done up to now has been covered in appendix A of Callier
&amp; Desoer. For the introduction to differential equations, we'll follow
appendix B of Callier &amp; Desoer. Not the easiest to read, but very
comprehensive background reading.</p>
<p>The existence and uniqueness theorems are in many places, however.</p>
<p>Lecture notes 7.</p>
<h2>Differential Equations</h2>
<p><mathjax>$$
\dot{x} = f((x(t), t)), x(t_0) = x_0
\\ x \in \Re^n
\\ \fn{f}{\Re^n \times \Re}{\Re^n}
$$</mathjax></p>
<p>(strictly speaking, <mathjax>$f$</mathjax> maps <mathjax>$x$</mathjax> to the tangent space, but for this course,
we're going to consider the two spaces to be equivalent)</p>
<p>Often, we're going to consider the <strong>time-invariant</strong> case (where there is
no dependence on <mathjax>$t$</mathjax>, but rather only on <mathjax>$x$</mathjax>), but this is a time-variant
case. Recall that we consider time to be a privileged variable, i.e. always
"marching forward".</p>
<p>What we're going to talk about now is how we can solve this differential
equation. Rather (for now), under what conditions does there exist a
(unique) solution to the differential equation (with initial condition)?
We're interested in these two properties: existence and uniqueness. The
solution we call <mathjax>$x(t)$</mathjax> where <mathjax>$x(t_0) = x_0$</mathjax>. We need some understanding of
some properties of that function <mathjax>$f$</mathjax>. We'll talk about continuity,
piecewise continuity, Lipschitz continuity (thinking about the
existence). In terms of uniqueness, we'll be talking about Cauchy
sequences, Banach spaces, Bellman-Gr&ouml;nwall lemma.</p>
<p>A couple of different ways to prove uniqueness and existence; we'll use the
Callier &amp; Desoer method.</p>
<p>We'll finish today's lecture by just talking about some definitions of
continuity. Suppose we have a function <mathjax>$f(x)$</mathjax> that is said to be
<strong>continuous</strong>: that is, <mathjax>$\forall \epsilon &gt; 0, \exists \delta &gt; 0 \st
\abs{x_1
- x_2} &lt; \delta \implies \abs{f(x_1) - f(x_2)} &lt; \epsilon$</mathjax>
(<mathjax>$\epsilon$</mathjax>-<mathjax>$\delta$</mathjax> definition).</p>
<p>Suppose we have <mathjax>$\fn{f(x,t)}{\Re^n \times \Re}{\Re^n}$</mathjax>. <mathjax>$f$</mathjax> is said to be
piece-wise continuous (w.r.t. <mathjax>$t$</mathjax>), <mathjax>$\forall x$</mathjax> if <mathjax>$\fn{f(x,
\cdot)}{\Re}{\Re^n}$</mathjax> is continuous except at a finite number of
(well-behaved) discontinuities in any closed and bounded interval of
time. What I'm not allowing in this definition are functions with
infinitely many points of discontinuity.</p>
<p>Next time we'll talk about Lipschitz continuity.</p>
<p><a name='7'></a></p>
<h1>Existence and Uniqueness of Solutions to Differential Equations</h1>
<h2>September 13, 2012</h2>
<p>Section this Friday only, 9:30 - 110:30, Cory 299.</p>
<p>Today: existence and uniqueness of solutions to differential equations.</p>
<p>We called this a DE or ODE, and we associated with it an initial
condition. We started to talk about properties of the function <mathjax>$f$</mathjax> as a
function of <mathjax>$x$</mathjax> only, but we can consider thinking about this as a function
of <mathjax>$x$</mathjax> for all t. This is a map from <mathjax>$\Re^n \to \Re^n$</mathjax>. In this class,
recall, we used the <mathjax>$\epsilon$</mathjax>-<mathjax>$\delta$</mathjax> definition for continuity.</p>
<p>We also introduced the concept of piecewise continuity, which will be
important for thinking about the right-hand-side of the differential
equation.</p>
<p>We defined piecewise continuity as <mathjax>$\fn{f(t)}{\Re_+}{\Re^n}$</mathjax>, where <mathjax>$f(t)$</mathjax>
is said to be piecewise continuous in <mathjax>$t$</mathjax>, where the function is continuous
except at a set of well-behaved discontinuities (finitely many in any
closed and bounded, i.e. <strong>compact</strong>, interval).</p>
<p>Finally, we will define Lipschitz continuity as follows: a function
<mathjax>$\fn{f(\cdot, t)}{\Re^n}{\Re^n}$</mathjax> is <strong>Lipschitz continuous</strong> in x if there
exists a piecewise continuous function of time <mathjax>$\fn{k(t)}{\Re_+}{\Re_+}$</mathjax>
such that the following inequality holds: <mathjax>$\mag{f(x_1) - f(x_2)} \le
k(t)\mag{x_1 - x_2}, \forall x_1, x_2 \in \Re^n, \forall t \in \Re_+$</mathjax>. This
inequality (condition) is called the <strong>Lipschitz condition</strong>.</p>
<p>An important thing in this inequality is that there has to be one function
<mathjax>$k(t)$</mathjax>, and it has to be piecewise continuous. That is, there exists such a
function that is not allowed to go to infinity in compact time
intervals.</p>
<p>It's an interesting condition, and if we look at this and compare the
Lipschitz continuity definition to the general continuity definition, we
can easily show that if the function is LC (Lipschitz continuous), then
it's C (continuous), since LC is a stricter condition than C. That
implication is fairly straightforward to show, but the inverse relationship
is not necessarily true (i.e. continuity does not necessarily imply
Lipschitz continuity).</p>
<p>Aside: think about this condition and what it takes to show that a function
is Lipschitz continuous. Need to come up with a candidate <mathjax>$k(t)$</mathjax> (often
called the Lipschitz function or constant, if it's constant). Often the
hardest part: trying to extract from <mathjax>$f$</mathjax> what a possible <mathjax>$k$</mathjax> is.</p>
<p>But there's a useful possible candidate for <mathjax>$k(t)$</mathjax>, given a particular
function <mathjax>$f$</mathjax>. Let's forget about time for a second and consider a function
just of <mathjax>$x$</mathjax>. If the <strong>Jacobian</strong> <mathjax>$Df$</mathjax> (often you also use <mathjax>$\pderiv{f}{x}$</mathjax>),
which is an <mathjax>$n \times n$</mathjax> matrix (where <mathjax>$(Df)^j_i = \pderiv{f_j}{x_i}$</mathjax>. If
the Jacobian <mathjax>$Df$</mathjax> exists, then its norm provides a candidate Lipschitz
function <mathjax>$k(t)$</mathjax>.</p>
<p>A norm of the Jacobian of <mathjax>$f$</mathjax>, if independent of <mathjax>$x$</mathjax>, tells you that the
function is Lipschitz. If the norm always seems to depend on <mathjax>$x$</mathjax>, you can
still say something about the Lipschitz properties of the function: you can
call it locally Lipschitz by bounding the value of <mathjax>$x$</mathjax> in some region.</p>
<p>Sketch of proof: generalization of mean value theorem (easy to sketch in
<mathjax>$\Re^1$</mathjax>). Mean value theorem states that there exists a point such that the
instantaneous slope is the same as the average slope (assuming that the
function is differentiable). If we want to generalize it to more
dimensions, we say <mathjax>$f(x_1) - f(x_2) = Df(\lambda x_1 + (1 - \lambda)
x_2)(x_1 - x_2)$</mathjax> (where <mathjax>$0 &lt; \lambda &lt; 1$</mathjax>). All we've required is the
existence of <mathjax>$Df$</mathjax>.</p>
<p>Now we can just take norms (and this is what's interesting now) and use
some of the results we have from norms. This provides a very useful
construction for a candidate for <mathjax>$k$</mathjax> (might not provide a great bound), but
it's the second thing to try if you can't immediately extract out a
function <mathjax>$k(t)$</mathjax>.</p>
<p>Something not in the notes, but useful. Let's go back to where we started,
the differential equation with initial condition, and state the main
theorem.</p>
<h2>Fundamental Theorem of DEs / the Existence and Uniqueness theorem of (O)DEs</h2>
<p>suppose we have a differential equation with an initial condition. Assume
that <mathjax>$f(x)$</mathjax> is piecewise continuous in <mathjax>$t$</mathjax> and Lipschitz continuous in
<mathjax>$x$</mathjax>. With that information, we have that there exists a unique function of
time which maps <mathjax>$\Re_+ \to \Re^n$</mathjax>, which is differentiable (<mathjax>$C^1$</mathjax>) <em>almost</em>
everywhere (derivative exists at all points at which <mathjax>$f$</mathjax> is continuous),
and it satisfies the initial condition and differential equation. This
derivative exists at all points <mathjax>$t \in [t_1, t_2] - D$</mathjax>, where
<mathjax>$D$</mathjax> is the set of points where <mathjax>$f$</mathjax> is discontinuous in <mathjax>$t$</mathjax>.</p>
<p>We are going to be interested in studying differential equations where we
know these conditions hold. We're also going to prove the theorem. It's a
nice thing to do (a little in depth) because it demonstrates some proof
techniques (as well as giving you an idea of why the theorem works).</p>
<h2>LC condition</h2>
<p>The norm of the Jacobian of the example is bounded for bounded <mathjax>$x$</mathjax>. That
is, we can choose a local region in <mathjax>$\Re$</mathjax> for which our <mathjax>$Df$</mathjax> is bounded to
be less than some constant. That gives us a candidate Lipschitz constant
for that local region. We say then that <mathjax>$f(x)$</mathjax> is (at least) <strong>locally
Lipschitz continuous</strong> (usually we just say this without specifying a
region, since you can usually find a bound given any region). Further, it
is trivially piecewise continuous in time (since it doesn't depend on
time).</p>
<p>Note: if the Lipschitz condition holds only locally, it may be that the
solution is only defined over a certain range of time.</p>
<p>We didn't show this, but in this example, the Lipschitz condition does not
hold globally.</p>
<h2>Local Fundamental theorem of DEs</h2>
<p>Now assume that <mathjax>$f(x)$</mathjax> is piecewise continuous in <mathjax>$t$</mathjax> and Lipschitz
continuous in <mathjax>$x$</mathjax> (for all <mathjax>$x \in G \in \Re^n$</mathjax>). We now have that there
exists a unique function of time and an interval <mathjax>$[t_0,t_1]$</mathjax> (such that
<mathjax>$t_0 \in G, t_1 \in G$</mathjax>) which maps <mathjax>$\Re_+ \to \Re^n$</mathjax>, which is
differentiable (<mathjax>$C^1$</mathjax>) <em>almost</em> everywhere (derivative exists at all points
at which <mathjax>$f$</mathjax> is continuous), and it satisfies the initial condition and
differential equation. As before, This derivative exists at all points <mathjax>$t
\in [t_1, t_2] - D$</mathjax>, where <mathjax>$D$</mathjax> is the set of points where <mathjax>$f$</mathjax> is
discontinuous in <mathjax>$t$</mathjax>. If it is global, we can make the interval as large as
desired.</p>
<h2>Proof</h2>
<p>There are two pieces: the proof of existence and the proof of
uniqueness. Today will likely just be existence.</p>
<h2>Existence</h2>
<p>Roadmap: construct an infinite sequence of continuous functions defined
(recursively) as follows <mathjax>$x_{m+1}(t) = x_0 + \int_{t_0}^t f(x_m(\tau),
\tau) d\tau$</mathjax>. First, show that this sequence converges to a continuous
function <mathjax>$\fn{\Phi(\cdot)}{\Re_+}{\Re^n}$</mathjax> which solves the DE/IC pair.</p>
<p>Would like to be able to prove the first thing here: I've constructed a
sequence, and I want to show that the limit of this sequence is a solution
to the differential equation.</p>
<p>The tool that I'm going to use is a property called Cauchy, and then I'm
going to invoke the result that if I have a complete space, any Cauchy
sequence on the space converges to something in the space. Gives me the
basis of the existence of the thing that this converges to.</p>
<p>Goal: (1) to show that this sequence is a Cauchy sequence in a
complete normed vector space, which means the sequence converges to
something in the space, and (2) to show that the limit of this sequence
satisfies the DE/IC pair.</p>
<p>A <strong>Cauchy sequence</strong> (on a normed vector space) is such that there exists
some point in the sequence (some finite index <mathjax>$m$</mathjax>) such that if you look at
any point beyond that index, the distance between the later points can be
made smaller than some arbitrarily small <mathjax>$\epsilon &gt; 0$</mathjax>. In other words: if
we drop a finite number of elements from the start of the sequence, the
distance between any remaining elements can be made arbitrarily small.</p>
<p>We define a <strong>Banach space</strong> (equivalently, a <strong>complete normed vector
space</strong>) is one in which all Cauchy sequences converge. Implicitly in that,
it means to something in the space itself.</p>
<p>Just an aside, a <strong>Hilbert space</strong> is a <strong>complete inner product
space</strong>. If you have an inner product space, and you define the norm in
that inner product space induced by that inner product, if all Cauchy
sequences of that space converge (to a limit in the space) with this norm,
then it is a Hilbert space.</p>
<p>Think about a Cauchy sequence on a space that converges to something not
necessarily in the space. Example: any continued fraction.</p>
<p>To show (1), we'll show that this sequence <mathjax>$\{x_m\}$</mathjax> that we constructed is
a Cauchy sequence in a Banach space. Interestingly, it matters what norm
you choose.</p>
<p><a name='8'></a></p>
<h1>Proof of Existence and Uniqueness Theorem</h1>
<h2>September 18, 2012</h2>
<p>Today:</p>
<ul>
<li>proof of existence and uniqueness theorem.</li>
<li>[ if time ] introduction to dynamical systems.</li>
</ul>
<p>First couple of weeks of review to build up basic concepts that we'll be
drawing upon throughout the course. Either today or Thursday we will launch
into linear system theory.</p>
<p>We're going to recall where we were last time. We had the fundamental
theorem of differential equations, which said the following: if we had a
differential equation, <mathjax>$\dot{x} = f(x,t)$</mathjax>, with initial condition <mathjax>$x(t_0) =
x_0$</mathjax>, where <mathjax>$x(t) \in \Re^n$</mathjax>, etc, if <mathjax>$f( \cdot , t)$</mathjax> is Lipschitz
continuous, and <mathjax>$f(x, \cdot )$</mathjax> is piecewise continuous, then there exists a
unique solution to the differential equation / initial condition pair (some
function <mathjax>$\phi(t)$</mathjax>) wherever you can take the derivative (may not be
differentiable everywhere: loses differentiability on the points where
discontinuities exist).</p>
<p>We spent quite a lot of time discussing Lipschitz continuity. Job is
usually to test both conditions; first one requires work. We described a
popular candidate function by looking at the mean value theorem and
applying it to <mathjax>$f$</mathjax>: a norm of the Jacobian function provides a candidate
Lipschitz if it works.</p>
<p>We also described local Lipschitz continuity, and often, when using a norm
of the Jacobian, that's fairly easy to show.</p>
<p>Important point to recall: a norm of the Jacobian of <mathjax>$f$</mathjax> provides a
candidate Lipschitz function.</p>
<p>Another important thing to say here is that we can use any norm we want, so
we can be creative in our choice of norm when looking for a better bound.</p>
<p>We started our proof last day, and we talked a little about the structure
of the proof. We are going to proceed by constructing a sequence of
functions, then show (1) that it converges to a solution, then show (2)
that it is unique.</p>
<h2>Proof of Existence</h2>
<p>We are going to construct this sequence of functions as follows:
<mathjax>$x_{m+1}(t) = x_0 + \int_0^t f(x_m(\tau)) d\tau$</mathjax>. Here we're dealing with
an arbitrary interval from <mathjax>$t_1$</mathjax> to <mathjax>$t_2$</mathjax>, and so <mathjax>$0 \in [t_1, t_2]$</mathjax>. We
want to show that this sequence is a Cauchy sequence, and we're going to
rely on our knowledge that the space these functions are defined in is a
Banach space (hence this sequence converges to something in the space).</p>
<p>We have to put a norm on the set of reals, so we'll use the infinity
norm. Not going to prove it, but rather state it's a Banach space. If we
show that this is a Cauchy sequence, then the limit of that Cauchy sequence
exists in the space. The reason that's interesting is that it's this limit
that provides a candidate for this differential equation.</p>
<p>We will then prove that this limit satisfies the DE/IC pair. That is
adequate to show existence. We'll then go on to prove uniqueness.</p>
<p>Our immediate goal is to show that this sequence is Cauchy, which is, we
should show <mathjax>$\exists m \st (x_{m+p} - x_m) \to 0$</mathjax> as <mathjax>$m$</mathjax> gets large.</p>
<p>First let us look at the difference between <mathjax>$x_{m+1}$</mathjax> and <mathjax>$x_m$</mathjax>. Just
functions of time, and we can compute this. <mathjax>$\mag{x_{m+1} - x_m} =
\int_{t_0}^t (f(x_m, \tau) - f(x_{m+1}, \tau)) d\tau$</mathjax>. Use the fact that f
is Lipschitz continuous, and so it is <mathjax>$\le k(\tau)\mag{x_m(\tau) -
x_{m+1}(\tau)} d\tau$</mathjax>. The function is Lipschitz, so well-defined, and it
has a supremum in this interval. Let <mathjax>$\bar{k}$</mathjax> be the supremum of <mathjax>$k$</mathjax> over
the whole interval <mathjax>$[t_1, t_2]$</mathjax>. This means that we can take this
inequality and rewrite as <mathjax>$\mag{x_{m+1} - x_m} \le \bar{k} \int_{t_0}^t
\mag{x_m(\tau) - x_{m+1}(\tau)} d\tau$</mathjax>. Now we have a bound that relates
the bound between <mathjax>$x_m$</mathjax> and <mathjax>$x_{m+1}$</mathjax>. You can essentially relate the
distance we've just related between two subsequent elements to some further
distance by counting.</p>
<p>Let us do two things: sort out the integral on the right-hand-side, then
look at arbitrary elements beyond an index.</p>
<p>We know that <mathjax>$x_1(t) = x_0 + \int_{t_0}^t f(x_0, \tau) d\tau$</mathjax>, and that <mathjax>$x_1
- x_0 \le \int_{t_0}^{t} \mag{f(x_0, \tau)} d\tau \le \int_{t_1}{t_2}
\mag{f(x_0, \tau) d\tau} \defequals M$</mathjax>. From the above inequalities,
<mathjax>$\mag{x_2 - x_1} \le M \bar{k}\abs{t - t_0}$</mathjax>. Now I can look at general
bounds: <mathjax>$x_3 - x_2 \le \frac{M\bar{k}^2 \abs{t - t_0}^2}{2!}$</mathjax>. In general,
<mathjax>$x_{m+1} - x_m \le \frac{M\parens{\bar{k} \abs{t - t_0}}^m}{m!}$</mathjax>.</p>
<p>If we look at the norm of <mathjax>$\dot{x}$</mathjax>, that is going to be a function
norm. What I've been doing up to now is look at a particular value <mathjax>$t_1 &lt; t
&lt; t_2$</mathjax>.</p>
<p>Try to relate this to the norm <mathjax>$\mag{x_{m+1} - x_m}_\infty$</mathjax>. Can what we've
done so far give us a bound on the difference between two functions? We
can, because the infinity norm of a function is the maximum value that the
function assumes (maximum vector norm for all points <mathjax>$t$</mathjax> in the interval
we're interested in). If we let <mathjax>$T$</mathjax> be the difference between our larger
bound <mathjax>$t_2 - t_1$</mathjax>, we can use the previous result on the pointwise norm,
then a bound on the function norm has to be less than the same
bound, i.e. if a pointwise norm function is less than this bound for all
relevant <mathjax>$t$</mathjax>, then its max value must be less than this bound.</p>
<p>That gets us on the road we want to be, since that now gets us a bound. We
can now go back to where we started. What we're actually interested in is
given an index <mathjax>$m$</mathjax>, we can construct a bound on all later elements in the
sequence.</p>
<p><mathjax>$\mag{x_{m+p} - x_m}_\infty = \mag{x_{m+p} + x_{m+p-1} - x_{m+p-1} + ... -
x_m} = \mag{\sum_{k=0}^{p-1} (x_{m+k+1} - x_{m+k})} \le M \sum_{k=0}^{p-1}
\frac{(\bar{k}T)^{m+k}}{(m+k)!}$</mathjax>.</p>
<p>We're going to recall a few things from undergraduate calculus: Taylor
expansion of the exponential function and <mathjax>$(m+k)! \ge m!k!$</mathjax>.</p>
<p>With these, we can say that <mathjax>$\mag{x_{m+p} - x_m}_\infty \le
M\frac{(\bar{k}T)^m}{m!} e^{\bar{k} T}$</mathjax>. What we'd like to show is that this
can be made arbitrarily small as <mathjax>$m$</mathjax> gets large. We study this bound as <mathjax>$m
\to \infty$</mathjax>, and we recall that we can use the Stirling approximation,
which shows that factorial grows faster than the exponential function. That
is enough to show that <mathjax>$\{x_m\}_0^\infty$</mathjax> is Cauchy. Since it is in a
Banach space (not proving, since beyond our scope), it converges to
something in the space to a function (call it <mathjax>$x^\ell$</mathjax>) in the same
space.</p>
<p>Now we just need to show that the limit <mathjax>$x^\ell$</mathjax> solves the differential
equation (and initial condition). Let's go back to the sequence that
determines <mathjax>$x^\ell$</mathjax>. <mathjax>$x_{m+1} = x_0 + \int_{t_0}^t f(x_m, \tau)
d\tau$</mathjax>. We've proven that this limit converges to <mathjax>$x^\ell$</mathjax>. What we want to
show is that if we evaluate <mathjax>$f(x^\ell, t)$</mathjax>, then <mathjax>$\int_{t_0}^t f(x_m, \tau)
\to \int_{t_0}^t f(x^\ell, \tau) d\tau$</mathjax>. Would be immediate if we had that
the function were continuous. Clear that it satisfies initial condition by
the construction of the sequence, but we need to show that it satisfies the
differential equation. Conceptually, this is probably more difficult than
what we've just done (establishing bounds, Cauchy sequences). Thinking
about what that function limit is and what it means for it to satisfy that
differential equation.</p>
<p>Now, you can basically use some of the machinery we've been using all along
to show this. Difference between these goes to <mathjax>$0$</mathjax> as <mathjax>$m$</mathjax> gets large.</p>
<p><mathjax>$$\mag{\int_{t_0}^t (f(x_m, \tau) f(x^\ell, \tau)) d\tau}
\\ \le \int_{t_0}^t k(\tau) \mag{x_m - x^\ell} d\tau \le \bar{k}\mag{x_m - x^\ell}_\infty T
\\ \le \bar{k} M e^{\bar{k} T} \frac{(\bar{k} T)^m}{m!}T
$$</mathjax></p>
<p>Thus <mathjax>$x^\ell$</mathjax> solves the DE/IC pair. A solution <mathjax>$\Phi$</mathjax> is <mathjax>$x^\ell$</mathjax>,
i.e. <mathjax>$x^\ell(t) = f(x^\ell, t) \forall [t_1, t_2] - D$</mathjax> and <mathjax>$x^\ell(t_0) =
x_0$</mathjax></p>
<p>To show that this solution is unique, we will use the Bellman-Gronwall
lemma, which is very important. Used ubiquitously when you want to show
that functions of time are equal to each other: candidate mechanism to do
that.</p>
<h2>Bellman-Gronwall Lemma</h2>
<p>Let <mathjax>$u, k$</mathjax> be real-valued positive piece-wise continuous functions of time,
and we'll have a constant <mathjax>$c_1 \ge 0$</mathjax> and <mathjax>$t_0 \ge 0$</mathjax>. If we have such
constants and functions, then the following is true: if <mathjax>$u(t) \le c_1 +
\int_{t_0}^t k(\tau)u(\tau) d\tau$</mathjax>, then <mathjax>$u(t) \le c_1 e^{\int_{t_0}^t
k(\tau) d\tau}$</mathjax>.</p>
<h2>Proof (of B-G)</h2>
<p><mathjax>$t &gt; t_0$</mathjax> WLOG.</p>
<p><mathjax>$$U(t) = c_1 + \int_{t_0}^t k(\tau) u(\tau) d\tau
\\ u(t) \le U(t)
\\ u(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau} \le U(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau}
\\ \deriv{}{t}\parens{U(t)e^{\int_{t_0}^t k(\tau) d\tau}} \le 0 \text{(then integrate this derivative, note that U(t_0) = c_1)}
\\ u(t) \le U(t) \le c_1 e^{\int_{t_0}^t k(\tau) d\tau}
$$</mathjax></p>
<h2>Using this to prove uniqueness of DE/IC solutions</h2>
<p>How we're going to use this to prove B-G lemma.</p>
<p>We have a solution that we constructed <mathjax>$\Phi$</mathjax>, and someone else gives us a
solution <mathjax>$\Psi$</mathjax>, constructed via a different method. Show that these must
be equivalent. Since they're both solutions, they have to satisfy the DE/IC
pair. Take the norm of the difference between the differential equations.</p>
<p><mathjax>$$\mag{\Phi - \Psi} \le \bar{k} \int_{t_0}^t \mag{\Phi - \Psi} d\tau \forall
t_0, t \in [t_1, t_2]$$</mathjax></p>
<p>From the Bellman-Gronwall Lemma, we can rewrite this inequality as
<mathjax>$\mag{\Phi - \Psi} \le c_1 e^{\bar{k}(t - t_0)}$</mathjax>. Since <mathjax>$c_1 = 0$</mathjax>, this
norm is less than or equal to 0. By positive definiteness, this norm must
be equal to 0, and so the functions are equal to each other.</p>
<h2>Reverse time differential equation</h2>
<p>We think about time as monotonic (either increasing or decreasing, usually
increasing). Suppose that time is decreasing. <mathjax>$\exists \dot{x} =
f(x,t)$</mathjax>. Going backwards in time, explore existence and uniqueness going
backwards in time. Suppose we had a time variable <mathjax>$\tau$</mathjax> which goes from
<mathjax>$t_0$</mathjax> backwards, and defined <mathjax>$\tau \defequals t_0 - t$</mathjax>. We want to define
the solution to that differential equation backwards in time as <mathjax>$z(\tau) =
x(t)$</mathjax> if <mathjax>$t &lt; t_0$</mathjax>. Derive what reverse order time derivative is. Equation
is just <mathjax>$-f$</mathjax>; we're going to use <mathjax>$\bar{f}$</mathjax> to represent this
function (<mathjax>$\deriv{}{\tau}z = -\deriv{}{t}x = -f(x, t) = -f(z, \tau) =
\bar{f}$</mathjax>).</p>
<p>This equation, if I solve the reverse time differential equation, we'll
have some corresponding backwards solution. Concluding statement: can think
about solutions forwards and backwards in time. Existence of unique
solution forward in time means existence of unique solution backward in
time (and vice versa). You can't have solutions crossing themselves in
time-invariant systems.</p>
<p><a name='9'></a></p>
<h1>Introduction to dynamical systems</h1>
<h2>September 20, 2012</h2>
<p>Suppose we have equations <mathjax>$\dot{x} = f(x, u, t)$</mathjax>, <mathjax>$\fn{f}{\Re^n \times
\Re^n \times \Re_+}{\Re^n}$</mathjax> and <mathjax>$y = h(x, u, t)$</mathjax>, <mathjax>$\fn{h}{\Re^n \times
\Re^n \times \Re_+}{\Re^n}$</mathjax>. We define <mathjax>$n_i$</mathjax> as the dimension of the input
space, <mathjax>$n_o$</mathjax> as dimension of the output space, and <mathjax>$n$</mathjax> as the dimension of
the state space.</p>
<p>We've looked at the form, and if we specify a particular <mathjax>$\bar{u}(t)$</mathjax> over some
time interval of interest, then we can plug this into the right hand side
of this differential equation. Typically we do not supply a particular
input. Thinking about solutions to this differential equation, for now,
let's suppose that it's specified.</p>
<p>Suppose we have some feedback function of the state. If <mathjax>$u$</mathjax> is specified,
as long as <mathjax>$\bar{f}$</mathjax> satisfies the conditions for the existence and
uniqueness theorem, we have a differential equation we can solve.</p>
<p>Another example: instead of differential equation (which corresponds to
continuous time), we have a difference equation (which corresponds to
discrete time).</p>
<p>Example: dynamic system represented by an LRC circuit. One practical way to
define the state <mathjax>$x$</mathjax> is as a vector of elements whose derivatives appear in
our differential equation. Not formal, but practical for this example.</p>
<p>Notions of discretizing.</p>
<h2>What is a dynamical system?</h2>
<p>As discussed in first lecture, we consider time <mathjax>$\tau$</mathjax> to be a privileged
variable. Based on our definition of time, the inputs and outputs are all
functions of time.</p>
<p>Now we're going to define a <strong>dynamical system</strong> as a 5-tuple: <mathjax>$(\mathcal{U},
\Sigma, \mathcal{Y}, s, r)$</mathjax> (input space, state space, output space, state
transition function, output map).</p>
<p>We define the <strong>input space</strong> as the set of input functions over time to an
input set <mathjax>$U$</mathjax> (i.e. <mathjax>$\mathcal{U} = \{\fn{u}{\tau}{U}\}$</mathjax>. Typically, <mathjax>$U =
\Re^{n_i}$</mathjax>).</p>
<p>We also define the <strong>output space</strong> as the set of output functions over time to
an output set <mathjax>$Y$</mathjax> (i.e. <mathjax>$\mathcal{Y} = \{\fn{y}{\tau}{Y}\}$</mathjax>). Typically, <mathjax>$Y
= \Re^{n_o}$</mathjax>.</p>
<p><mathjax>$\Sigma$</mathjax> is our <strong>state space</strong>. Not defined as the function, but the actual
state space. Typically, <mathjax>$\Sigma = \Re^n$</mathjax>, and we can go back and think
about the function <mathjax>$x(t) \in \Sigma$</mathjax>. <mathjax>$\fn{x}{\tau}{\Sigma}$</mathjax> is called the
state trajectory.</p>
<p><mathjax>$s$</mathjax> is called the <strong>state transition function</strong> because it defines how the
state changes in response to time and the initial state and the
input. <mathjax>$\fn{s}{\tau \times \tau \times \Sigma \times U }{\Sigma}$</mathjax>. Usually
we write this as <mathjax>$x(t_1) = s(t_1, t_0, x_0, u)$</mathjax>, where <mathjax>$u$</mathjax> is the function
<mathjax>$u(\cdot) |_{t_0}^{t_1}$</mathjax>. This is important: coming towards how we define
state. Only things you need to get to state at the new time are the initial
state, inputs, and dynamics.</p>
<p>Finally, we have this <strong>output map</strong> (sometimes called the readout map)
<mathjax>$r$</mathjax>. <mathjax>$\fn{r}{\tau \times \Sigma \times U}{Y}$</mathjax>. That is, we can think about
<mathjax>$y(t) = r(t, x(t), u(t))$</mathjax>. There's something fundamentally different
between <mathjax>$r$</mathjax> and <mathjax>$s$</mathjax>. <mathjax>$s$</mathjax> depended on the function <mathjax>$u$</mathjax>, whereas <mathjax>$r$</mathjax> only
depended on the current value of <mathjax>$u$</mathjax> at a particular time.</p>
<p><mathjax>$s$</mathjax> captures dynamics, while <mathjax>$r$</mathjax> is static. Remark: <mathjax>$s$</mathjax> has dynamics
(memory) -- things that depend on previous time, whereas <mathjax>$r$</mathjax> is static:
everything it depends on is at the current time (memoryless).</p>
<p>In order to be a dynamical system, we need to satisfy two axioms: a
dynamical system is a five-tuple with the following two axioms:</p>
<ul>
<li>The <strong>state transition axiom</strong>: <mathjax>$\forall t_1 \ge t_0$</mathjax>, given <mathjax>$u, \tilde{u}$</mathjax>
that are equal to each other over a particular time interval, the state
transition functions must be equal over that interval, i.e. <mathjax>$s(t_1, t_0,
x_0, u) = s(t_1, t_0, x_0, \tilde{u})$</mathjax>. Requires us to not have
dependence on the input outside of the time interval of interest.</li>
<li>The <strong>semigroup axiom</strong>: suppose you start a system at <mathjax>$t_0$</mathjax> and evolve it to
<mathjax>$t_2$</mathjax>, and you're considering the state. You have an input <mathjax>$u$</mathjax> defined
over the whole time interval. If you were to look at an intermediate
point <mathjax>$t_1$</mathjax>, and you computed the state at <mathjax>$t_1$</mathjax> via the state transition
function, we can split our time interval into two intervals, and we can
compute the result any way we like. Stated as the following: <mathjax>$s(t_2, t_1,
s(t_1, t_0, x_0, u), u) = s(t_2, t_0, x_0, u)$</mathjax>.</li>
</ul>
<p>When we talk about a dynamical system, we have to satisfy these axioms.</p>
<h2>Response function</h2>
<p>Since we're interested in the outputs and not the states, we can define
what we call the <strong>response map</strong>. It's not considered part of the definition
of a dynamical system because it can be easily derived.</p>
<p>It's the composition of the state transition function and the readout map,
i.e. <mathjax>$y(t) = r(t, x(t), u(t)) = r(t, s(t, t_0, x_0, u), u(t)) \defequals
\rho(t, t_0, x_0, u)$</mathjax>. This is an important function because it is used to
define properties of a dynamical system. Why is that? We've said that
states are somehow mysterious. Not something we typically care about:
typically we care about the outputs. Thus we define properties like
linearity and time invariance.</p>
<h2>Time Invariance</h2>
<p>We define a time-shift operator <mathjax>$\fn{T_\tau}{\mathcal{U}}{\mathcal{U}}$</mathjax>,
<mathjax>$\fn{T_\tau}{\mathcal{Y}}{\mathcal{Y}}$</mathjax>. <mathjax>$(T_\tau u)(t) \defequals u(t -
\tau)$</mathjax>. Namely, the value of <mathjax>$T_\tau u$</mathjax> is that of the old signal at
<mathjax>$t-\tau$</mathjax>.</p>
<p>A <strong>time-invariant</strong> (dynamical) system is one in which the input space and
output space are closed under <mathjax>$T_\tau$</mathjax> for all <mathjax>$\tau$</mathjax>, and <mathjax>$\rho(t, t_0,
x_0, u) = \rho(t + \tau, t_0 + \tau, x_0, T_\tau u)$</mathjax>.</p>
<h2>Linearity</h2>
<p>A <strong>linear</strong> dynamical system is one in which the input, state, and output
spaces are all linear spaces over the same field <mathjax>$\mathbb{F}$</mathjax>, and the
response map <mathjax>$\rho$</mathjax> is a linear map of <mathjax>$\Sigma \times \mathcal{U}$</mathjax> into
<mathjax>$\mathcal{Y}$</mathjax>.</p>
<p>This is a strict requirement: you have to check that the response map
satisfies these conditions. Question that comes up: why do we define
linearity of a dynamical system in terms of linearity of the response and
not the state transition function? Goes back to a system being
intrinsically defined by its inputs and outputs. Often states, you can have
many different ways to define states. Typically we can't see all of
them. It's accepted that when we talk about a system and think about its
I/O relations, it makes sense that we define linearity in terms of this
memory function of the system, as opposed to the state transition function.</p>
<p>Let's just say a few remarks about this: <strong>zero-input response</strong>,
<strong>zero-state response</strong>. If we look at the zero element in our spaces (so
we have a zero vector), then we can take our superposition, which implies
that the response at time <mathjax>$t$</mathjax> is equal to the zero-state response, which is
the response, given that we started at the zero state, plus the zero input
response.</p>
<p>That is: <mathjax>$\rho(t, t_0, x_0, u) = \rho(t, t_0, \theta_x, u) + \rho(t, t_0,
x_0, \theta_u)$</mathjax> (from the definition of linearity).</p>
<p>The second remark is that the zero-state response is linear in the input,
and similarly, the zero-input response is linear in the state.</p>
<p>One more property of dynamical systems before we finish: <strong>equivalence</strong> (a
property derived from the definition). Take two dynamical systems <mathjax>$D = (U,
\Sigma, Y, s, r), \tilde{D} = (U, \bar{\Sigma}, Y, \bar{s}, \bar{r})$</mathjax>. <mathjax>$x_0
\in D$</mathjax> is equivalent to <mathjax>$\tilde{x_0} \in \tilde{D}$</mathjax> at <mathjax>$t_0$</mathjax>. If <mathjax>$\forall t
\ge t_0, \rho(t, t_0, x_0, u) = \tilde{\rho}(t, t_0, \tilde{x_0}, u)$</mathjax>
<mathjax>$\forall x$</mathjax> and some <mathjax>$\tilde{x}$</mathjax>, the two systems are equivalent.</p>
<p><a name='10'></a></p>
<h1>Linear time-varying systems</h1>
<h2>September 25, 2012</h2>
<p>Recall the state transition function is given some function of the current
time with initial state, initial time, and inputs, Suppose you have a
differential equation; how do you acquire the state transition function?
Solve the differential equation.</p>
<p>For a general dynamical system, there are different ways to get the state
transition function. This is an instantiation of a dynamical system, and
we're going to ge thte state transition function by solving the
differential equation / initial condition pair. </p>
<p>We're going to call <mathjax>$\dot{x}(t) = A(t)x(t) + B(t)u(t)$</mathjax> a vector
differential equation with initial condition <mathjax>$x(t_0) = x_0$</mathjax>.</p>
<p>So that requires us to think about solving that differential equation. Do a
dimension check, to make sure we know the dimensions of the matrices. <mathjax>$x
\in \Re^n$</mathjax>, so <mathjax>$A \in \Re^{n_0 \times n}$</mathjax>. We could define the matrix
function <mathjax>$A$</mathjax>, which takes intervals of the real line and maps them over to
matrices. As a function, <mathjax>$A$</mathjax> is piecewise continuous matrix function in
time.</p>
<p>The entries are piecewise-continuous scalars in time. We would like to get
at the state transition function; to do that, we need to solve the
differential equation.</p>
<p>Let's assume for now that <mathjax>$A, B, U$</mathjax> are given (part of the system
definition).</p>
<p>Piece-wise continuous is trivial; we can use the induced norm of <mathjax>$A$</mathjax> for a
Lipschitz condition. Since this induced norm is piecewise-continuous in
time, this is a fine bound. Therefore <mathjax>$f$</mathjax> is globally Lipschitz continuous.</p>
<p>We're going to back off for a bit and introduce the state transition
matrix. Background for solving the VDE. We're going to introduce a matrix
differential equation, <mathjax>$\dot{X} = A(t) X$</mathjax> (where <mathjax>$A(t)$</mathjax> is same as before).</p>
<p>I'm going to define <mathjax>$\Phi(t, t_0)$</mathjax> as the solution to the matrix
differential equation (MDE) for the initial condition <mathjax>$\Phi(t_0, t_0) =
1_{n \times n}$</mathjax>. I'm going to define <mathjax>$\Phi$</mathjax> as the solution to the <mathjax>$n
\times n$</mathjax> matrix when my differential equation starts out in the identity
matrix.</p>
<p>Let's first talk about properties of this matrix <mathjax>$\Phi$</mathjax> just from the
definition we have.</p>
<ul>
<li>If you go back to the vector differential equation, and let's just drop
the term that depends on <mathjax>$u$</mathjax> (either consider <mathjax>$B$</mathjax> to be 0, or the input
to be 0), the solution of <mathjax>$\cdot{x} = A(t)x(t)$</mathjax> is given by <mathjax>$x(t) =
\Phi(t, t_0)x_0$</mathjax>.</li>
<li>This is what we call the semigroup property, since it's reminiscent of
the semigroup axiom. <mathjax>$\Phi(t, t_0) = \Phi(t, t_1) \Phi(t_1, t_0) \forall
t, t_0, t_1 \in \Re^+$</mathjax></li>
<li><mathjax>$\Phi^{-1}(t, t_0) = \Phi(t_0, t)$</mathjax>.</li>
<li><mathjax>$\text{det} \Phi(t, t_0) = \exp\parens{\int_{t_0}^t \text{tr} \parens{A
(\tau)} d\tau}$</mathjax>.</li>
</ul>
<p>Here's let's talk about some machinery we can now invoke when
we want to show that two functions of time are equal to each other when
they're both solutions to the differential equation. You can simply show by
the existence and uniqueness theorem (assuming it applies) that they
satisfy the same initial condition and the same differential
equation. That's an important point, and we tend to use it a lot.</p>
<p>(i.e. when faced with showing that two functions of time are equal to each
other, you can show that they both satisfy the same initial condition and
the same differential equation [as long as the differential equation
satisfies the hypotheses of the existence and uniqueness theorem])</p>
<p>Obvious, but good to state.</p>
<p>Note: the initial condition doesn't have to be the initial condition given;
it just has to hold at one point in the interval. Pick your point in time
judiciously.</p>
<p>Proof of (2): check <mathjax>$t=t_1$</mathjax>. (3) follows directly from (2). (4) you can
look at if you want. Gives you a way to compute <mathjax>$\Phi(t, t_0)$</mathjax>. We've
introduced a matrix differential equation and an abstract solution.</p>
<p>Consider (1). <mathjax>$\Phi(t, t_0)$</mathjax> is a map that takes the initial state and
transitions to the new state. Thus we call <mathjax>$\Phi$</mathjax> the <strong>state transition
matrix</strong> because of what it does to the states of this vector differential
equation: it transfers them from their initial value to their final value,
and it transfers them through matrix multiplication.</p>
<p>Let's go back to the original differential equation. Claim that the
solution to that differential equation has the following form: <mathjax>$x(t) =
\Phi(t, t_0)x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau)u(\tau) d\tau$</mathjax>. Proof:
we can use the same machinery. If someone gives you a candidate solution,
you can easily show that it is the solution.</p>
<p>Recall the Leibniz rule, which we'll state in general as follows:
<mathjax>$\pderiv{}{z} \int_{a(z)}^{b(z)} f(x, z) dx = \int_{a(z)}^{b(z)}
\pderiv{}{x}f(x, z) dx + \pderiv{b}{z} f(b, z) - \pderiv{a}{z} f(a, z)$</mathjax>.</p>
<p><mathjax>$$
\dot{x}(t) = A(t) \Phi(t, t_0) x_0 + \int_{t_0}^t
\pderiv{}{t} \parens{\Phi(t, \tau)B(\tau)u(\tau)} d\tau +
\pderiv{t}{t}\parens{\Phi(t, t)B(t)u(t)} - \pderiv{t_0}{t}\parens{...}
\\ = A(t)\Phi(t, t_0)x_0 + \int_{t_0}^t A(t)\Phi(t,\tau)B(\tau)u(\tau)d\tau + B(t)u(t)
\\ = A(\tau)\Phi(t, t_0) x_0 + A(t)\int_{t_0}^t \Phi(t, \tau)B(\tau)
u(\tau) d\tau + B(t) u(t)
\\ = A(\tau)\parens{\Phi(t, t_0) x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau)
u(\tau) d\tau} + B(t) u(t)
$$</mathjax></p>
<p><mathjax>$x(t) = \Phi(t,t_0)x_0 + \int_{t_0}^t \Phi(t,\tau)B(\tau)u(\tau) d\tau$</mathjax> is
good to remember.</p>
<p>Not surprisingly, it depends on the input function over an interval of
time.</p>
<p>The differential equation is changing over time, therefore the system
itself is time-varying. No way in general that will be time-invariant,
since the equation that defines its evolution is changing. You test
time-invariance or time variance through the response map. But is it
linear? You have the state transition function, so we can compute the
response function (recall: readout map composed with the state transition
function) and ask if this is a linear map.</p>
<p><a name='11'></a></p>
<h1>Linear time-Invariant systems</h1>
<h2>September 27, 2012</h2>
<p>Last time, we talked about the time-varying differential equation, and we
expressed <mathjax>$R(\cdot) = \bracks{A(\cdot), B(\cdot), C(\cdot),
D(\cdot)}$</mathjax>. Used state transition matrix to show that the solution was
given by <mathjax>$x(t) = \Phi(t, t_0) x_0 + \int_{t_0}^t B(\tau) u(\tau)
d\tau$</mathjax>. Integral part is the state transition matrix, and we haven't
talked about how we would compute this matrix. In general, computing the
state transition matrix is hard. But there's one important class where
computing that class becomes much simpler than usual. That is where the
system does not depend on time.</p>
<p>Linear time-invariant case: <mathjax>$\dot{x} = Ax + Bu, y = Cx + Du, x(t_0) =
x_0$</mathjax>. Does not matter at what time we start. Typically, WLOG, we use <mathjax>$t_0 =
0$</mathjax> (we can't do this in the time-varying case).</p>
<h2>Aside: Jacobian linearization</h2>
<p>In practice, generally the case that someone
doesn't present you with a model that looks like this. Usually, you derive
this (usually nonlinear) model through physics and whatnot. What can I do
to come up with a linear representation of that system? What is typically
done is an approximation technique called Jacobian linearization.</p>
<p>So suppose someone gives you a nonlinear system and an output equation,
and you want to come up with some linear representation of the system.</p>
<p>Two points of view: we could look at the system, and suppose we applied a
particular input to the system and solve the differential equation
(<mathjax>$u^0(t) \mapsto x^0(t)$</mathjax>, the <strong>nominal input</strong> and <strong>nominal
solution</strong>). That would result in a solution (<strong>state trajectory</strong>, in
general). Now suppose that we for some reason want to perturb that input
(<mathjax>$u^0(t) + \delta u(t)$</mathjax>, the <strong>perturbed input</strong>). Suppose in general
that <mathjax>$\delta u$</mathjax> is a small perturbation. What this results in is a new
state trajectory, that we'll define as <mathjax>$x^0(t) + \delta x(t)$</mathjax>, the
<strong>perturbed solution</strong>.</p>
<p>Now we can derive from that what we call the Jacobian linearization. That
tells us that if we apply the input, the solution will be <mathjax>$x^0 =
f(x^0, u^0, t)$</mathjax>, and I also have that <mathjax>$x^0(t_0) = x_0$</mathjax>.</p>
<p><mathjax>$\dot{x}^0 + \dot{\delta}x = f(x^0 + \delta x, u^0 + \delta u, t)$</mathjax>, where
<mathjax>$(x^0 + \delta x)(t_0) = x_0 + \delta x_0$</mathjax>. Now I'm going to look at these
two and perform a Taylor expansion about the nominal input and
solution. Thus <mathjax>$f(x^0 + \delta x, u^0 + \delta u, t) = f(x^0, u^0, t) +
\pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)}\delta x +
\pderiv{}{u}f(x,u,t)\vert_{(x^0, u^0)} \delta u + \text{higher order
terms}$</mathjax> (recall that we also called <mathjax>$\pderiv{}{x}$</mathjax> <mathjax>$D_1$</mathjax>, i.e. the
derivative with respect to the first argument).</p>
<p>What I've done is expanded the right hand side of the differential
equation. Thus <mathjax>$\delta x = \pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)} \delta
x + \pderiv{}{u} f(...)\vert_{(x^0, y^0)}\delta u + ...$</mathjax>. If <mathjax>$\delta u,
\delta x$</mathjax> small, then we can assume that they are approximately zero, which
gives us an approximate first-order linear differential equation. This
gives us a linear time-varying approximation of the dynamics of this
perturbation vector, in response to a perturbation input. That's what the
Jacobian linearization gives you: the perturbation away from the nominal
(we linearized about a bias point).</p>
<p>Consider A(t) to be the Jacobian matrix with respect to x, and B(t) to be
the Jacobian matrix with respect to u. Remember that this is an
approximation, and if your system is really nonlinear, and you perturb the
system a lot (stray too far from the bias point), then this linearization
may cease to hold.</p>
<h2>Linear time-invariant systems</h2>
<p>Motivated by the fact that we have a solution to the time-varying equation,
it depends on the state transition matrix, which right now is an abstract
thing which we don't have a way of solving. Let's go to a more specific
class of systems: that where <mathjax>$A, B, C, D$</mathjax> do not depend on time. We know
that this system is linear (we don't know yet that it is time-invariant; we
have to find the response function and show that it satisfies the
definition of a time-invariant system), so this still requires proof.</p>
<p>Since these don't depend on time, we can use some familiar tools
(e.g. Laplace transforms) and remember what taking the Laplace transform of
a derivative is. Denote <mathjax>$\hat{x}(s)$</mathjax> to be the Laplace transform of
<mathjax>$x(t)$</mathjax>. The Laplace transform is therefore <mathjax>$s\hat{x}(s) - x_0 = A\hat{x}(s)
+ B\hat{u}(s)$</mathjax>; <mathjax>$s\hat{y}(s) - y_0 = C\hat{x}(s) + D\hat{u}(s)$</mathjax>. The first
equation becomes <mathjax>$(sI - A)\hat{x}(s) = x_0 + B\hat{u}(s)$</mathjax>, and we'll leave
the second equation alone.</p>
<p>Let's first consider <mathjax>$\hat{x} = Ax$</mathjax>, <mathjax>$x(0) = x_0$</mathjax>. I could have done the
same thing, except my right hand side doesn't depend on B: <mathjax>$(sI -
A)\hat{x}(s) = x_0$</mathjax>. Let's leave that for a second and come back to it, and
make the following claim: the state transition matrix for <mathjax>$\hat{x} = Ax,
x(t_0) = x_0$</mathjax> is <mathjax>$\Phi(t,t_0) = e^{A(t-t_0)}$</mathjax>, which is called the <strong>matrix
exponential</strong>, defined as <mathjax>$e^{A(t-t_0)} = I + A(t-t_0) + \frac{A^2(t-t_0)^2}{2!}
+ ...$</mathjax> (Taylor expansion of the exponential function).</p>
<p>We just need to show that the state transition matrix, using definitions we
had last day, is indeed the state transition matrix for that system. We
could go back to the definition of the state transition matrix for the
system, or we could go back to the state transition function for the vector
differential equation.</p>
<p>From last time, we know that the solution to <mathjax>$\dot{x}A(t)x, x(t_0) = x_0$</mathjax>
is given by <mathjax>$x(t) = \Phi(t, t_0)x_0$</mathjax>; here, we are claiming then that <mathjax>$x(t)
= e^{A(t - t_0)} x_0$</mathjax>, where <mathjax>$x(t)$</mathjax> is the solution to <mathjax>$\dot{x} = Ax$</mathjax> with
initial condition <mathjax>$x_0$</mathjax>.</p>
<p>First show that it satisfies the vector differential equation: <mathjax>$\dot{x} =
\pderiv{}{t}\exp\parens{A(t-t_0)} x_0 = (0 + A + A^2(t - t_0 + ...)x_0 =
A(I + A(t-t_0) + \frac{A^2}{2}(t-t_0)^2 + ...) x_0 = Ae^{At} x_0 = Ax(t)$</mathjax>,
so it satisfies the differential equation. Checking the initial condition,
we get <mathjax>$e^{A \cdot 0}x_0 = I x_0 = x_0$</mathjax>. We've proven that this represents
the solution to this time-invariant differential equation. By the existence
and uniqueness theorem, this is the same solution.</p>
<p>Through this proof, we've shown a couple of things: the derivative of the
matrix exponential, and we evaluated it at <mathjax>$t-t_0=0$</mathjax>. So now let's go back
and reconsider its infinite series representation and classify some of its
other properties.</p>
<h2>Properties of the matrix exponential</h2>
<ul>
<li><mathjax>$e^0 = I$</mathjax></li>
<li><mathjax>$e^{A(t+s)} = e^{At}e^{As}$</mathjax></li>
<li><mathjax>$e^{(A+B)t} = e^{At}e^{Bt}$</mathjax> iff <mathjax>$\comm{A}{B} = 0$</mathjax>.</li>
<li><mathjax>$\parens{e^{At}}^{-1} = e^{-At}$</mathjax>, and these properties hold in general if
you're looking at <mathjax>$t$</mathjax> or <mathjax>$t - t_0$</mathjax>.</li>
<li><mathjax>$\deriv{e^{At}}{t} = Ae^{At} = e^{At}A$</mathjax> (i.e. <mathjax>$\comm{e^At}{A} = 0$</mathjax>)</li>
<li>Suppose <mathjax>$X(t) \in \Re^{n \times n}$</mathjax>, <mathjax>$\dot{X} = AX, X(0) = I$</mathjax>, then the
solution of this matrix differential equation and initial condition pair
is given by <mathjax>$X(t) = e^{At}$</mathjax>. Proof in the notes; very similar to what we
just did (more general proof, that the state transition matrix is just
given by the matrix exponential).</li>
</ul>
<h2>Calculating <mathjax>$e^{At}$</mathjax>, given <mathjax>$A$</mathjax></h2>
<p>What this is now useful for is making more concrete this state transition
concept. Still a little abstract, since we're still considering the
exponential of a matrix.</p>
<p>The first point is that using the infinite series representation to compute
<mathjax>$e^{At}$</mathjax> is in general hard.</p>
<p>Would be doable if you knew <mathjax>$A$</mathjax> were nilpotent (<mathjax>$A^k = 0$</mathjax> for some <mathjax>$k \in
\mathbb{Z}$</mathjax>), but it's not always feasible. Would not be feasible if <mathjax>$k$</mathjax>
large.</p>
<p>The way one usually computes the state transition matrix <mathjax>$e^{At}$</mathjax> is as
follows:</p>
<p>Recall: <mathjax>$\dot{X}(t) = AX(t)$</mathjax>, with <mathjax>$X(0) = I$</mathjax>. We know from what we've done
before (property 6) that we can easily prove <mathjax>$X(t) = e^{At}$</mathjax>. We also know
that <mathjax>$(sI - A)\hat{X}(s) = I$</mathjax>, so <mathjax>$\hat{X}(s) = (sI - A)^{-1}$</mathjax>. That tells
me that <mathjax>$e^{At} = \mathcal{L}^{-1}\parens{(sI - A)^{-1}}$</mathjax>. That gives us a
way of computing <mathjax>$e^{At}$</mathjax>, assuming we have a way to compute a matrix's
inverse and an inverse Laplace transform. This is what people usually do,
and most algorithms approach the problem this way. Generally hard to
compute the inverse and the inverse Laplace transform.</p>
<p>Requires proof regarding why <mathjax>$sI - A$</mathjax> always has an inverse given by
<mathjax>$e^{-At}$</mathjax>.</p>
<p>Clive Moller started LINPACK (Linear algebra package; engine behind
MATLAB). Famous in computational linear algebra. Paper: 19 dubious ways to
compute the matrix exponential. Actually a hard problem in
general. Factoring of <mathjax>$n$</mathjax>-degree polynomials.</p>
<p>If we were to consider our simple nilpotent case, we'll compute <mathjax>$sI - A =
\begin{bmatrix}s &amp; -1 \\ 0 &amp; s\end{bmatrix}$</mathjax>. We can immediately write down
its inverse as <mathjax>$\begin{bmatrix}\frac{1}{s} &amp; \frac{1}{s^2} \\ 0 &amp;
\frac{1}{s}\end{bmatrix}$</mathjax>. Inverse Laplace transform takes no work; it's
simply <mathjax>$\begin{bmatrix}1 &amp; t \\ 0 &amp; 1\end{bmatrix}$</mathjax>.</p>
<p>In the next lecture (and next series of lectures) we will be talking about
the Jordan form of a matrix. We have a way to compute <mathjax>$e^{At}$</mathjax>. We'll write
<mathjax>$A = TJT^{-1}$</mathjax>. In its simplest case, it's diagonal. Either way, all of the
work is in exponentiating <mathjax>$J$</mathjax>. You still end up doing something that's the
inverse Laplace transform of <mathjax>$sI - J$</mathjax>.</p>
<p>We've shown that for a linear TI system, <mathjax>$\dot{x} = Ax + Bu$</mathjax>; <mathjax>$y = Cx + Du$</mathjax>
(<mathjax>$x(0) = x_0$</mathjax>). <mathjax>$x(t) = e^{At}x_0 + \int_0^t e^{A(t-\tau)} Bu(\tau)
d\tau$</mathjax>. We proved it last time, but you can check this satisfies the
differential equation and initial condition.</p>
<p>From that, you can compute the response function and show that it's
time-invariant. Let's conclude today's class with a planar inverted
pendulum. Let's call the angle of rotation away from the vertical <mathjax>$\theta$</mathjax>,
mass <mathjax>$m$</mathjax>, length <mathjax>$\ell$</mathjax>, and torque <mathjax>$\tau$</mathjax>. Equations of motion: <mathjax>$m\ell^2
\ddot{\theta} - mg\ell \sin \theta = \tau$</mathjax>. Perform Jacobian
linearization; we'll define <mathjax>$\theta = 0$</mathjax> at <mathjax>$\pi/2$</mathjax>, and we're linearizing
about the trivial trajectory that the pendulum is straight up. Therefore
<mathjax>$\delta \theta = \theta \implies m\ell^2 \ddot{\theta} + mg\ell\theta
= \tau$</mathjax>, where <mathjax>$u = \frac{\tau}{m\ell^2}$</mathjax>, and <mathjax>$\Omega^2 = \frac{g}{\ell}$</mathjax>,
<mathjax>$\dot{x}_1 = x_2$</mathjax>, and <mathjax>$\dot{x}_2 = \Omega^2 x_1 + u$</mathjax>.</p>
<p><mathjax>$y = \theta - x_1, \dot{x}_1 = x_2, \dot{x}_2 = \Omega^2 x_1 + u, y =
x_1$</mathjax>. Stabilization of system via feedback by considering poles of Laplace
transform, etc. <mathjax>$\frac{\hat{y}}{\hat{u}} = \frac{1}{s^2 - \Omega^2} =
G(s)$</mathjax> (the plant).</p>
<p>In general, not a good idea: canceling unstable pole, and then using
feedback. In the notes, this is some controller <mathjax>$K(s)$</mathjax>. If we look at the
open-loop transfer function (<mathjax>$K(s)G(s) = \frac{1}{s(s+\Omega)}$</mathjax>), <mathjax>$u =
\frac{s-\Omega}{s}\bar{u}$</mathjax>, so <mathjax>$\dot{u} = \dot{\bar{u}} - \Omega\bar{u}$</mathjax>
(assume zero initial conditions on <mathjax>$u, \bar{u}$</mathjax>). If we define a third
state variable now, <mathjax>$x_3 = u - \bar{u}$</mathjax>, then that tells us that <mathjax>$\dot{x}_3
= \Omega \bar{u}$</mathjax>. Here, I have <mathjax>$A = \begin{bmatrix} 0 &amp; 1 &amp; 0 \\ \Omega^2
&amp; 0 &amp; -1 \\ 0 &amp; 0 &amp; 0 \end{bmatrix}$</mathjax>, <mathjax>$B = \begin{bmatrix}0 \\ 1 \\
\Omega\end{bmatrix}$</mathjax>, <mathjax>$C = \begin{bmatrix}1 &amp; 0 &amp; 0\end{bmatrix}$</mathjax>, <mathjax>$D =
0$</mathjax>. Out of time today, but we'll solve at the beginning of Tuesday's class.</p>
<p>Solve for <mathjax>$x(t) = \begin{bmatrix}x_1, x_2, x_3\end{bmatrix}$</mathjax>. We have a few
approaches:</p>
<ul>
<li>Using <mathjax>$A,B,C,D$</mathjax>: compute the following: <mathjax>$y(t) = Ce^{At} x_0 + C\int_0^t
e^{A(t - \tau)}Bu(\tau) d\tau$</mathjax>. In doing that, we'll need to compute
<mathjax>$e^{At}$</mathjax>, and then we have this expression for general <mathjax>$u$</mathjax>: suppose you
supply a step input.</li>
<li>Suppose <mathjax>$\bar{u} = -y = -Cx$</mathjax>. Therefore <mathjax>$\dot{x} = Ax + B(-Cx) = (A -
BC)x$</mathjax>. We have a new <mathjax>$A_{CL} = A - BC$</mathjax>, and we can exponentiate this
instead.</li>
</ul>
<p>Foreshadows later, when we think about control. Introduces this standard
notion of feedback for stabilizing systems. Using newfound knowledge of
state transition matrix for TI systems (how to compute it), see how to
compute. See what MATLAB is doing.</p></div><div class='pos'></div>
<script src='mathjax/unpacked/MathJax.js?config=default'></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () {
var TEX = MathJax.InputJax.TeX;
var PREFILTER = TEX.prefilterMath;
TEX.Augment({
prefilterMath: function (math,displaymode,script) {
math = "\\displaystyle{"+math+"}";
return PREFILTER.call(TEX,math,displaymode,script);
}
});
});
var a = document.getElementsByTagName('a'),
ll = a.length;
if (ll > 0) {
var div = document.getElementsByClassName('pos')[0];
div.style.float = 'right';
div.style.position = 'fixed';
div.style.background = '#FFF';
div.style.fontSize = '90%';
div.style.top = '10%';
div.style.right = '5%';
div.style.width = '15%';
for (var i = 0; i < ll; i++) {
div.innerHTML += '<a href="\#' + a[i].name + '">'
+ a[i].parentElement.nextElementSibling
.nextElementSibling.innerHTML
+ '</a><br />';
}
var div = document.getElementsByClassName('wrapper')[0];
div.style.width = '80%';
}
</script></div>
Jump to Line
Something went wrong with that request. Please try again.