# steveWang/Notes

### Subversion checkout URL

You can clone with HTTPS or Subversion.

Fetching contributors…

Cannot retrieve contributors at this time

1471 lines (1469 sloc) 104.08 kb


EE 221A: Linear System Theory

August 23, 2012

Prof. Claire Tomlin (tomlin@eecs). 721 Sutardja Dai Hall. Somewhat tentative office hours on schedule: T 1-2, W 11-12. http://inst.eecs.berkeley.edu/~ee221a

GSI: Insoon Yang (iyang@eecs). In Soon's office hours: M 1:30 - 2:30, θ 11-12.

Homeworks typically due on Thursday or Friday.

Intro

Bird's eye view of modeling in engineering + design vs. in science.

"Science":

$$\mbox{Dynamical system} \rightarrow \mbox{experiments} \leftrightarrow \mbox{Model}$$

"Engineering":

$$\mbox{dynamical system} \rightarrow \mbox{experiments} \leftrightarrow \mbox{Model} \rightarrow \mbox{control}$$

Control validation, verification, testing.

Broad brush of a couple of concepts: modeling. We're going to spend a lot of time talking about modeling in this course.

First: for any dynamical system, there's an infinite number of models you could design, depending on your level of abstraction. Typically, you choose level of abstraction based on use case. Often, only able to use certain kinds of experiments. e.g. probing of protein concentration levels. If you're able to measure just this, then the signals in your model should have something to do with these concentration levels.

As we said, same phys system can have many different models. Another example: MEMS device. Can think about having models at various different models. e.g. electrical model: silicon / electrostatics of the system. Might be interested in manipulation of the device.

Alt: mechanical model (could have a free-body diagram, e.g.).

Another example: Hubble telescope. Could think of orbital dynamics. Individual rigid body dynamics. Or properties of the telescope, the individual optical models of the mirrors and their interactions. The idea here is to just realize that the word model can mean very different things. The logical model to use depends on the task at hand. The main point: a basic tenet of engineering: the value of a model: choose the simplest model that answers the questions you are asking about the system. The simplest model that will allow you to predict something that you didn't build into it.

Predict IO relations that you didn't explicitly design the model on. One of the properties of a good linear model for a system: it obeys linearity, so if you form a basis for your domain, then you have the system response to any input spanned by this basis. Probably the most important thing to take away from this course: linearity is a very strong principle that allows us to build up a set of tools.

Time

We have this term a "dynamical system". A key part is that it changes with time, responding with behavior over time. Time will turn out to be quite important. Depending on how we model time, we can come up with different variables. We call time (t) a privileged model because it has certain properties. Namely, when we think about time, we think about time marching forward (unidirectionality of evolution). Different models: continuous time ($t \in \Re$, could be negative, could go backwards, if we are interested in backwards evolution), or discrete time $t \in \{nT, n \in \mathbb{Z}\}$, where $T$ is some sampling time. So in that sense, discrete time, we have some set. We can also come up with more complicated models of time, like discrete-time asynchronous. The previous model was some constant period $T$. In DT asynchronous, we just have a set of points in time. Now becoming a more important model now with asynchronous processes (react to events that are going to happen at previously undefined points in time).

Linear vs. nonlinear models

More on this later. Suppose we could take the system, and we could represent it as being in one of a number of states.

First: suppose a finite number of states (so can be modeled by a FSM), which represent some configuration of the system. State space represents states system can be in at any point in time. If state space is finite, we can use a finite-state automaton. Each state has an output (prints out a message, or a measurement is taken), and we also consider inputs. The inputs are used to evolve the dynamic system. Input affects a transition. We can build up the dynamics of the system by just defining the transition function.

Packet transmitting node: first state is "ready-to-send"; second state is "send packet & wait"; and the third state is "flush buffer". If buffer empty, stay in $q_1$. If not empty, transitions to $q_2$. If ACK received, then transition to $q_3$ and return to $q_1$. If $T$ time units elapse, we time out and transition directly to $q_1$. Here, no notion of linear or nonlinear systems. To be able to talk about linear or nonlinear models, we need to be able to put some vector space structure on these three elements. System must then satisfy superposition.

Back to abstract dynamical system (thing we could never hope to model perfectly): rather than thinking about a set of rules, we're going to think about a mathematical model. Three classes: CT, DT [synchronous], and discrete-state (typically finite). Within each of these classes we can further break each down. For the first two, we can consider linearity, and we can further break these down into time-varying (TV) and time-invariant (TI). This course is going to focus just on the linear systems in continuous and discrete time, both time-varying and time-invariant. We'll use differential equation models in continuous time and difference equation models in discrete time. We usually develop in continuous-time and show analogies in discrete-time.

Analysis and Control

Control is pervasive. If you go to any of the control conferences, you see areas where techniques from this course are applied. Modern control came about because of aerospace in the 50s. e.g. autopilot, air traffic control. There the system itself is the system of aircraft. Chemical process control. Mechatronics, MEMS, robotics. Novel ways to automate things that hadn't been automated previously, mostly because of a renaissance in sensing. Power systems. Network control systems: how you combine models of the system itself with the control models. Quantum chemistry. Typically, when we think about state spaces, we think about the state as a vector in $\Re^n$. In many cases, you want to think about the state spaces as more complicated (e.g. $C^\infty$, the class of smooth functions).

Difference between verification, simulation, and validation

One of the additional basic tenets of this course: if you have a model of the system, and you can analytically verify that the model behaves in given ways for ranges of initial conditions, then that is a very valuable thing to have: you have a proof that as long as the system adheres to the model, then your model will work as expected. Simulation gives you system behavior for a certain set of parameters. Very different, but they complement each other. Analyze simpler models, simulate more complex models.

Linear Algebra

Functions and their properties.

Fields, vector spaces, properties and subspaces.

(note regarding notation: $\Re^+$ means non-negative reals, as does $\mathbb{C}_+$ (non-negative real part)

$\exists!$: exists a unique, $\exists?$: does there exist, $\ni$: such that.

Cartesian product: $\{(x,y) \vert x \in X \land y \in Y\}$ (set of ordered n-tuples)

Functions and Vector Spaces

August 28, 2012

OH: M/W 5-6, 258 Cory

Today: beginning of the course: review of lin. alg topics needed for the course. We're going to go through lecture notes 2 and probably start on the third sets of notes. Will bring copies of 3 and 4 on Thursday.

We did an introduction to notation and topics last time. First topic: functions, which will be used synonymously with "maps". Terminology will be used interchangeably.

Given two sets of elements X, Y, we defined $\fn{f}{X}{Y}$. Notion of range vs. codomain (range is merely the subset of the codomain covered by f). We define $f(X) \defequals \set{f(x)}{x \in X}$ to be the range.

Properties of functions

Injectivity of functions ("one-to-one"). A function $f$ is said to be injective iff the function maps each x in X to a distinct y in Y. Equivalently, $f(x_1) = f(x_2) \iff x_1 = x_2$. This is also equivalent to $x_1 \neq x_2 \iff f(x_1) \neq f(x_2)$.

Surjectivity of functions ("onto"). A function $f$ is said to be surjective if the codomain is equal to the range. Basically, the map $f$ covers the entire codomain. A way to write this formally is that $f$ is surjective iff $\forall y \in Y \exists x \in X \ni y = f(x)$.

And then a map $f$ is bijective iff it is both injective and surjective. We can write this formally as there being a unique $x \in X$ forall $y \in Y$.

Example: inverse of a map. We can talk about left and right inverses of maps. Suppose we have a map $\fn{f}{X}{Y}$. We're going to define this map $\mathbb{1}_X$ as the identity map on X. Namely, application of this map to any $x \in X$ will yield the same $x$.

The left inverse of $f$ is $\fn{g_L}{Y}{X}$ such that $g_L \circ f = \mathbb{1}_X$. In other words, $\forall x\in X, (g_L \circ f)(x) = x$.

Prove: $f$ has a left inverse $g_L$ iff $f$ is injective. First of all, let us prove the backwards implication. Assume $f$ is injective. Prove that $g_L$ exists. We're going to construct the map $\fn{g_L}{Y}{X}$ as $g_L(f(x)) = x$, where the domain here is the range of $f$. In order for this to be a well-defined function, we require that $x$ is unique, which is met by injectivity of $f$.

Now let us prove the forward implication. Assume that this left inverse $g_L$ exists. By definition, $g_L \circ f = \mathbb{1}_x \iff \forall x \in X g_L(f(x)) = x$. If $f$ were not injective, then $g_L$ would not be well-defined ($\exists x_1 \neq x_2$ such that $f(x_1) = f(x_2)$, and so $g_L$ is no longer a function).

review: contrapositive: $(A \implies B) \iff (\lnot B \implies \lnot A)$; contradiction: $(A \not\implies B) \implies \text{contradiction}$.

We can similarly shows surjectivity $\iff$ existence of a right inverse. With these two, we can then trivially show that bijectivity $\iff$ existence of an inverse (rather, both a left and right inverse, which we can easily show must be equal). Proof will likely be part of the first homework assignment.

Fields

We need the definition of a vector and a field in order to define a vector space.

A field is an object: a set of elements $S$ with two closed binary operations defined upon $S$. These two operations are addition (which forms an abelian group over $S$) and multiplication (which forms an abelian group over $S - \{0\}$) such that multiplication distributes over addition. Note that convention dictates $0$ to be the additive identity and $1$ to be the multiplicative identity.

Other silly proofs include showing that if both a left and right identity exist, they must be equivalent, or that multiplication by $0$ maps any element to $0$.

Vector spaces (linear spaces)

A vector space is a set of vectors V and a field of scalars $\mathbb{F}$, combined with vector addition and scalar multiplication. Vector addition forms an abelian group, but this time, scalar multiplication has the properties of a monoid (existence of an identity and associativity). We then have the distributive laws $\alpha + \beta)x = \alpha x + \beta x$ and $\alpha (x + y)$.

Function spaces

We define a space $F(D,V)$, where $(V, \mathbb{F})$ is a vector space and $D$ is a set. $F$ is the set of all functions $F(D, V) = \fn{f}{D}{V}$. Is $(F, \mathbb{F})$ a vector space (yes) where vector addition is pointwise addition of functions and scalar multiplication is pointwise multiplication by a scalar?

Examples of this: space of continuous functions on the closed interval $\fn{\mathcal{C}}{\bracks{t_0, t_1}}{\Re^n}$, ($(C(\bracks{t_0, t_1}, \Re^n), \Re)$). This is indeed a vector space.

Lebesgue spaces

$L_p t_0, t_1) = \set{\fn{f}{[t_0, t_1]}{\Re}}{\int_{t_0}^{t_1} \abs{f(t)}^p dt < \infty}$.

We can then talk about $\ell_p$, which are spaces of sequences. $\ell_2$ is the space of square-summable sequences of real numbers. Informally, $\ell_2 = \set{ v = \{v_1, v_2, ... v_k\}}{v_k \in \Re \sum_k \abs{v_k}^2 < \infty}$.

In general, when looking at vector spaces, often we use $\mathbb{F} = \Re$, and we refer to the space as simply $V$.

Next: subspaces, bases, linear dependence/independence, linearity. One of the main things we're going to do is look at properties of linear functions and representation as multiplication by matrices.

Vector Spaces and Linearity

August 30, 2012

From last time

Subspaces, bases, linear dependence/independence, linearity. One of the main things we're going to do is look at properties of linear functions and representation as multiplication by matrices.

Example (of a vector space)

$\ell_2 = \{v = \{v_1, v_2, ...\} \st \sum_{i=1}^\infty \abs{v_i}^2 < \infty, v_i \in \Re \}$

What is a vector subspace?

Consider vector space $(V, \mathbb{F})$. Consider a subset W of V combined with the same field. $(W, \mathbb{F})$ is a subspace of $(V, \mathbb{F})$ if it is closed under vector addition and scalar multiplication (formally, this must be a vector space in its own right, but these are the only vector space properties that we need to check).

Consider vectors from $\Re^n$. A plane (in $\Re^3$) is a subspace of $\Re^3$ if it contains the origin.

Aside: for $x \in V$, span$(x) = \alpha x, \alpha \in \mathbb{F}$.

Linear dependence, linear independence.

Consider a set of $p$ vectors $\{v_1, v_2, ..., v_p\}, v_i \in V$. This set of vectors is said to be a linear independent set iff no nontrivial homogeneous equation exists, i.e. $\sum_i \alpha_i v_i = 0 \implies \forall i, \alpha_i = 0$. This is equivalent to saying that no one vector can be written as a linear combination of the others.

Otherwise, the set is said to be linearly dependent.

Bases

Recall: a set of vectors $W$ is said to span a space $(V, \mathbb{F})$ if any vector in the space can be written as a linear combination of vectors in the set, i.e. $\forall v \in V, \exists \set{(\alpha_i, w_i)}{v = \sum \alpha_i w_i}$ for $w_i \in W, \alpha_i \in \mathbb{F}$.

W is a basis iff it is also linearly independent.

Coordinates

Given a basis $B$ of a space $(V, \mathbb{F})$, there is a unique representation (trivial proof) of every $v \in V$ as a linear combination of elements of $B$. We define our coordinates to be the coefficients that appear in this unique representation. A visual representation is the coordinate vector, which defines

$$\alpha = \begin{bmatrix}\alpha_i \\ \vdots \\ \alpha_n \end{bmatrix}$$

Basis is not uniquely defined, but what is constant is the number of elements in the basis. This number is the dimension (rank) of the space. Another notion is that a basis generates the corresponding space, since once you have a basis, you can acquire any element in the space.

Linearity

A function $\fn{f}{(V, \mathbb{F})}{(W, \mathbb{F})}$ (note that these spaces are defined over the same field!) is linear iff $f(\alpha_1 v_1 + \alpha_2 v_2) = \alpha_1 f(v_1) + \alpha_2 f(v_2)$.

This property is known as superposition, which is an amazing property, because if you know what this function does to the basis elements of a vector space, then you know what it does to any element in the space.

An interesting corollary is that a linear map will always map the zero vector to itself.

Definitions associated with linear maps

Suppose we have a linear map $\fn{\mathcal{A}}{U}{V}$. The range (image) of $\mathcal{A}$ is defined to be $R(\mathcal{A}) = \set{v}{v = A(u), u \in U} \subset V$. The nullspace (kernel) of $\mathcal{A}$ is defined to be $N(\mathcal{A}) = \set{u}{\mathcal{A}(u) = 0} \subset U$. Also trivial (from definition of linearity) to prove that these are subspaces.

We have a couple of very important properties now that we've defined range and nullspace.

Properties of linear maps $\fn{\mathcal{A}}{U}{V}$

$$(b \in V) \implies (\mathcal{A}(u) = b \iff b \in R(\mathcal{A}))$$

$$(b \in R(\mathcal{A})) \iff (\exists!\ u\ \st \mathcal{A}(u) = b \iff [N(\mathcal{A}) = 0])$$

(if the nullspace only contains the zero vector, we say it is trivial)

$$\mathcal{A}(x_0) = \mathcal{A}(x_1) \iff x - x_0 \in N(\mathcal{A})$$

Matrix Representation of Linear Maps

September 4, 2012

Today

Matrix multiplication as a representation of a linear map; change of basis -- what happens to matrices; norms; inner products. We may get to adjoints today.

Last time, we talked about the concept of the range and the nullspace of a linear map, and we ended with a relationship that related properties of the nullspace to properties of the linear equation $\mathcal{A}(x) = b$. As we've written here, this is not matrix multiplication. As we'll see today, it can be represented as matrix multiplication, in which case, we'll write this as $Ax = b$.

There's one more important result, called the rank-nullity theorem. We defined the range and nullspace of a linear operator. We also showed that these are subspaces (range of codomain; nullspace of domain). We call $\text{dim}(R(\mathcal{A})) = \text{rank}(\mathcal{A})$ and $\text{dim}(N(\mathcal{A})) = \text{nullity}(\mathcal{A})$. Taking the dimension of the domain as $n$ and the dimension of the codomain as $m$, $\text{rank}(\mathcal{A}) + \text{nullity}(\mathcal{A}) = n$. Left as an exercise. Hints: choose a basis for the nullspace. Presumably you'd extend it to a basis for the domain (without loss of generality, because any set of $n$ linearly independent vectors will form a basis). Then consider how these relate to the range of $\mathcal{A}$. Then map $\mathcal{A}$ over this basis.

Matrix representation

Any linear map between finite-dimensional vector spaces can be represented as matrix multiplication. We're going to show that it's true via construction.

$\fn{\mathcal{A}}{U}{V}$. We're going to choose bases for the domain and codomain. $\forall x \in U, x = \sum_{j=1}^n \xi_k u_j$. Now consider $\mathcal{A}(x) = \mathcal{A}(\sum_{j=1}^n \xi_k u_j) = \sum_{j=1}^n \xi_k \mathcal{A}(u_j)$ (through linearity). Each $\mathcal{A}(u_j) = \sum_{i=1}^n a_{ij} v_i$. Uniqueness of $a_{ij}$ and $\xi_j$ follows from writing the vector spaces in terms of a basis.

$$\mathcal{A}(x) = \sum_{j=1}^n \xi_j \sum_{i=1}^m a_{ij} v_i \\ = \sum_{i=1}^m \left(\sum_{j=1}^n a_{ij} \xi_j\right) v_i \\ = \sum_{i=1}^m \eta_i v_i$$

Uniqueness of representation tells me that $\eta_i \equiv \sum_{j=1}^n a_{ij} \xi_j$. We've got $i = \{1 .. m\}$ and $j = \{1 .. n\}$. We can turn this representation into a matrix by defining $\eta = A\xi$. $A \in \mathbb{F}^{m \times n}$ is defined such that its $j^{\text{th}}$ column is $\mathcal{A}(u_j)$ written with respect to the $v_i$s.

All we used here was the definitions of basis, coordinate vectors, and linearity.

Let's do a couple of examples. Foreshadowing of work later in controllability of systems. Consider a linear map $\fn{\mathcal{A}} {(\Re^n, \Re)}{(\Re^n, \Re)}$. Try to derive the matrix $A \in \Re^{n \times n}$. Both the domain and codomain have as basis $\{b, \mathcal{A}(b), \mathcal{A}^2(b), ..., \mathcal{A}^{n-1}(b)\}$, where $b \in \Re^n$ and $A^n = -\sum_1^n -\alpha_i \mathcal{A}^{n-i}$. Your task is to show that the representation of $b$ and $\mathcal{A}$ is:

$$\bar{b} = \begin{bmatrix}1 \\ 0 \\ \vdots \\ 0\end{bmatrix} \\ \bar{A} = \begin{bmatrix} \\ 0 & 0 & ... & 0 & -\alpha_n \\ 1 & 0 * ... & 0 & -\alpha_{n-1} \\ 0 & 1 * ... & \vdots & -\alpha_{n-2} \\ \vdots & \vdots& \ddots & \vdots & -\alpha_{n-2} \\ \vdots & \vdots & \ddots & \vdots & -\alpha_{n-2} \\ 0 & 0 & \dots & 1 & -\alpha_1 \end{bmatrix}$$

This is really quite simple; it's almost by definition.

Note that these are composable maps, where composition corresponds to matrix multiplication.

Change of basis

Consider we have $\fn{\mathcal{A}}{U}{V}$ and two sets of bases for the domain and codomain. There exist maps between the first set of bases and the second set; composing those appropriately will give you your change of basis. Essentially, do a change of coordinates to those in which $A$ is defined (represented this as $P$), apply $A$, then change the coordinates of the codomain back (represented as $Q$). Thus $\bar{A} = QAP$.

If $V = U$, then you can easily derive that $Q = P^{-1}$, so $\bar{A} = P^{-1}AP$.

We consider this transformation ($\bar{A} = QAP$) to be a similarity transformation, and $A$ and $\bar{A}$ are called similar (equivalent).

We derived these two matrices from the same linear map, but they're derived using different bases.

Proof of Sylvester's inequality on homework 2.

One last note about the dimension of the rank of a linear map, which corresponds to the rank of the associated matrix representation: that is $\text{dim}(R(A)) = \text{dim}(R(\mathcal{A}))$. Similarly, $\text {nullity}(A) = \text{dim}(\text{nullspace}(A)) = \text{dim}(\text {nullspace}(\mathcal{A}))$.

Sylvester's inequality, which is an important relationship, says the following: Suppose you have $A \in \mathbb{F}^{m \times n}$, $B \in \mathbb{F}^{n \times p}$, then $AB \in \mathbb{F}^{m \times p}$, then $\text{rk}(A) + \text{rk}(B) - n \le \text{rk}(AB) \le \min(\text{rk}(A), \text{rk}(B)$. On the homework, you'll have to show both inequalities. Note at the end about elementary row operations.

Next important concept about vector spaces: that of norms.

Norms

With some vector spaces, you can associate some entity called a norm. We can then speak of a normed vector space (more commonly known as a metric space). Suppose you have a vector space $(V, \mathbb{F})$, where $\mathbb{F}$ is either $\Re$ or $\mathbb{C}$. This is a metric space if you can find $\fn{\mag{\cdot}}{V}{\Re_+}$ that satisfies the following axioms:

$\mag{v_1 + v_2} \le \mag{v_1} + \mag{v_2}$

$\mag{\alpha v} = \abs{\alpha}\mag{v}$

$\mag{v} = 0 \iff v = \theta$

We have some common norms on these fields:

$\mag{x}_1 = \sum_{i=1}^n \abs{x_i}$ ($\ell_1$)

$\mag{x}_2 = \sum_{i=1}^n \abs{x_i}^2$ ($\ell_2$)

$\mag{x}_p = \sum_{i=1}^n \abs{x_i}^p$ ($\ell_p$)

$\mag{x}_\infty = \max \abs{x_i}$ ($\ell_\infty$)

One of the most important norms that we'll be using: the induced norm is that induced by a linear operator. We'll define $\mathcal{A}$ to be a continuous linear map between two metric spaces; the induced norm is defined as

$$\mag{\mathcal{A}}_i = \sup_{u \neq \theta} \frac{\mag{\mathcal{A}u}_V}{\mag{u}_U}$$

From analysis: the supremum is the least upper bound (the smallest $\forall y \in S, x : x \ge y$).

Guest Lecture: Induced Norms and Inner Products

September 6, 2012

Induced norms of matrices

The reason that we're going to start talking about induced norms: today we're just going to build abstract algebra machinery, and at the end, we'll do the first application: least squares. We'll see why we need this machinery and why abstraction is a useful tool.

The idea is that we want to find a norm on a matrix using existing norms on vectors.

Let 1) $\fn{A}{(U,F)}{(U,F)}$, 2) let U have the norm $\mag{\ }_u$, 3) let V have the norm $\mag{\ }_v$. Let the induced norm be $\mag{A}_{u,v} = \sup_{x\neq 0} \frac{\mag{Ax}_v}{\mag{x}_u}$. Theorem: the induced norm is a norm. Not going to bother showing positive homogeneity and triangle inequality (trivial in this case). Only going to show last property: separates points. Essentially, $\mag{A}_{u,v} = 0 \iff A = 0$. The reason that this is not necessarily trivial is because of the supremum. It's a complex operator that's trying to maximize this function over an infinite set of points. It's possible that the supremum does not actually exist at a finite point.

The first direction is easy: if $A$ is zero, then its norm is 0 (by definition -- numerator is 0).

The second direction is a hard one. If $\mag{A}_{u,v} = 0$, then given any $x \neq 0$, it holds that $\frac{\mag{Ax}_u}{\mag{v}_u} \le 0$ (from the definition of supremum). Denominator must be positive definite (being the norm of a vector), and numerator must be positive definite (also being a norm). Thus the norm is also bounded below by zero, which means that the numerator is zero for all nonzero x. Thus everything is in the nullspace of $A$, which means that $A$ is zero.

Proposition: the induced norm has (a) $\mag{Ax}_u \le \mag{A}_{u,v} \mag{x}_u$; (b) $\mag{AB}_{u,v} \le \mag{A}_{u,v} \mag{B}_{u,v}$. (b) follows from (a).

Not emphasized in Claire's notes: induced norms form a small amount of all possible norms on matrices.

Examples of induced norms:

• $\mag{A}_{1,1} = \max_j \sum_i \abs{u_{ij}}$: maximum column sum: maximum of the sum of columns;
• $\mag{A}_{2,2} = \max_j \sqrt{\lambda_j A^T A}$: max singular value norm;
• $\mag{A}_{\infty, \infty} = \max_i \sum_j \abs{u_{ij}}$: maximum row sum.

Other matrix: special case of Schatten norms. (a) Frobenius norm $\sqrt{\text{trace}(A^T A)}$. Also square root of singular values. Convenient way to write nuclear norm.

Statistical regularization; Frobenius norm is analogous to $\ell_2$ regularization; nuclear norm analogous to $\ell_1$ regularization. It is important to be aware that these other norms exist.

Sensitivity analysis

Nice application of norms, but we won't see that it's a nice application until later.

Computation for numerical linear algebra.

Some algebra can be performed to show that if $Ax_0 = b$ (when $A$ invertible), then for $(A + \delta A)(x + \delta_x) = b + \delta b$, we have an approximate bound of $\frac{\mag{\delta_x}}{\mag{x_0}} \le \mag{A}\mag{A^{-1}} \bracks{\frac{\mag{\delta A}}{\mag{A}} + \frac{\mag{\delta b}}{\mag{b}}}$. Need to engineer computation to improve situation. Namely, we're perturbing $A$ and $b$ slightly: how much can the solution vary? In some sense, we have a measure of effect ($\mag{A}\mag{A^{-1}}$) and a measure of perturbation. The first quantity is important enough that people in linear algebra have defined it and called it a condition number: $\kappa(A) = \mag{A}\mag{A^{-1}} \ge 1$. The best you can do is 1. If you have a condition number of 1, your system is well-conditioned and very robust to perturbations. Larger condition number will mean less robustness to perturbation.

More machinery: Inner Product & Hilbert Spaces

Consider a linear space $(H, \mathbb{F})$, and define a function $\fn{\braket{}{}}{(H, \mathbb{F})}{\mathbb{F}}$. This function is an inner product if it satisfies the following properties.

• Conjugate symmetry. $\braket{x}{y} = \braket{y}{x}^*$.
• Homogeneity. $\braket{x}{\alpha y} = \alpha \braket{x}{y}$.
• Linearity. $\braket{x}{y + z} = \braket{x}{y} + \braket{x}{z}$.
• Positive definiteness. $\braket{x}{x} \ge 0$, where equality only occurs when $x = 0$.

Inner product spaces have a natural norm (might not be the official name), and that's the norm induced by the inner product.

One can define $\mag{x}^2 = \braket{x}{x}$, which satisfies the axioms of a norm.

Examples of Hilbert spaces: finite-dimensional vectors. Most of the time, infinite-dimensional Hilbert spaces match up with finite-dimensional. All linear operators in finite vector spaces are continuous because they can be written as a matrix (not always the case with infinite vector spaces). Suppose I have the field $\mathbb{F}$; $(\mathbb{F}^n, \mathbb{F})$, where the inner product $\braket{x}{y} = \sum_i \bar{x_i} y_i$, but another important inner product space is the space of square-integrable functions, $L^2([t_0, t_1], \mathbb{F}^n )$. Infinite-dimensional space which actually is the space spanned by Fourier series. It turns out that the inner product (of functions) is $\int_{t_0}^{t_1} f(t)^* g(t) dt$.

We're going to power through a little more machinery, but we're getting very close to the application. Need to go through adjoints and orthogonality before we can start doing applications.

Consider Hilbert spaces $(U, \mathbb{F}, \braket{}{}_u), V, \mathbb{F}, \braket{}{}_v)$, and let $\fn{A}{U}{V}$ be a continuous linear function. The adjoint of $A$ is denoted $A^*$ and is the map $\fn{A^*}{V}{U}$ such that $\braket{x}{Ay}_v = \braket{A^*}{y}_u$.

Reasoning? Sometimes you can simplify things. Suppose $A$ maps an infinite-dimensional space to a finite-dimensional space (e.g. functions to numbers). In some sense, you can convert that function into something that goes from real numbers to functions on numbers. Generalization of the Hermitian transpose.

Consider functions $f, g \in C([t_0, t_1], \Re^n)$. What is the adjoint of $\fn{A}{C([t_0, t_1], \Re^n)}{\Re}$, where $A = \braket{g}{f}_{C ([t_0, t_1], \Re^n)}$? (aside: this notion of the adjoint will be very important when we get to observability and reachability)

Observe that $\braket{v}{A}_\Re = v \cdot A = v \braket{g}{f}_C = \braket{v g}{f}$, and so consequently, we have that the adjoint of $A^*[v] = v g$.

Orthogonality

With Hilbert spaces, one can define orthogonality in an axiomatic manner (a more abstract form, rather). Let $(H, \mathbb{F}, \braket{}{})$ be a Hilbert space. Two vectors $x, y$ are defined to be orthogonal if $\braket{x}{y} = 0$.

Cute example: suppose $c = a + b$ and $a, b$ are orthogonal. In fact, $\mag{c}^2 = \mag{a + b}^2 = \braket{a + b}{a + b} = \braket{a}{a} + \braket{b}{b} + \braket{a}{b} + \braket{b}{a} = \mag{a}^2 + \mag{b}^2$. Cute because the result is the Pythagorean theorem, which we got just through these axioms.

One more thing: the orthogonal complement of a subspace $M$ in a Hilbert space is defined as $M^\perp = \set{y \in H}{\forall x \in M \braket{x}{y}}$.

We are at a point now where we can talk about an important theorem:

Fundamental Theorem of Linear Algebra (partially)

Let $A \in \Re^{m \times n}$. Then:

• $R(A) \perp N(A^T)$
• $R(A^T) \perp N(A)$
• $R(AA^T) = R(A)$
• $R(A^TA) = R(A^T)$
• $N(AA^T) = N(A)$
• $N(A^TA) = N(A^T)$

Proofs:

• Given any $x \in \Re^n, y \in \Re^m \st A^T y = 0$ ($y \in N(A^T)$), consider the quantity $\braket{y}{Ax} = \braket{A^Ty}{x} = 0$.

• Given any $x \in \Re^n, \exists y \in \Re^m \st x = A^T y + z$, where $z \in N(A)$(as a result of the decomposition above). Thus $Ax = AA^Ty$. Implies that $R(A) \subset R(A A^T)$

Now for the application.

Application: Least Squares

Consider the following problem: minimze $\mag{y - Ax}_2$, where $y \not\in R(A)$. If $y$ were in the range of A, and A were invertible, the solution would be trivial ($A^{-1}y$). In many problems, $A \in \Re^{m\times n}$, where $m \gg n$, $y \in \Re^m$, $x \in \Re^n$.

Since we cannot solve $Ax = y$, we instead solve $Ax = \hat{y}$. According to our intuition, we would like $y - \hat{y}$ to be orthogonal to $R(A)$. From the preceding (partial) theorem, this means that $y - \hat{y} \in N(A^T) \iff A^T(y - y_0) = 0$. Remember: what we really want to solve is $A^T(y - Ax) = 0 \implies A^T Ax = A^T y \implies x = (A^T A)^{-1} A^T y$ if $A^T A$ is invertible.

If A has full column-rank (that is, for $A \in \Re^{m \times n}$, we have $R(A) = n$), then this means that in fact $N(A) = \{0\}$, and the preceding theorem implies that the dimension of $R(A^T) = n$, which means that the dimension of $R(A^T A) = n$. However, $A^T A \in \Re^{n \times n}$. Thus, $A^T A$ is invertible.

Back to condition numbers (special case)

Consider a self-adjoint and invertible matrix in $\Re^{n \times n}$. $\hat{x} = (A^T A)^{-1} A^T y = A^{-1} y$. We have two ways of determining this value: the overdetermined least-squares solution and the standard inverse. Let us look at the condition numbers.

$\kappa(A^T A) = \mag{A^T A}\mag{(A^T A)^{-1}} = \mag{A^2}\mag{(A^{-1})^2} = \bracks{\kappa(A)}^2$. This result is more general: also applies in the $L^2$ case even if $A$ is not self-adjoint. As you can see, this is worse than if we simply use the inverse.

Gram-Schmidt orthonormalization

This is a theoretical toy, not used for computation (numerics are very bad).

More definitions:

A set of vectors S is orthogonal if $x \perp y \forall x \neq y$ and $x, y \in S$.

The set is orthonormal if also $\mag{x} = 1, \forall x \in S$. Why do we care about orthonormality? Consider Parseval's theorem. The reason you get that theorem is that the bases are required to be orthonormal so that you can get that result. Otherwise it wouldn't be as clean. That's typically why people like orthonormal bases: you can represent your vectors as just coefficients (and you don't need to store the length of the vectors).

We conclude with an example of Gram-Schmidt orthonormalization. Consider the space $L^2([t_0, t_1], \Re)$. Suppose I have $v_1 = 1, v_2 = t, v_3 = t^2$, $t_0 = 0$, $t_1 = 1$, and $\mag{v_1}^2 = \int_0^1 1 \cdot 1 dt = 1$. The key idea of Gram-Schmidt orthonormalization is the following: start with $b_1 \equiv \frac{v_1}{\mag{v_1}}$. Then go on with $b_2 = \frac{v_2 - \braket{v_2}{b_1}b_1}{\mag{v_2 - \braket{v_2}{b_1}b_1}}$, and repeat until you're done (in essence: you want to preserve only the component that is orthogonal to the space spanned by the vectors you've computed so far, then renormalize).

Basically, you get after all this computation that $b_2 = \frac{1}{12} t - \frac{1}{24}$. Same construction for $b_3$.

Singular Value Decomposition & Introduction to Differential Equations

September 11, 2012

Reviewing the adjoint, suppose we have two vector spaces $U, V$; like we have with norms, let us associated a field that is either $\Re$ or $\mathbb{C}$. Assume that these spaces are inner product spaces (we're associating with each an inner product). Suppose we have a continuous (linear) map $\fn{\mathcal{A}}{U}{V}$. We define the adjoint of this map to be $\fn{\mathcal{A}^*}{V}{U}$ such that $\braket{u}{\mathcal{A} v} = \braket{\mathcal{A}^* v}{u}$.

We define self-adjoint maps as maps that are equal to their adjoints, i.e. $\fn{\mathcal{A}}{U_1}{U_2} \st \mathcal{A} = \mathcal{A}^*$.

In finite-dimensional vector spaces, the adjoint of a map is equivalent to the conjugate transpose of the matrix representation of the map. We refer to matrices that correspond to self-adjoint maps as hermitian.

Unitary matrices

Suppose that we have $U \in \mathbb{F}^{n\times n}$. $U$ is unitary iff $U^*U = UU^* = I_n$. If $\mathbb{F}$ is $\Re$, the matrix is called orthogonal.

These constructions lead us to something useful: singular value decomposition. We'll come back to this later when we talk about matrix operations.

Singular Value Decomposition (SVD)

Suppose you have a matrix $M \in \mathbb{F}^{m\times m}$. An eigenvalue $\lambda$ of $M$ is a complex number iff there exists a nonzero vector $v$ such that $Mv = \lambda v$ ($v$ is thus called the eigenvector associated to $\lambda$). Now we can think about how to define singular values of a matrix in terms of these definitions.

Let us think about this in general for a matrix $A \in \mathbb{F}^{m \times n}$ (which we consider to be a matrix representation of some linear map with respect to a basis). Note that $A A^* = \mathbb{F}^{m \times m}$, which will have $m$ eigenvalues $\lambda_i, i = 1 ... m$.

Note that $AA^*$ is hermitian. We note that from the Spectral theorem, we can decompose the matrix into an orthonormal basis of eigenvectors corresponding to real eigenvalues. In fact, in this case, the eigenvalues must be real and non-negative.

If we write the eigenvalues of $AA^*$ as $\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_m$, where the first $r$ are nonzero, note that $r = \text{rank} AA^*$. We define the non-zero singular values of $A$ to be $\sigma_i = \sqrt{\lambda_i}, i \le r$. The remaining singular values are zero.

Recall the induced 2-norm: let us relate this notion of singular values back to the induced 2-norm of a matrix $A$ ($\mag{A}_{2,i}$). Consider the induced norm to be the norm induced by the action of $A$ on the domain of $A$; thus if we take the induced 2-norm, then this is the $\max (\lambda_i (A^*A))^{1/2}$, which is simply the maximum singular value.

Now that we know what singular values are, we can do a useful decomposition called singular value decomposition.

Take $M \in \mathbb{C}^{m \times n}$. We have the following theorem: there exist unitary matrices $U \in \mathbb{C}^{m \times m}, V \in \mathbb{C}^{n \times n}$ such that $A = U \Sigma V$, where $\Sigma$ is defined as a diagonal matrix containing the singular values of $A$. Consider the first $r$ columns of $U$ to be $U_1$, the first $r$ columns of $V$ to be $V_1$, and the $r \times r$ block of $\Sigma$ containing the nonzero singular values to be $\Sigma_r$. Then $A = U \Sigma V = U_1 \sigma_r V_1^*$.

Consider $AA^*$. With a bit of algebra, we can show that $AA^*U_1 = U_1 \sigma_r^2$. We call the columns $u_i$ of $U_1$ are the eigenvectors of $AA^*$ associated to eigenvalues $\sigma_i^2$; these are called the right-singular vectors.

Similarly, if we consider $A^*A$, we can show that $A^*A = V_1^* \Sigma_r^2 V_1$ and that $v_i^* A^*A = \Sigma_r^2 v_1^*$; the columns of this matrix are called the left-singular vectors.

Recap

We've covered a lot of ground these past few weeks: we covered functions, vector spaces, bases, and then we started to consider linearity. And then we started talking about endowing vector spaces with things like norms, inner products; induced norms. From that, we went on to talk about adjoints. We used adjoints, we went on to talk a little about projection and least-squares optimization. We then went on to talk about Hermitian matrices and singular value decomposition. I think about this first unit as having many basic units that we'll use over and over again. Two interesting applications: least-squares, SVD.

So we have this basis now to build on as we talk about linear systems. We'll also need to build a foundation on linear differential equations. We'll spend some time going over the basics: what a solution means, under what conditions a solution exists (i.e. what properties does the differential equation need to have?). We'll spend the next couple weeksn talking about properties of differential equations.

All of what we've done up to now has been covered in appendix A of Callier & Desoer. For the introduction to differential equations, we'll follow appendix B of Callier & Desoer. Not the easiest to read, but very comprehensive background reading.

The existence and uniqueness theorems are in many places, however.

Lecture notes 7.

Differential Equations

$$\dot{x} = f((x(t), t)), x(t_0) = x_0 \\ x \in \Re^n \\ \fn{f}{\Re^n \times \Re}{\Re^n}$$

(strictly speaking, $f$ maps $x$ to the tangent space, but for this course, we're going to consider the two spaces to be equivalent)

Often, we're going to consider the time-invariant case (where there is no dependence on $t$, but rather only on $x$), but this is a time-variant case. Recall that we consider time to be a privileged variable, i.e. always "marching forward".

What we're going to talk about now is how we can solve this differential equation. Rather (for now), under what conditions does there exist a (unique) solution to the differential equation (with initial condition)? We're interested in these two properties: existence and uniqueness. The solution we call $x(t)$ where $x(t_0) = x_0$. We need some understanding of some properties of that function $f$. We'll talk about continuity, piecewise continuity, Lipschitz continuity (thinking about the existence). In terms of uniqueness, we'll be talking about Cauchy sequences, Banach spaces, Bellman-Grönwall lemma.

A couple of different ways to prove uniqueness and existence; we'll use the Callier & Desoer method.

We'll finish today's lecture by just talking about some definitions of continuity. Suppose we have a function $f(x)$ that is said to be continuous: that is, $\forall \epsilon > 0, \exists \delta > 0 \st \abs{x_1 - x_2} < \delta \implies \abs{f(x_1) - f(x_2)} < \epsilon$ ($\epsilon$-$\delta$ definition).

Suppose we have $\fn{f(x,t)}{\Re^n \times \Re}{\Re^n}$. $f$ is said to be piece-wise continuous (w.r.t. $t$), $\forall x$ if $\fn{f(x, \cdot)}{\Re}{\Re^n}$ is continuous except at a finite number of (well-behaved) discontinuities in any closed and bounded interval of time. What I'm not allowing in this definition are functions with infinitely many points of discontinuity.

Next time we'll talk about Lipschitz continuity.

Existence and Uniqueness of Solutions to Differential Equations

September 13, 2012

Section this Friday only, 9:30 - 110:30, Cory 299.

Today: existence and uniqueness of solutions to differential equations.

We called this a DE or ODE, and we associated with it an initial condition. We started to talk about properties of the function $f$ as a function of $x$ only, but we can consider thinking about this as a function of $x$ for all t. This is a map from $\Re^n \to \Re^n$. In this class, recall, we used the $\epsilon$-$\delta$ definition for continuity.

We also introduced the concept of piecewise continuity, which will be important for thinking about the right-hand-side of the differential equation.

We defined piecewise continuity as $\fn{f(t)}{\Re_+}{\Re^n}$, where $f(t)$ is said to be piecewise continuous in $t$, where the function is continuous except at a set of well-behaved discontinuities (finitely many in any closed and bounded, i.e. compact, interval).

Finally, we will define Lipschitz continuity as follows: a function $\fn{f(\cdot, t)}{\Re^n}{\Re^n}$ is Lipschitz continuous in x if there exists a piecewise continuous function of time $\fn{k(t)}{\Re_+}{\Re_+}$ such that the following inequality holds: $\mag{f(x_1) - f(x_2)} \le k(t)\mag{x_1 - x_2}, \forall x_1, x_2 \in \Re^n, \forall t \in \Re_+$. This inequality (condition) is called the Lipschitz condition.

An important thing in this inequality is that there has to be one function $k(t)$, and it has to be piecewise continuous. That is, there exists such a function that is not allowed to go to infinity in compact time intervals.

It's an interesting condition, and if we look at this and compare the Lipschitz continuity definition to the general continuity definition, we can easily show that if the function is LC (Lipschitz continuous), then it's C (continuous), since LC is a stricter condition than C. That implication is fairly straightforward to show, but the inverse relationship is not necessarily true (i.e. continuity does not necessarily imply Lipschitz continuity).

Aside: think about this condition and what it takes to show that a function is Lipschitz continuous. Need to come up with a candidate $k(t)$ (often called the Lipschitz function or constant, if it's constant). Often the hardest part: trying to extract from $f$ what a possible $k$ is.

But there's a useful possible candidate for $k(t)$, given a particular function $f$. Let's forget about time for a second and consider a function just of $x$. If the Jacobian $Df$ (often you also use $\pderiv{f}{x}$), which is an $n \times n$ matrix (where $(Df)^j_i = \pderiv{f_j}{x_i}$. If the Jacobian $Df$ exists, then its norm provides a candidate Lipschitz function $k(t)$.

A norm of the Jacobian of $f$, if independent of $x$, tells you that the function is Lipschitz. If the norm always seems to depend on $x$, you can still say something about the Lipschitz properties of the function: you can call it locally Lipschitz by bounding the value of $x$ in some region.

Sketch of proof: generalization of mean value theorem (easy to sketch in $\Re^1$). Mean value theorem states that there exists a point such that the instantaneous slope is the same as the average slope (assuming that the function is differentiable). If we want to generalize it to more dimensions, we say $f(x_1) - f(x_2) = Df(\lambda x_1 + (1 - \lambda) x_2)(x_1 - x_2)$ (where $0 < \lambda < 1$). All we've required is the existence of $Df$.

Now we can just take norms (and this is what's interesting now) and use some of the results we have from norms. This provides a very useful construction for a candidate for $k$ (might not provide a great bound), but it's the second thing to try if you can't immediately extract out a function $k(t)$.

Something not in the notes, but useful. Let's go back to where we started, the differential equation with initial condition, and state the main theorem.

Fundamental Theorem of DEs / the Existence and Uniqueness theorem of (O)DEs

suppose we have a differential equation with an initial condition. Assume that $f(x)$ is piecewise continuous in $t$ and Lipschitz continuous in $x$. With that information, we have that there exists a unique function of time which maps $\Re_+ \to \Re^n$, which is differentiable ($C^1$) almost everywhere (derivative exists at all points at which $f$ is continuous), and it satisfies the initial condition and differential equation. This derivative exists at all points $t \in [t_1, t_2] - D$, where $D$ is the set of points where $f$ is discontinuous in $t$.

We are going to be interested in studying differential equations where we know these conditions hold. We're also going to prove the theorem. It's a nice thing to do (a little in depth) because it demonstrates some proof techniques (as well as giving you an idea of why the theorem works).

LC condition

The norm of the Jacobian of the example is bounded for bounded $x$. That is, we can choose a local region in $\Re$ for which our $Df$ is bounded to be less than some constant. That gives us a candidate Lipschitz constant for that local region. We say then that $f(x)$ is (at least) locally Lipschitz continuous (usually we just say this without specifying a region, since you can usually find a bound given any region). Further, it is trivially piecewise continuous in time (since it doesn't depend on time).

Note: if the Lipschitz condition holds only locally, it may be that the solution is only defined over a certain range of time.

We didn't show this, but in this example, the Lipschitz condition does not hold globally.

Local Fundamental theorem of DEs

Now assume that $f(x)$ is piecewise continuous in $t$ and Lipschitz continuous in $x$ (for all $x \in G \in \Re^n$). We now have that there exists a unique function of time and an interval $[t_0,t_1]$ (such that $t_0 \in G, t_1 \in G$) which maps $\Re_+ \to \Re^n$, which is differentiable ($C^1$) almost everywhere (derivative exists at all points at which $f$ is continuous), and it satisfies the initial condition and differential equation. As before, This derivative exists at all points $t \in [t_1, t_2] - D$, where $D$ is the set of points where $f$ is discontinuous in $t$. If it is global, we can make the interval as large as desired.

Proof

There are two pieces: the proof of existence and the proof of uniqueness. Today will likely just be existence.

Existence

Roadmap: construct an infinite sequence of continuous functions defined (recursively) as follows $x_{m+1}(t) = x_0 + \int_{t_0}^t f(x_m(\tau), \tau) d\tau$. First, show that this sequence converges to a continuous function $\fn{\Phi(\cdot)}{\Re_+}{\Re^n}$ which solves the DE/IC pair.

Would like to be able to prove the first thing here: I've constructed a sequence, and I want to show that the limit of this sequence is a solution to the differential equation.

The tool that I'm going to use is a property called Cauchy, and then I'm going to invoke the result that if I have a complete space, any Cauchy sequence on the space converges to something in the space. Gives me the basis of the existence of the thing that this converges to.

Goal: (1) to show that this sequence is a Cauchy sequence in a complete normed vector space, which means the sequence converges to something in the space, and (2) to show that the limit of this sequence satisfies the DE/IC pair.

A Cauchy sequence (on a normed vector space) is such that there exists some point in the sequence (some finite index $m$) such that if you look at any point beyond that index, the distance between the later points can be made smaller than some arbitrarily small $\epsilon > 0$. In other words: if we drop a finite number of elements from the start of the sequence, the distance between any remaining elements can be made arbitrarily small.

We define a Banach space (equivalently, a complete normed vector space) is one in which all Cauchy sequences converge. Implicitly in that, it means to something in the space itself.

Just an aside, a Hilbert space is a complete inner product space. If you have an inner product space, and you define the norm in that inner product space induced by that inner product, if all Cauchy sequences of that space converge (to a limit in the space) with this norm, then it is a Hilbert space.

Think about a Cauchy sequence on a space that converges to something not necessarily in the space. Example: any continued fraction.

To show (1), we'll show that this sequence $\{x_m\}$ that we constructed is a Cauchy sequence in a Banach space. Interestingly, it matters what norm you choose.

Proof of Existence and Uniqueness Theorem

September 18, 2012

Today:

• proof of existence and uniqueness theorem.
• [ if time ] introduction to dynamical systems.

First couple of weeks of review to build up basic concepts that we'll be drawing upon throughout the course. Either today or Thursday we will launch into linear system theory.

We're going to recall where we were last time. We had the fundamental theorem of differential equations, which said the following: if we had a differential equation, $\dot{x} = f(x,t)$, with initial condition $x(t_0) = x_0$, where $x(t) \in \Re^n$, etc, if $f( \cdot , t)$ is Lipschitz continuous, and $f(x, \cdot )$ is piecewise continuous, then there exists a unique solution to the differential equation / initial condition pair (some function $\phi(t)$) wherever you can take the derivative (may not be differentiable everywhere: loses differentiability on the points where discontinuities exist).

We spent quite a lot of time discussing Lipschitz continuity. Job is usually to test both conditions; first one requires work. We described a popular candidate function by looking at the mean value theorem and applying it to $f$: a norm of the Jacobian function provides a candidate Lipschitz if it works.

We also described local Lipschitz continuity, and often, when using a norm of the Jacobian, that's fairly easy to show.

Important point to recall: a norm of the Jacobian of $f$ provides a candidate Lipschitz function.

Another important thing to say here is that we can use any norm we want, so we can be creative in our choice of norm when looking for a better bound.

We started our proof last day, and we talked a little about the structure of the proof. We are going to proceed by constructing a sequence of functions, then show (1) that it converges to a solution, then show (2) that it is unique.

Proof of Existence

We are going to construct this sequence of functions as follows: $x_{m+1}(t) = x_0 + \int_0^t f(x_m(\tau)) d\tau$. Here we're dealing with an arbitrary interval from $t_1$ to $t_2$, and so $0 \in [t_1, t_2]$. We want to show that this sequence is a Cauchy sequence, and we're going to rely on our knowledge that the space these functions are defined in is a Banach space (hence this sequence converges to something in the space).

We have to put a norm on the set of reals, so we'll use the infinity norm. Not going to prove it, but rather state it's a Banach space. If we show that this is a Cauchy sequence, then the limit of that Cauchy sequence exists in the space. The reason that's interesting is that it's this limit that provides a candidate for this differential equation.

We will then prove that this limit satisfies the DE/IC pair. That is adequate to show existence. We'll then go on to prove uniqueness.

Our immediate goal is to show that this sequence is Cauchy, which is, we should show $\exists m \st (x_{m+p} - x_m) \to 0$ as $m$ gets large.

First let us look at the difference between $x_{m+1}$ and $x_m$. Just functions of time, and we can compute this. $\mag{x_{m+1} - x_m} = \int_{t_0}^t (f(x_m, \tau) - f(x_{m+1}, \tau)) d\tau$. Use the fact that f is Lipschitz continuous, and so it is $\le k(\tau)\mag{x_m(\tau) - x_{m+1}(\tau)} d\tau$. The function is Lipschitz, so well-defined, and it has a supremum in this interval. Let $\bar{k}$ be the supremum of $k$ over the whole interval $[t_1, t_2]$. This means that we can take this inequality and rewrite as $\mag{x_{m+1} - x_m} \le \bar{k} \int_{t_0}^t \mag{x_m(\tau) - x_{m+1}(\tau)} d\tau$. Now we have a bound that relates the bound between $x_m$ and $x_{m+1}$. You can essentially relate the distance we've just related between two subsequent elements to some further distance by counting.

Let us do two things: sort out the integral on the right-hand-side, then look at arbitrary elements beyond an index.

We know that $x_1(t) = x_0 + \int_{t_0}^t f(x_0, \tau) d\tau$, and that $x_1 - x_0 \le \int_{t_0}^{t} \mag{f(x_0, \tau)} d\tau \le \int_{t_1}{t_2} \mag{f(x_0, \tau) d\tau} \defequals M$. From the above inequalities, $\mag{x_2 - x_1} \le M \bar{k}\abs{t - t_0}$. Now I can look at general bounds: $x_3 - x_2 \le \frac{M\bar{k}^2 \abs{t - t_0}^2}{2!}$. In general, $x_{m+1} - x_m \le \frac{M\parens{\bar{k} \abs{t - t_0}}^m}{m!}$.

If we look at the norm of $\dot{x}$, that is going to be a function norm. What I've been doing up to now is look at a particular value $t_1 < t < t_2$.

Try to relate this to the norm $\mag{x_{m+1} - x_m}_\infty$. Can what we've done so far give us a bound on the difference between two functions? We can, because the infinity norm of a function is the maximum value that the function assumes (maximum vector norm for all points $t$ in the interval we're interested in). If we let $T$ be the difference between our larger bound $t_2 - t_1$, we can use the previous result on the pointwise norm, then a bound on the function norm has to be less than the same bound, i.e. if a pointwise norm function is less than this bound for all relevant $t$, then its max value must be less than this bound.

That gets us on the road we want to be, since that now gets us a bound. We can now go back to where we started. What we're actually interested in is given an index $m$, we can construct a bound on all later elements in the sequence.

$\mag{x_{m+p} - x_m}_\infty = \mag{x_{m+p} + x_{m+p-1} - x_{m+p-1} + ... - x_m} = \mag{\sum_{k=0}^{p-1} (x_{m+k+1} - x_{m+k})} \le M \sum_{k=0}^{p-1} \frac{(\bar{k}T)^{m+k}}{(m+k)!}$.

We're going to recall a few things from undergraduate calculus: Taylor expansion of the exponential function and $(m+k)! \ge m!k!$.

With these, we can say that $\mag{x_{m+p} - x_m}_\infty \le M\frac{(\bar{k}T)^m}{m!} e^{\bar{k} T}$. What we'd like to show is that this can be made arbitrarily small as $m$ gets large. We study this bound as $m \to \infty$, and we recall that we can use the Stirling approximation, which shows that factorial grows faster than the exponential function. That is enough to show that $\{x_m\}_0^\infty$ is Cauchy. Since it is in a Banach space (not proving, since beyond our scope), it converges to something in the space to a function (call it $x^\ell$) in the same space.

Now we just need to show that the limit $x^\ell$ solves the differential equation (and initial condition). Let's go back to the sequence that determines $x^\ell$. $x_{m+1} = x_0 + \int_{t_0}^t f(x_m, \tau) d\tau$. We've proven that this limit converges to $x^\ell$. What we want to show is that if we evaluate $f(x^\ell, t)$, then $\int_{t_0}^t f(x_m, \tau) \to \int_{t_0}^t f(x^\ell, \tau) d\tau$. Would be immediate if we had that the function were continuous. Clear that it satisfies initial condition by the construction of the sequence, but we need to show that it satisfies the differential equation. Conceptually, this is probably more difficult than what we've just done (establishing bounds, Cauchy sequences). Thinking about what that function limit is and what it means for it to satisfy that differential equation.

Now, you can basically use some of the machinery we've been using all along to show this. Difference between these goes to $0$ as $m$ gets large.

$$\mag{\int_{t_0}^t (f(x_m, \tau) f(x^\ell, \tau)) d\tau} \\ \le \int_{t_0}^t k(\tau) \mag{x_m - x^\ell} d\tau \le \bar{k}\mag{x_m - x^\ell}_\infty T \\ \le \bar{k} M e^{\bar{k} T} \frac{(\bar{k} T)^m}{m!}T$$

Thus $x^\ell$ solves the DE/IC pair. A solution $\Phi$ is $x^\ell$, i.e. $x^\ell(t) = f(x^\ell, t) \forall [t_1, t_2] - D$ and $x^\ell(t_0) = x_0$

To show that this solution is unique, we will use the Bellman-Gronwall lemma, which is very important. Used ubiquitously when you want to show that functions of time are equal to each other: candidate mechanism to do that.

Bellman-Gronwall Lemma

Let $u, k$ be real-valued positive piece-wise continuous functions of time, and we'll have a constant $c_1 \ge 0$ and $t_0 \ge 0$. If we have such constants and functions, then the following is true: if $u(t) \le c_1 + \int_{t_0}^t k(\tau)u(\tau) d\tau$, then $u(t) \le c_1 e^{\int_{t_0}^t k(\tau) d\tau}$.

Proof (of B-G)

$t > t_0$ WLOG.

$$U(t) = c_1 + \int_{t_0}^t k(\tau) u(\tau) d\tau \\ u(t) \le U(t) \\ u(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau} \le U(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau} \\ \deriv{}{t}\parens{U(t)e^{\int_{t_0}^t k(\tau) d\tau}} \le 0 \text{(then integrate this derivative, note that U(t_0) = c_1)} \\ u(t) \le U(t) \le c_1 e^{\int_{t_0}^t k(\tau) d\tau}$$

Using this to prove uniqueness of DE/IC solutions

How we're going to use this to prove B-G lemma.

We have a solution that we constructed $\Phi$, and someone else gives us a solution $\Psi$, constructed via a different method. Show that these must be equivalent. Since they're both solutions, they have to satisfy the DE/IC pair. Take the norm of the difference between the differential equations.

$$\mag{\Phi - \Psi} \le \bar{k} \int_{t_0}^t \mag{\Phi - \Psi} d\tau \forall t_0, t \in [t_1, t_2]$$

From the Bellman-Gronwall Lemma, we can rewrite this inequality as $\mag{\Phi - \Psi} \le c_1 e^{\bar{k}(t - t_0)}$. Since $c_1 = 0$, this norm is less than or equal to 0. By positive definiteness, this norm must be equal to 0, and so the functions are equal to each other.

Reverse time differential equation

We think about time as monotonic (either increasing or decreasing, usually increasing). Suppose that time is decreasing. $\exists \dot{x} = f(x,t)$. Going backwards in time, explore existence and uniqueness going backwards in time. Suppose we had a time variable $\tau$ which goes from $t_0$ backwards, and defined $\tau \defequals t_0 - t$. We want to define the solution to that differential equation backwards in time as $z(\tau) = x(t)$ if $t < t_0$. Derive what reverse order time derivative is. Equation is just $-f$; we're going to use $\bar{f}$ to represent this function ($\deriv{}{\tau}z = -\deriv{}{t}x = -f(x, t) = -f(z, \tau) = \bar{f}$).

This equation, if I solve the reverse time differential equation, we'll have some corresponding backwards solution. Concluding statement: can think about solutions forwards and backwards in time. Existence of unique solution forward in time means existence of unique solution backward in time (and vice versa). You can't have solutions crossing themselves in time-invariant systems.

Introduction to dynamical systems

September 20, 2012

Suppose we have equations $\dot{x} = f(x, u, t)$, $\fn{f}{\Re^n \times \Re^n \times \Re_+}{\Re^n}$ and $y = h(x, u, t)$, $\fn{h}{\Re^n \times \Re^n \times \Re_+}{\Re^n}$. We define $n_i$ as the dimension of the input space, $n_o$ as dimension of the output space, and $n$ as the dimension of the state space.

We've looked at the form, and if we specify a particular $\bar{u}(t)$ over some time interval of interest, then we can plug this into the right hand side of this differential equation. Typically we do not supply a particular input. Thinking about solutions to this differential equation, for now, let's suppose that it's specified.

Suppose we have some feedback function of the state. If $u$ is specified, as long as $\bar{f}$ satisfies the conditions for the existence and uniqueness theorem, we have a differential equation we can solve.

Another example: instead of differential equation (which corresponds to continuous time), we have a difference equation (which corresponds to discrete time).

Example: dynamic system represented by an LRC circuit. One practical way to define the state $x$ is as a vector of elements whose derivatives appear in our differential equation. Not formal, but practical for this example.

Notions of discretizing.

What is a dynamical system?

As discussed in first lecture, we consider time $\tau$ to be a privileged variable. Based on our definition of time, the inputs and outputs are all functions of time.

Now we're going to define a dynamical system as a 5-tuple: $(\mathcal{U}, \Sigma, \mathcal{Y}, s, r)$ (input space, state space, output space, state transition function, output map).

We define the input space as the set of input functions over time to an input set $U$ (i.e. $\mathcal{U} = \{\fn{u}{\tau}{U}\}$. Typically, $U = \Re^{n_i}$).

We also define the output space as the set of output functions over time to an output set $Y$ (i.e. $\mathcal{Y} = \{\fn{y}{\tau}{Y}\}$). Typically, $Y = \Re^{n_o}$.

$\Sigma$ is our state space. Not defined as the function, but the actual state space. Typically, $\Sigma = \Re^n$, and we can go back and think about the function $x(t) \in \Sigma$. $\fn{x}{\tau}{\Sigma}$ is called the state trajectory.

$s$ is called the state transition function because it defines how the state changes in response to time and the initial state and the input. $\fn{s}{\tau \times \tau \times \Sigma \times U }{\Sigma}$. Usually we write this as $x(t_1) = s(t_1, t_0, x_0, u)$, where $u$ is the function $u(\cdot) |_{t_0}^{t_1}$. This is important: coming towards how we define state. Only things you need to get to state at the new time are the initial state, inputs, and dynamics.

Finally, we have this output map (sometimes called the readout map) $r$. $\fn{r}{\tau \times \Sigma \times U}{Y}$. That is, we can think about $y(t) = r(t, x(t), u(t))$. There's something fundamentally different between $r$ and $s$. $s$ depended on the function $u$, whereas $r$ only depended on the current value of $u$ at a particular time.

$s$ captures dynamics, while $r$ is static. Remark: $s$ has dynamics (memory) -- things that depend on previous time, whereas $r$ is static: everything it depends on is at the current time (memoryless).

In order to be a dynamical system, we need to satisfy two axioms: a dynamical system is a five-tuple with the following two axioms:

• The state transition axiom: $\forall t_1 \ge t_0$, given $u, \tilde{u}$ that are equal to each other over a particular time interval, the state transition functions must be equal over that interval, i.e. $s(t_1, t_0, x_0, u) = s(t_1, t_0, x_0, \tilde{u})$. Requires us to not have dependence on the input outside of the time interval of interest.
• The semigroup axiom: suppose you start a system at $t_0$ and evolve it to $t_2$, and you're considering the state. You have an input $u$ defined over the whole time interval. If you were to look at an intermediate point $t_1$, and you computed the state at $t_1$ via the state transition function, we can split our time interval into two intervals, and we can compute the result any way we like. Stated as the following: $s(t_2, t_1, s(t_1, t_0, x_0, u), u) = s(t_2, t_0, x_0, u)$.

When we talk about a dynamical system, we have to satisfy these axioms.

Response function

Since we're interested in the outputs and not the states, we can define what we call the response map. It's not considered part of the definition of a dynamical system because it can be easily derived.

It's the composition of the state transition function and the readout map, i.e. $y(t) = r(t, x(t), u(t)) = r(t, s(t, t_0, x_0, u), u(t)) \defequals \rho(t, t_0, x_0, u)$. This is an important function because it is used to define properties of a dynamical system. Why is that? We've said that states are somehow mysterious. Not something we typically care about: typically we care about the outputs. Thus we define properties like linearity and time invariance.

Time Invariance

We define a time-shift operator $\fn{T_\tau}{\mathcal{U}}{\mathcal{U}}$, $\fn{T_\tau}{\mathcal{Y}}{\mathcal{Y}}$. $(T_\tau u)(t) \defequals u(t - \tau)$. Namely, the value of $T_\tau u$ is that of the old signal at $t-\tau$.

A time-invariant (dynamical) system is one in which the input space and output space are closed under $T_\tau$ for all $\tau$, and $\rho(t, t_0, x_0, u) = \rho(t + \tau, t_0 + \tau, x_0, T_\tau u)$.

Linearity

A linear dynamical system is one in which the input, state, and output spaces are all linear spaces over the same field $\mathbb{F}$, and the response map $\rho$ is a linear map of $\Sigma \times \mathcal{U}$ into $\mathcal{Y}$.

This is a strict requirement: you have to check that the response map satisfies these conditions. Question that comes up: why do we define linearity of a dynamical system in terms of linearity of the response and not the state transition function? Goes back to a system being intrinsically defined by its inputs and outputs. Often states, you can have many different ways to define states. Typically we can't see all of them. It's accepted that when we talk about a system and think about its I/O relations, it makes sense that we define linearity in terms of this memory function of the system, as opposed to the state transition function.

Let's just say a few remarks about this: zero-input response, zero-state response. If we look at the zero element in our spaces (so we have a zero vector), then we can take our superposition, which implies that the response at time $t$ is equal to the zero-state response, which is the response, given that we started at the zero state, plus the zero input response.

That is: $\rho(t, t_0, x_0, u) = \rho(t, t_0, \theta_x, u) + \rho(t, t_0, x_0, \theta_u)$ (from the definition of linearity).

The second remark is that the zero-state response is linear in the input, and similarly, the zero-input response is linear in the state.

One more property of dynamical systems before we finish: equivalence (a property derived from the definition). Take two dynamical systems $D = (U, \Sigma, Y, s, r), \tilde{D} = (U, \bar{\Sigma}, Y, \bar{s}, \bar{r})$. $x_0 \in D$ is equivalent to $\tilde{x_0} \in \tilde{D}$ at $t_0$. If $\forall t \ge t_0, \rho(t, t_0, x_0, u) = \tilde{\rho}(t, t_0, \tilde{x_0}, u)$ $\forall x$ and some $\tilde{x}$, the two systems are equivalent.

Linear time-varying systems

September 25, 2012

Recall the state transition function is given some function of the current time with initial state, initial time, and inputs, Suppose you have a differential equation; how do you acquire the state transition function? Solve the differential equation.

For a general dynamical system, there are different ways to get the state transition function. This is an instantiation of a dynamical system, and we're going to ge thte state transition function by solving the differential equation / initial condition pair.

We're going to call $\dot{x}(t) = A(t)x(t) + B(t)u(t)$ a vector differential equation with initial condition $x(t_0) = x_0$.

So that requires us to think about solving that differential equation. Do a dimension check, to make sure we know the dimensions of the matrices. $x \in \Re^n$, so $A \in \Re^{n_0 \times n}$. We could define the matrix function $A$, which takes intervals of the real line and maps them over to matrices. As a function, $A$ is piecewise continuous matrix function in time.

The entries are piecewise-continuous scalars in time. We would like to get at the state transition function; to do that, we need to solve the differential equation.

Let's assume for now that $A, B, U$ are given (part of the system definition).

Piece-wise continuous is trivial; we can use the induced norm of $A$ for a Lipschitz condition. Since this induced norm is piecewise-continuous in time, this is a fine bound. Therefore $f$ is globally Lipschitz continuous.

We're going to back off for a bit and introduce the state transition matrix. Background for solving the VDE. We're going to introduce a matrix differential equation, $\dot{X} = A(t) X$ (where $A(t)$ is same as before).

I'm going to define $\Phi(t, t_0)$ as the solution to the matrix differential equation (MDE) for the initial condition $\Phi(t_0, t_0) = 1_{n \times n}$. I'm going to define $\Phi$ as the solution to the $n \times n$ matrix when my differential equation starts out in the identity matrix.

Let's first talk about properties of this matrix $\Phi$ just from the definition we have.

• If you go back to the vector differential equation, and let's just drop the term that depends on $u$ (either consider $B$ to be 0, or the input to be 0), the solution of $\cdot{x} = A(t)x(t)$ is given by $x(t) = \Phi(t, t_0)x_0$.
• This is what we call the semigroup property, since it's reminiscent of the semigroup axiom. $\Phi(t, t_0) = \Phi(t, t_1) \Phi(t_1, t_0) \forall t, t_0, t_1 \in \Re^+$
• $\Phi^{-1}(t, t_0) = \Phi(t_0, t)$.
• $\text{det} \Phi(t, t_0) = \exp\parens{\int_{t_0}^t \text{tr} \parens{A (\tau)} d\tau}$.

Here's let's talk about some machinery we can now invoke when we want to show that two functions of time are equal to each other when they're both solutions to the differential equation. You can simply show by the existence and uniqueness theorem (assuming it applies) that they satisfy the same initial condition and the same differential equation. That's an important point, and we tend to use it a lot.

(i.e. when faced with showing that two functions of time are equal to each other, you can show that they both satisfy the same initial condition and the same differential equation [as long as the differential equation satisfies the hypotheses of the existence and uniqueness theorem])

Obvious, but good to state.

Note: the initial condition doesn't have to be the initial condition given; it just has to hold at one point in the interval. Pick your point in time judiciously.

Proof of (2): check $t=t_1$. (3) follows directly from (2). (4) you can look at if you want. Gives you a way to compute $\Phi(t, t_0)$. We've introduced a matrix differential equation and an abstract solution.

Consider (1). $\Phi(t, t_0)$ is a map that takes the initial state and transitions to the new state. Thus we call $\Phi$ the state transition matrix because of what it does to the states of this vector differential equation: it transfers them from their initial value to their final value, and it transfers them through matrix multiplication.

Let's go back to the original differential equation. Claim that the solution to that differential equation has the following form: $x(t) = \Phi(t, t_0)x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau)u(\tau) d\tau$. Proof: we can use the same machinery. If someone gives you a candidate solution, you can easily show that it is the solution.

Recall the Leibniz rule, which we'll state in general as follows: $\pderiv{}{z} \int_{a(z)}^{b(z)} f(x, z) dx = \int_{a(z)}^{b(z)} \pderiv{}{x}f(x, z) dx + \pderiv{b}{z} f(b, z) - \pderiv{a}{z} f(a, z)$.

$$\dot{x}(t) = A(t) \Phi(t, t_0) x_0 + \int_{t_0}^t \pderiv{}{t} \parens{\Phi(t, \tau)B(\tau)u(\tau)} d\tau + \pderiv{t}{t}\parens{\Phi(t, t)B(t)u(t)} - \pderiv{t_0}{t}\parens{...} \\ = A(t)\Phi(t, t_0)x_0 + \int_{t_0}^t A(t)\Phi(t,\tau)B(\tau)u(\tau)d\tau + B(t)u(t) \\ = A(\tau)\Phi(t, t_0) x_0 + A(t)\int_{t_0}^t \Phi(t, \tau)B(\tau) u(\tau) d\tau + B(t) u(t) \\ = A(\tau)\parens{\Phi(t, t_0) x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau) u(\tau) d\tau} + B(t) u(t)$$

$x(t) = \Phi(t,t_0)x_0 + \int_{t_0}^t \Phi(t,\tau)B(\tau)u(\tau) d\tau$ is good to remember.

Not surprisingly, it depends on the input function over an interval of time.

The differential equation is changing over time, therefore the system itself is time-varying. No way in general that will be time-invariant, since the equation that defines its evolution is changing. You test time-invariance or time variance through the response map. But is it linear? You have the state transition function, so we can compute the response function (recall: readout map composed with the state transition function) and ask if this is a linear map.

Linear time-Invariant systems

September 27, 2012

Last time, we talked about the time-varying differential equation, and we expressed $R(\cdot) = \bracks{A(\cdot), B(\cdot), C(\cdot), D(\cdot)}$. Used state transition matrix to show that the solution was given by $x(t) = \Phi(t, t_0) x_0 + \int_{t_0}^t B(\tau) u(\tau) d\tau$. Integral part is the state transition matrix, and we haven't talked about how we would compute this matrix. In general, computing the state transition matrix is hard. But there's one important class where computing that class becomes much simpler than usual. That is where the system does not depend on time.

Linear time-invariant case: $\dot{x} = Ax + Bu, y = Cx + Du, x(t_0) = x_0$. Does not matter at what time we start. Typically, WLOG, we use $t_0 = 0$ (we can't do this in the time-varying case).

Aside: Jacobian linearization

In practice, generally the case that someone doesn't present you with a model that looks like this. Usually, you derive this (usually nonlinear) model through physics and whatnot. What can I do to come up with a linear representation of that system? What is typically done is an approximation technique called Jacobian linearization.

So suppose someone gives you a nonlinear system and an output equation, and you want to come up with some linear representation of the system.

Two points of view: we could look at the system, and suppose we applied a particular input to the system and solve the differential equation ($u^0(t) \mapsto x^0(t)$, the nominal input and nominal solution). That would result in a solution (state trajectory, in general). Now suppose that we for some reason want to perturb that input ($u^0(t) + \delta u(t)$, the perturbed input). Suppose in general that $\delta u$ is a small perturbation. What this results in is a new state trajectory, that we'll define as $x^0(t) + \delta x(t)$, the perturbed solution.

Now we can derive from that what we call the Jacobian linearization. That tells us that if we apply the input, the solution will be $x^0 = f(x^0, u^0, t)$, and I also have that $x^0(t_0) = x_0$.

$\dot{x}^0 + \dot{\delta}x = f(x^0 + \delta x, u^0 + \delta u, t)$, where $(x^0 + \delta x)(t_0) = x_0 + \delta x_0$. Now I'm going to look at these two and perform a Taylor expansion about the nominal input and solution. Thus $f(x^0 + \delta x, u^0 + \delta u, t) = f(x^0, u^0, t) + \pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)}\delta x + \pderiv{}{u}f(x,u,t)\vert_{(x^0, u^0)} \delta u + \text{higher order terms}$ (recall that we also called $\pderiv{}{x}$ $D_1$, i.e. the derivative with respect to the first argument).

What I've done is expanded the right hand side of the differential equation. Thus $\delta x = \pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)} \delta x + \pderiv{}{u} f(...)\vert_{(x^0, y^0)}\delta u + ...$. If $\delta u, \delta x$ small, then we can assume that they are approximately zero, which gives us an approximate first-order linear differential equation. This gives us a linear time-varying approximation of the dynamics of this perturbation vector, in response to a perturbation input. That's what the Jacobian linearization gives you: the perturbation away from the nominal (we linearized about a bias point).

Consider A(t) to be the Jacobian matrix with respect to x, and B(t) to be the Jacobian matrix with respect to u. Remember that this is an approximation, and if your system is really nonlinear, and you perturb the system a lot (stray too far from the bias point), then this linearization may cease to hold.

Linear time-invariant systems

Motivated by the fact that we have a solution to the time-varying equation, it depends on the state transition matrix, which right now is an abstract thing which we don't have a way of solving. Let's go to a more specific class of systems: that where $A, B, C, D$ do not depend on time. We know that this system is linear (we don't know yet that it is time-invariant; we have to find the response function and show that it satisfies the definition of a time-invariant system), so this still requires proof.

Since these don't depend on time, we can use some familiar tools (e.g. Laplace transforms) and remember what taking the Laplace transform of a derivative is. Denote $\hat{x}(s)$ to be the Laplace transform of $x(t)$. The Laplace transform is therefore $s\hat{x}(s) - x_0 = A\hat{x}(s) + B\hat{u}(s)$; $s\hat{y}(s) - y_0 = C\hat{x}(s) + D\hat{u}(s)$. The first equation becomes $(sI - A)\hat{x}(s) = x_0 + B\hat{u}(s)$, and we'll leave the second equation alone.

Let's first consider $\hat{x} = Ax$, $x(0) = x_0$. I could have done the same thing, except my right hand side doesn't depend on B: $(sI - A)\hat{x}(s) = x_0$. Let's leave that for a second and come back to it, and make the following claim: the state transition matrix for $\hat{x} = Ax, x(t_0) = x_0$ is $\Phi(t,t_0) = e^{A(t-t_0)}$, which is called the matrix exponential, defined as $e^{A(t-t_0)} = I + A(t-t_0) + \frac{A^2(t-t_0)^2}{2!} + ...$ (Taylor expansion of the exponential function).

We just need to show that the state transition matrix, using definitions we had last day, is indeed the state transition matrix for that system. We could go back to the definition of the state transition matrix for the system, or we could go back to the state transition function for the vector differential equation.

From last time, we know that the solution to $\dot{x}A(t)x, x(t_0) = x_0$ is given by $x(t) = \Phi(t, t_0)x_0$; here, we are claiming then that $x(t) = e^{A(t - t_0)} x_0$, where $x(t)$ is the solution to $\dot{x} = Ax$ with initial condition $x_0$.

First show that it satisfies the vector differential equation: $\dot{x} = \pderiv{}{t}\exp\parens{A(t-t_0)} x_0 = (0 + A + A^2(t - t_0 + ...)x_0 = A(I + A(t-t_0) + \frac{A^2}{2}(t-t_0)^2 + ...) x_0 = Ae^{At} x_0 = Ax(t)$, so it satisfies the differential equation. Checking the initial condition, we get $e^{A \cdot 0}x_0 = I x_0 = x_0$. We've proven that this represents the solution to this time-invariant differential equation. By the existence and uniqueness theorem, this is the same solution.

Through this proof, we've shown a couple of things: the derivative of the matrix exponential, and we evaluated it at $t-t_0=0$. So now let's go back and reconsider its infinite series representation and classify some of its other properties.

Properties of the matrix exponential

• $e^0 = I$
• $e^{A(t+s)} = e^{At}e^{As}$
• $e^{(A+B)t} = e^{At}e^{Bt}$ iff $\comm{A}{B} = 0$.
• $\parens{e^{At}}^{-1} = e^{-At}$, and these properties hold in general if you're looking at $t$ or $t - t_0$.
• $\deriv{e^{At}}{t} = Ae^{At} = e^{At}A$ (i.e. $\comm{e^At}{A} = 0$)
• Suppose $X(t) \in \Re^{n \times n}$, $\dot{X} = AX, X(0) = I$, then the solution of this matrix differential equation and initial condition pair is given by $X(t) = e^{At}$. Proof in the notes; very similar to what we just did (more general proof, that the state transition matrix is just given by the matrix exponential).

Calculating $e^{At}$, given $A$

What this is now useful for is making more concrete this state transition concept. Still a little abstract, since we're still considering the exponential of a matrix.

The first point is that using the infinite series representation to compute $e^{At}$ is in general hard.

Would be doable if you knew $A$ were nilpotent ($A^k = 0$ for some $k \in \mathbb{Z}$), but it's not always feasible. Would not be feasible if $k$ large.

The way one usually computes the state transition matrix $e^{At}$ is as follows:

Recall: $\dot{X}(t) = AX(t)$, with $X(0) = I$. We know from what we've done before (property 6) that we can easily prove $X(t) = e^{At}$. We also know that $(sI - A)\hat{X}(s) = I$, so $\hat{X}(s) = (sI - A)^{-1}$. That tells me that $e^{At} = \mathcal{L}^{-1}\parens{(sI - A)^{-1}}$. That gives us a way of computing $e^{At}$, assuming we have a way to compute a matrix's inverse and an inverse Laplace transform. This is what people usually do, and most algorithms approach the problem this way. Generally hard to compute the inverse and the inverse Laplace transform.

Requires proof regarding why $sI - A$ always has an inverse given by $e^{-At}$.

Clive Moller started LINPACK (Linear algebra package; engine behind MATLAB). Famous in computational linear algebra. Paper: 19 dubious ways to compute the matrix exponential. Actually a hard problem in general. Factoring of $n$-degree polynomials.

If we were to consider our simple nilpotent case, we'll compute $sI - A = \begin{bmatrix}s & -1 \\ 0 & s\end{bmatrix}$. We can immediately write down its inverse as $\begin{bmatrix}\frac{1}{s} & \frac{1}{s^2} \\ 0 & \frac{1}{s}\end{bmatrix}$. Inverse Laplace transform takes no work; it's simply $\begin{bmatrix}1 & t \\ 0 & 1\end{bmatrix}$.

In the next lecture (and next series of lectures) we will be talking about the Jordan form of a matrix. We have a way to compute $e^{At}$. We'll write $A = TJT^{-1}$. In its simplest case, it's diagonal. Either way, all of the work is in exponentiating $J$. You still end up doing something that's the inverse Laplace transform of $sI - J$.

We've shown that for a linear TI system, $\dot{x} = Ax + Bu$; $y = Cx + Du$ ($x(0) = x_0$). $x(t) = e^{At}x_0 + \int_0^t e^{A(t-\tau)} Bu(\tau) d\tau$. We proved it last time, but you can check this satisfies the differential equation and initial condition.

From that, you can compute the response function and show that it's time-invariant. Let's conclude today's class with a planar inverted pendulum. Let's call the angle of rotation away from the vertical $\theta$, mass $m$, length $\ell$, and torque $\tau$. Equations of motion: $m\ell^2 \ddot{\theta} - mg\ell \sin \theta = \tau$. Perform Jacobian linearization; we'll define $\theta = 0$ at $\pi/2$, and we're linearizing about the trivial trajectory that the pendulum is straight up. Therefore $\delta \theta = \theta \implies m\ell^2 \ddot{\theta} + mg\ell\theta = \tau$, where $u = \frac{\tau}{m\ell^2}$, and $\Omega^2 = \frac{g}{\ell}$, $\dot{x}_1 = x_2$, and $\dot{x}_2 = \Omega^2 x_1 + u$.

$y = \theta - x_1, \dot{x}_1 = x_2, \dot{x}_2 = \Omega^2 x_1 + u, y = x_1$. Stabilization of system via feedback by considering poles of Laplace transform, etc. $\frac{\hat{y}}{\hat{u}} = \frac{1}{s^2 - \Omega^2} = G(s)$ (the plant).

In general, not a good idea: canceling unstable pole, and then using feedback. In the notes, this is some controller $K(s)$. If we look at the open-loop transfer function ($K(s)G(s) = \frac{1}{s(s+\Omega)}$), $u = \frac{s-\Omega}{s}\bar{u}$, so $\dot{u} = \dot{\bar{u}} - \Omega\bar{u}$ (assume zero initial conditions on $u, \bar{u}$). If we define a third state variable now, $x_3 = u - \bar{u}$, then that tells us that $\dot{x}_3 = \Omega \bar{u}$. Here, I have $A = \begin{bmatrix} 0 & 1 & 0 \\ \Omega^2 & 0 & -1 \\ 0 & 0 & 0 \end{bmatrix}$, $B = \begin{bmatrix}0 \\ 1 \\ \Omega\end{bmatrix}$, $C = \begin{bmatrix}1 & 0 & 0\end{bmatrix}$, $D = 0$. Out of time today, but we'll solve at the beginning of Tuesday's class.

Solve for $x(t) = \begin{bmatrix}x_1, x_2, x_3\end{bmatrix}$. We have a few approaches:

• Using $A,B,C,D$: compute the following: $y(t) = Ce^{At} x_0 + C\int_0^t e^{A(t - \tau)}Bu(\tau) d\tau$. In doing that, we'll need to compute $e^{At}$, and then we have this expression for general $u$: suppose you supply a step input.
• Suppose $\bar{u} = -y = -Cx$. Therefore $\dot{x} = Ax + B(-Cx) = (A - BC)x$. We have a new $A_{CL} = A - BC$, and we can exponentiate this instead.

Foreshadows later, when we think about control. Introduces this standard notion of feedback for stabilizing systems. Using newfound knowledge of state transition matrix for TI systems (how to compute it), see how to compute. See what MATLAB is doing.

Something went wrong with that request. Please try again.