# public steveWang /Notes

### Subversion checkout URL

You can clone with HTTPS or Subversion.

Fetching contributors…

Cannot retrieve contributors at this time

file 1471 lines (1469 sloc) 104.08 kb
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 

EE 221A: Linear System Theory

August 23, 2012

Prof. Claire Tomlin (tomlin@eecs). 721 Sutardja Dai Hall. Somewhattentative office hours on schedule: T 1-2, W 11-12.http://inst.eecs.berkeley.edu/~ee221a

GSI: Insoon Yang (iyang@eecs). In Soon's office hours: M 1:30 - 2:30, θ11-12.

Homeworks typically due on Thursday or Friday.

Intro

Bird's eye view of modeling in engineering + design vs. in science.

"Science":

$$\mbox{Dynamical system} \rightarrow \mbox{experiments} \leftrightarrow \mbox{Model}$$

"Engineering":

$$\mbox{dynamical system} \rightarrow \mbox{experiments} \leftrightarrow \mbox{Model} \rightarrow \mbox{control}$$

Control validation, verification, testing.

Broad brush of a couple of concepts: modeling. We're going to spend a lotof time talking about modeling in this course.

First: for any dynamical system, there's an infinite number of models youcould design, depending on your level of abstraction. Typically, you chooselevel of abstraction based on use case. Often, only able to use certainkinds of experiments. e.g. probing of protein concentration levels. Ifyou're able to measure just this, then the signals in your model shouldhave something to do with these concentration levels.

As we said, same phys system can have many different models. Anotherexample: MEMS device. Can think about having models at various differentmodels. e.g. electrical model: silicon / electrostatics of thesystem. Might be interested in manipulation of the device.

Alt: mechanical model (could have a free-body diagram, e.g.).

Another example: Hubble telescope. Could think of orbitaldynamics. Individual rigid body dynamics. Or properties of the telescope,the individual optical models of the mirrors and their interactions. Theidea here is to just realize that the word model can mean very differentthings. The logical model to use depends on the task at hand. The mainpoint: a basic tenet of engineering: the value of a model: choose thesimplest model that answers the questions you are asking about thesystem. The simplest model that will allow you to predict something thatyou didn't build into it.

Predict IO relations that you didn't explicitly design the model on. One ofthe properties of a good linear model for a system: it obeys linearity, soif you form a basis for your domain, then you have the system response toany input spanned by this basis. Probably the most important thing to takeaway from this course: linearity is a very strong principle that allowsus to build up a set of tools.

Time

We have this term a "dynamical system". A key part is that it changes withtime, responding with behavior over time. Time will turn out to be quiteimportant. Depending on how we model time, we can come up with differentvariables. We call time (t) a privileged model because it has certainproperties. Namely, when we think about time, we think about time marchingforward (unidirectionality of evolution). Different models: continuous time($t \in \Re$, could be negative, could go backwards, if we are interestedin backwards evolution), or discrete time $t \in \{nT, n \in \mathbb{Z}\}$,where $T$ is some sampling time. So in that sense, discrete time, we havesome set. We can also come up with more complicated models of time, likediscrete-time asynchronous. The previous model was some constant period$T$. In DT asynchronous, we just have a set of points in time. Now becominga more important model now with asynchronous processes (react to eventsthat are going to happen at previously undefined points in time).

Linear vs. nonlinear models

More on this later. Suppose we could take the system, and we couldrepresent it as being in one of a number of states.

First: suppose a finite number of states (so can be modeled by a FSM),which represent some configuration of the system. State space representsstates system can be in at any point in time. If state space is finite, wecan use a finite-state automaton. Each state has an output (prints out amessage, or a measurement is taken), and we also consider inputs. Theinputs are used to evolve the dynamic system. Input affects atransition. We can build up the dynamics of the system by just defining thetransition function.

Packet transmitting node: first state is "ready-to-send"; second state is"send packet & wait"; and the third state is "flush buffer". If bufferempty, stay in $q_1$. If not empty, transitions to $q_2$. If ACK received,then transition to $q_3$ and return to $q_1$. If $T$ time units elapse, wetime out and transition directly to $q_1$. Here, no notion of linear ornonlinear systems. To be able to talk about linear or nonlinear models, weneed to be able to put some vector space structure on these threeelements. System must then satisfy superposition.

Back to abstract dynamical system (thing we could never hope to modelperfectly): rather than thinking about a set of rules, we're going to thinkabout a mathematical model. Three classes: CT, DT [synchronous], anddiscrete-state (typically finite). Within each of these classes we canfurther break each down. For the first two, we can consider linearity, andwe can further break these down into time-varying (TV) and time-invariant(TI). This course is going to focus just on the linear systems incontinuous and discrete time, both time-varying and time-invariant. We'lluse differential equation models in continuous time and difference equationmodels in discrete time. We usually develop in continuous-time and showanalogies in discrete-time.

Analysis and Control

Control is pervasive. If you go to any of the control conferences, you seeareas where techniques from this course are applied. Modern control cameabout because of aerospace in the 50s. e.g. autopilot, air trafficcontrol. There the system itself is the system of aircraft. Chemicalprocess control. Mechatronics, MEMS, robotics. Novel ways to automatethings that hadn't been automated previously, mostly because of arenaissance in sensing. Power systems. Network control systems: how youcombine models of the system itself with the control models. Quantumchemistry. Typically, when we think about state spaces, we think about thestate as a vector in $\Re^n$. In many cases, you want to think about thestate spaces as more complicated (e.g. $C^\infty$, the class of smoothfunctions).

Difference between verification, simulation, and validation

One of the additional basic tenets of this course: if you have a model ofthe system, and you can analytically verify that the model behaves in givenways for ranges of initial conditions, then that is a very valuable thingto have: you have a proof that as long as the system adheres to the model,then your model will work as expected. Simulation gives you system behaviorfor a certain set of parameters. Very different, but they complement eachother. Analyze simpler models, simulate more complex models.

Linear Algebra

Functions and their properties.

Fields, vector spaces, properties and subspaces.

(note regarding notation: $\Re^+$ means non-negative reals, as does$\mathbb{C}_+$ (non-negative real part)

$\exists!$: exists a unique, $\exists?$: does there exist, $\ni$:such that.

Cartesian product: $\{(x,y) \vert x \in X \land y \in Y\}$ (set of orderedn-tuples)

Functions and Vector Spaces

August 28, 2012

OH: M/W 5-6, 258 Cory

Today: beginning of the course: review of lin. alg topics needed for thecourse. We're going to go through lecture notes 2 and probably start onthe third sets of notes. Will bring copies of 3 and 4 on Thursday.

We did an introduction to notation and topics last time. First topic:functions, which will be used synonymously with "maps". Terminology will beused interchangeably.

Given two sets of elements X, Y, we defined $\fn{f}{X}{Y}$. Notion ofrange vs. codomain (range is merely the subset of the codomain covered byf). We define $f(X) \defequals \set{f(x)}{x \in X}$ to be therange.

Properties of functions

Injectivity of functions ("one-to-one"). A function $f$ is said to beinjective iff the function maps each x in X to a distinct y inY. Equivalently, $f(x_1) = f(x_2) \iff x_1 = x_2$. This is also equivalentto $x_1 \neq x_2 \iff f(x_1) \neq f(x_2)$.

Surjectivity of functions ("onto"). A function $f$ is said to be surjectiveif the codomain is equal to the range. Basically, the map $f$ covers theentire codomain. A way to write this formally is that $f$ is surjective iff$\forall y \in Y \exists x \in X \ni y = f(x)$.

And then a map $f$ is bijective iff it is both injective and surjective. Wecan write this formally as there being a unique $x \in X$ forall $y \in Y$.

Example: inverse of a map. We can talk about left and right inverses ofmaps. Suppose we have a map $\fn{f}{X}{Y}$. We're going to define thismap $\mathbb{1}_X$ as the identity map on X. Namely, application of thismap to any $x \in X$ will yield the same $x$.

The left inverse of $f$ is $\fn{g_L}{Y}{X}$ such that $g_L \circ f =\mathbb{1}_X$. In other words, $\forall x\in X, (g_L \circ f)(x) = x$.

Prove: $f$ has a left inverse $g_L$ iff $f$ is injective. First of all, letus prove the backwards implication. Assume $f$ is injective. Prove that$g_L$ exists. We're going to construct the map $\fn{g_L}{Y}{X}$ as$g_L(f(x)) = x$, where the domain here is the range of $f$. In order forthis to be a well-defined function, we require that $x$ is unique, which ismet by injectivity of $f$.

Now let us prove the forward implication. Assume that this left inverse$g_L$ exists. By definition, $g_L \circ f = \mathbb{1}_x \iff \forall x\in X g_L(f(x)) = x$. If $f$ were not injective, then $g_L$ would not bewell-defined ($\exists x_1 \neq x_2$ such that $f(x_1) = f(x_2)$, and so$g_L$ is no longer a function).

review: contrapositive: $(A \implies B) \iff (\lnot B \implies \lnot A)$;contradiction: $(A \not\implies B) \implies \text{contradiction}$.

We can similarly shows surjectivity $\iff$ existence of a rightinverse. With these two, we can then trivially show that bijectivity $\iff$existence of an inverse (rather, both a left and right inverse, which wecan easily show must be equal). Proof will likely be part of the firsthomework assignment.

Fields

We need the definition of a vector and a field in order to define a vectorspace.

A field is an object: a set of elements $S$ with two closed binaryoperations defined upon $S$. These two operations are addition (whichforms an abelian group over $S$) and multiplication (which forms an abeliangroup over $S - \{0\}$) such that multiplication distributes overaddition. Note that convention dictates $0$ to be the additive identity and$1$ to be the multiplicative identity.

Other silly proofs include showing that if both a left and right identityexist, they must be equivalent, or that multiplication by $0$ maps anyelement to $0$.

Vector spaces (linear spaces)

A vector space is a set of vectors V and a field of scalars $\mathbb{F}$,combined with vector addition and scalar multiplication. Vector additionforms an abelian group, but this time, scalar multiplication has theproperties of a monoid (existence of an identity and associativity). Wethen have the distributive laws $\alpha + \beta)x = \alpha x + \beta x$ and$\alpha (x + y)$.

Function spaces

We define a space $F(D,V)$, where $(V, \mathbb{F})$ is a vector space and$D$ is a set. $F$ is the set of all functions $F(D, V) = \fn{f}{D}{V}$. Is$(F, \mathbb{F})$ a vector space (yes) where vector addition is pointwiseaddition of functions and scalar multiplication is pointwise multiplicationby a scalar?

Examples of this: space of continuous functions on the closed interval$\fn{\mathcal{C}}{\bracks{t_0, t_1}}{\Re^n}$, ($(C(\bracks{t_0, t_1},\Re^n), \Re)$). This is indeed a vector space.

Lebesgue spaces

$L_p t_0, t_1) = \set{\fn{f}{[t_0, t_1]}{\Re}}{\int_{t_0}^{t_1}\abs{f(t)}^p dt < \infty}$.

We can then talk about $\ell_p$, which are spaces of sequences. $\ell_2$ isthe space of square-summable sequences of real numbers. Informally, $\ell_2= \set{ v = \{v_1, v_2, ... v_k\}}{v_k \in \Re \sum_k \abs{v_k}^2 <\infty}$.

In general, when looking at vector spaces, often we use $\mathbb{F} = \Re$,and we refer to the space as simply $V$.

Next: subspaces, bases, linear dependence/independence, linearity. One ofthe main things we're going to do is look at properties of linear functionsand representation as multiplication by matrices.

Vector Spaces and Linearity

August 30, 2012

From last time

Subspaces, bases, linear dependence/independence, linearity. One of themain things we're going to do is look at properties of linear functions andrepresentation as multiplication by matrices.

Example (of a vector space)

$\ell_2 = \{v = \{v_1, v_2, ...\} \st \sum_{i=1}^\infty \abs{v_i}^2 <\infty, v_i \in \Re \}$

What is a vector subspace?

Consider vector space $(V, \mathbb{F})$. Consider a subset W of V combinedwith the same field. $(W, \mathbb{F})$ is a subspace of $(V, \mathbb{F})$if it is closed under vector addition and scalar multiplication (formally,this must be a vector space in its own right, but these are the onlyvector space properties that we need to check).

Consider vectors from $\Re^n$. A plane (in $\Re^3$) is a subspace of$\Re^3$ if it contains the origin.

Aside: for $x \in V$, span$(x) = \alpha x, \alpha \in \mathbb{F}$.

Linear dependence, linear independence.

Consider a set of $p$ vectors $\{v_1, v_2, ..., v_p\}, v_i \in V$. This setof vectors is said to be a linear independent set iff no nontrivialhomogeneous equation exists, i.e. $\sum_i \alpha_i v_i = 0 \implies \foralli, \alpha_i = 0$. This is equivalent to saying that no one vector can bewritten as a linear combination of the others.

Otherwise, the set is said to be linearly dependent.

Bases

Recall: a set of vectors $W$ is said to span a space $(V, \mathbb{F})$ ifany vector in the space can be written as a linear combination ofvectors in the set, i.e. $\forall v \in V, \exists \set{(\alpha_i,w_i)}{v = \sum \alpha_i w_i}$ for $w_i \in W, \alpha_i \in \mathbb{F}$.

W is a basis iff it is also linearly independent.

Coordinates

Given a basis $B$ of a space $(V, \mathbb{F})$, there is a uniquerepresentation (trivial proof) of every $v \in V$ as a linear combinationof elements of $B$. We define our coordinates to be the coefficientsthat appear in this unique representation. A visual representation is thecoordinate vector, which defines

$$\alpha = \begin{bmatrix}\alpha_i \\ \vdots \\ \alpha_n \end{bmatrix}$$

Basis is not uniquely defined, but what is constant is the number ofelements in the basis. This number is the dimension (rank) of thespace. Another notion is that a basis generates the corresponding space,since once you have a basis, you can acquire any element in the space.

Linearity

A function $\fn{f}{(V, \mathbb{F})}{(W, \mathbb{F})}$ (note that thesespaces are defined over the same field!) is linear iff $f(\alpha_1 v_1 +\alpha_2 v_2) = \alpha_1 f(v_1) + \alpha_2 f(v_2)$.

This property is known as superposition, which is an amazing property,because if you know what this function does to the basis elements of avector space, then you know what it does to any element in the space.

An interesting corollary is that a linear map will always map the zerovector to itself.

Definitions associated with linear maps

Suppose we have a linear map $\fn{\mathcal{A}}{U}{V}$. The range(image) of $\mathcal{A}$ is defined to be $R(\mathcal{A}) = \set{v}{v =A(u), u \in U} \subset V$. The nullspace (kernel) of $\mathcal{A}$ isdefined to be $N(\mathcal{A}) = \set{u}{\mathcal{A}(u) = 0} \subset U$. Alsotrivial (from definition of linearity) to prove that these are subspaces.

We have a couple of very important properties now that we've defined rangeand nullspace.

Properties of linear maps $\fn{\mathcal{A}}{U}{V}$

$$(b \in V) \implies (\mathcal{A}(u) = b \iff b \in R(\mathcal{A}))$$

$$(b \in R(\mathcal{A})) \iff (\exists!\ u\ \st \mathcal{A}(u) = b \iff[N(\mathcal{A}) = 0])$$

(if the nullspace only contains the zero vector, we say it is trivial)

$$\mathcal{A}(x_0) = \mathcal{A}(x_1) \iff x - x_0 \in N(\mathcal{A})$$

Matrix Representation of Linear Maps

September 4, 2012

Today

Matrix multiplication as a representation of a linear map; change of basis-- what happens to matrices; norms; inner products. We may get to adjointstoday.

Last time, we talked about the concept of the range and the nullspace of alinear map, and we ended with a relationship that related properties of thenullspace to properties of the linear equation $\mathcal{A}(x) = b$. Aswe've written here, this is not matrix multiplication. As we'll seetoday, it can be represented as matrix multiplication, in which case, we'llwrite this as $Ax = b$.

There's one more important result, called the rank-nullity theorem. Wedefined the range and nullspace of a linear operator. We also showed thatthese are subspaces (range of codomain; nullspace of domain). We call$\text{dim}(R(\mathcal{A})) = \text{rank}(\mathcal{A})$ and$\text{dim}(N(\mathcal{A})) = \text{nullity}(\mathcal{A})$. Taking thedimension of the domain as $n$ and the dimension of the codomain as $m$,$\text{rank}(\mathcal{A}) + \text{nullity}(\mathcal{A}) = n$. Left as anexercise. Hints: choose a basis for the nullspace. Presumably you'd extendit to a basis for the domain (without loss of generality, because any setof $n$ linearly independent vectors will form a basis). Then consider howthese relate to the range of $\mathcal{A}$. Then map $\mathcal{A}$ overthis basis.

Matrix representation

Any linear map between finite-dimensional vector spaces can berepresented as matrix multiplication. We're going to show that it's truevia construction.

$\fn{\mathcal{A}}{U}{V}$. We're going to choose bases for the domain andcodomain. $\forall x \in U, x = \sum_{j=1}^n \xi_k u_j$. Now consider$\mathcal{A}(x) = \mathcal{A}(\sum_{j=1}^n \xi_k u_j) = \sum_{j=1}^n \xi_k\mathcal{A}(u_j)$ (through linearity). Each $\mathcal{A}(u_j) =\sum_{i=1}^n a_{ij} v_i$. Uniqueness of $a_{ij}$ and $\xi_j$ follows fromwriting the vector spaces in terms of a basis.

$$\mathcal{A}(x) = \sum_{j=1}^n \xi_j \sum_{i=1}^m a_{ij} v_i\\ = \sum_{i=1}^m \left(\sum_{j=1}^n a_{ij} \xi_j\right) v_i\\ = \sum_{i=1}^m \eta_i v_i$$

Uniqueness of representation tells me that $\eta_i \equiv \sum_{j=1}^na_{ij} \xi_j$. We've got $i = \{1 .. m\}$ and $j = \{1 .. n\}$. We can turnthis representation into a matrix by defining $\eta = A\xi$. $A \in\mathbb{F}^{m \times n}$ is defined such that its $j^{\text{th}}$ column is$\mathcal{A}(u_j)$ written with respect to the $v_i$s.

All we used here was the definitions of basis, coordinate vectors, andlinearity.

Let's do a couple of examples. Foreshadowing of work later incontrollability of systems. Consider a linear map $\fn{\mathcal{A}}{(\Re^n, \Re)}{(\Re^n, \Re)}$. Try to derive the matrix $A \in \Re^{n\times n}$. Both the domain and codomain have as basis $\{b,\mathcal{A}(b), \mathcal{A}^2(b), ..., \mathcal{A}^{n-1}(b)\}$, where $b\in \Re^n$ and $A^n = -\sum_1^n -\alpha_i \mathcal{A}^{n-i}$. Your task isto show that the representation of $b$ and $\mathcal{A}$ is:

$$\bar{b} = \begin{bmatrix}1 \\ 0 \\ \vdots \\ 0\end{bmatrix}\\ \bar{A} = \begin{bmatrix}\\ 0 & 0 & ... & 0 & -\alpha_n\\ 1 & 0 * ... & 0 & -\alpha_{n-1}\\ 0 & 1 * ... & \vdots & -\alpha_{n-2}\\ \vdots & \vdots& \ddots & \vdots & -\alpha_{n-2}\\ \vdots & \vdots & \ddots & \vdots & -\alpha_{n-2}\\ 0 & 0 & \dots & 1 & -\alpha_1\end{bmatrix}$$

This is really quite simple; it's almost by definition.

Note that these are composable maps, where composition corresponds tomatrix multiplication.

Change of basis

Consider we have $\fn{\mathcal{A}}{U}{V}$ and two sets of bases for thedomain and codomain. There exist maps between the first set of bases andthe second set; composing those appropriately will give you your change ofbasis. Essentially, do a change of coordinates to those in which $A$ isdefined (represented this as $P$), apply $A$, then change the coordinatesof the codomain back (represented as $Q$). Thus $\bar{A} = QAP$.

If $V = U$, then you can easily derive that $Q = P^{-1}$, so $\bar{A} =P^{-1}AP$.

We consider this transformation ($\bar{A} = QAP$) to be a similaritytransformation, and $A$ and $\bar{A}$ are called similar(equivalent).

We derived these two matrices from the same linear map, but they're derivedusing different bases.

Proof of Sylvester's inequality on homework 2.

One last note about the dimension of the rank of a linear map, whichcorresponds to the rank of the associated matrix representation: that is$\text{dim}(R(A)) = \text{dim}(R(\mathcal{A}))$. Similarly, $\text{nullity}(A) = \text{dim}(\text{nullspace}(A)) = \text{dim}(\text{nullspace}(\mathcal{A}))$.

Sylvester's inequality, which is an important relationship, says thefollowing: Suppose you have $A \in \mathbb{F}^{m \times n}$, $B \in\mathbb{F}^{n \times p}$, then $AB \in \mathbb{F}^{m \times p}$, then$\text{rk}(A) + \text{rk}(B) - n \le \text{rk}(AB) \le \min(\text{rk}(A),\text{rk}(B)$. On the homework, you'll have to show bothinequalities. Note at the end about elementary row operations.

Next important concept about vector spaces: that of norms.

Norms

With some vector spaces, you can associate some entity called a norm. Wecan then speak of a normed vector space (more commonly known as ametric space). Suppose you have a vector space $(V, \mathbb{F})$, where$\mathbb{F}$ is either $\Re$ or $\mathbb{C}$. This is a metric space if youcan find $\fn{\mag{\cdot}}{V}{\Re_+}$ that satisfies the following axioms:

$\mag{v_1 + v_2} \le \mag{v_1} + \mag{v_2}$

$\mag{\alpha v} = \abs{\alpha}\mag{v}$

$\mag{v} = 0 \iff v = \theta$

We have some common norms on these fields:

$\mag{x}_1 = \sum_{i=1}^n \abs{x_i}$ ($\ell_1$)

$\mag{x}_2 = \sum_{i=1}^n \abs{x_i}^2$ ($\ell_2$)

$\mag{x}_p = \sum_{i=1}^n \abs{x_i}^p$ ($\ell_p$)

$\mag{x}_\infty = \max \abs{x_i}$ ($\ell_\infty$)

One of the most important norms that we'll be using: the induced normis that induced by a linear operator. We'll define $\mathcal{A}$ to be acontinuous linear map between two metric spaces; the induced norm isdefined as

$$\mag{\mathcal{A}}_i = \sup_{u \neq \theta}\frac{\mag{\mathcal{A}u}_V}{\mag{u}_U}$$

From analysis: the supremum is the least upper bound (the smallest$\forall y \in S, x : x \ge y$).

Guest Lecture: Induced Norms and Inner Products

September 6, 2012

Induced norms of matrices

The reason that we're going to start talking about induced norms: todaywe're just going to build abstract algebra machinery, and at the end, we'lldo the first application: least squares. We'll see why we need thismachinery and why abstraction is a useful tool.

The idea is that we want to find a norm on a matrix using existing norms on vectors.

Let 1) $\fn{A}{(U,F)}{(U,F)}$, 2) let U have the norm $\mag{\ }_u$, 3) letV have the norm $\mag{\ }_v$. Let the induced norm be $\mag{A}_{u,v} =\sup_{x\neq 0} \frac{\mag{Ax}_v}{\mag{x}_u}$. Theorem: the induced norm isa norm. Not going to bother showing positive homogeneity and triangleinequality (trivial in this case). Only going to show last property:separates points. Essentially, $\mag{A}_{u,v} = 0 \iff A = 0$. The reasonthat this is not necessarily trivial is because of the supremum. It's acomplex operator that's trying to maximize this function over an infiniteset of points. It's possible that the supremum does not actually exist at afinite point.

The first direction is easy: if $A$ is zero, then its norm is 0 (bydefinition -- numerator is 0).

The second direction is a hard one. If $\mag{A}_{u,v} = 0$, then given any$x \neq 0$, it holds that $\frac{\mag{Ax}_u}{\mag{v}_u} \le 0$ (from thedefinition of supremum). Denominator must be positive definite (being thenorm of a vector), and numerator must be positive definite (also being anorm). Thus the norm is also bounded below by zero, which means that thenumerator is zero for all nonzero x. Thus everything is in the nullspace of$A$, which means that $A$ is zero.

Proposition: the induced norm has (a) $\mag{Ax}_u \le \mag{A}_{u,v}\mag{x}_u$; (b) $\mag{AB}_{u,v} \le \mag{A}_{u,v} \mag{B}_{u,v}$. (b)follows from (a).

Not emphasized in Claire's notes: induced norms form a small amount of allpossible norms on matrices.

Examples of induced norms:

• $\mag{A}_{1,1} = \max_j \sum_i \abs{u_{ij}}$: maximum column sum: maximum  of the sum of columns;
• $\mag{A}_{2,2} = \max_j \sqrt{\lambda_j A^T A}$: max singular value norm;
• $\mag{A}_{\infty, \infty} = \max_i \sum_j \abs{u_{ij}}$: maximum row sum.

Other matrix: special case of Schatten norms. (a) Frobenius norm$\sqrt{\text{trace}(A^T A)}$. Also square root of singularvalues. Convenient way to write nuclear norm.

Statistical regularization; Frobenius norm is analogous to $\ell_2$regularization; nuclear norm analogous to $\ell_1$ regularization. It isimportant to be aware that these other norms exist.

Sensitivity analysis

Nice application of norms, but we won't see that it's a nice applicationuntil later.

Computation for numerical linear algebra.

Some algebra can be performed to show that if $Ax_0 = b$ (when $A$invertible), then for $(A + \delta A)(x + \delta_x) = b + \delta b$, wehave an approximate bound of $\frac{\mag{\delta_x}}{\mag{x_0}} \le\mag{A}\mag{A^{-1}} \bracks{\frac{\mag{\delta A}}{\mag{A}} +\frac{\mag{\delta b}}{\mag{b}}}$. Need to engineer computation to improvesituation. Namely, we're perturbing $A$ and $b$ slightly: how much can thesolution vary? In some sense, we have a measure of effect($\mag{A}\mag{A^{-1}}$) and a measure of perturbation. The first quantityis important enough that people in linear algebra have defined it andcalled it a condition number: $\kappa(A) = \mag{A}\mag{A^{-1}} \ge1$. The best you can do is 1. If you have a condition number of 1, yoursystem is well-conditioned and very robust to perturbations. Largercondition number will mean less robustness to perturbation.

More machinery: Inner Product & Hilbert Spaces

Consider a linear space $(H, \mathbb{F})$, and define a function$\fn{\braket{}{}}{(H, \mathbb{F})}{\mathbb{F}}$. This function is aninner product if it satisfies the following properties.

• Conjugate symmetry. $\braket{x}{y} = \braket{y}{x}^*$.
• Homogeneity. $\braket{x}{\alpha y} = \alpha \braket{x}{y}$.
• Linearity. $\braket{x}{y + z} = \braket{x}{y} + \braket{x}{z}$.
• Positive definiteness. $\braket{x}{x} \ge 0$, where equality only occurs  when $x = 0$.

Inner product spaces have a natural norm (might not be the official name),and that's the norm induced by the inner product.

One can define $\mag{x}^2 = \braket{x}{x}$, which satisfies the axioms of anorm.

Examples of Hilbert spaces: finite-dimensional vectors. Most of the time,infinite-dimensional Hilbert spaces match up with finite-dimensional. Alllinear operators in finite vector spaces are continuous because they can bewritten as a matrix (not always the case with infinite vectorspaces). Suppose I have the field $\mathbb{F}$; $(\mathbb{F}^n,\mathbb{F})$, where the inner product $\braket{x}{y} = \sum_i \bar{x_i}y_i$, but another important inner product space is the space ofsquare-integrable functions, $L^2([t_0, t_1], \mathbb{F}^n)$. Infinite-dimensional space which actually is the space spanned byFourier series. It turns out that the inner product (of functions) is$\int_{t_0}^{t_1} f(t)^* g(t) dt$.

We're going to power through a little more machinery, but we're gettingvery close to the application. Need to go through adjoints andorthogonality before we can start doing applications.

Consider Hilbert spaces $(U, \mathbb{F}, \braket{}{}_u), V, \mathbb{F},\braket{}{}_v)$, and let $\fn{A}{U}{V}$ be a continuous linearfunction. The adjoint of $A$ is denoted $A^*$ and is the map$\fn{A^*}{V}{U}$ such that $\braket{x}{Ay}_v = \braket{A^*}{y}_u$.

Reasoning? Sometimes you can simplify things. Suppose $A$ maps aninfinite-dimensional space to a finite-dimensional space (e.g. functions tonumbers). In some sense, you can convert that function into something thatgoes from real numbers to functions on numbers. Generalization of theHermitian transpose.

Consider functions $f, g \in C([t_0, t_1], \Re^n)$. What is the adjoint of$\fn{A}{C([t_0, t_1], \Re^n)}{\Re}$, where $A = \braket{g}{f}_{C([t_0, t_1], \Re^n)}$? (aside: this notion of the adjoint will be veryimportant when we get to observability and reachability)

Observe that $\braket{v}{A}_\Re = v \cdot A = v \braket{g}{f}_C = \braket{vg}{f}$, and so consequently, we have that the adjoint of $A^*[v] = v g$.

Orthogonality

With Hilbert spaces, one can define orthogonality in an axiomatic manner (amore abstract form, rather). Let $(H, \mathbb{F}, \braket{}{})$ be aHilbert space. Two vectors $x, y$ are defined to be orthogonal if$\braket{x}{y} = 0$.

Cute example: suppose $c = a + b$ and $a, b$ are orthogonal. In fact,$\mag{c}^2 = \mag{a + b}^2 = \braket{a + b}{a + b} = \braket{a}{a} +\braket{b}{b} + \braket{a}{b} + \braket{b}{a} = \mag{a}^2 +\mag{b}^2$. Cute because the result is the Pythagorean theorem, which wegot just through these axioms.

One more thing: the orthogonal complement of a subspace $M$ in a Hilbertspace is defined as $M^\perp = \set{y \in H}{\forall x \in M\braket{x}{y}}$.

We are at a point now where we can talk about an important theorem:

Fundamental Theorem of Linear Algebra (partially)

Let $A \in \Re^{m \times n}$. Then:

• $R(A) \perp N(A^T)$
• $R(A^T) \perp N(A)$
• $R(AA^T) = R(A)$
• $R(A^TA) = R(A^T)$
• $N(AA^T) = N(A)$
• $N(A^TA) = N(A^T)$

Proofs:

• Given any $x \in \Re^n, y \in \Re^m \st A^T y = 0$ ($y \in N(A^T)$),  consider the quantity $\braket{y}{Ax} = \braket{A^Ty}{x} = 0$.

• Given any $x \in \Re^n, \exists y \in \Re^m \st x = A^T y + z$, where $z \in N(A)$(as a result of the decomposition above). Thus $Ax = AA^Ty$. Implies that $R(A) \subset R(A A^T)$

Now for the application.

Application: Least Squares

Consider the following problem: minimze $\mag{y - Ax}_2$, where $y \not\inR(A)$. If $y$ were in the range of A, and A were invertible, the solutionwould be trivial ($A^{-1}y$). In many problems, $A \in \Re^{m\times n}$,where $m \gg n$, $y \in \Re^m$, $x \in \Re^n$.

Since we cannot solve $Ax = y$, we instead solve $Ax = \hat{y}$. Accordingto our intuition, we would like $y - \hat{y}$ to be orthogonal to$R(A)$. From the preceding (partial) theorem, this means that $y - \hat{y}\in N(A^T) \iff A^T(y - y_0) = 0$. Remember: what we really want to solveis $A^T(y - Ax) = 0 \implies A^T Ax = A^T y \implies x = (A^T A)^{-1} A^Ty$ if $A^T A$ is invertible.

If A has full column-rank (that is, for $A \in \Re^{m \times n}$, we have$R(A) = n$), then this means that in fact $N(A) = \{0\}$, and the precedingtheorem implies that the dimension of $R(A^T) = n$, which means that thedimension of $R(A^T A) = n$. However, $A^T A \in \Re^{n \times n}$. Thus,$A^T A$ is invertible.

Back to condition numbers (special case)

Consider a self-adjoint and invertible matrix in $\Re^{n \timesn}$. $\hat{x} = (A^T A)^{-1} A^T y = A^{-1} y$. We have two ways ofdetermining this value: the overdetermined least-squares solution and thestandard inverse. Let us look at the condition numbers.

$\kappa(A^T A) = \mag{A^T A}\mag{(A^T A)^{-1}} = \mag{A^2}\mag{(A^{-1})^2}= \bracks{\kappa(A)}^2$. This result is more general: also applies in the$L^2$ case even if $A$ is not self-adjoint. As you can see, this is worsethan if we simply use the inverse.

Gram-Schmidt orthonormalization

This is a theoretical toy, not used for computation (numerics are very bad).

More definitions:

A set of vectors S is orthogonal if $x \perp y \forall x\neq y$ and $x, y \in S$.

The set is orthonormal if also $\mag{x} = 1, \forall x \in S$. Why do wecare about orthonormality? Consider Parseval's theorem. The reason you getthat theorem is that the bases are required to be orthonormal so that youcan get that result. Otherwise it wouldn't be as clean. That's typicallywhy people like orthonormal bases: you can represent your vectors as justcoefficients (and you don't need to store the length of the vectors).

We conclude with an example of Gram-Schmidt orthonormalization. Considerthe space $L^2([t_0, t_1], \Re)$. Suppose I have $v_1 = 1, v_2 = t, v_3 =t^2$, $t_0 = 0$, $t_1 = 1$, and $\mag{v_1}^2 = \int_0^1 1 \cdot 1 dt =1$. The key idea of Gram-Schmidt orthonormalization is the following: startwith $b_1 \equiv \frac{v_1}{\mag{v_1}}$. Then go on with $b_2 = \frac{v_2 -\braket{v_2}{b_1}b_1}{\mag{v_2 - \braket{v_2}{b_1}b_1}}$, and repeat untilyou're done (in essence: you want to preserve only the component that isorthogonal to the space spanned by the vectors you've computed so far, thenrenormalize).

Basically, you get after all this computation that $b_2 = \frac{1}{12} t -\frac{1}{24}$. Same construction for $b_3$.

Singular Value Decomposition & Introduction to Differential Equations

September 11, 2012

Reviewing the adjoint, suppose we have two vector spaces $U, V$; like wehave with norms, let us associated a field that is either $\Re$ or$\mathbb{C}$. Assume that these spaces are inner product spaces (we'reassociating with each an inner product). Suppose we have a continuous(linear) map $\fn{\mathcal{A}}{U}{V}$. We define the adjoint of thismap to be $\fn{\mathcal{A}^*}{V}{U}$ such that $\braket{u}{\mathcal{A} v} =\braket{\mathcal{A}^* v}{u}$.

We define self-adjoint maps as maps that are equal to their adjoints,i.e. $\fn{\mathcal{A}}{U_1}{U_2} \st \mathcal{A} = \mathcal{A}^*$.

In finite-dimensional vector spaces, the adjoint of a map is equivalent tothe conjugate transpose of the matrix representation of the map. We referto matrices that correspond to self-adjoint maps as hermitian.

Unitary matrices

Suppose that we have $U \in \mathbb{F}^{n\times n}$. $U$ is unitary iff$U^*U = UU^* = I_n$. If $\mathbb{F}$ is $\Re$, the matrix is calledorthogonal.

These constructions lead us to something useful: singular valuedecomposition. We'll come back to this later when we talk about matrixoperations.

Singular Value Decomposition (SVD)

Suppose you have a matrix $M \in \mathbb{F}^{m\times m}$. An eigenvalue$\lambda$ of $M$ is a complex number iff there exists a nonzero vector $v$such that $Mv = \lambda v$ ($v$ is thus called the eigenvectorassociated to $\lambda$). Now we can think about how to define singularvalues of a matrix in terms of these definitions.

Let us think about this in general for a matrix $A \in \mathbb{F}^{m \timesn}$ (which we consider to be a matrix representation of some linear mapwith respect to a basis). Note that $A A^* = \mathbb{F}^{m \times m}$,which will have $m$ eigenvalues $\lambda_i, i = 1 ... m$.

Note that $AA^*$ is hermitian. We note that from the Spectral theorem, wecan decompose the matrix into an orthonormal basis of eigenvectorscorresponding to real eigenvalues. In fact, in this case, the eigenvaluesmust be real and non-negative.

If we write the eigenvalues of $AA^*$ as $\lambda_1 \ge \lambda_2 \ge... \ge \lambda_m$, where the first $r$ are nonzero, note that $r =\text{rank} AA^*$. We define the non-zero singular values of $A$ to be$\sigma_i = \sqrt{\lambda_i}, i \le r$. The remaining singular values arezero.

Recall the induced 2-norm: let us relate this notion of singular valuesback to the induced 2-norm of a matrix $A$ ($\mag{A}_{2,i}$). Consider theinduced norm to be the norm induced by the action of $A$ on the domain of$A$; thus if we take the induced 2-norm, then this is the $\max (\lambda_i(A^*A))^{1/2}$, which is simply the maximum singular value.

Now that we know what singular values are, we can do a useful decompositioncalled singular value decomposition.

Take $M \in \mathbb{C}^{m \times n}$. We have the following theorem: thereexist unitary matrices $U \in \mathbb{C}^{m \times m}, V \in \mathbb{C}^{n\times n}$ such that $A = U \Sigma V$, where $\Sigma$ is defined as adiagonal matrix containing the singular values of $A$. Consider the first$r$ columns of $U$ to be $U_1$, the first $r$ columns of $V$ to be $V_1$,and the $r \times r$ block of $\Sigma$ containing the nonzero singularvalues to be $\Sigma_r$. Then $A = U \Sigma V = U_1 \sigma_rV_1^*$.

Consider $AA^*$. With a bit of algebra, we can show that $AA^*U_1 = U_1\sigma_r^2$. We call the columns $u_i$ of $U_1$ are the eigenvectors of$AA^*$ associated to eigenvalues $\sigma_i^2$; these are called theright-singular vectors.

Similarly, if we consider $A^*A$, we can show that $A^*A = V_1^* \Sigma_r^2V_1$ and that $v_i^* A^*A = \Sigma_r^2 v_1^*$; the columns of this matrixare called the left-singular vectors.

Recap

We've covered a lot of ground these past few weeks: we covered functions,vector spaces, bases, and then we started to consider linearity. And thenwe started talking about endowing vector spaces with things like norms,inner products; induced norms. From that, we went on to talk aboutadjoints. We used adjoints, we went on to talk a little about projectionand least-squares optimization. We then went on to talk about Hermitianmatrices and singular value decomposition. I think about this first unit ashaving many basic units that we'll use over and over again. Two interestingapplications: least-squares, SVD.

So we have this basis now to build on as we talk about linearsystems. We'll also need to build a foundation on linear differentialequations. We'll spend some time going over the basics: what a solutionmeans, under what conditions a solution exists (i.e. what properties doesthe differential equation need to have?). We'll spend the next couple weeksntalking about properties of differential equations.

All of what we've done up to now has been covered in appendix A of Callier& Desoer. For the introduction to differential equations, we'll followappendix B of Callier & Desoer. Not the easiest to read, but verycomprehensive background reading.

The existence and uniqueness theorems are in many places, however.

Lecture notes 7.

Differential Equations

$$\dot{x} = f((x(t), t)), x(t_0) = x_0\\ x \in \Re^n\\ \fn{f}{\Re^n \times \Re}{\Re^n}$$

(strictly speaking, $f$ maps $x$ to the tangent space, but for this course,we're going to consider the two spaces to be equivalent)

Often, we're going to consider the time-invariant case (where there isno dependence on $t$, but rather only on $x$), but this is a time-variantcase. Recall that we consider time to be a privileged variable, i.e. always"marching forward".

What we're going to talk about now is how we can solve this differentialequation. Rather (for now), under what conditions does there exist a(unique) solution to the differential equation (with initial condition)?We're interested in these two properties: existence and uniqueness. Thesolution we call $x(t)$ where $x(t_0) = x_0$. We need some understanding ofsome properties of that function $f$. We'll talk about continuity,piecewise continuity, Lipschitz continuity (thinking about theexistence). In terms of uniqueness, we'll be talking about Cauchysequences, Banach spaces, Bellman-Grönwall lemma.

A couple of different ways to prove uniqueness and existence; we'll use theCallier & Desoer method.

We'll finish today's lecture by just talking about some definitions ofcontinuity. Suppose we have a function $f(x)$ that is said to becontinuous: that is, $\forall \epsilon > 0, \exists \delta > 0 \st\abs{x_1- x_2} < \delta \implies \abs{f(x_1) - f(x_2)} < \epsilon$  ($\epsilon$-$\delta$ definition).

Suppose we have $\fn{f(x,t)}{\Re^n \times \Re}{\Re^n}$. $f$ is said to bepiece-wise continuous (w.r.t. $t$), $\forall x$ if $\fn{f(x,\cdot)}{\Re}{\Re^n}$ is continuous except at a finite number of(well-behaved) discontinuities in any closed and bounded interval oftime. What I'm not allowing in this definition are functions withinfinitely many points of discontinuity.

Next time we'll talk about Lipschitz continuity.

Existence and Uniqueness of Solutions to Differential Equations

September 13, 2012

Section this Friday only, 9:30 - 110:30, Cory 299.

Today: existence and uniqueness of solutions to differential equations.

We called this a DE or ODE, and we associated with it an initialcondition. We started to talk about properties of the function $f$ as afunction of $x$ only, but we can consider thinking about this as a functionof $x$ for all t. This is a map from $\Re^n \to \Re^n$. In this class,recall, we used the $\epsilon$-$\delta$ definition for continuity.

We also introduced the concept of piecewise continuity, which will beimportant for thinking about the right-hand-side of the differentialequation.

We defined piecewise continuity as $\fn{f(t)}{\Re_+}{\Re^n}$, where $f(t)$is said to be piecewise continuous in $t$, where the function is continuousexcept at a set of well-behaved discontinuities (finitely many in anyclosed and bounded, i.e. compact, interval).

Finally, we will define Lipschitz continuity as follows: a function$\fn{f(\cdot, t)}{\Re^n}{\Re^n}$ is Lipschitz continuous in x if thereexists a piecewise continuous function of time $\fn{k(t)}{\Re_+}{\Re_+}$such that the following inequality holds: $\mag{f(x_1) - f(x_2)} \lek(t)\mag{x_1 - x_2}, \forall x_1, x_2 \in \Re^n, \forall t \in \Re_+$. Thisinequality (condition) is called the Lipschitz condition.

An important thing in this inequality is that there has to be one function$k(t)$, and it has to be piecewise continuous. That is, there exists such afunction that is not allowed to go to infinity in compact timeintervals.

It's an interesting condition, and if we look at this and compare theLipschitz continuity definition to the general continuity definition, wecan easily show that if the function is LC (Lipschitz continuous), thenit's C (continuous), since LC is a stricter condition than C. Thatimplication is fairly straightforward to show, but the inverse relationshipis not necessarily true (i.e. continuity does not necessarily implyLipschitz continuity).

Aside: think about this condition and what it takes to show that a functionis Lipschitz continuous. Need to come up with a candidate $k(t)$ (oftencalled the Lipschitz function or constant, if it's constant). Often thehardest part: trying to extract from $f$ what a possible $k$ is.

But there's a useful possible candidate for $k(t)$, given a particularfunction $f$. Let's forget about time for a second and consider a functionjust of $x$. If the Jacobian $Df$ (often you also use $\pderiv{f}{x}$),which is an $n \times n$ matrix (where $(Df)^j_i = \pderiv{f_j}{x_i}$. Ifthe Jacobian $Df$ exists, then its norm provides a candidate Lipschitzfunction $k(t)$.

A norm of the Jacobian of $f$, if independent of $x$, tells you that thefunction is Lipschitz. If the norm always seems to depend on $x$, you canstill say something about the Lipschitz properties of the function: you cancall it locally Lipschitz by bounding the value of $x$ in some region.

Sketch of proof: generalization of mean value theorem (easy to sketch in$\Re^1$). Mean value theorem states that there exists a point such that theinstantaneous slope is the same as the average slope (assuming that thefunction is differentiable). If we want to generalize it to moredimensions, we say $f(x_1) - f(x_2) = Df(\lambda x_1 + (1 - \lambda)x_2)(x_1 - x_2)$ (where $0 < \lambda < 1$). All we've required is theexistence of $Df$.

Now we can just take norms (and this is what's interesting now) and usesome of the results we have from norms. This provides a very usefulconstruction for a candidate for $k$ (might not provide a great bound), butit's the second thing to try if you can't immediately extract out afunction $k(t)$.

Something not in the notes, but useful. Let's go back to where we started,the differential equation with initial condition, and state the maintheorem.

Fundamental Theorem of DEs / the Existence and Uniqueness theorem of (O)DEs

suppose we have a differential equation with an initial condition. Assumethat $f(x)$ is piecewise continuous in $t$ and Lipschitz continuous in$x$. With that information, we have that there exists a unique function oftime which maps $\Re_+ \to \Re^n$, which is differentiable ($C^1$) almosteverywhere (derivative exists at all points at which $f$ is continuous),and it satisfies the initial condition and differential equation. Thisderivative exists at all points $t \in [t_1, t_2] - D$, where$D$ is the set of points where $f$ is discontinuous in $t$.

We are going to be interested in studying differential equations where weknow these conditions hold. We're also going to prove the theorem. It's anice thing to do (a little in depth) because it demonstrates some prooftechniques (as well as giving you an idea of why the theorem works).

LC condition

The norm of the Jacobian of the example is bounded for bounded $x$. Thatis, we can choose a local region in $\Re$ for which our $Df$ is bounded tobe less than some constant. That gives us a candidate Lipschitz constantfor that local region. We say then that $f(x)$ is (at least) locallyLipschitz continuous (usually we just say this without specifying aregion, since you can usually find a bound given any region). Further, itis trivially piecewise continuous in time (since it doesn't depend ontime).

Note: if the Lipschitz condition holds only locally, it may be that thesolution is only defined over a certain range of time.

We didn't show this, but in this example, the Lipschitz condition does nothold globally.

Local Fundamental theorem of DEs

Now assume that $f(x)$ is piecewise continuous in $t$ and Lipschitzcontinuous in $x$ (for all $x \in G \in \Re^n$). We now have that thereexists a unique function of time and an interval $[t_0,t_1]$ (such that$t_0 \in G, t_1 \in G$) which maps $\Re_+ \to \Re^n$, which isdifferentiable ($C^1$) almost everywhere (derivative exists at all pointsat which $f$ is continuous), and it satisfies the initial condition anddifferential equation. As before, This derivative exists at all points $t\in [t_1, t_2] - D$, where $D$ is the set of points where $f$ isdiscontinuous in $t$. If it is global, we can make the interval as large asdesired.

Proof

There are two pieces: the proof of existence and the proof ofuniqueness. Today will likely just be existence.

Existence

Roadmap: construct an infinite sequence of continuous functions defined(recursively) as follows $x_{m+1}(t) = x_0 + \int_{t_0}^t f(x_m(\tau),\tau) d\tau$. First, show that this sequence converges to a continuousfunction $\fn{\Phi(\cdot)}{\Re_+}{\Re^n}$ which solves the DE/IC pair.

Would like to be able to prove the first thing here: I've constructed asequence, and I want to show that the limit of this sequence is a solutionto the differential equation.

The tool that I'm going to use is a property called Cauchy, and then I'mgoing to invoke the result that if I have a complete space, any Cauchysequence on the space converges to something in the space. Gives me thebasis of the existence of the thing that this converges to.

Goal: (1) to show that this sequence is a Cauchy sequence in acomplete normed vector space, which means the sequence converges tosomething in the space, and (2) to show that the limit of this sequencesatisfies the DE/IC pair.

A Cauchy sequence (on a normed vector space) is such that there existssome point in the sequence (some finite index $m$) such that if you look atany point beyond that index, the distance between the later points can bemade smaller than some arbitrarily small $\epsilon > 0$. In other words: ifwe drop a finite number of elements from the start of the sequence, thedistance between any remaining elements can be made arbitrarily small.

We define a Banach space (equivalently, a complete normed vectorspace) is one in which all Cauchy sequences converge. Implicitly in that,it means to something in the space itself.

Just an aside, a Hilbert space is a complete inner productspace. If you have an inner product space, and you define the norm inthat inner product space induced by that inner product, if all Cauchysequences of that space converge (to a limit in the space) with this norm,then it is a Hilbert space.

Think about a Cauchy sequence on a space that converges to something notnecessarily in the space. Example: any continued fraction.

To show (1), we'll show that this sequence $\{x_m\}$ that we constructed isa Cauchy sequence in a Banach space. Interestingly, it matters what normyou choose.

Proof of Existence and Uniqueness Theorem

September 18, 2012

Today:

• proof of existence and uniqueness theorem.
• [ if time ] introduction to dynamical systems.

First couple of weeks of review to build up basic concepts that we'll bedrawing upon throughout the course. Either today or Thursday we will launchinto linear system theory.

We're going to recall where we were last time. We had the fundamentaltheorem of differential equations, which said the following: if we had adifferential equation, $\dot{x} = f(x,t)$, with initial condition $x(t_0) =x_0$, where $x(t) \in \Re^n$, etc, if $f( \cdot , t)$ is Lipschitzcontinuous, and $f(x, \cdot )$ is piecewise continuous, then there exists aunique solution to the differential equation / initial condition pair (somefunction $\phi(t)$) wherever you can take the derivative (may not bedifferentiable everywhere: loses differentiability on the points wherediscontinuities exist).

We spent quite a lot of time discussing Lipschitz continuity. Job isusually to test both conditions; first one requires work. We described apopular candidate function by looking at the mean value theorem andapplying it to $f$: a norm of the Jacobian function provides a candidateLipschitz if it works.

We also described local Lipschitz continuity, and often, when using a normof the Jacobian, that's fairly easy to show.

Important point to recall: a norm of the Jacobian of $f$ provides acandidate Lipschitz function.

Another important thing to say here is that we can use any norm we want, sowe can be creative in our choice of norm when looking for a better bound.

We started our proof last day, and we talked a little about the structureof the proof. We are going to proceed by constructing a sequence offunctions, then show (1) that it converges to a solution, then show (2)that it is unique.

Proof of Existence

We are going to construct this sequence of functions as follows:$x_{m+1}(t) = x_0 + \int_0^t f(x_m(\tau)) d\tau$. Here we're dealing withan arbitrary interval from $t_1$ to $t_2$, and so $0 \in [t_1, t_2]$. Wewant to show that this sequence is a Cauchy sequence, and we're going torely on our knowledge that the space these functions are defined in is aBanach space (hence this sequence converges to something in the space).

We have to put a norm on the set of reals, so we'll use the infinitynorm. Not going to prove it, but rather state it's a Banach space. If weshow that this is a Cauchy sequence, then the limit of that Cauchy sequenceexists in the space. The reason that's interesting is that it's this limitthat provides a candidate for this differential equation.

We will then prove that this limit satisfies the DE/IC pair. That isadequate to show existence. We'll then go on to prove uniqueness.

Our immediate goal is to show that this sequence is Cauchy, which is, weshould show $\exists m \st (x_{m+p} - x_m) \to 0$ as $m$ gets large.

First let us look at the difference between $x_{m+1}$ and $x_m$. Justfunctions of time, and we can compute this. $\mag{x_{m+1} - x_m} =\int_{t_0}^t (f(x_m, \tau) - f(x_{m+1}, \tau)) d\tau$. Use the fact that fis Lipschitz continuous, and so it is $\le k(\tau)\mag{x_m(\tau) -x_{m+1}(\tau)} d\tau$. The function is Lipschitz, so well-defined, and ithas a supremum in this interval. Let $\bar{k}$ be the supremum of $k$ overthe whole interval $[t_1, t_2]$. This means that we can take thisinequality and rewrite as $\mag{x_{m+1} - x_m} \le \bar{k} \int_{t_0}^t\mag{x_m(\tau) - x_{m+1}(\tau)} d\tau$. Now we have a bound that relatesthe bound between $x_m$ and $x_{m+1}$. You can essentially relate thedistance we've just related between two subsequent elements to some furtherdistance by counting.

Let us do two things: sort out the integral on the right-hand-side, thenlook at arbitrary elements beyond an index.

We know that $x_1(t) = x_0 + \int_{t_0}^t f(x_0, \tau) d\tau$, and that $x_1- x_0 \le \int_{t_0}^{t} \mag{f(x_0, \tau)} d\tau \le \int_{t_1}{t_2} \mag{f(x_0, \tau) d\tau} \defequals M$. From the above inequalities,  $\mag{x_2 - x_1} \le M \bar{k}\abs{t - t_0}$. Now I can look at general  bounds: $x_3 - x_2 \le \frac{M\bar{k}^2 \abs{t - t_0}^2}{2!}$. In general,  $x_{m+1} - x_m \le \frac{M\parens{\bar{k} \abs{t - t_0}}^m}{m!}$.

If we look at the norm of $\dot{x}$, that is going to be a functionnorm. What I've been doing up to now is look at a particular value $t_1 < t< t_2$.

Try to relate this to the norm $\mag{x_{m+1} - x_m}_\infty$. Can what we'vedone so far give us a bound on the difference between two functions? Wecan, because the infinity norm of a function is the maximum value that thefunction assumes (maximum vector norm for all points $t$ in the intervalwe're interested in). If we let $T$ be the difference between our largerbound $t_2 - t_1$, we can use the previous result on the pointwise norm,then a bound on the function norm has to be less than the samebound, i.e. if a pointwise norm function is less than this bound for allrelevant $t$, then its max value must be less than this bound.

That gets us on the road we want to be, since that now gets us a bound. Wecan now go back to where we started. What we're actually interested in isgiven an index $m$, we can construct a bound on all later elements in thesequence.

$\mag{x_{m+p} - x_m}_\infty = \mag{x_{m+p} + x_{m+p-1} - x_{m+p-1} + ... -x_m} = \mag{\sum_{k=0}^{p-1} (x_{m+k+1} - x_{m+k})} \le M \sum_{k=0}^{p-1}\frac{(\bar{k}T)^{m+k}}{(m+k)!}$.

We're going to recall a few things from undergraduate calculus: Taylorexpansion of the exponential function and $(m+k)! \ge m!k!$.

With these, we can say that $\mag{x_{m+p} - x_m}_\infty \leM\frac{(\bar{k}T)^m}{m!} e^{\bar{k} T}$. What we'd like to show is that thiscan be made arbitrarily small as $m$ gets large. We study this bound as $m\to \infty$, and we recall that we can use the Stirling approximation,which shows that factorial grows faster than the exponential function. Thatis enough to show that $\{x_m\}_0^\infty$ is Cauchy. Since it is in aBanach space (not proving, since beyond our scope), it converges tosomething in the space to a function (call it $x^\ell$) in the samespace.

Now we just need to show that the limit $x^\ell$ solves the differentialequation (and initial condition). Let's go back to the sequence thatdetermines $x^\ell$. $x_{m+1} = x_0 + \int_{t_0}^t f(x_m, \tau)d\tau$. We've proven that this limit converges to $x^\ell$. What we want toshow is that if we evaluate $f(x^\ell, t)$, then $\int_{t_0}^t f(x_m, \tau)\to \int_{t_0}^t f(x^\ell, \tau) d\tau$. Would be immediate if we had thatthe function were continuous. Clear that it satisfies initial condition bythe construction of the sequence, but we need to show that it satisfies thedifferential equation. Conceptually, this is probably more difficult thanwhat we've just done (establishing bounds, Cauchy sequences). Thinkingabout what that function limit is and what it means for it to satisfy thatdifferential equation.

Now, you can basically use some of the machinery we've been using all alongto show this. Difference between these goes to $0$ as $m$ gets large.

$$\mag{\int_{t_0}^t (f(x_m, \tau) f(x^\ell, \tau)) d\tau}\\ \le \int_{t_0}^t k(\tau) \mag{x_m - x^\ell} d\tau \le \bar{k}\mag{x_m - x^\ell}_\infty T\\ \le \bar{k} M e^{\bar{k} T} \frac{(\bar{k} T)^m}{m!}T$$

Thus $x^\ell$ solves the DE/IC pair. A solution $\Phi$ is $x^\ell$,i.e. $x^\ell(t) = f(x^\ell, t) \forall [t_1, t_2] - D$ and $x^\ell(t_0) =x_0$

To show that this solution is unique, we will use the Bellman-Gronwalllemma, which is very important. Used ubiquitously when you want to showthat functions of time are equal to each other: candidate mechanism to dothat.

Bellman-Gronwall Lemma

Let $u, k$ be real-valued positive piece-wise continuous functions of time,and we'll have a constant $c_1 \ge 0$ and $t_0 \ge 0$. If we have suchconstants and functions, then the following is true: if $u(t) \le c_1 +\int_{t_0}^t k(\tau)u(\tau) d\tau$, then $u(t) \le c_1 e^{\int_{t_0}^tk(\tau) d\tau}$.

Proof (of B-G)

$t > t_0$ WLOG.

$$U(t) = c_1 + \int_{t_0}^t k(\tau) u(\tau) d\tau\\ u(t) \le U(t)\\ u(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau} \le U(t)k(t)e^{\int_{t_0}^t k(\tau) d\tau}\\ \deriv{}{t}\parens{U(t)e^{\int_{t_0}^t k(\tau) d\tau}} \le 0 \text{(then integrate this derivative, note that U(t_0) = c_1)}\\ u(t) \le U(t) \le c_1 e^{\int_{t_0}^t k(\tau) d\tau}$$

Using this to prove uniqueness of DE/IC solutions

How we're going to use this to prove B-G lemma.

We have a solution that we constructed $\Phi$, and someone else gives us asolution $\Psi$, constructed via a different method. Show that these mustbe equivalent. Since they're both solutions, they have to satisfy the DE/ICpair. Take the norm of the difference between the differential equations.

$$\mag{\Phi - \Psi} \le \bar{k} \int_{t_0}^t \mag{\Phi - \Psi} d\tau \forallt_0, t \in [t_1, t_2]$$

From the Bellman-Gronwall Lemma, we can rewrite this inequality as$\mag{\Phi - \Psi} \le c_1 e^{\bar{k}(t - t_0)}$. Since $c_1 = 0$, thisnorm is less than or equal to 0. By positive definiteness, this norm mustbe equal to 0, and so the functions are equal to each other.

Reverse time differential equation

We think about time as monotonic (either increasing or decreasing, usuallyincreasing). Suppose that time is decreasing. $\exists \dot{x} =f(x,t)$. Going backwards in time, explore existence and uniqueness goingbackwards in time. Suppose we had a time variable $\tau$ which goes from$t_0$ backwards, and defined $\tau \defequals t_0 - t$. We want to definethe solution to that differential equation backwards in time as $z(\tau) =x(t)$ if $t < t_0$. Derive what reverse order time derivative is. Equationis just $-f$; we're going to use $\bar{f}$ to represent thisfunction ($\deriv{}{\tau}z = -\deriv{}{t}x = -f(x, t) = -f(z, \tau) =\bar{f}$).

This equation, if I solve the reverse time differential equation, we'llhave some corresponding backwards solution. Concluding statement: can thinkabout solutions forwards and backwards in time. Existence of uniquesolution forward in time means existence of unique solution backward intime (and vice versa). You can't have solutions crossing themselves intime-invariant systems.

Introduction to dynamical systems

September 20, 2012

Suppose we have equations $\dot{x} = f(x, u, t)$, $\fn{f}{\Re^n \times\Re^n \times \Re_+}{\Re^n}$ and $y = h(x, u, t)$, $\fn{h}{\Re^n \times\Re^n \times \Re_+}{\Re^n}$. We define $n_i$ as the dimension of the inputspace, $n_o$ as dimension of the output space, and $n$ as the dimension ofthe state space.

We've looked at the form, and if we specify a particular $\bar{u}(t)$ over sometime interval of interest, then we can plug this into the right hand sideof this differential equation. Typically we do not supply a particularinput. Thinking about solutions to this differential equation, for now,let's suppose that it's specified.

Suppose we have some feedback function of the state. If $u$ is specified,as long as $\bar{f}$ satisfies the conditions for the existence anduniqueness theorem, we have a differential equation we can solve.

Another example: instead of differential equation (which corresponds tocontinuous time), we have a difference equation (which corresponds todiscrete time).

Example: dynamic system represented by an LRC circuit. One practical way todefine the state $x$ is as a vector of elements whose derivatives appear inour differential equation. Not formal, but practical for this example.

Notions of discretizing.

What is a dynamical system?

As discussed in first lecture, we consider time $\tau$ to be a privilegedvariable. Based on our definition of time, the inputs and outputs are allfunctions of time.

Now we're going to define a dynamical system as a 5-tuple: $(\mathcal{U},\Sigma, \mathcal{Y}, s, r)$ (input space, state space, output space, statetransition function, output map).

We define the input space as the set of input functions over time to aninput set $U$ (i.e. $\mathcal{U} = \{\fn{u}{\tau}{U}\}$. Typically, $U =\Re^{n_i}$).

We also define the output space as the set of output functions over time toan output set $Y$ (i.e. $\mathcal{Y} = \{\fn{y}{\tau}{Y}\}$). Typically, $Y= \Re^{n_o}$.

$\Sigma$ is our state space. Not defined as the function, but the actualstate space. Typically, $\Sigma = \Re^n$, and we can go back and thinkabout the function $x(t) \in \Sigma$. $\fn{x}{\tau}{\Sigma}$ is called thestate trajectory.

$s$ is called the state transition function because it defines how thestate changes in response to time and the initial state and theinput. $\fn{s}{\tau \times \tau \times \Sigma \times U }{\Sigma}$. Usuallywe write this as $x(t_1) = s(t_1, t_0, x_0, u)$, where $u$ is the function$u(\cdot) |_{t_0}^{t_1}$. This is important: coming towards how we definestate. Only things you need to get to state at the new time are the initialstate, inputs, and dynamics.

Finally, we have this output map (sometimes called the readout map)$r$. $\fn{r}{\tau \times \Sigma \times U}{Y}$. That is, we can think about$y(t) = r(t, x(t), u(t))$. There's something fundamentally differentbetween $r$ and $s$. $s$ depended on the function $u$, whereas $r$ onlydepended on the current value of $u$ at a particular time.

$s$ captures dynamics, while $r$ is static. Remark: $s$ has dynamics(memory) -- things that depend on previous time, whereas $r$ is static:everything it depends on is at the current time (memoryless).

In order to be a dynamical system, we need to satisfy two axioms: adynamical system is a five-tuple with the following two axioms:

• The state transition axiom: $\forall t_1 \ge t_0$, given $u, \tilde{u}$  that are equal to each other over a particular time interval, the state  transition functions must be equal over that interval, i.e. $s(t_1, t_0, x_0, u) = s(t_1, t_0, x_0, \tilde{u})$. Requires us to not have  dependence on the input outside of the time interval of interest.
• The semigroup axiom: suppose you start a system at $t_0$ and evolve it to  $t_2$, and you're considering the state. You have an input $u$ defined  over the whole time interval. If you were to look at an intermediate  point $t_1$, and you computed the state at $t_1$ via the state transition  function, we can split our time interval into two intervals, and we can  compute the result any way we like. Stated as the following: $s(t_2, t_1, s(t_1, t_0, x_0, u), u) = s(t_2, t_0, x_0, u)$.

When we talk about a dynamical system, we have to satisfy these axioms.

Response function

Since we're interested in the outputs and not the states, we can definewhat we call the response map. It's not considered part of the definitionof a dynamical system because it can be easily derived.

It's the composition of the state transition function and the readout map,i.e. $y(t) = r(t, x(t), u(t)) = r(t, s(t, t_0, x_0, u), u(t)) \defequals\rho(t, t_0, x_0, u)$. This is an important function because it is used todefine properties of a dynamical system. Why is that? We've said thatstates are somehow mysterious. Not something we typically care about:typically we care about the outputs. Thus we define properties likelinearity and time invariance.

Time Invariance

We define a time-shift operator $\fn{T_\tau}{\mathcal{U}}{\mathcal{U}}$,$\fn{T_\tau}{\mathcal{Y}}{\mathcal{Y}}$. $(T_\tau u)(t) \defequals u(t -\tau)$. Namely, the value of $T_\tau u$ is that of the old signal at$t-\tau$.

A time-invariant (dynamical) system is one in which the input space andoutput space are closed under $T_\tau$ for all $\tau$, and $\rho(t, t_0,x_0, u) = \rho(t + \tau, t_0 + \tau, x_0, T_\tau u)$.

Linearity

A linear dynamical system is one in which the input, state, and outputspaces are all linear spaces over the same field $\mathbb{F}$, and theresponse map $\rho$ is a linear map of $\Sigma \times \mathcal{U}$ into$\mathcal{Y}$.

This is a strict requirement: you have to check that the response mapsatisfies these conditions. Question that comes up: why do we definelinearity of a dynamical system in terms of linearity of the response andnot the state transition function? Goes back to a system beingintrinsically defined by its inputs and outputs. Often states, you can havemany different ways to define states. Typically we can't see all ofthem. It's accepted that when we talk about a system and think about itsI/O relations, it makes sense that we define linearity in terms of thismemory function of the system, as opposed to the state transition function.

Let's just say a few remarks about this: zero-input response,zero-state response. If we look at the zero element in our spaces (sowe have a zero vector), then we can take our superposition, which impliesthat the response at time $t$ is equal to the zero-state response, which isthe response, given that we started at the zero state, plus the zero inputresponse.

That is: $\rho(t, t_0, x_0, u) = \rho(t, t_0, \theta_x, u) + \rho(t, t_0,x_0, \theta_u)$ (from the definition of linearity).

The second remark is that the zero-state response is linear in the input,and similarly, the zero-input response is linear in the state.

One more property of dynamical systems before we finish: equivalence (aproperty derived from the definition). Take two dynamical systems $D = (U,\Sigma, Y, s, r), \tilde{D} = (U, \bar{\Sigma}, Y, \bar{s}, \bar{r})$. $x_0\in D$ is equivalent to $\tilde{x_0} \in \tilde{D}$ at $t_0$. If $\forall t\ge t_0, \rho(t, t_0, x_0, u) = \tilde{\rho}(t, t_0, \tilde{x_0}, u)$$\forall x and some \tilde{x}, the two systems are equivalent. Linear time-varying systems September 25, 2012 Recall the state transition function is given some function of the currenttime with initial state, initial time, and inputs, Suppose you have adifferential equation; how do you acquire the state transition function?Solve the differential equation. For a general dynamical system, there are different ways to get the statetransition function. This is an instantiation of a dynamical system, andwe're going to ge thte state transition function by solving thedifferential equation / initial condition pair. We're going to call \dot{x}(t) = A(t)x(t) + B(t)u(t) a vectordifferential equation with initial condition x(t_0) = x_0. So that requires us to think about solving that differential equation. Do adimension check, to make sure we know the dimensions of the matrices. x\in \Re^n, so A \in \Re^{n_0 \times n}. We could define the matrixfunction A, which takes intervals of the real line and maps them over tomatrices. As a function, A is piecewise continuous matrix function intime. The entries are piecewise-continuous scalars in time. We would like to getat the state transition function; to do that, we need to solve thedifferential equation. Let's assume for now that A, B, U are given (part of the systemdefinition). Piece-wise continuous is trivial; we can use the induced norm of A for aLipschitz condition. Since this induced norm is piecewise-continuous intime, this is a fine bound. Therefore f is globally Lipschitz continuous. We're going to back off for a bit and introduce the state transitionmatrix. Background for solving the VDE. We're going to introduce a matrixdifferential equation, \dot{X} = A(t) X (where A(t) is same as before). I'm going to define \Phi(t, t_0) as the solution to the matrixdifferential equation (MDE) for the initial condition \Phi(t_0, t_0) =1_{n \times n}. I'm going to define \Phi as the solution to the n\times n matrix when my differential equation starts out in the identitymatrix. Let's first talk about properties of this matrix \Phi just from thedefinition we have. • If you go back to the vector differential equation, and let's just drop the term that depends on u (either consider B to be 0, or the input to be 0), the solution of \cdot{x} = A(t)x(t) is given by x(t) = \Phi(t, t_0)x_0. • This is what we call the semigroup property, since it's reminiscent of the semigroup axiom. \Phi(t, t_0) = \Phi(t, t_1) \Phi(t_1, t_0) \forall t, t_0, t_1 \in \Re^+ • \Phi^{-1}(t, t_0) = \Phi(t_0, t). • \text{det} \Phi(t, t_0) = \exp\parens{\int_{t_0}^t \text{tr} \parens{A (\tau)} d\tau}. Here's let's talk about some machinery we can now invoke whenwe want to show that two functions of time are equal to each other whenthey're both solutions to the differential equation. You can simply show bythe existence and uniqueness theorem (assuming it applies) that theysatisfy the same initial condition and the same differentialequation. That's an important point, and we tend to use it a lot. (i.e. when faced with showing that two functions of time are equal to eachother, you can show that they both satisfy the same initial condition andthe same differential equation [as long as the differential equationsatisfies the hypotheses of the existence and uniqueness theorem]) Obvious, but good to state. Note: the initial condition doesn't have to be the initial condition given;it just has to hold at one point in the interval. Pick your point in timejudiciously. Proof of (2): check t=t_1. (3) follows directly from (2). (4) you canlook at if you want. Gives you a way to compute \Phi(t, t_0). We'veintroduced a matrix differential equation and an abstract solution. Consider (1). \Phi(t, t_0) is a map that takes the initial state andtransitions to the new state. Thus we call \Phi the state transitionmatrix because of what it does to the states of this vector differentialequation: it transfers them from their initial value to their final value,and it transfers them through matrix multiplication. Let's go back to the original differential equation. Claim that thesolution to that differential equation has the following form: x(t) =\Phi(t, t_0)x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau)u(\tau) d\tau. Proof:we can use the same machinery. If someone gives you a candidate solution,you can easily show that it is the solution. Recall the Leibniz rule, which we'll state in general as follows:\pderiv{}{z} \int_{a(z)}^{b(z)} f(x, z) dx = \int_{a(z)}^{b(z)}\pderiv{}{x}f(x, z) dx + \pderiv{b}{z} f(b, z) - \pderiv{a}{z} f(a, z).$$\dot{x}(t) = A(t) \Phi(t, t_0) x_0 + \int_{t_0}^t\pderiv{}{t} \parens{\Phi(t, \tau)B(\tau)u(\tau)} d\tau +\pderiv{t}{t}\parens{\Phi(t, t)B(t)u(t)} - \pderiv{t_0}{t}\parens{...}\\ = A(t)\Phi(t, t_0)x_0 + \int_{t_0}^t A(t)\Phi(t,\tau)B(\tau)u(\tau)d\tau + B(t)u(t)\\ = A(\tau)\Phi(t, t_0) x_0 + A(t)\int_{t_0}^t \Phi(t, \tau)B(\tau)u(\tau) d\tau + B(t) u(t)\\ = A(\tau)\parens{\Phi(t, t_0) x_0 + \int_{t_0}^t \Phi(t, \tau)B(\tau)u(\tau) d\tau} + B(t) u(t)$x(t) = \Phi(t,t_0)x_0 + \int_{t_0}^t \Phi(t,\tau)B(\tau)u(\tau) d\tau$isgood to remember. Not surprisingly, it depends on the input function over an interval oftime. The differential equation is changing over time, therefore the systemitself is time-varying. No way in general that will be time-invariant,since the equation that defines its evolution is changing. You testtime-invariance or time variance through the response map. But is itlinear? You have the state transition function, so we can compute theresponse function (recall: readout map composed with the state transitionfunction) and ask if this is a linear map. Linear time-Invariant systems September 27, 2012 Last time, we talked about the time-varying differential equation, and weexpressed$R(\cdot) = \bracks{A(\cdot), B(\cdot), C(\cdot),D(\cdot)}$. Used state transition matrix to show that the solution wasgiven by$x(t) = \Phi(t, t_0) x_0 + \int_{t_0}^t B(\tau) u(\tau)d\tau$. Integral part is the state transition matrix, and we haven'ttalked about how we would compute this matrix. In general, computing thestate transition matrix is hard. But there's one important class wherecomputing that class becomes much simpler than usual. That is where thesystem does not depend on time. Linear time-invariant case:$\dot{x} = Ax + Bu, y = Cx + Du, x(t_0) =x_0$. Does not matter at what time we start. Typically, WLOG, we use$t_0 =0$(we can't do this in the time-varying case). Aside: Jacobian linearization In practice, generally the case that someonedoesn't present you with a model that looks like this. Usually, you derivethis (usually nonlinear) model through physics and whatnot. What can I doto come up with a linear representation of that system? What is typicallydone is an approximation technique called Jacobian linearization. So suppose someone gives you a nonlinear system and an output equation,and you want to come up with some linear representation of the system. Two points of view: we could look at the system, and suppose we applied aparticular input to the system and solve the differential equation($u^0(t) \mapsto x^0(t)$, the nominal input and nominalsolution). That would result in a solution (state trajectory, ingeneral). Now suppose that we for some reason want to perturb that input($u^0(t) + \delta u(t)$, the perturbed input). Suppose in generalthat$\delta u$is a small perturbation. What this results in is a newstate trajectory, that we'll define as$x^0(t) + \delta x(t)$, theperturbed solution. Now we can derive from that what we call the Jacobian linearization. Thattells us that if we apply the input, the solution will be$x^0 =f(x^0, u^0, t)$, and I also have that$x^0(t_0) = x_0$.$\dot{x}^0 + \dot{\delta}x = f(x^0 + \delta x, u^0 + \delta u, t)$, where$(x^0 + \delta x)(t_0) = x_0 + \delta x_0$. Now I'm going to look at thesetwo and perform a Taylor expansion about the nominal input andsolution. Thus$f(x^0 + \delta x, u^0 + \delta u, t) = f(x^0, u^0, t) +\pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)}\delta x +\pderiv{}{u}f(x,u,t)\vert_{(x^0, u^0)} \delta u + \text{higher orderterms}$(recall that we also called$\pderiv{}{x}D_1$, i.e. thederivative with respect to the first argument). What I've done is expanded the right hand side of the differentialequation. Thus$\delta x = \pderiv{}{x} f(x, u, t)\vert_{(x^0, u^0)} \deltax + \pderiv{}{u} f(...)\vert_{(x^0, y^0)}\delta u + ...$. If$\delta u,\delta x$small, then we can assume that they are approximately zero, whichgives us an approximate first-order linear differential equation. Thisgives us a linear time-varying approximation of the dynamics of thisperturbation vector, in response to a perturbation input. That's what theJacobian linearization gives you: the perturbation away from the nominal(we linearized about a bias point). Consider A(t) to be the Jacobian matrix with respect to x, and B(t) to bethe Jacobian matrix with respect to u. Remember that this is anapproximation, and if your system is really nonlinear, and you perturb thesystem a lot (stray too far from the bias point), then this linearizationmay cease to hold. Linear time-invariant systems Motivated by the fact that we have a solution to the time-varying equation,it depends on the state transition matrix, which right now is an abstractthing which we don't have a way of solving. Let's go to a more specificclass of systems: that where$A, B, C, D$do not depend on time. We knowthat this system is linear (we don't know yet that it is time-invariant; wehave to find the response function and show that it satisfies thedefinition of a time-invariant system), so this still requires proof. Since these don't depend on time, we can use some familiar tools(e.g. Laplace transforms) and remember what taking the Laplace transform ofa derivative is. Denote$\hat{x}(s)$to be the Laplace transform of$x(t)$. The Laplace transform is therefore$s\hat{x}(s) - x_0 = A\hat{x}(s)+ B\hat{u}(s)$;$s\hat{y}(s) - y_0 = C\hat{x}(s) + D\hat{u}(s)$. The firstequation becomes$(sI - A)\hat{x}(s) = x_0 + B\hat{u}(s)$, and we'll leavethe second equation alone. Let's first consider$\hat{x} = Ax$,$x(0) = x_0$. I could have done thesame thing, except my right hand side doesn't depend on B:$(sI -A)\hat{x}(s) = x_0$. Let's leave that for a second and come back to it, andmake the following claim: the state transition matrix for$\hat{x} = Ax,x(t_0) = x_0$is$\Phi(t,t_0) = e^{A(t-t_0)}$, which is called the matrixexponential, defined as$e^{A(t-t_0)} = I + A(t-t_0) + \frac{A^2(t-t_0)^2}{2!}+ ...$(Taylor expansion of the exponential function). We just need to show that the state transition matrix, using definitions wehad last day, is indeed the state transition matrix for that system. Wecould go back to the definition of the state transition matrix for thesystem, or we could go back to the state transition function for the vectordifferential equation. From last time, we know that the solution to$\dot{x}A(t)x, x(t_0) = x_0$is given by$x(t) = \Phi(t, t_0)x_0$; here, we are claiming then that$x(t)= e^{A(t - t_0)} x_0$, where$x(t)$is the solution to$\dot{x} = Ax$withinitial condition$x_0$. First show that it satisfies the vector differential equation:$\dot{x} =\pderiv{}{t}\exp\parens{A(t-t_0)} x_0 = (0 + A + A^2(t - t_0 + ...)x_0 =A(I + A(t-t_0) + \frac{A^2}{2}(t-t_0)^2 + ...) x_0 = Ae^{At} x_0 = Ax(t)$,so it satisfies the differential equation. Checking the initial condition,we get$e^{A \cdot 0}x_0 = I x_0 = x_0$. We've proven that this representsthe solution to this time-invariant differential equation. By the existenceand uniqueness theorem, this is the same solution. Through this proof, we've shown a couple of things: the derivative of thematrix exponential, and we evaluated it at$t-t_0=0$. So now let's go backand reconsider its infinite series representation and classify some of itsother properties. Properties of the matrix exponential •$e^0 = I$•$e^{A(t+s)} = e^{At}e^{As}$•$e^{(A+B)t} = e^{At}e^{Bt}$iff$\comm{A}{B} = 0$. •$\parens{e^{At}}^{-1} = e^{-At}$, and these properties hold in general if you're looking at$t$or$t - t_0$. •$\deriv{e^{At}}{t} = Ae^{At} = e^{At}A$(i.e.$\comm{e^At}{A} = 0$) • Suppose$X(t) \in \Re^{n \times n}$,$\dot{X} = AX, X(0) = I$, then the solution of this matrix differential equation and initial condition pair is given by$X(t) = e^{At}$. Proof in the notes; very similar to what we just did (more general proof, that the state transition matrix is just given by the matrix exponential). Calculating$e^{At}$, given$A$What this is now useful for is making more concrete this state transitionconcept. Still a little abstract, since we're still considering theexponential of a matrix. The first point is that using the infinite series representation to compute$e^{At}$is in general hard. Would be doable if you knew$A$were nilpotent ($A^k = 0$for some$k \in\mathbb{Z}$), but it's not always feasible. Would not be feasible if$k$large. The way one usually computes the state transition matrix$e^{At}$is asfollows: Recall:$\dot{X}(t) = AX(t)$, with$X(0) = I$. We know from what we've donebefore (property 6) that we can easily prove$X(t) = e^{At}$. We also knowthat$(sI - A)\hat{X}(s) = I$, so$\hat{X}(s) = (sI - A)^{-1}$. That tellsme that$e^{At} = \mathcal{L}^{-1}\parens{(sI - A)^{-1}}$. That gives us away of computing$e^{At}$, assuming we have a way to compute a matrix'sinverse and an inverse Laplace transform. This is what people usually do,and most algorithms approach the problem this way. Generally hard tocompute the inverse and the inverse Laplace transform. Requires proof regarding why$sI - A$always has an inverse given by$e^{-At}$. Clive Moller started LINPACK (Linear algebra package; engine behindMATLAB). Famous in computational linear algebra. Paper: 19 dubious ways tocompute the matrix exponential. Actually a hard problem ingeneral. Factoring of$n$-degree polynomials. If we were to consider our simple nilpotent case, we'll compute$sI - A =\begin{bmatrix}s & -1 \\ 0 & s\end{bmatrix}$. We can immediately write downits inverse as$\begin{bmatrix}\frac{1}{s} & \frac{1}{s^2} \\ 0 &\frac{1}{s}\end{bmatrix}$. Inverse Laplace transform takes no work; it'ssimply$\begin{bmatrix}1 & t \\ 0 & 1\end{bmatrix}$. In the next lecture (and next series of lectures) we will be talking aboutthe Jordan form of a matrix. We have a way to compute$e^{At}$. We'll write$A = TJT^{-1}$. In its simplest case, it's diagonal. Either way, all of thework is in exponentiating$J$. You still end up doing something that's theinverse Laplace transform of$sI - J$. We've shown that for a linear TI system,$\dot{x} = Ax + Bu$;$y = Cx + Du$($x(0) = x_0$).$x(t) = e^{At}x_0 + \int_0^t e^{A(t-\tau)} Bu(\tau)d\tau$. We proved it last time, but you can check this satisfies thedifferential equation and initial condition. From that, you can compute the response function and show that it'stime-invariant. Let's conclude today's class with a planar invertedpendulum. Let's call the angle of rotation away from the vertical$\theta$,mass$m$, length$\ell$, and torque$\tau$. Equations of motion:$m\ell^2\ddot{\theta} - mg\ell \sin \theta = \tau$. Perform Jacobianlinearization; we'll define$\theta = 0$at$\pi/2$, and we're linearizingabout the trivial trajectory that the pendulum is straight up. Therefore$\delta \theta = \theta \implies m\ell^2 \ddot{\theta} + mg\ell\theta= \tau$, where$u = \frac{\tau}{m\ell^2}$, and$\Omega^2 = \frac{g}{\ell}$,$\dot{x}_1 = x_2$, and$\dot{x}_2 = \Omega^2 x_1 + u$.$y = \theta - x_1, \dot{x}_1 = x_2, \dot{x}_2 = \Omega^2 x_1 + u, y =x_1$. Stabilization of system via feedback by considering poles of Laplacetransform, etc.$\frac{\hat{y}}{\hat{u}} = \frac{1}{s^2 - \Omega^2} =G(s)$(the plant). In general, not a good idea: canceling unstable pole, and then usingfeedback. In the notes, this is some controller$K(s)$. If we look at theopen-loop transfer function ($K(s)G(s) = \frac{1}{s(s+\Omega)}$),$u =\frac{s-\Omega}{s}\bar{u}$, so$\dot{u} = \dot{\bar{u}} - \Omega\bar{u}$(assume zero initial conditions on$u, \bar{u}$). If we define a thirdstate variable now,$x_3 = u - \bar{u}$, then that tells us that$\dot{x}_3= \Omega \bar{u}$. Here, I have$A = \begin{bmatrix} 0 & 1 & 0 \\ \Omega^2& 0 & -1 \\ 0 & 0 & 0 \end{bmatrix}$,$B = \begin{bmatrix}0 \\ 1 \\\Omega\end{bmatrix}$,$C = \begin{bmatrix}1 & 0 & 0\end{bmatrix}$,$D =0$. Out of time today, but we'll solve at the beginning of Tuesday's class. Solve for$x(t) = \begin{bmatrix}x_1, x_2, x_3\end{bmatrix}$. We have a fewapproaches: • Using$A,B,C,D$: compute the following:$y(t) = Ce^{At} x_0 + C\int_0^t  e^{A(t - \tau)}Bu(\tau) d\tau$. In doing that, we'll need to compute$e^{At}$, and then we have this expression for general$u$: suppose you supply a step input. • Suppose$\bar{u} = -y = -Cx$. Therefore$\dot{x} = Ax + B(-Cx) = (A -  BC)x$. We have a new$A_{CL} = A - BC\$, and we can exponentiate this  instead.

Foreshadows later, when we think about control. Introduces this standardnotion of feedback for stabilizing systems. Using newfound knowledge ofstate transition matrix for TI systems (how to compute it), see how tocompute. See what MATLAB is doing.

Something went wrong with that request. Please try again.