In [1]:
%load_ext nb_black

<IPython.core.display.Javascript object>

# Tracking Intro

- here you can see the Google self-driving car using a road map, localizing itself, but in addition what's shown here in red are measurements of other vehicles
  - the car uses lasers and radars to track other vehicles

<img src="resources/googlecar_sensor_map.png"/>

- we're going to talk about how to find other cars
- the reason why we'd like to find other cars is because you wouldn't want to run into them
- we have to understand how to interpret sensor data to make assessments not just where these other cars are, as in the localization case, but also how fast they're moving
- so you you can drive in a way that avoids collisions with them in the future; it's important not just for cars and for pedestrians and for bicyclists
- understanding where the cars are and making predictions where they're going to move is absolutely essential for safe driving


- in this class we will talk about tracking, and the technique I'd like to teach you is called a **Kalman filter**
  - this is an insanely popular technique for estimating the state of a system
  - Kalman filters estimate a continuous state and as a result, they give us a uni-modal distribution

- let's beginwith an example
  - consider the car down here
  - let's assume it senses this measurement: objects at the times $t=0$, $t=1$, $t=2$, $t=3$
  - where would you assume the object would be at $t=4$?

<img src="resources/tracking_question.png"/>

- from those observations you would say that the velocity points in the direction of this vector
- assuming no drastic change in velocity, you expect that the 5th position would be over here

<img src="resources/tracking_answer.png"/>

- the Kalman filter takes observations like these and estimates future locations and velocities based on data like this


- I'm going to teach you how to write a piece of software that let's you take points like those--even if they're noisy and uncertain-- and estimate automatically where future locations might be and at what velocity the object is moving
- the Google self-driving car uses methods like these to understand where other traffic is based on radar and laser-range data

# Gaussian Intro

- in Kalman filters, the distribution is given by what's called a *Gaussian*
- Gaussian is a continuous function over the space of locations in the area underneath which sums up to $1$

<img src="resources/gaussian.png"/>


- here's our Gaussian again and if we call the space X then the Gaussian is characterized by two parameters
  - the mean, $\mu$
  - the width of the Gaussian, often called the variance, $\sigma^2$
    - for reasons that I don't want to go into, it's often written as a quadratic variable
-  any Gaussian in 1-D, which means the parameter space is one dimensional, is characterized by $\mu$ and $\sigma^2$

<img src="resources/gaussian_mu_sigma.png"/>

- our task in common spaces is to maintain a $\mu$  and a $\sigma^2$ as our best estimate of the location of the option we are trying to find
- the exact formula is $f(x) = \frac{1}{\sigma \sqrt {2\pi}}e^{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}}$

  - an exponential of a quadratic function where we take the exponent of this complicated expression over here, the quadratic difference of our query point X relative to the mean Mu divided by Sigma square by multiplied by minus a half
  - if X equals Mu then the enumerator becomes 0 and we have exp of zero which is one
  - it turns out we have to normalize this by a constant, one over the squared root of two Pi Sigma squared


- let me draw you a couple of functions and you tell me which one you believe are Gaussian by checking the box on the right side

<img src="resources/gaussian_question.png"/>

- the answer are these 3 functions
  - they are all characterized by this exponential drop-off on both sides that are symmetrical, and they have a single peak
    - they are what's called "unimodal"
  - other functions are "bimodal" functions that have two peaks and as a result are not Gaussian

# Variance Comparison

- let me check about your intuition and draw three more Gaussians
- I'm going to ask you about the variance
  - for each of those check exactly one box
  - is the covariance large, medium, or small?
  - obviously, one of those is the largest, one is a medium, and one is small

<img src="resources/variance_question.png"/>

- the answer is shown on the image above
- wide Gaussians have higher variance than narrow ones


- to see how this is being found in the formula, the difference between $x$ and $\mu$ is being normalized by the variance
  - the larger this value, the less the difference matters and, as a result, the more the function is spread out
- put differently, the $\sigma^2$ variance is a measure of uncertainty
  - the larger the $\sigma^2$, the more uncertain we are about the actual state
  - second function in the image is a very certain distribution where expected deviation is small
  - third function in the image is a relative uncertain distribution where we know very little

# Preferred Gaussian

- if we track another car with our Google self-driving car, which Gaussian would we prefer?
  - the first, second, or third?

<img src="resources/preferred_gaussian_answer.png"/>

- the answer is the third function, because that's the one that's most certain, and because it is most certain, it makes a chance of accidentally hitting another car the smallest just by the fact that we know more about the car than in the two other distributions
- we would definitely prefer a narrow Gaussian, since that means we are confident about our location

# Maximize Gaussian

- starting with the following source code, here's a question for you
  - how do I have to modify x (the 8) to get the maximum return value for this function `f`?

In [2]:
# For this problem, you aren't writing any code.
# Instead, please just change the last argument in f() to maximize the output.

from math import *


def f(mu, sigma2, x):
    return 1 / sqrt(2.0 * pi * sigma2) * exp(-0.5 * (x - mu) ** 2 / sigma2)


print(f(10.0, 4.0, 8.0))  # Change the 8. to something else!
print(f(10.0, 4.0, 10.0))  # Change the 8. to something else!

0.12098536225957168
0.19947114020071635


<IPython.core.display.Javascript object>

- the answer is to assert with the same value of mu, in which case exponent expression becomes zero, and we get the maximum; we get the peak of the Gaussian
- we set x to the same value as mu, to $10$, and the output is $0.2$ approximately

# Measurement and Motion

- the Kalman filter represents our distributions by Gaussians and iterates on two main cycles
- Sebastian summarizes some of the key concepts from these cycles in the below referenced links
- if you are interested, please feel free to check out these links directly from Sebastian's class on Artificial Intelligence for Robotics


- the first cycle is the Measurement Update
  - requires a [product](https://classroom.udacity.com/courses/cs373/lessons/48739381/concepts/487235990923#)
  - uses [Bayes rule](https://classroom.udacity.com/courses/cs373/lessons/48739381/concepts/487221690923#)
  
- the second cycle is the Motion Update
  - involves a convolution
  - uses [total probability](https://classroom.udacity.com/courses/cs373/lessons/48739381/concepts/486736290923#)

# Shifting the Mean

- in Kalman filters we iterate measurement (often called "measurement update") and motion (often called "prediction")
  - in the update we'll use Bayes rule, which is nothing else but a product, or a multiplication
  - in the prediction we'll use total probability, which is a convolution, or simply an addition
- let's talk first about the measurement cycle and then the prediction cycle, using Gaussians for implementing those steps


- suppose you're localizing another vehicle, and you have a prior distribution that looks as follows (black)
  - it's a very wide Gaussian with the mean over here
- now, say we get a measurement that tells us something about the localization of the vehicle, and it comes in like this (blue)
  - it has a mean over here called $\nu$ (Nu) and this example has a much smaller covariance for the measurement
- this is an example where in our prior we were fairly uncertain about a location, but the measurement told us quite a bit as to where the vehicle is


- where will the new mean of the subsequent Gaussian be?

<img src="resources/shifting_the_mean_answer.png"/>

- the answer is over here in the middle
  - it's between the two old means--the mean of the prior and the mean of the measurement
  - it's slightly further on the measurement side, because the measurement was more certain as to where the vehicle is than the prior


- the more certain we are, the more we pull the mean in the direction of the certain answer

# Predicting the Peak

- when we graph the new Gaussian, I can graph one that's very wide and very peaky
- if I were to measure where the peak of the new Gaussian is
  - the first point would be a very narrow and skinny Gaussian
  - the second point would be one whose width would be in between the two Gaussians
  - the third point is one that's even wider than the two original Gaussians


- which one do you believe is the correct posterior after multiplying these two Gaussians?

<img src="resources/predicting_the_peak_answer.png"/>

- the resulting Gaussian is more certain than the two component Gaussians
  - that is, the variance is smaller than either of the two variances in isolation
  - intuitively speaking, this is the case because we actually gain information
  - the two Gaussians together have a higher information content than either Gaussian in isolation;
- the new belief will be more certain than either the previous belief OR the measurement
- the takeaway lesson here: more measurements means greater certainty

# Parameter Update

- suppose we multiply two Gaussians as in Bayes rule-- a prior $p(x)$ and a measurement probability $p(z)$
  - the prior has a mean of $\mu$ and a variance of $\sigma^2$
  - the measurement has a mean of $\nu$ and a covariance of $r^2$
- the new mean is $\mu' = \frac{r^2\mu + \sigma^2\nu}{r^2+\sigma^2}$
- the new variance is ${\sigma^2}^\prime = \frac{1}{\frac{1}{r^2} + \frac{1}{\sigma^2}}$


- for the previous example
  - clearly, the prior Guassian has a much higher uncertainty, therefore $\sigma^2$ is larger
    - that means that $\nu$ is weighted much, much larger than the $\mu$ so the mean will be closer to the $\nu$ than the $\mu$, which means that it'll be somewhere like the drawn green area above
  - interestingly enough, the variance term is unaffected by the actual means
    - it just uses the previous variances and comes up with a new one that's even peakier
- the result is called posterior $p(x|z)$

# Separated Gaussians

- suppose we have a prior that sits over here (left) and a measurement probability that sits over here (right)--really far away--and both have the same covariance
- where the new mean would be?

<img src="resources/separated_gaussians_answer.png"/>

- the answer is in the middle; it's in the straight middle, because these two variances are the same (they have the same width which means same certainty), so we just average the means

- let me ask the hard question now
- will it be a Gaussian like this where the variance is larger, a Guassian with the exact same variance, or an even more peaked Guassian that's more certain than the two original factors in this calculation

<img src="resources/separated_gaussians_2_answer.png"/>

- the answer is the more peaky Gaussian
  - that is somewhat counter-intuitive
  - this can be hard to wrap your head around, but multiple measurements ALWAYS gives us a more certain (and therefore taller and narrower) belief

# New Mean and Variance

In [3]:
# Write a program to update your mean and variance
# when given the mean and variance of your belief
# and the mean and variance of your measurement.

# This program will update the parameters of your belief function.


def update(mean1, var1, mean2, var2):
    new_mean = (var2 * mean1 + var1 * mean2) / (var1 + var2)
    new_var = 1 / (1 / var1 + 1 / var2)
    return [new_mean, new_var]


print(update(10.0, 8.0, 13.0, 2.0))

[12.4, 1.6]


<IPython.core.display.Javascript object>

# Gaussian Motion

- let's step back and look at what we've achieved
  - we knew there was a measurement update and a motion update
  - measurement update is implemented by multiplication, which is the same as Bayes rule
  - motion update is also called prediction and is done by total probability or an addition


- motion update is a really, really easy step
- suppose you live in a world like this
  - this is your current best estimate of where you are (in blue), and this is your uncertainty
  - now say you move to the right side a certain distance and that motion itself has its own set of uncertainty (in green)
  - then you arrive at a prediction (in red) that adds the motion of command to the mean, and it has an increased uncertainty over the initial uncertainty

<img src="resources/gaussian_motion.png"/>

- intuitively this makes sense
  - if you move to the right by this distance, in expectation you're exactly where you wish to be but you've lost information because your motion tends to lose information as manifested by this uncertainty over here (in green)


- the math for this is really, really easy
  - your new mean is your old mean plus the motion, often called $u$
    - $\mu' \leftarrow \mu + u$
    - if you move over 10 meters, $u$ will be 10 meters
  - your new variance is your old variance plus a variance of the motion Gaussian $r^2$
    - ${\sigma^2}^\prime \leftarrow \sigma^2 + r^2$


- this is all you need to know; it's just an addition
- in summary, we have a Gaussian over here (in blue), we have a Gaussian for the motion (in green), with $u$ as the mean and $r^2$ as its own motion uncertainty, and the resulting Gaussian in the prediction step (in red) just adds these two things up

# Predict Function

In [4]:
def predict(mean1, var1, mean2, var2):
    new_mean = mean1 + mean2
    new_var = var1 + var2
    return [new_mean, new_var]


print(predict(10.0, 4.0, 12.0, 4.0))

[22.0, 8.0]


<IPython.core.display.Javascript object>

# Kalman Filter Code

- let's put everything together
- let's write a main program that takes these 2 functions, `update` and `predict` and feeds into a sequence of measurements and motions


- in the example I've chosen the measurements are $5., 6., 7., 9., 10.$
- the motions are $1., 1., 2., 1., 1.$
- this all would work out really well if the initial estimate was $5$, but we're setting it to $0$ with a very large uncertainty of $10,000$
- let's assume the measurement uncertainty is constant $4$ and the motion uncertainty is constant $2$

In [5]:
# Write a program that will iteratively update and predict
# based on the location measurements and inferred motions shown below.


def update(mean1, var1, mean2, var2):
    new_mean = float(var2 * mean1 + var1 * mean2) / (var1 + var2)
    new_var = 1.0 / (1.0 / var1 + 1.0 / var2)
    return [new_mean, new_var]


def predict(mean1, var1, mean2, var2):
    new_mean = mean1 + mean2
    new_var = var1 + var2
    return [new_mean, new_var]


measurements = [5.0, 6.0, 7.0, 9.0, 10.0]
motion = [1.0, 1.0, 2.0, 1.0, 1.0]
measurement_sig = 4.0
motion_sig = 2.0
mu = 0.0
sig = 10000.0

for n in range(len(measurements)):
    [mu, sig] = update(mu, sig, measurements[n], measurement_sig)
    print("update: ", [mu, sig])
    [mu, sig] = predict(mu, sig, motion[n], motion_sig)
    print("predict: ", [mu, sig])

print("\nfinal: ", [mu, sig])

update:  [4.998000799680128, 3.9984006397441023]
predict:  [5.998000799680128, 5.998400639744102]
update:  [5.999200191953932, 2.399744061425258]
predict:  [6.999200191953932, 4.399744061425258]
update:  [6.999619127420922, 2.0951800575117594]
predict:  [8.999619127420921, 4.09518005751176]
update:  [8.999811802788143, 2.0235152416216957]
predict:  [9.999811802788143, 4.023515241621696]
update:  [9.999906177177365, 2.0058615808441944]
predict:  [10.999906177177365, 4.005861580844194]

final:  [10.999906177177365, 4.005861580844194]


<IPython.core.display.Javascript object>

- when you run this, your first estimate for position should basically become $5$--$4.99$, and the reason is your initial uncertainty is so large, the estimate is dominated by the first measurement
- your uncertainty shrinks to $3.99$, which is slightly better than the measurement uncertainty
- you then predict that you add $1$, but the uncertainty increases to $5.99$, which is the motion uncertainty of $2$
- you update again based on the measurement $6$, you get your estimate of $5.99$, which is almost $6$
- you move $1$ again, you measure $7$
- you move $2$, you measure $9$
- you move $1$, you measure $10$
- and you move a final $1$ and out comes as the final result
  - a prediction of $10.99$ for the position, which is your $10$ position moved by $1$, and the uncertainty--residual uncertainty of $4$


- this code that you just wrote implements a full Kalman filter for 1D

# Kalman Prediction

- now we understand a lot about the 1D Kalman filter; you've programmed one
- you understand how to incorporate measurements, you understand how to incorporate motion
- in reality, we often have many Ds, and then things become more involved, so I'm going to just tell you how things work with an example, and why it's great to estimate in higher dimensional state spaces


- suppose you have a 2-dimensional state space of x and y--like a camera image, or in our case, we might have a car that uses a radar to detect the location of a vehicle over time
- then what the 2D Kalman filter affords you is something really amazing, and here is how it goes


- suppose at time t = 0, you observe the object of interest to be at this coordinate
  - this might be another car in traffic for the Google self-driving car
- one time step later, you see it over here, other time step later, you see it right over here
- where would you now expect at time $t = 3$ the object to be?

<img src="resources/kalman_prediction.png"/>

- we'd expect the car to continue in a straight line
- what the Kalman filter does for you, if you do estimation and higher dimensional spaces, is to not just go into x and y spaces, but allows you to implicitly figure out what the velocity of the object is, and then use the velocity estimate to make a really good prediction about the future


- notice the sensor itself only sees position; it never sees the actual velocity
- velocity is inferred from seeing multiple positions
- one of the most amazing things about Kalman filters in tracking applications is it's able to figure out, even though it never directly measures it, the velocity of the object, and from there is able to make predictions about future locations that incorporate velocity
- that's one of the reasons that Kalman filters are such a popular algorithm in artificial intelligence and in control theory at large

# Kalman Filter Land

- to explain how this works, I have to talk about high dimesional Gaussians
  - these are often called multivariate Gaussians
  - the mean is now a vector with 1 element for each of the dimensions
  - the variance here is replaced by what's called a co-variance, and it's a matrix with D rows and D columns, if the dimensionality of the estimate is D
  - the formula is something you have to get used to; to tell you the truth, even I have to look up the formula for this class, so I don't have it in my head, and please, don't get confused


- let me explain it to you more intuitively
- here's a 2-dimensional space
- a 2-dimensional Gaussian is defined over that space, and it's possible to draw the contour lines of the Gaussian

<img src="resources/2d_gaussian.png"/>

- the mean of this Gaussian is this $x0, y0$ pair,
- the co-variance now defines the spread of the Gaussian as indicated by these contour lines
  - it might be possible to have a fairly small uncertainty in one dimension, but a huge uncertainty in the other
- when the Gaussian is tilted as showed over here, then the uncertainty of x and y is correlated, which means if I get information about x--it actually sits over here--that would make me believe that y probably sits somewhere over here
  - that's called correlation


- I can explain to you the entire effect of estimating velocity and using it in filtering using Gaussians like these, and it becomes really simple
- the problem I'm going to choose is a 1-dimensional motion example
  - let's assume a $t = 1$, a $t = 2$, and a $t = 3$ like in the image
  - you would assume that at $t = 4$, the object sits over here, and the reason why you would assume this is--even though it's just seen these different discrete locations, you can infer from it there is actually velocity that drives the object to the right side to the point

<img src="resources/estimation_velocity.png"/>

# Kalman Filter Prediction

- in Kalman filter land, we're going to build a 2-dimensional estimate
  - one for the location $x$, and one for the velocity denoted $\dot{x}$
  - the velocity can be $0$, it can be negative, or it can be positive


- if initially I know my location, but not my velocity, then I represent it with a Gaussian that's elevated around the correct location, but really, really broad in the space of velocities
- let's look at the prediction step
  - in the prediction step, I don't know my velocity, so I can't possibly predict for location; I'm going to assume
  - but miraculously, there'll be some interesting correlation
  - so let's for a second, just pick a point on this distribution over here
    - let me assume my velocity is $0$ (of course, in practice, I don't know the velocity, but let me assume for a moment the velocity is $0$)
    - where would my posterior be after the prediction?
      - well, we know we started in location $1$, the velocity is $0$, so my location would likely be at $(1,0)$
  - now let's change my belief in velocity and pick a different one
    - let's say the velocity is $1$
    - where would my prediction be $1$ time step later starting at location $1$ and velocity $1$?
      - the answer is at $(2, 1)$
      - if we advance by one time step, we should also move forward in the x direction by one
      - if a car's starting point is the point $(1, 1)$, for which we know the location is $1$, and the velocity is $1$, and if we predict $1$ time step in the future, then for that prediction, we know the location will be $2$, and the velocity might be a little uncertain, but it stays about the same 

<img src="resources/kalman_filter_prediction.png"/>

<img src="resources/kalman_filters_prediction_multiple_gaussian.png"/>

- when you put all this together, you find that all these possibilites on the Gaussian over here (blue), link to a Gaussian that looks like this (red)
  - this is a really interesting 2-dimensional Gaussian, which you should really think about
  - clearly, if I were to project this Gaussian uncertainty into the space of possible locations, I can't predict a thing
  - it's impossible to predict where the object is; the reason is, I don't know the velocity
  - also, clearly if I project this Gaussian into the space of $\dot{x}$ it is impossible to say what the velocity is
  - a single observation or single prediction is insufficient to make that observation
  - however, what we know is our location is correlated to the velocity
    - the faster I move, the further on the right is the location; this Gaussian (red) expresses this
      - if I, for example, figure out that my velocity was $2$, then I was able, under this Gaussian, to really nail that my location is $3$
    - we still haven't figured out where we are, and we haven't figured out how fast we're moving, but we've learned so much about the relation of these two things with this tilted Gaussian


- to understand how powerful this is, let's now fold in the second observation at time $t = 2$
  - this observation tells us nothing about the velocity and only something about the location
  - if I were to draw this as a Gaussian--it's a Gaussian just like this (green), which says something about the location but not about the velocity
  - but if we multiply my prior (red) from the prediction step with the measurement probability (green), then miraculously, I get a Gaussian that sits at $(2, 1)$
    - this Gaussian now has a really good estimate what my velocity is and a really good estimate where I am
      - if I take this Gaussian, and predict $1$ step forward, then I find myself at $(3, 2)$


- think about this; this is a really deep insight about how Kalman filters work
- in particular, we've only been able to *observe* one variable, $x$ and we've been able to measure observation to *infer* this other variable, $\dot{x}$
  - the way we've been able to infer is that there's a set of physical equations which say that my location, after a times step, is my old location plus my velocity $x' = x + \dot{x}$
    - this has been able to propagate constraints from subsequent measurements back to this unobservable variable, velocity, so we are able to estimate the velocity as well
    - this is really key to understanding Kalman filter; it is key to understanding how a Google self-driving car, estimates the locations of other cars, and is able to make predictions even if it's unable to measure velocity directly


 - the variables of a Kalman filter--they're often called *states* because they reflect states of the physical world like where the other car is and how fast it's moving.
  - they separate into 2 subsets
    - the observables, like the momentary location
    - the hidden, which in our example is the velocity, which I can never directly observe
  - because those 2 things interact, subsequent observations of the observable variables give us information about these hidden variables, so we can also estimate what these hidden variables are
  - from multiple observations of the places of the object--the location--we can estimate how fast it's moving
    - that is actually true for all of the different filters but because Kalman filters happen to be very efficient to calculate, when we have a problem like this, you tend to often use just a Kalman filter

# Kalman Filter Design

- when we design a Kalman filter, you need effectively 2 things
  - we know that the new location is the old location + velocity, $x' \leftarrow x + \dot{x}$
  - the new velocity should just be the old velocity, $\dot{x'} \leftarrow \dot{x}$
  
  
- for the state, you need a state transition function, and that's usually a matrix, so we're now in the world of linear algebra
  - $\begin{pmatrix} x' \\ \dot{x'} \end{pmatrix} \leftarrow \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ \dot{x} \end{pmatrix}$
      - matrix with 1s and 0s would be called $F$
- for the measurements, you need a measurement function
  - we only observe the first component of the place, not velocity, and that uses a matrix like this
  - $z \leftarrow \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ \dot{x} \end{pmatrix}$
      - matrix with 1s and 0s would be called $H$


- the actual update equations for a Kalman filter are involved, and I give them to you, but please, don't memorize them, and I won't prove them for you even the proof is very involved
  - variables
    - x = estimate
    - P = uncertainity covariance
    - F = state transition matrix
    - u = motion vector
    - z = measurement
    - H = measurement function
    - y = error
    - R = measurement noise
    - S = matrix which is obtained by projecting the system uncertainty into the measurement space using the measurement function projection
    - K = Kalman gain
  - prediction
    - $x' = F \cdot x + u$
    - $P' = F \cdot P \cdot F^T$
  - measurement update
    - $y = z - H \cdot x$
    - $S = H \cdot P \cdot H^T + R$
    - $K = P \cdot H^T \cdot S^{-1}$        
    - update the estimate: $x' = x + (K \cdot y)$ 
    - update the uncertainty: $P' = (I - K \cdot H) \cdot P$


- I wrote this down so that you have a complete definition, but this is something you should not memorize
- if you really wish to understand this math, it happens to be just a generalization of the math I gave you to higher dimensional spaces, but I would recommend just not to worry about this
- there's a set of linear algebra equations that implement the Kalman filter in higher dimensions

# Kalman Matrices

- I have a new, challenging programming assignment for you; I would like you to implement a multidimensional Kalman filter for the example I've just given you
- the matrix class is a class for manipulating matrices that should make it really easy
  - it has a function that initializes matrices
  - it can set them down to $0$
  - it can compute an identity matrix
  - it can print out a matrix with show
  - it overloads operators like addition, subtraction, multiplication
  - it even computes the transpose of a matrix
  - it can invert a matrix using Cholesky factorization
- this matrix class is a small version of what is found in typical libraries

In [6]:
# Write a function 'kalman_filter' that implements a multi-
# dimensional Kalman Filter for the example given

from math import *


class matrix:

    # implements basic operations of a matrix class

    def __init__(self, value):
        self.value = value
        self.dimx = len(value)
        self.dimy = len(value[0])
        if value == [[]]:
            self.dimx = 0

    def zero(self, dimx, dimy):
        # check if valid dimensions
        if dimx < 1 or dimy < 1:
            raise ValueError("Invalid size of matrix")
        else:
            self.dimx = dimx
            self.dimy = dimy
            self.value = [[0 for row in range(dimy)] for col in range(dimx)]

    def identity(self, dim):
        # check if valid dimension
        if dim < 1:
            raise ValueError("Invalid size of matrix")
        else:
            self.dimx = dim
            self.dimy = dim
            self.value = [[0 for row in range(dim)] for col in range(dim)]
            for i in range(dim):
                self.value[i][i] = 1

    def show(self):
        for i in range(self.dimx):
            print(self.value[i])
        print(" ")

    def __add__(self, other):
        # check if correct dimensions
        if self.dimx != other.dimx or self.dimy != other.dimy:
            raise ValueError("Matrices must be of equal dimensions to add")
        else:
            # add if correct dimensions
            res = matrix([[]])
            res.zero(self.dimx, self.dimy)
            for i in range(self.dimx):
                for j in range(self.dimy):
                    res.value[i][j] = self.value[i][j] + other.value[i][j]
            return res

    def __sub__(self, other):
        # check if correct dimensions
        if self.dimx != other.dimx or self.dimy != other.dimy:
            raise ValueError("Matrices must be of equal dimensions to subtract")
        else:
            # subtract if correct dimensions
            res = matrix([[]])
            res.zero(self.dimx, self.dimy)
            for i in range(self.dimx):
                for j in range(self.dimy):
                    res.value[i][j] = self.value[i][j] - other.value[i][j]
            return res

    def __mul__(self, other):
        # check if correct dimensions
        if self.dimy != other.dimx:
            raise ValueError("Matrices must be m*n and n*p to multiply")
        else:
            # multiply if correct dimensions
            res = matrix([[]])
            res.zero(self.dimx, other.dimy)
            for i in range(self.dimx):
                for j in range(other.dimy):
                    for k in range(self.dimy):
                        res.value[i][j] += self.value[i][k] * other.value[k][j]
            return res

    def transpose(self):
        # compute transpose
        res = matrix([[]])
        res.zero(self.dimy, self.dimx)
        for i in range(self.dimx):
            for j in range(self.dimy):
                res.value[j][i] = self.value[i][j]
        return res

    # Thanks to Ernesto P. Adorio for use of Cholesky and CholeskyInverse functions

    def Cholesky(self, ztol=1.0e-5):
        # Computes the upper triangular Cholesky factorization of
        # a positive definite matrix.
        res = matrix([[]])
        res.zero(self.dimx, self.dimx)

        for i in range(self.dimx):
            S = sum([(res.value[k][i]) ** 2 for k in range(i)])
            d = self.value[i][i] - S
            if abs(d) < ztol:
                res.value[i][i] = 0.0
            else:
                if d < 0.0:
                    raise ValueError("Matrix not positive-definite")
                res.value[i][i] = sqrt(d)
            for j in range(i + 1, self.dimx):
                S = sum([res.value[k][i] * res.value[k][j] for k in range(self.dimx)])
                if abs(S) < ztol:
                    S = 0.0
                try:
                    res.value[i][j] = (self.value[i][j] - S) / res.value[i][i]
                except:
                    raise ValueError("Zero diagonal")
        return res

    def CholeskyInverse(self):
        # Computes inverse of matrix given its Cholesky upper Triangular
        # decomposition of matrix.
        res = matrix([[]])
        res.zero(self.dimx, self.dimx)

        # Backward step for inverse.
        for j in reversed(range(self.dimx)):
            tjj = self.value[j][j]
            S = sum(
                [self.value[j][k] * res.value[j][k] for k in range(j + 1, self.dimx)]
            )
            res.value[j][j] = 1.0 / tjj ** 2 - S / tjj
            for i in reversed(range(j)):
                res.value[j][i] = res.value[i][j] = (
                    -sum(
                        [
                            self.value[i][k] * res.value[k][j]
                            for k in range(i + 1, self.dimx)
                        ]
                    )
                    / self.value[i][i]
                )
        return res

    def inverse(self):
        aux = self.Cholesky()
        res = aux.CholeskyInverse()
        return res

    def __repr__(self):
        return repr(self.value)


########################################

# Implement the filter function below


def kalman_filter(x, P):
    for n in range(len(measurements)):

        # measurement update
        Z = matrix([[measurements[n]]])
        y = Z - (H * x)
        S = H * P * H.transpose() + R
        K = P * H.transpose() * S.inverse()
        x = x + (K * y)

        P = (I - (K * H)) * P

        # prediction
        x = (F * x) + u
        P = F * P * F.transpose()

    return x, P


############################################
### use the code below to test your filter!
############################################

measurements = [1, 2, 3]  # filter with these measurements

x = matrix([[0.0], [0.0]])  # initial state (location and velocity)
P = matrix([[1000.0, 0.0], [0.0, 1000.0]])  # initial uncertainty
u = matrix([[0.0], [0.0]])  # external motion
F = matrix([[1.0, 1.0], [0, 1.0]])  # next state function
H = matrix([[1.0, 0.0]])  # measurement function
R = matrix([[1.0]])  # measurement uncertainty
I = matrix([[1.0, 0.0], [0.0, 1.0]])  # identity matrix

print(kalman_filter(x, P))
# output should be:
# x: [[3.9996664447958645], [0.9999998335552873]]
# P: [[2.3318904241194827, 0.9991676099921091], [0.9991676099921067, 0.49950058263974184]]

([[3.9996664447958645], [0.9999998335552873]], [[2.3318904241194827, 0.9991676099921091], [0.9991676099921067, 0.49950058263974184]])


<IPython.core.display.Javascript object>

# Conclusion

- you really understood something fundamental here that I believe is really essential to artificial intelligence and to building self-driving cars
- you implemented effectively our method for finding other cars
- let me put this in context
  - here's a Google self-driving car and here's another car
  - our Google self-driving car uses radar on the front bumper that measures the distance to vehicles and also gives a noisy estimate of the velocity
  - and it also uses its lasers, and again, it measures the distance to other cars but no velocities
- if you take the same situation from above, here is the Google car; it is localized on a map and here are other vehicles
  - using radars and lasers, the Google car estimates the distance and the velocity of all these vehicles, and it does so using a Kalman filter 
  - it feeds in range data from the laser, and it uses state spaces like this one of the relative distance in x and y and the relative velocity in x and y to get state transition matrices of the type I've just shown you to find out where these other cars are
    - this is exactly what you've just learned and programmed yourself

<img src="resources/kalman_filters_example_conclusion.png"/>

- I didn't tell you how to extract the location of other cars from radar and laser
  - there's something called a correspondence problem
  - sometimes you don't know which one each car is, and I won't talk in much depth about it
- but you understand the gist of the solution now, and you've been able to program it


- if you were in a situation like this, you can use range data like laser data and radar data and come up with a rational algorithm that takes momentary measurements of other cars and not just estimates where they are but also how fast they're moving

- this completes my unit on Kalman filters
- you learned about Gaussians, how to do measurement updates using multiplication, how to do prediction or state transitions using convolution, and you even implemented your first Kalman filter
- you've implemented it in the context of vehicle tracking, and you used this to estimate a nonobservable velocity for measurement data