This notebook contains all my notes from Udacity Course [Artificial Intelligence for Robotics](https://www.udacity.com/course/artificial-intelligence-for-robotics--cs373).  I have learned a lots of fundamental concept of autonomous driving from this course and I am thankful to Udacity for that.

## Localization

Localization is the process, how a robot identify its location in an environment. A robot does this using probability theory. Initially the robot does not know where it is, so the probability of being in any place will be the same. In localization, basically we try to change these probabilities with the measurement of the robot such as the probability of one place increases whereas probability of other places decreases. The location with highest probability is the location of the robot and thus our robot will localize itself in the environment. 

##### steps in localization are:

   1. Initialize all the locations with uniform probability distribution, which is called prior belief.
   2. sense the world and change our prior belief according to the sense, which is called posterior belief. To change our prior we have to multiply it with the factor of the sense being correct or wrong. Further we have to normalize this to make it a valid probability distribution.   
   3. Take an action and change the posterior belief according to the action taken, which is called convolution. 
   4. sense the world again and now changing our prior belief from previous step will do the trick, now our posterior belief will be something meaningfull. Only one location will have the highest probability. So thus our robot will localize itself. 

To understand it, lets assume a robot in one dimensional discrete world. We can define the location of this world with different colors and our robot can sense these colors. Now we will try to apply the above algorithm on this robot and lets whether our robot can localize itself or not:

In [9]:
# one dimensional world
world = ['red', 'green', 'green','red', 'red', 'green']
# step 1: Initializing all the locations with uniform probability
p = [1/len(world) for _ in range(len(world))]
# These are the factor of being the sense correct or not 
pHit = 0.6 
pMiss = 0.2

# These are the valid action the robot can take
action = [0,1] #0 means move left and 1 means move right

# we will do step 2 as a function because we have to do it repeatedly.
def sense(sense, prior):
    '''This function can calculate posterior probability given the sense and prior'''
    posterior = [0.0 for _ in range(len(prior))]
#     chaning prior according to the factor
    for i in range(len(prior)):
        if sense==world[i]:
            posterior[i] = prior[i] * pHit
        else:
            posterior[i] = prior[i] * pMiss
#     normalizing the posterior to make it a valid probability distribution
    posterior = [p/sum(posterior) for p in posterior]
    return posterior


# we will do step 3 as a function too, cause we have to do it repeatedly too. 
def convolution(action, posterior):
    '''This function will change the posterior according to the action taken by the robot.
    This function assumes the world to be a cyclic world that means the element falls off 
    from right will go to the left or vice versa'''
    new_prior = [0.0 for _ in range(len(posterior))]
#   robot moves to the left
    if action == 0:
        new_prior = [posterior[(i+1)%len(posterior)] for i in range(len(posterior))]
#     robot moves to the right
    else:
        new_prior = [posterior[(i-1)] for i in range(len(posterior))]
    return new_prior

# applying the algorithm
# step 2
posterior = sense('green',p)
# step 3
new_prior = convolution(1, posterior)
# step 2 repeated
new_posterior = sense('green',new_prior)
print(new_posterior)

[0.13636363636363638, 0.13636363636363638, 0.4090909090909091, 0.13636363636363638, 0.04545454545454547, 0.13636363636363638]


As we can see the highest probability is in location 2(starting from 0 location) which is correct according to our world setup, if we sense two times and green is the result for each time. So by applying the above algorithm our robot can actually localize itself in this one dimensional world.

### Inexact robot motion

So we have undersatnd the basic concepts of localization. But in our previous example we assume the robot to be in exact motion. Which means if the robot wants to go to right by one cell it does it correctly every time. This is not the case in real life. In reality, the robot motion is uncertain. 

So for example, lets say if we give the robot the command to move right by one cell, the robot remains in the current cell with probability of 0.1, it moves to the right cell with probability of 0.8, it can also overshoot the goal by one cell with probability of 0.1. 

Considering this uncertain robot motion, the new probability of a cell will be the addition of all the probabilities from cells those are candidate for the current cell.  

In [17]:
world=['green', 'red', 'red', 'green', 'green']
p = [1/len(world) for _ in range(len(world))]
measurements = ['red', 'green'] # considering multiple measrements
motions = [1, 1] # considering multiple motion
pHit = 0.6
pMiss = 0.2
# probability for inexact robot motion
pExact = 0.8 # probability for the correct move
pOvershoot = 0.1 # probability for overshooting the goal 
pUndershoot = 0.1 # probability for undershooting the goal

def sense(sense, prior):
    '''This function can calculate posterior probability given the sense and prior'''
    posterior = [0.0 for _ in range(len(prior))]
#     chaning prior according to the factor
    for i in range(len(prior)):
        if sense==world[i]:
            posterior[i] = prior[i] * pHit
        else:
            posterior[i] = prior[i] * pMiss
#     normalizing the posterior to make it a valid probability distribution
    posterior = [p/sum(posterior) for p in posterior]
    return posterior

def convolution(p, U):
    '''This function takes the posterior probability and steps to move in left or right.
    It returns new prior distribution. U=1 means move right by one cell, U=-1 means move left 
    by one cell. Assuming the world to be cyclic'''
    q = []
    for i in range(len(p)):
        s = pExact * p[(i-U) % len(p)] # calculating probability of correct motion
        s = s + pOvershoot * p[(i-U-1) % len(p)] # probability of overshoot motion
        s = s + pUndershoot * p[(i-U+1) % len(p)] # probability of undershoot motion
        q.append(s)
    return q

for i,s in enumerate(measurements):
    # step 2
    p = sense(s,p)
    # step 3
    p = convolution(p, motions[i])
print(p)

[0.21157894736842103, 0.1515789473684211, 0.08105263157894739, 0.16842105263157897, 0.3873684210526316]


So from the above output we can see that the probability distribution works in same way as previous but this time we consider the inexact robot motion. So this convolution function or move function is more accurate considering the real life scenario. 

### understand sense and move from probability theorem

If you look closely to the sense function it will lead us to the Bayesian rule. Lets represent the prior belief as X and the measurement as Z then the sense function is calculating the probability after having the measurement Z. In mathmatically we can say it like p(X|Z). We all know that according to the bayes theorem, 

    p(X|Z) = (p(Z|X) * P(X)) / p(Z)
    
Here p(Z|X) is the probability of having a measurement. In our case it is defined by pHit and pMiss. p(X) is the prior belief. p(Z) is just a normalized term. Since our final output is a posterior distribution we can replace p(Z) with just the normalization term. Thats the beauty of Bayes rule. So, 

    p(Z) = sum of p(Z|X) * P(X) for all the cells. 
    
Now the move or convolution function can be relate to something called total probability theory. The way we computed one cell probability after one move, was looking at all the grid cells from which it could have come from one time step earlier, we looked at the prior probability of those grid cells at previous time step and we multiply it with a probability that our motion command would carry us from those cells to this current cell. In probability term people write this like the follows-

    p(A) = sum over all B cells ( p(A|B) p(B) ) , here p(B) is the prior at previous time step , p(A|B) is probability of transition to this cell from B cells. 
    
Here A is the current cell index and B is the all possible previous cell's prior probabilities. This theorem is known as Theorem of total probability.  

The above procedure is called histogram based localization or monte carlo robot localization. Next we are going to learn about Kalman filters which is used for tracking other cars in the road. 

## Kalman Filters

So by using Localization techniques discussed above our robot can find itself in an environment. But for a safe driving we must know the location and velocity of other cars as well in the environment. Kalman filters help use to know the location of other cars in the environment. 

Though generally Kalman filters are being used to track other cars in the system, it is a similar approach as monte carlo localization. The differences are in Kalman filters, we try to estimate a continuous state (in simple word, we consider the world to be continuous instead of discrete grid cells) whereas in monte carlo localization we estimated a discrete state. As a result Kalman filters gives us unimodal distribution but monte carlo localization gives us multimodal distribution. 

Since we consider the state of the robot is now continuous we can represent the prior belief as a gaussian distribution. A gaussian distribution is parameterized by two variable. One is mean and the other is its variance. Since it is representing a probability distribution the area under the gaussian should be sums up to 1. The mean will represent the highest probability for the location. Here the larger the variance is for a distribution the less it is confidence about the probability. So after every measurements we will expect the variance will be very low and we will get a narrow gaussian distribution. 

The basic principle is similar to the monte carlo localization. We will proceed through a measurement and move. Only the mathematical formula will be different now. In the sense function we will update the mean and variance of our belief according to the following formula: 

    new_mean = (mean * var' + mean' * var) / (var + var'), here mean and var are current belief's parameter and mean' and var' are the measurement distribution's parameter. 
    
    new_var = 1. / (1/var + 1/var')
    
Here interesting thing to notice is the mean is multiplied with the measurement variance. So the new mean will be much fluenced by the measurement variance. 

Now the move function updates will be very easy. To get the location after a move we have to just add the moving units with the previous mean. In this way we will get the new location. We can introduce the motion uncertainty using the variance paramter of the action. The new variance will be the sum of the two variance. 

    new_mean = mean + mean'
    new_var = var + var'
    
In the move function we will update the mean and variance according to the above formulas. 

In [3]:
def sense(mean1, var1, mean2, var2):
    new_mean = float(var2 * mean1 + var1 * mean2) / (var1 + var2)
    new_var = 1./(1./var1 + 1./var2)
    return [new_mean, new_var]

def move(mean1, var1, mean2, var2):
    new_mean = mean1 + mean2
    new_var = var1 + var2
    return [new_mean, new_var]

measurements = [5., 6., 7., 9., 10.]
motion = [1., 1., 2., 1., 1.]
measurement_sig = 4. # measurement uncertainty
motion_sig = 2. # motion uncertainty
mu = 0. # initial belief
sig = 10000. # initial belief uncertainty

for i in range(len(measurements)):
    mu,sig = sense(measurements[i],measurement_sig,mu,sig)
    mu,sig = move(mu,sig,motion[i],motion_sig)
print([mu, sig])

[10.999906177177365, 4.005861580844194]


So now we know how to make the measurement and move updates in a continuous environment which represent the state of the robot as gaussian distribution. We consider 1 dimensional environment in the above example but in reality we have multidimensional environment. and things becomes more involved in multidimensional environment. To understand how things work in a multidimensional environment we have to understand multidimensional gaussian which oftens called multivariate gaussian. 

In a multivariate gaussian the mean is now a vector and variance is represented by a DxD matrix (here D is the dimension) which is called covariance matrix. A two dimensioanl gaussian can be plotted in a contour graph like the follows. 

<img src="images/contour_kalman_filter.png" width=400/>

Here the center of the contour is representing the mean and the area of the contour represents the covariance matrix. So the more certain we are about one property of the robot, the contour area in that dimension will be very narrow. If the gaussian is tilted a little bit diagonally then the properties represented by that gaussian is correlated. This means if we know about one property we can determine about the other property from the contour graph. This is the true beauty of Kalman filter. If we represent the location in one dimension and the velocity in other dimension. By having the measurement about the location we can infer about the velocity of the robot using Kalman filter.   

Actually all the filters have this property but Kalman filter is very efficient to do this. So whenever we need this kind of thing we use Kalman filter. When we design a Kalman filter we need two things, 

    1. for the state we need a state transition function, F
    2. for the measurement we need a measurement function, H
    
The sense and move function formula for high dimensional Kalman filter is given below-

For sense function: 
  
    x' = F * x + u
    p' = F * p * F(transpose)
    
For move function:

    y = z - H * x
    S = H * p * H(transpose) + R
    K = p * H(transpose) * S(inverse)
    x' = x + (K*y)
    p' = (I-K*H)*p
    
Here, 

    x = estimate
    p = uncertainty covariance
    F = state transition matrix
    u = motion vector
    z = measurement
    H = measurement function
    R = measurement noise
    I = identity matrix

In [12]:
import numpy as np

def kalman_filter(x, P):
    for n in range(len(measurements)):
        # measurement update
        y = np.array([[measurements[n]]]).transpose() - H * x
        S = H * P * H.transpose() + R
        K = P * H.transpose() * S.I
        x = x + (K*y)
        P = (I - K*H) * P
        # prediction
        x = F * x + u
        P = F * P * F.transpose()
        print('Observe ',n)
        print('x = ', x)
        print('p = ', P)
    return x,P

measurements = [1, 2, 3] # this example consider 1 dimensional motion vector.

x = np.matrix([[0.], [0.]]) # initial state (location and velocity)
P = np.matrix([[1000., 0.], [0., 1000.]]) # initial uncertainty and there is no coorelation (diagonals are 0.0)
u = np.matrix([[0.], [0.]]) # external motion
F = np.matrix([[1., 1.], [0, 1.]]) # next state function
H = np.matrix([[1., 0.]]) # measurement function
R = np.matrix([[1.]]) # measurement uncertainty
I = np.matrix([[1., 0.], [0., 1.]]) # identity matrix

x, P = kalman_filter(x, P)

Observe  0
x =  [[0.999001]
 [0.      ]]
p =  [[1000.999001 1000.      ]
 [1000.       1000.      ]]
Observe  1
x =  [[2.99800299]
 [0.999002  ]]
p =  [[4.99002494 2.99301795]
 [2.99301795 1.99501297]]
Observe  2
x =  [[3.99966644]
 [0.99999983]]
p =  [[2.33189042 0.99916761]
 [0.99916761 0.49950058]]


From the above outputs, we can see that after first measurement update, we observed location 1 which get copied over in the x matrix 0.999001 and nothing about the velocity so it is still 0, the initialized value. and the uncertainty matrix now shows a strong coorelation (1000 in the diagonal element).

Then again when we observe location 2, now we can see our next location will be 3 (2.99800 in the matrix). Now we have a really good estimate about the velocity which is 1.0. The reason is the kalman filters were able to use the formula correctly and estimate the velocity. There is also new covariance matrix.

And the third observation for location 3, we also get the next prediction correctly for both the location and velocity. We also notice that the covariance matrix has now highest amount of certainty. So the more observation we do the more certain our Kalman filters become about the prediction. 

So that is how a Kalman filter works and it is very useful in tracking a robot/car. We can also infer about the hidden property of a robot that depends on the observable like the velocity that depends on location. I find it really cool to understand. To understand the formula for prediction and measurement update we have to understand the intution from contour graph plot. 