In [1]:
%load_ext nb_black

<IPython.core.display.Javascript object>

- localization is what allows an autonomous car to know precisely where it is
- without localization, it would be impossible for a self-driving car to drive safely

# Localization Intuition

- conceptionally, localization is pretty straightforward
- a robot takes information about its environment and compares that to information it already knows about the real world
- humans do something similar
  - imagine you were suddenly kidnapped and blindfolded--you were stuffed into a car that drove for hours--you would have no idea where in the world you were--then the blindfold were removed and you saw like this (Eiffel Tower)
  - now, what would you say if you asked, where are you?
    - if you recognized this as the Eiffel Tower, then you will probably say something like Paris or France
    - this may not seem very impressive, but it's actually remarkable
    - before the blindfold was removed, you had zero understanding of where you were in the world
    - you could have made a guess, but you would have had no idea if you were right
    - but after being shown a tiny amount of data, a single image, you reduce that uncertainty to a few kilometer radius


- this is the main intuition behind localization
- a robot gathers information about its current environment and compares that to a known map to understand where it is in the world

# Localizing a Self Driving Car

- assume there is a car that is totally lost, which means you, as a driver or as a car, have no clue where you are
- now assume that you have a global map of the environment
- generally speaking, localization answers a question, where is our car in a given map with a high accuracy?
  - a high accuracy means between 3 and 10 centimeters


- in a traditional way, we use global navigation satellite systems to find the car, with respect to the map
  - but GPS is not precise enough
  - most of the time, GPS has an accuracy of the width of a lane-- about one to three meters
  - but sometimes it can be as broad as 10 to 50 meters
  - clearly this is not reliable enough for a self-driving car so you can't trust GPS and you have to find another technique to localize yourself inside a given map


- it is common practice to use the onboard sensor data, along with our global map, to solve the localization issue
  - with the onboard sensors it is possible to measure distances to static obstacles, like trees, poles, or walls
  - we measure these distances, and the bearing of these static objects in the local coordinate system of our car


- when you're lucky, the same obstacles that were observed by the onboard sensors are also part of the map
- and, of course, the map has its own global coordinate system
- to estimate where the car is in the map, you have to match the observations with the map information
  - when you do it correctly, this results in a transformation between both coordinate systems--the local car coordinate system and the global coordinate system of the map
  - this transformation should be as accurate as possible-- let's say within a range of 10 centimeters or less
  - if you are able to estimate this transformation, you solve the localization issue

 <img src="resources/transform_onboard_global.png"/>

- let's summarize
  - first, localization answers the question of where the car is in a given map within an accuracy of 10 centimeters or less
  - second, onboard sensors are used to estimate the transformation between local measurements and a given map

# Localization

**Note:** Much of the content in this lesson assumes you have a good map first. Without it, the techniques here either won't work or won't work very well. There is also another version of localization called SLAM, or Simultaneous Localization and Mapping, that does not need a good map prior to beginning.

- the very first problem I'm trying to solve is called localization
- it involves a robot that's lost in space--it could be a car, it could be a mobile robot
  - here is the environment, and the poor robot has no clue where it is
- similarly, we might have a car driving on a highway, and this car would like to know where it is
  - is it inside the lane or is it crossing lane markers?


- the traditional way to solve this problem is by using satellites
  - these satellites emit signals that the car can perceive
    - that's known as GPS, short for "global positioning system" and it's what you have in your dashboard if you have a car with GPS that shows you the maps and shows you where you are
    - unfortunately, the problem with GPS is its really not very accurate
  - it's really common for a car to believe to be at one position but it has 2 meters all the way up to 10 meters of error
    - if you try to stay in the lane with 10 meters of error, you're far off, and if you're driving, you crash


- for our self-driving cars, to be able to stay in lanes using localization, we need something like $2$ - $10$ centimeters of error
  - then we can drive with GPS in lanes
- the question is, how can we know where were are with $10 cm$ accuracy?
  - that's the localization question


- in the Google self-driving car, localization plays a key role
- we record images of the road surface and then use the techniques I'm just about to teach you to find out exactly where the robot is
  - it does so within a few centimeters of accuracy, and that makes it possible to stay inside the lane even if the lane markers are missing

 <img src="resources/google_sdc_localization.png"/>

- localization has a lot of math, but before I dive into mathematical detail, I want to give you an intuition for the basic principles
- I want to tell you the story of how we will localize this, and then we can go through the math together so you can understand it
- I also want to let you program your own localizer so you can program a self-driving car

# Uniform Probability

- let's move into our first programming exercise, and let's program together the very first version of robot localization
- here's a bit of program code--an empty list, and what I'd like you to program is a world with 5 different cells or places where each cell has the same probability that the robot might be in that cell
  - so probabilities add up to 1


- here's a simple quiz: for the cells $x_1$ all the way to $x_5$, what is the probability of any of those $x$'s?
  - it's $0.2$ because $1/5 = 0.2$

# Uniform Distribution

- now in our Python interface, I'd like to take this code over here, which assigns to `p` an empty list and modify it into code where `p` becomes a uniform distribution over $5$ grid cells as expressed in a vector of $5$ probabilities


- here's an easy solution; you just initialize the vector with five $0.2$s
- or like a loop below:

In [2]:
p = []
n = 5
for i in range(n):
    p.append(1.0 / n)
print(p)

[0.2, 0.2, 0.2, 0.2, 0.2]


<IPython.core.display.Javascript object>

# Probability After Sense

- let's look at the measurement of this robot in its world with $5$ different grid cells--$x_1$ through $x_5$
- let's assume two of those cells are colored red ($x_2$ and $x_3$) whereas the other three are green
- as before, we assign uniform probability to each cell of $0.2$, and our robot is now allowed to sense
- what it sees is a red color
- how will this affect my belief over different places?
  - obviously, the one's for $x_2$ and $x_3$ should go up, and the ones for $x_1$, $x_4$, and $x_5$ should go down


- we're going incorporate this measurement into our belief with a very simple rule--a product
  - any cell where the color is correct--any of the red cells-- we multiply it with a relatively large number--say, $0.6$
    - that feels small, but as we will see later, it is actually a large number
  - whereas all the green cells will be multiplied with $0.2$
- if we look at the ratio of those, then it seems about 3 times as likely to be in a red cell than it is to be in a green cell, because $0.6$ is $3$ times larger than $0.2$
- the answer is obviously for the red cells we get a $0.12$ whereas for the green cells we get a $0.04$, which is the product of $0.2 \times 0.6$ versus $0.2 \times 0.2$
  - but notice that our probabilities don't add up to $1$--it adds up to $0.36$
  - we'll have to fix that; let's learn about renormalization

# Normalize Distribution

- to turn this back into a probability distribution, we will now divide each of these numbers by $0.36$
- put differently, we normalize


- so $0.12$ divided by $0.36$ is the same as $12$ divided by $36$ is the same as $1/3$ or $0.333$.
- and $0.04$ divided by $0.36$ is the same as $4$ divided by $36$, that is $1/9$
- if you look at these numbers, $1/3, 1/3, 1/3, 1/9, 1/9$, they give exactly $1$


- so this is a probability distribution, which is often written in the following way: $p(X_i | Z)$
  - the probability of each cell, $i$ where $i$ could range from $1-5$, after we've seen our measurement $Z$
- the probabilist would also call it posterior distribution of place $x_i$ given measurement $Z$

# pHit and pMiss

- here's our distribution again and here's our factor for getting the color right or for getting it wrong ($0.2$), and let's first start with a non-normalized version
- write a piece of code that outputs `p` after multiplying with `pHit` and `pMiss`
- also, get the sum of all the `p's`


- one way to do this is to go explicitly through all these 5 different cases from 0 to 4 and multiply in manually the miss or hit case
  - this is not particularly elegant, but it does the job
- we sum elements with `sum` function

In [3]:
# Write code that outputs p after multiplying each entry by pHit or pMiss at the appropriate places.
# Remember that #the red cells 1 and 2 are hits and the other green cells are misses.

p = [0.2, 0.2, 0.2, 0.2, 0.2]
pHit = 0.6
pMiss = 0.2

# Enter code here
p[0] *= pMiss
p[1] *= pHit
p[2] *= pHit
p[3] *= pMiss
p[4] *= pMiss

p_sum = sum(p)

print(p)
print(p_sum)

[0.04000000000000001, 0.12, 0.12, 0.04000000000000001, 0.04000000000000001]
0.3600000000000001


<IPython.core.display.Javascript object>

# Sense Function

- I want to make this a little bit more beautiful
- I will introduce a variable called `world`, and for each of the 5 grid cells, world specifies the color of the cell--green, red, red, green, green
- further, I define the measurement `Z` to be red

 
- can you define a function, called `sense`, which is the measurement update, which takes as input the initial distribution `p` and the measurement `Z` and all the other global variables and outputs a normalized distribution called `q` in which $q$ reflects the non-normalized product of our input probability, which will be $0.2$ and so on, and the corresponding `pHit` or `pMiss` in accordance to whether these colors over here agree or disagree?
- when I call `sense(p, Z)`, I expect to get the vector as output as before, but now in the form of a function
  - the reason I'd like to have a function here is because later on as we build our localizer we will apply this to every single measurement over and over again
  - this function should really respond to any arbitrary `p` and arbitrary `Z`, either red or green, and give me the non-normalized `q`, which gives me the vector $0.04$ or $0.12$ and so on

In [4]:
# Modify the code below so that the function sense, which takes p and Z as inputs, will output the NON-normalized
# probability distribution, q, after multiplying the entries in p by pHit or pMiss according to the color in the
# corresponding cell in world.

p = [0.2, 0.2, 0.2, 0.2, 0.2]
world = ["green", "red", "red", "green", "green"]
Z = "red"
pHit = 0.6
pMiss = 0.2


def sense(p, Z):
    # ADD YOUR CODE HERE
    q = []
    for i in range(len(p)):
        if world[i] == Z:
            q.append(p[i] * pHit)
        else:
            q.append(p[i] * pMiss)
    return q


print(sense(p, Z))

[0.04000000000000001, 0.12, 0.12, 0.04000000000000001, 0.04000000000000001]


<IPython.core.display.Javascript object>

# Normalized Sense Function

- let's take that same piece of code and modify it to give me a valid probability distribution
  -  so it normalizes the output of the function `sense`o it adds up to 1


- with this, we implement the absolute key function of localization, which is called the **measurement update**

In [5]:
# Modify your code so that it normalizes the output for the function sense.
# This means that the entries in q should sum to one.

p = [0.2, 0.2, 0.2, 0.2, 0.2]
world = ["green", "red", "red", "green", "green"]
Z = "red"
pHit = 0.6
pMiss = 0.2


def sense(p, Z):
    # ADD YOUR CODE HERE
    q = []
    sum_p = sum(p)
    for i in range(len(p)):
        if world[i] == Z:
            q.append(p[i] * pHit)
        else:
            q.append(p[i] * pMiss)
    sum_q = sum(q)
    q = [i / sum_q for i in q]
    return q


print(sense(p, Z))

[0.1111111111111111, 0.3333333333333332, 0.3333333333333332, 0.1111111111111111, 0.1111111111111111]


<IPython.core.display.Javascript object>

# Test Sense Function

- let's just go back to our example and see what an amazing thing you've just programmed
- we had a uniform distribution over places--each place had a probability of $0.2$
- then you wrote a piece of code that used the measurement to turn this prior into a posterior, in which the probability of the two red cells was a factor of 3 larger than the posterior of the green cells

<img src="resources/sense_function_prior_posterior.png"/>

- you've done exactly what I gave you intuitively in the beginning as the secret of localization
  - you manipulated a probability distribution over places into a new one by incorporating the measurement


- in fact, let's go back to our code and test in your code whether we get a good result when we replace our measurement `red` by `green`
  - please type green into your measurement variable and rerun your code to see if you get the correct result

# Multiple Measurements

- I'd like you to modify this code a little bit more in a way that we have multiple measurements
- instead of `Z`, we're going to make a measurement vector called `measurements`
  - we're going to assume that we're going to first sense red and then green


- can you modify the code that so it updates the probability twice and gives me the posterior after both of these measurements are incorporated?
  - in fact, can you modify it in a way that any sequence of measurements of any length can be processed?
  - do not modify the sense function--add code so that `p` is the correct probability after making the two measurements
    - make sure your code works for measurement lists of arbitrary length


- the modification is simple; we will call the procedure `sense` multiple times, in fact, as often as we have measurements
  - we grab the $k$th measurement element and apply it to the current belief then recursively update that belief into itself
  - in this case, we run it twice
- for this specific example, we get back the uniform distribution--these are all $0.2$s approximately
  - the reason is we up-multiplied each field once for the $0.6$ and down-multiplied for the $0.2$
  - these effects were in total the same for each cell

In [6]:
# Modify the code so that it updates the probability twice
# and gives the posterior distribution after bothmeasurements are incorporated.
# Make sure that your code allows for any sequence of measurement of any length.

p = [0.2, 0.2, 0.2, 0.2, 0.2]
world = ["green", "red", "red", "green", "green"]
measurements = ["red", "green"]
pHit = 0.6
pMiss = 0.2


def sense(p, Z):
    # ADD YOUR CODE HERE
    q = []
    sum_p = sum(p)
    for i in range(len(p)):
        if world[i] == Z:
            q.append(p[i] * pHit)
        else:
            q.append(p[i] * pMiss)
    sum_q = sum(q)
    q = [i / sum_q for i in q]
    return q


# ADD YOUR CODE HERE
for i in range(len(measurements)):
    p = sense(p, measurements[i])

print(p)

[0.20000000000000004, 0.19999999999999996, 0.19999999999999996, 0.20000000000000004, 0.20000000000000004]


<IPython.core.display.Javascript object>

# Exact Motion

- before we're done with localization, I'd like to talk about robot motion
- suppose we have a distribution over those cells--such as this one: $1/9, 1/3, 1/3, 1/9, 1/9$--and even though we don't know where the robot is, the robot moves, and it moves to the right
- in fact, the way we're going to program is we will assume the world is cyclic, so if it drops off the right-most cell it finds itself in the left-most cell
- suppose we know for a fact the world moved exactly 1 grid cell to the right, including the cyclic motion
  - can you tell me for all these 5 values, what the posterior probability is after that motion?


- the answer is all of these are shifted to the right--the $1/9$ in the left-most cell goes over here, the $1/3$ over here, and finally the right-most $1/9$ finds itself on the left side
- if the robot's motion is perfect (which means it moves exactly as far as it thinks it does), then all of the probabilities move one place to the right

<img src="resources/exact_robot_motion.png"/>

- in the case of exact motion, we have a perfect robot
- we just shift the probabilities by the actual robot motion
- now, that's a degenerate case, but it's a good one to program first so let's program this one

# Move Function

- I define a function `move` with an input distribution `p` and a motion number `U` where U is the number of grid cells that the robot is moving to the right or to the left
- I want you to program a function that returns the new distribution `q` after the move
  - if `U` equals $0$, `q` is the same as `p`
  - if `U` equals $1$, all the values are cyclically shifted to the right by $1$
  - if `U` equals $3$, they are cyclically shifted to the right by $3$
  - if `U` equals $-1$, they're cyclically shifted to the left
- please call the function with argument `p` and a shift to the right by $1$
- I've commented out my measurement part because for now I don't want to do measurement updates
- in addition to this, I will use a very simple `p`, that has a $1$ at the second position and zeros elsewhere
  - otherwise, if we were to use the uniform `p`, we couldn't even see the effect of the motion whether that's programmed correctly or not

In [7]:
# Program a function that returns a new distribution q, shifted to the right by U units.
# If U=0, q should be the same as p.

p = [0, 1, 0, 0, 0]
world = ["green", "red", "red", "green", "green"]
measurements = ["red", "green"]
pHit = 0.6
pMiss = 0.2


def sense(p, Z):
    q = []
    for i in range(len(p)):
        hit = Z == world[i]
        q.append(p[i] * (hit * pHit + (1 - hit) * pMiss))
    s = sum(q)
    for i in range(len(q)):
        q[i] = q[i] / s
    return q


def move(p, U):
    # ADD CODE HERE
    q = []
    for i in range(len(p)):
        q.append(p[(i - U) % len(p)])
    return q


print(move(p, 1))

[0, 0, 1, 0, 0]


<IPython.core.display.Javascript object>

- we start with the empty list
- we go through all the elements in `p`
- we will construct `q` element-by-element by accessing the corresponding `p`, and `p` is shifted by `U` and if this shift exceeds the range of `p` on the left, we apply the modulo operator with the number of states as an argument
  - in this case, it'll be $5$
- the reason why there is a minus sign is tricky
  - to shift the distribution to the right, `U = 1`, we need to find in `p` the element $1$ place to the left
  - rather than shifting `p` to the right directly, what I've done is I've constructed `q` by searching for where the robot might have come from
    - that's of course, in hindsight, from the left


- think about this, as it's a little bit nontrivial, but it's going to be important as we go forward and define probabilistic convolution and generalize this to the noisy case


- alternate solution:
```python
U = U % len(p)
q = p[-U:] + p[:-U]
```

# Inexact Motion

- let's talk about inaccurate robot motion
- we are again given $5$ grid cells
- for $U = 2$ let's assume a robot executes its action with high probability correctly say $0.8$, but with $0.1$ chance it finds itself short of the intended action, and yet another $0.1$ probability it finds itself overshooting its target
- you can define the same for other U values, say $U = 1$--then with $0.8$ chance it would end up over here, $0.1$ it stays in the same element, and $0.1$ it hops $2$ elements ahead

<img src="resources/inaccurate_robot_motion.png"/>

- now this is a model of inaccurate robot motion
- this robot attempts to go U grid cells, but occasionally falls short of its goal or overshoots
- it's a more common case - robots as they move accrue uncertainty, and it's really important to model this, because this is the primary reason why localization is hard, because robots are not very accurate

- we're now going to look into this first from the mathematical side
- I will be giving you a prior distribution, and we're going to be using the value of $U = 2$
  - for the motion model that shifts the robot exactly $2$ steps, we believe there is a $0.8$ chance
  - we assign a $0.1$ to the cases where the robot over or under shoots by exactly $1$
  - that's kind of written by this formula over here where the two gets a $0.8$ probability, the one and the three end up with a $0.1$ probability
- I'm going to ask you now for the initial distribution that I'm writing up here, can you give me the distribution after the motion?

<img src="resources/inexact_motion_solution_1.png"/>

- the answer is for our intended field over here $0.8$, the two neighbors $0.1$ and a $0$ and $0$ at at the beginning
- notice that this motion has added some uncertainty to the robot's position

- let's assume we have a $0.5$ in this cell and a $0.5$ in this cell
- remember that this is a cyclic-motion model, so whatever falls off on the right side, you'll find on the left side
- can you again for $U = 2$ fill in the posterior distribution?

<img src="resources/inexact_motion_solution_2.png"/>

- this is a pretty tricky question, which I'm going to answer in two phases
- let's just look at the first $0.5$ over here
  - $0.8$ of that, which is $0.4$, ends up over here, and $0.1$ of this, which is $0.05$ ends up over here
  - the reason why I write it so small (green) is because this is not the correct answer quite yet
- let's look at the other $0.5$
  - $0.4$ goes two steps and ends up over here on the left side, but $0.1$ falls short and makes the $0.05$ over here in the last grid cell
- interestingly enough, for the cell on the right side, there's two possibly ways you could've gotten there
  - either by overshooting starting in the second cell, or undershooting starting in the right cell
  - so the total probability is the sum of these two things--$0.1$
- this is the final answer: $0.4, 0.05, 0.05, 0.4, 0.1$
- notice that the robot is now pretty uncertain about its location

- let me give you a final example in which I assume a uniform distribution, and I want you to fill in for me the distribution after motion

<img src="resources/inexact_motion_solution_3.png"/>

- the answer as it turns out will be just $0.2$ everywhere, and the reason is with every grid cell being equally likely, applying this motion model will still make each grid cell equally likely
- you can't get any more uncertain than the uniform distribution!
  - let's pick one of them--say this one over here (4th cell)
  - we could have arrived here in 3 different ways
    - perhaps we started in $x_2$ and our motion went well--this gives us a $0.2 \times 0.8$
    - perhaps we started in $x_1$ and we overshot, which gives us a $0.2$, for the cell $x_1$, times a $0.1$ for overshooting
    - or perhaps we started in $x_3$ and we undershot, which gives us $0.2 \times 0.1$
  - if we add those up, then we find it is the same as $0.2 \times 1$, because the factors over here add up exactly to $1$, which makes $0.2$
  - you can apply this same logic to all the other cells
    - that's called a convolution, and as well see later, there's a very nice way to write this mathematically as something called *Theorem of Total Probability*

# Inexact Move Function

- I'm going to give us a `pExact` of $0.8$, `pOvershoot` of $0.1$, and `pUndershoot` of $0.1$
- I'd like you to modify the move procedure to accommodate these extra probabilities

In [8]:
# Modify the move function to accommodate the added probabilities
# of overshooting or undershooting the intended destination.

p = [0, 1, 0, 0, 0]
world = ["green", "red", "red", "green", "green"]
measurements = ["red", "green"]
pHit = 0.6
pMiss = 0.2
pExact = 0.8
pOvershoot = 0.1
pUndershoot = 0.1


def sense(p, Z):
    q = []
    for i in range(len(p)):
        hit = Z == world[i]
        q.append(p[i] * (hit * pHit + (1 - hit) * pMiss))
    s = sum(q)
    for i in range(len(q)):
        q[i] = q[i] / s
    return q


def move(p, U):
    q = []
    for i in range(len(p)):
        s = pExact * p[(i - U) % len(p)]
        s = s + pOvershoot * p[(i - U - 1) % len(p)]
        s = s + pUndershoot * p[(i - U + 1) % len(p)]
        q.append(s)
    return q


print(move(p, 1))

[0.0, 0.1, 0.8, 0.1, 0.0]


<IPython.core.display.Javascript object>

- we're going to introduce the auxiliary variable `s` which we build up in three different steps
  - we multiply the `p` value as before for the exact set off by `pExact`
  - then we add to it two more multiplied by `pOvershoot` or `pUndershoot` where we are overshooting by going yet $1$ step further than `U` or undershooting by cutting it short by $1$
  - then we add these things up and finally append the sum of those to our output probability `q`
  - when we run this, we get for our example prior of $0, 1, 0, 0, 0$ the answer $0, 0.1, 0.8, 0.1, 0$

# Limit Distribution

- suppose we have $5$ grid cells as before with an initial distribution that assigns $1$ to the first grid cell and $0$ to all the other ones
- let's assume we do $U = 1$, which means with $0.8$ chance in each action we transition $1$ to the right, with $0.1$ chance we don't move at all, and with $0.1$ chance again we skip and move $2$ steps
- again, let's assume the world is cyclic, so every time I fall off on the right side, I find myself back on the left side


- suppose I run infinitely many motion steps--then I actually get a what's called a **limit distribution**
- what's going to happen to my robot if it never senses but executes the action of going $1$ to the right on our little cyclic environment forever?
- what will be the so-called limit or stationary distribution be in the very end?


- the answer is the uniform distribution--there's an intuitive reasoning behind this
- every time we move, we lose information--that is, in the initial distribution we know exactly where we are
- one step in we have a $0.8$ chance, but the $0.8$ will fall to something smaller as we move on--$0.64$ and so on
- the distribution of the absolute least information is the uniform distribution--it has no preference whatsoever
  - that is really the result of moving many, many times
- as the robot continues to get more and more uncertain about where it is, eventually it will reach the state of maximal uncertainty: the uniform distribution

<img src="resources/limit_distribution.png"/>

- there is a way to derive this mathematically, and I can prove a property that's highly related, which is a *balance property*
  - say we take $x_4$, and we'd like to understand how $x_4$ at some timestamp $t$ corresponds to the previous time distribution over all these variables
  - for this to be stationary, it has to be the same
  - put differently, the probability of $x_4$ must be the same as $0.8 p(x2) + 0.1p(x1) + 0.1p(x3)$
    - this is exactly the same calculation we did before where we asked what's the chance of being $x_4$--well, you might be coming from $x2, x1,$ or $x3$, and there's these probabilities are $0.8, 0.1$, and $0.1$, they govern the likelihood you might have been coming from there
    - those together must hold true in the limit when things don't move anymore
    - now, you might think there are many different ways to solve this and the $0.2$ is just one solution, but it turns out $0.2$ is the only solution
      - if you plug in $0.2$ over here and $0.2$ over here and $0.2$ over here, you get $1 \times 0.2$, and that's $0.2$ on the right side
      - clearly, those $0.2$s over here meet the balance that is necessary to define a valid solution in the limit
    - the formula given is for $U=2$, not for $U=1$--this mistake in the formula does not change the result, however: at the end we get a uniform distribution

# Move Twice and Move 1000

- now let's go back to our code and move many times
- let's move twice, so please write a piece of code that makes the robot move twice, starting with the initial distribution as shown over here--$0, 1, 0, 0, 0$
  - here's a piece of code that moves twice by the same amount as before, and the output now is a vector that assigns $0.66$ as the largest value and not $0.8$ anymore


- let's move 1,000 times--write a piece of code that moves 1,000 steps and give me the final distribution
  - we have a loop for 1,000 steps--we move 1,000 times, and we print the corresponding distribution over here--it's $0.2$ in each case as expected

In [9]:
p = [0, 1, 0, 0, 0]
world = ["green", "red", "red", "green", "green"]
measurements = ["red", "green"]
pHit = 0.6
pMiss = 0.2
pExact = 0.8
pOvershoot = 0.1
pUndershoot = 0.1


def sense(p, Z):
    q = []
    for i in range(len(p)):
        hit = Z == world[i]
        q.append(p[i] * (hit * pHit + (1 - hit) * pMiss))
    s = sum(q)
    for i in range(len(q)):
        q[i] = q[i] / s
    return q


def move(p, U):
    q = []
    for i in range(len(p)):
        s = pExact * p[(i - U) % len(p)]
        s = s + pOvershoot * p[(i - U - 1) % len(p)]
        s = s + pUndershoot * p[(i - U + 1) % len(p)]
        q.append(s)
    return q


# Write code that makes the robot move twice and then
# prints out the resulting distribution, starting with the initial distribution p = [0, 1, 0, 0, 0]
# for i in range(2):
#     p = move(p, 1)

# Write code that moves 1000 times and then prints the resulting probability distribution.
for i in range(1000):
    p = move(p, 1)

# Make sure to print out p!
print(p)

[0.20000000000000365, 0.20000000000000373, 0.20000000000000365, 0.2000000000000035, 0.2000000000000035]


<IPython.core.display.Javascript object>

# Sense and Move

- we talked about measurement updates, and we talked about motion
  - we called these two routines *sense* and *move*
- localization is nothing else but the iteration of *sense* and *move*
  - there is an initial belief that is tossed into this loop
  - if you sense first, if comes to the left side--then localization cycles through these--move, sense cycle


- every time the robot moves, it loses information as to where it is
  - that's because robot motion is inaccurate
- every time it senses it gains information
- that is manifest by the fact that after motion, the probability distribution is a little bit flatter and a bit more spread out and after sensing, it's focused a little bit more
  - in fact, as a footnote, there is a measure of information called *entropy*
    - here is one of the many ways you can write it: $-\sum p(X_i)\log p(X_i)$ as the expected log (logarithmic) likelihood of the probability of each grid cell
    - without going into detail, this is a measure of information that the distribution has, and it can be shown that the update step, the motion step, makes the entropy go down, and the measurement step makes it go up--you're really losing and gaining information

### Clarification Regarding Entropy

- the video mentions that entropy will decrease after the motion update step and that entropy will increase after measurement step
- what is meant is that that entropy will decrease after the measurement update (sense) step and that entropy will increase after the movement step (move)


- in general, entropy represents the amount of uncertainty in a system
- since the measurement update step decreases uncertainty, entropy will decrease
- the movement step increases uncertainty, so entropy will increase after this step


- let's look at our current example where the robot could be at one of five different positions
- the maximum uncertainty occurs when all positions have equal probabilities $[0.2, 0.2, 0.2, 0.2, 0.2]$
- following the formula $Entropy = \sum (-p \times log(p))$, we get $-5 \times (.2)\times log(0.2) = 0.699$


- taking a measurement will decrease uncertainty and entropy
- let's say after taking a measurement, the probabilities become $[0.05, 0.05, 0.05, 0.8, 0.05]$
- now we have a more certain guess as to where the robot is located and our entropy has decreased to 0.338

- I would now love to implement this in our code
- in addition to the two measurements we had before, red and green, I'm going to give you 2 motions--1 and 1, which means the robot moves right and right again
- can you compute the posterior distribution if the robot first senses red, then moves right by 1, then senses green, then moves right again?
- let's start with a uniform prior distribution

In [10]:
# Given the list motions=[1,1] which means the robot
# moves right and then right again, compute the posterior
# distribution if the robot first senses red, then moves
# right one, then senses green, then moves right again,
# starting with a uniform prior distribution.

p = [0.2, 0.2, 0.2, 0.2, 0.2]
world = ["green", "red", "red", "green", "green"]
# measurements = ['red', 'green']
measurements = ["red", "red"]
motions = [1, 1]
pHit = 0.6
pMiss = 0.2
pExact = 0.8
pOvershoot = 0.1
pUndershoot = 0.1


def sense(p, Z):
    q = []
    for i in range(len(p)):
        hit = Z == world[i]
        q.append(p[i] * (hit * pHit + (1 - hit) * pMiss))
    s = sum(q)
    for i in range(len(q)):
        q[i] = q[i] / s
    return q


def move(p, U):
    q = []
    for i in range(len(p)):
        s = pExact * p[(i - U) % len(p)]
        s = s + pOvershoot * p[(i - U - 1) % len(p)]
        s = s + pUndershoot * p[(i - U + 1) % len(p)]
        q.append(s)
    return q


# ADD CODE HERE
for i in range(len(measurements)):
    p = sense(p, measurements[i])
    p = move(p, motions[i])

print(p)

[0.07882352941176471, 0.07529411764705884, 0.22470588235294123, 0.4329411764705882, 0.18823529411764706]


<IPython.core.display.Javascript object>

- the world has a green, a red, a red, and a green, and a green field
- the robot saw red, followed by a right motion, and green
- that suggests that it probably started with with the highest likelihood in grid cell number 3, which is the right-most of the two red cells
- it saw red correctly and then moved to the right by 1
- it saw green correctly, moved right again
- it now finds itself most likely in the right-most cell


- let's pick a different base
  - let's assume the robot saw red twice
  - it senses red, it moves, it senses red, it moves again
  - what is the most likely cell?
    - we find that the most likely cell is the 4th cell
    - that makes sense, because the best match of red, red to the world is red at indexes 1 and 2
    - after seeing the 2nd red, the robot still moved 1 to the right and finds itself in the 4th cell as shown over here


- now I want to celebrate with you the code that you just wrote, which is a piece of software that implements the essence of Google's self-driving car's localization approach
- as I said in the beginning, it's absolutely crucial that the car knows exactly where it is relative to the map of its road
- while the road isn't painted green and red, the road has lane markers
- instead of those green and red cells over here, we plug in the color of the lane markings relative to the color of the pavement
- it isn't just one observation per time step, it's an entire field of observations, an entire camera image, but you can do the same with a camera image as long as you can correspond a camera image in your model with a camera image in your measurements
- then a piece of code not much more difficult than what you coded yourself is responsible for localizing the Google self-driving car (last for lop)
  - you just implemented a major, major function that makes Google's car drive itself

# Localization Summary

- we learned that localization maintains a function over all possible places where a road might be, where each cell has an associated probability value
  - belief = probability


- the measurement update function, or *sense*, is nothing else but a product in which we take those probability values and multiply them up or down depending on the exact measurement
- because the product might violate the fact that probabilities add up to $1$, there was a product followed by normalization
  - sense = product followed by normalization


- motion was a convolution (addition)
  - this word itself might sound cryptic, but what it really means is for each possible location after the motion, we reverse engineered the situation and guessed where the world might have come from and then collected, we added, the corresponding probabilities


- something as simple as multiplication and addition solves all of localization and is the foundation for autonomous driving

# Formal Definition of Probability

- I want to spend a few minutes and go over the formal definition of localization
- I'm going to introduce probability and ask you lots of questions


- formally, we define a probability function to be $P(X)$, and it's a value that is bounded below and above by $0$ and $1$: $0 \leq P(X) \leq 1$
  - $X$ often can take multiple values
  - probabilities always add up to $1$

# Bayes' Rule

- let's look into measurements, and they will lead to something called *Bayes Rule*
- you might have heard about Bayes Rule before--it's the most fundamental consideration in probabilistic inference, but the Bayes Rule is really, really simple
- suppose $X$ is my grid cell and $Z$ is my measurement
  - then the measurement update seeks to calculate a belief over my location after seeing the measurement
  - Bayes Rule looks like this: $P(X_i|Z) = \dfrac{P(Z|X_i)P(X_i)}{P(Z)}$
    - what it does is it takes my prior distribution, $P(X)$, and multiplies in the chances of seeing a red or green tile for every possible location and out comes the non-normalized posterior distribution we had before
    - $P(X)$ is prior and $P(Z|X)$ is measurement probability
    - if we put a little index *i* over here, then just the product of the prior of the grid cell times the measurement probability, which was large if the measurement corresponded to the correct color and small if it corresponded to a false color
      - that product gave us the non-normalized posterior distribution for the grid cell
      - we programmed this; a product between the prior probability distribution and a number
    - the normalization is now the constant $P(Z)$
      - technically, that is the probability of seeing a measurement devoid of any location information


- the easiest way to understand what's going on is to realize that this is a function that assigns to each grid cell a number, and the $P(Z)$ doesn't have the grid cell as an index so no matter what grid cell we consider, the $P(Z)$ is the same
- no matter what $P(Z)$ is, because the final posterior has to be a probability distribution, by normalizing these non-normalized products, we will exactly calculate $P(Z)$
  - put differently, $P(Z)$ is the sum over all $i$ of just this product: $P(Z) = \sum_{i} P(Z|X_i)P(X_i)$
    - it's a product of our prior distribution with a measurement probability, which we know to be large if the color is correct and small otherwise


- we do this and assign it to a so-called non-normalized probability $\overline{P}(X_i|Z) \leftarrow {P(Z|X_i)P(X_i)}$
- then we  compute the normalizer $\alpha \leftarrow \sum \overline{P}(X_i|Z)$
- then we just normalize
- our resulting probability will be $\dfrac{1}{\alpha}$ of the non-normalized probability: $P(X_i|Z) \leftarrow \dfrac{1}{\alpha} \overline{P}(X_i|Z)$


- this is exactly what we did, and this is exactly Bayes Rule

# Cancer Test

- let me ask you Bayes Rule in the context of a completely different example to see if you understand how to apply Bayes Rule
- this time it's about cancer testing; it is an example that is commonly studied in statistics classes
- suppose there exists a certain type of cancer, but the cancer is rare--only 1 in a 1000 people has the cancer--where as 999 in 1000 people don't have it, illustrated by the probability of cancer and the probability of not cancer
- suppose we have a test, and the test can come out positive or negative
  - the probability that the test triggers positive if you have cancer is $0.8$, and the probability that the test comes out positive given that I'm cancer free is only $0.1$
  - clearly the test has a strong correlation to whether I have cancer


- here's a really difficult question
- can you compute for me the probability of cancer given that I just received a positive test
- think of the cancer/non cancer as the robot position and think of the positive as whether the colored door observed is the correct one

<img src="resources/cancer_test.png"/>

- the result of Bayes Rule, non-normalized of C given POS is simply the product of my prior probability, $0.001 \times 0.8$, which is the probability of a positive result in the cancer state
  - that ends up to be $0.0008$
- the non-normalized probability for the opposite event, the non-cancerous event, given a positive test, is $0.999 \times 0.1$
  - that's obviously $0.0999$
- our normalizer is the sum of both of those, which is $0.1007$
- dividing $0.0008$, the non-normalized probability, by $0.1007$ gives us $0.0079$
- the answer is 0.0079--in other words, there's only $0.79%$ chance, $0.79$ out of $100$ that, despite the positive test result, that you have cancer


- we just applied Bayes Rule to compute a really involved probability of having cancer after seeing a test result

# Theorem of Total Probability

- let's look at motion, which will turn out to be something we will call total probability
- you remember that we cared about a grid cell $X_i$ and we asked what is the chance of being in $X_i$ after robot motion?
- to indicate the after and before, let me add a time index--$t$ up here, is an index for time
  - I write it superscript so there is no confusion with the index $i$, which is the grid cell


- you might remember the way we computed this was by looking at all the grid cells the robot could have come from on time step earlier--indexed here by $j$
- we looked at the prior probability of those grid cells at time $t - 1$
- we multiply with the probability that our motion command would carry us from $X_j$ to $X_i$
- this is written as a condition distribution as follows: $P(X_i^t) = \sum_{j}P(X_j^{t-1}) \cdot P(X_i|X_j)$

<img src="resources/total_probability.png"/>

- this was exactly what we implemented
  - if there was our grid cells over here and we asked one time step later about a specific grid cell over here, we would combine $0.8$ from over here, $0.1$ from over here, and $0.1$ from over here into the probability of this grid cell
  - it's the same formula as above
    - this is now $X_i$, and the way we find the posterior probability for $X_i$ is to go through all possible places from which we could have come, all the different $j$'s, look at the prior probabilities, multiply it by the probability that $i$ transition from $j$ to $i$ given my motion command, which in this case is go $1$ to the right side


- in probability terms, people often write it as follows: $P(A) = \sum_{B} P(A|B)P(B)$
  - this is just the way you'd find it in text books, and you can see directly the correspondence of $A$ as a place $i$ of time $t$ and all the different $B$s as the possible prior locations
  - that is often called the *Theorem of Total Probability*
- the operation of a weighted sum over other variables is often called a *convolution*

# Coin Flip

- suppose I flip a coin, and the coin comes up tails or heads
- suppose it's a fair coin; the probability of tails or of heads is both $\frac{1}{2}$
- let's say that the coin comes up tails, and I just accept and don't do anything
- but suppose it comes up heads, and I flip it again, and after $1$ flip, I accept the result
- my quiz for you is what is the probability that the final result is heads?
  - that's an example of total probability

<img src="resources/coin_flip.png"/>

- the answer is $\frac{1}{4}$
- it's easy to see that the probability of heads in step $2$ is the probability of heads in step $2$ conditioned on heads in step $1$ times probability of heads in $1$ plus, that's the sum, probability of heads in step $2$ given we had tails in step $1$ times probability of tails in step $1$
  - now, the way I set it up, those things here are equally likely
  - however, if we did have tails in step $1$, we would never toss the coin again and just accept it
  - it's impossible that in step $2$ I flip over the heads; it's probability is zero
  - whereas if I found heads, I would flip again and then the $0.5$ chance I arrive at heads
  - if I look at this, then this all becomes zero, and these guys multiply to $\frac{1}{4}$

# Two Coins

- there are multiple coins; one is fair and one is loaded
  - the fair coin has a probability of heads of $0.5$
  - the loaded coin has a probability of heads of $0.1$
- I'm going to grab a random coin with $50\%$ chance; the fair coin will be chosen with $50\%$ chance, and the loaded coin will be chosen with $50\%$ chance, but I don't know which one it is
- I flip it and I observe heads
- what's the probability that the coin I hold in my hand is fair?
  - apply anything you've learned before--one of the rules you've learned before is exactly the right one to apply here

<img src="resources/two_coins.png"/>

- what I'm really asking you is the probability of a fair coin $F$  given that I observed $H$
- this has nothing to do with total probability and all with Bayes Rules, because I'm talking about observations
- the non-normalized probability according to Bayes Rule is obtained as follows:
  - the probability of observing $H$ for the fair coin is $0.5$, and the probability of having grabbed the fair coin is $0.5$ as well
- the non-normalized probability of not $F$ given $H$, which is the loaded coin, is probability of $H$ given not $F$, which we know to be $0.1$ times the probability of not picking the fair coin, which is $0.5$, ends up to be $0.05$


- when we now normalize, we get $\alpha = 0.25 + 0.05$, which is $0.3$
- if we now normalize the $0.25$ over here with the $0.3$, we get $0.833$, which is the same as $\frac{5}{6}$
- that's our posterior probability we hold the fair coin after we observed $H$