- to understand what the prediction team does, imagine at T-shaped intersection
- you are a self-driving car that has just pulled up to the stop sign
- you want to turn left but your sensors notice another vehicle coming from the left
- at this point, you as a human probably know the other vehicle will do one of two things - either it will go straight or it will go right
- let's say that at this point, the other vehicle starts slowing down and moving right in the lane
- you probably know that they are turning right which means it is safe for you to go left
- by making a successful prediction, you were able to make a decision that got you to your destination safely and efficiently


- what makes prediction interesting but also challenging is that it is inherently multi-modal
- a good way of thinking about what that means is to think of how you would answer the question, "Where is the other car likely to be in five seconds?"
- if we try to come up with the probability distribution, we would see that it has multiple peaks or modes

<img src="resources/t_intersection_probability_distribution.png"/>

- if the car is going straight, then the car is likely to be somewhere here (green) but if the car turns right, then it's more likely to be here (blue)
- in general, the way we think about handling multi-modal uncertainty is by maintaining some beliefs about how probable each potential mode is
- initially, if we just see this green car coming from far away, those beliefs could be initialized using some prior knowledge about this intersection
- in this case, let's say that cars generally go straight at this intersection, but as we continue watching the car, we may notice that it is slowing down
- since this behavior is more consistent with turning right, the probability of turning right increases
- and then, at the next timestep, we might notice that the car has already started turning right which again increases the probability of turning right
- and as we keep observing, we continue updating our belief based on new evidence until eventually we can predict with high certainty that the vehicle is turning right at this intersection


- the responsibility of the prediction module is to do the following:
  - we take as input a map of the world and data from sensor fusion and generate as output some predictions of the future state of all the vehicles and other moving objects in the vicinity of our vehicle
- typically, these predictions are represented by a set of possible trajectories like that two dotted arrows emanating from the green car in this scenario and an associated probability for each trajectory

<img src="resources/t_intersection_trajectories_dotted.png"/>

- before we get into the details, let me explain what we are going to discuss in this lesson
- first, we'll go through a brief overview where you will learn a bit more about the inputs and outputs to prediction
- next, we will talk about how prediction is actually done
  - we will discuss the two main classes of prediction techniques: model-based approaches and data-driven approaches
    - model-based approaches use mathematical models of motion to predict trajectories
    - data-driven approaches rely on machine learning and examples to learn from
- then, I will briefly walk you through how to apply a strictly data-driven approach for prediction called trajectory clustering
- then, we will dig into model-based approaches
  - I'll introduce process models as a mathematical technique for modeling various maneuvers like lane changes, vehicle following, etc.
  - I'll introduce multi-modal estimators as an effective technique for handling the uncertainty associated with prediction, namely, the uncertainty about which maneuver an object will do in a particular situation
- finally, we will dive deep into hybrid approaches which use data and process models to predict motion through a cycle of intense classification where we try to figure out what a driver wants to do and trajectory generation
  - there we try to figure out how they are likely to do it
  - we will end by implementing an algorithm called Naive Bayes to predict the motion of a car at a T-shaped intersection like the one you just saw

# I/O Recap

<img src="resources/prediction_io_recap_directions.jpg"/>

- a prediction module uses a map and data from sensor fusion to generate predictions for what all other **dynamic** objects in view are likely to do
- to make this clearer, let's look at an example (in json format) of what the input to and output from prediction might look like

### Example Input - Sensor Fusion

```python
{
    "timestamp" : 34512.21,
    "vehicles" : [
        {
            "id"  : 0,
            "x"   : -10.0,
            "y"   : 8.1,
            "v_x" : 8.0,
            "v_y" : 0.0,
            "sigma_x" : 0.031,
            "sigma_y" : 0.040,
            "sigma_v_x" : 0.12,
            "sigma_v_y" : 0.03,
        },
        {
            "id"  : 1,
            "x"   : 10.0,
            "y"   : 12.1,
            "v_x" : -8.0,
            "v_y" : 0.0,
            "sigma_x" : 0.031,
            "sigma_y" : 0.040,
            "sigma_v_x" : 0.12,
            "sigma_v_y" : 0.03,
        },
    ]
}
```

### Example Output - Sensor Fusion

```python
{
    "timestamp" : 34512.21,
    "vehicles" : [
        {
            "id" : 0,
            "length": 3.4,
            "width" : 1.5,
            "predictions" : [
                {
                    "probability" : 0.781,
                    "trajectory"  : [
                        {
                            "x": -10.0,
                            "y": 8.1,
                            "yaw": 0.0,
                            "timestamp": 34512.71
                        },
                        {
                            "x": -6.0,
                            "y": 8.1,
                            "yaw": 0.0,
                            "timestamp": 34513.21
                        },
                        {
                            "x": -2.0,
                            "y": 8.1,
                            "yaw": 0.0,
                            "timestamp": 34513.71
                        },
                        {
                            "x": 2.0,
                            "y": 8.1,
                            "yaw": 0.0,
                            "timestamp": 34514.21
                        },
                        {
                            "x": 6.0,
                            "y": 8.1,
                            "yaw": 0.0,
                            "timestamp": 34514.71
                        },
                        {
                            "x": 10.0,
                            "y": 8.1,
                            "yaw": 0.0,
                            "timestamp": 34515.21
                        },
                    ]
                },
                {
                    "probability" : 0.219,
                    "trajectory"  : [
                        {
                            "x": -10.0,
                            "y": 8.1,
                            "yaw": 0.0,
                            "timestamp": 34512.71
                        },
                        {
                            "x": -7.0,
                            "y": 7.5,
                            "yaw": -5.2,
                            "timestamp": 34513.21
                        },
                        {
                            "x": -4.0,
                            "y": 6.1,
                            "yaw": -32.0,
                            "timestamp": 34513.71
                        },
                        {
                            "x": -3.0,
                            "y": 4.1,
                            "yaw": -73.2,
                            "timestamp": 34514.21
                        },
                        {
                            "x": -2.0,
                            "y": 1.2,
                            "yaw": -90.0,
                            "timestamp": 34514.71
                        },
                        {
                            "x": -2.0,
                            "y":-2.8,
                            "yaw": -90.0,
                            "timestamp": 34515.21
                        },
                    ]

                }
            ]
        },
        {
            "id" : 1,
            "length": 3.4,
            "width" : 1.5,
            "predictions" : [
                {
                    "probability" : 1.0,
                    "trajectory" : [
                        {
                            "x": 10.0,
                            "y": 12.1,
                            "yaw": -180.0,
                            "timestamp": 34512.71
                        },
                        {
                            "x": 6.0,
                            "y": 12.1,
                            "yaw": -180.0,
                            "timestamp": 34513.21
                        },
                        {
                            "x": 2.0,
                            "y": 12.1,
                            "yaw": -180.0,
                            "timestamp": 34513.71
                        },
                        {
                            "x": -2.0,
                            "y": 12.1,
                            "yaw": -180.0,
                            "timestamp": 34514.21
                        },
                        {
                            "x": -6.0,
                            "y": 12.1,
                            "yaw": -180.0,
                            "timestamp": 34514.71
                        },
                        {
                            "x": -10.0,
                            "y": 12.1,
                            "yaw": -180.0,
                            "timestamp": 34515.21
                        }
                    ]
                }
            ]
        }
    ]
}
```

- the predicted trajectories shown above only extend out a few seconds
  - in reality the predictions we make extend to a horizon of $10-20$ seconds
- the trajectories shown have $0.5$ second resolution
  - in reality we would generate slightly finer-grained predictions
- this example only shows vehicles but in reality we would also generate predictions for all dynamic objects in view


- **Q:** How many possible trajectories are given for the car on the left (with id of 0) (image above)?
- **A:** 2 and the probabilities for those two trajectories sum to 1.

# Model-Based vs Data-Driven Approaches

- imagine a T-shaped intersection
- the blue self-driving car pulls up to that stop sign and would like to make a left turn but it sees this green car coming from the left
- if the green car is turning right, it is safe for the blue car to go, but if the green car is going straight, then the blue car should wait

<img src="resources/model_based_vs_data_driven.png"/>

- the way we would handle this with the model based approach is as follows:
  - we would come up with two process models, one for going straight and one for turning right
  - we would use some simple trajectory generator to figure out what trajectory we would expect to see if the driver were going straight or turning right
  - we would pay attention to the actual behavior of the target vehicle and using a multimodal estimation algorithm (which is still a black box for now) we would compare observed trajectory to the ones we would expect for each of our models
  - based on that we would assign a probability to each of the possible trajectories
- the important takeaway for purely model based prediction is that we have some bank of possible behaviors and each has a mathematical model of motion which takes into account the physical capabilities of the object as well as the constraints imposed by the road traffic laws and other restrictions


- with the purely data driven approach we have a truly blackbox algorithm and this algorithm will be trained on lots of training data
- once it's trained we just fitted the observed behavior and let it make a prediction about what will happen next


- we can see that each approach has its own strengths
- model based approaches incorporate our knowledge of physics constraints imposed by the road traffic etc.
- data driven approaches are nice because they let us use data to extract subtle patterns that would otherwise be missed by model based approaches
  - for example differences in vehicle behavior at an intersection during different times of the day

# Which is Best?

- neither approach (model based or data driven) is strictly better than the other but there are certain situations in which one is more useful than the other
- think about the following situations and whether model-based or data-driven approaches would be more useful


- **Q:** Determining maximum safe turning speed on a wet road.
- **A:** In this situation we could use a model based approach to incorporate our knowledge of physics (friction, forces, etc...) to figure out exactly (or almost exactly) when a vehicle would begin to skid on a wet road.


- **Q:** Predicting the behavior of an unidentified object sitting on the road.
- **A:** Even with data driven approaches this would still be a very hard problem but since we don't even know what this object is, a model based approach to prediction would be nearly impossible.


- **Q:** Predicting the behavior of a vehicle on a two lane highway in light traffic.
- **A:** You could really use either approach (or a hybrid approach) in this situation. On the one hand there are very few behaviors we need to model in a highway driving situation and the physics are all very well understood so model based approaches could work. On the other hand it would be relatively easy to collect a lot of training data in similar situations so a purely data driven approach could work too.

# Data Driven Example - Trajectory Clustering

- since you already went through a machine learning class, we won't go into these techniques in too much detail
- in this video, I would like to show you one example that is representative of what these algorithms are good at--trajectory clustering
- as is the case with all data driven prediction techniques, there will be two phases
  - an offline training phase where the algorithm learns a model from data
  - an online prediction phase where it uses that model to generate predictions


- [Trajectory Clustering for Motion Prediction](http://video.udacity-data.com.s3.amazonaws.com/topher/2017/July/5978c2c6_trajectory-clustering/trajectory-clustering.pdf) - *Sung, Feldman, and Rus*

- let's discuss the offline phase first
  - the first step is to get a lot of data which you might do by placing a static camera at an intersection
  - then, we have to clean the data since some of the cars we observe may be occluded or something else went wrong in the processing step--so we need to discard the bad data
  - once the data is gathered and cleaned up, we would be left with a bunch of trajectories that look something like this (white lines)
  - next, we need to define some mathematical measure of trajectory similarity
    - there are many ways to do this but intuitively we want something that says a trajectory like this one in red is more similar to this one in pink than it is to this one in blue, even though the red and blue trajectories overlap more closely for a while than the red and pink ever do
  - once we have a measure of similarity we can use a machine learning algorithm like agglomerative clustering or a spectral clustering to cluster these trajectories (perform unsupervised learning)
    - in the case of a four-way stop intersection, we would expect to see 12 clusters since at each of the four stop signs cars can do one of three things: turn right, go straight or turn left
    - if we were looking at just one of those four stop signs, we would expect to see a cluster of trajectories for left turns, going straight, and turning right
    - note that in some situations you may obtain even more clusters than that--for example, if this lane is controlled by a traffic light instead of stop, your clustering algorithm will probably create twice as many clusters
  - once the trajectories have been grouped into clusters, it is useful to define what prototype trajectories look like for each cluster
  - at this point, we have a trained model of typical car behavior at this intersection and the next step is to use this model on the road to actually generate predictions

<img src="resources/offline_phase_lines.png"/>

- once our clustering algorithm has identified clusters and prototype trajectories, in this case three clusters with three prototype trajectories each, we can begin the job of online prediction for a vehicle that we meet on the road
  - first, we observe the vehicle's partial trajectory
  - next we compare it to the corresponding segments of the prototype trajectories for each cluster
    - this comparison is done using the same similarity measure we used earlier to perform the clustering
  - the belief for each cluster is updated based on how similar the partial trajectory is to the prototype trajectories
  - finally, we compute a predicted trajectory for each cluster
    - for example, by taking the most similar prototype trajectory

<img src="resources/online_prediction_example.png"/>

- let's make this more clear by following this car forward in time from $t = 0$ to $t = 1$--let's go through these steps
  - one, we observe the partial trajectory between time $0$ and $1$
    - it is this green line behind the vehicle
  - two, well since all of the prototype trajectories overlap up to this point, the trajectory comparison step will yield the same probability for each cluster
  - three, even though there is no clear winner within each cluster we still have to choose one prototype trajectory to represent each cluster and we broadcast these three trajectories with their associated probabilities
  - next, a $t = 2$ things get more interesting
    - now when you make a comparison of the partial trajectory with the nine prototype trajectories, we find that the vehicle's partial trajectory seems more similar to the red than purple or blue
    - when we update the associated probabilities we might get something like this (third column)--note that red grows in probability and the blue and purple both shrink, but blue shrinks more than purple since the partial trajectory is a worse match for blue
  - then, we pick the best prototype trajectory for each cluster and use them to represent the future trajectory of the car
  - as we continue this process, we see the probability for the red cluster quickly approaches one

# Thinking about Model Based Approaches

- data driven approaches can be very useful, particularly when we have access to plenty of training data but in some ways purely data driven approaches are naïve since they rely solely on historical evidence to make predictions about likely future behavior
- ideally, we would also like to include, in our predictions, all the insights we have about driver behavior, physics, or vehicle dynamics
- this is where model based approaches can help
- the way these approaches typically work is as follows:
  - for each object identify all the behaviors that object is likely to do in the current situation
    - the behavior for a vehicle could be something like change lanes, turn left and for a pedestrian it could be cross the street on pedestrian crossing
    - for our intersection scenario, the behaviors could be go straight, turn left, turn right
    - whatever it is, it needs to be something that we can describe mathematically
  - step two, define a process model for each behavior
    - a process model is a mathematical description of object motion for behavior
    - it is a function which can be used to compute the state of the object at time $t + 1$ from the state at time $t$
    - the process model must incorporate some uncertainty which represents how much we trust our model
    - if you keep running the process models your uncertainty will increase
  - once we have a process model for each behavior we can go to the next step, step three, which is to use the process models to compute the probability of each behavior (i.e. update beliefs by comparing the observation with the output of the process model)
    - this is done by taking the observed state of the object at time $t-1$, running the process models to compute the expected state of the object at time $t$
    - then we compare the observed state of the object at time $t$ with what our process models predicted
    - we use a multimodal estimation algorithm to derive the probability of each maneuver
    - the purpose of these algorithms is to maintain some belief about how likely it is that the driver intends to perform each behavior--we'll go into more detail later
  - the fourth and final step is to predict a trajectory for each behavior
    - this is done easily by iterating on the process models until the prediction horizon is reached

# Frenet Coordinates

- before we discuss process models, we should mention "Frenet Coordinates", which are a way of representing position on a road in a more intuitive way than traditional $(x,y)$ Cartesian Coordinates
- with Frenet coordinates, we use the variables $s$ and $d$ to describe a vehicle's position on the road
  - the $s$ coordinate represents distance along the road (also known as longitudinal displacement) 
  - the $d$ coordinate represents side-to-side position on the road (also known as lateral displacement)


- why do we use Frenet coordinates?
- imagine a curvy road like the one below with a Cartesian coordinate system laid on top of it

<img src="resources/frenet_1.png"/>

- using these Cartesian coordinates, we can try to describe the path a vehicle would normally follow on the road

<img src="resources/frenet_2.png"/>

<img src="resources/frenet_3.png"/>

- notice how curvy that path is!--if we wanted equations to describe this motion it wouldn't be easy!


- ideally, it should be mathematically easy to describe such common driving behavior--but how do we do that?
- one way is to use a new coordinate system
- now instead of laying down a normal Cartesian grid, we do something like you see below

<img src="resources/frenet_4.png"/>

- here, we've defined a new system of coordinates
- at the bottom we have $s=0$ to represent the beginning of the segment of road we are thinking about and $d=0$ to represent the center line of that road
- to the left of the center line we have negative $d$ and to the right $d$ is positive


- so what does a typical trajectory look like when presented in Frenet coordinates?

<img src="resources/frenet_5.png"/>

<img src="resources/frenet_6.png"/>

- it looks straight!
- in fact, if this vehicle were moving at a constant speed of $v_0$ we could write a mathematical description of the vehicle's position as:
  - $s(t) = v_0t$
  - $d(t) = 0$


- we'll be working with Frenet coordinates a good deal in the rest of the course, because straight lines are so much easier than curved ones

<img src="resources/frenet_7.png"/>

# Process Models

- let's consider some process models for a situation where a self-driving car is trying to merge onto a highway
- but, let's say there is another vehicle in the rightmost lane
- now, this vehicle might do a few things: it might just ignore us, it might speed up to let us merge behind it, it might slow down to let us get ahead of it, or it might change lanes
- for each of these behaviors, we want to come up with a process model that formalizes the likely motion of the car
  - if the car ignores us, it will likely follow lane A (current) with constant velocity
  - if they speed up, we may choose to model their motion as lane following with positive acceleration
  - if they slow down, we would do the same thing, but with negative acceleration
  - lane changing, we could model as lane following on lane B (left of lane A), with constant velocity


- what is lane following--how do we describe it mathematically?
- in general, there is a tradeoff between simplicity and accuracy when choosing a process model
  - one very simple approach is to treat the car as a point particle with holonomic properties
    - this means we assume the point can move in any direction at any time, which of course is a very simplistic assumption
    - the simplest motion models are linear--constant velocity lane following for any coordinates where the car moves forward at each timestep, and is assumed to keep a constant distance to the lane center
    - in practice, linear point models usually wind up being too simplistic
  - the next step in complexity happens when we allow non-linearities into our model
    - typically, if you start incorporating heading into our state vector, you will end up with sines and cosines in our model equations
    - note the presence of cosine and sine, which are where the non-linearity comes in
  - the next jump in complexity happens when we take into account that a car is a non-holonomic system
    - a popular approach is to use a bicycle model, which takes two inputs, the steering angle and the acceleration
    - for the steering angle, we could use a PID controller with the target lane center line as the reference line
    - for the acceleration, we could once again use a constant velocity model, or a constant acceleration model, or if we wanted more complex acceleration behavior, we could use a PID controller with the speed limit as the target


- in practice, these sorts of models tend to strike a good balance between simplicity and accuracy but you could always go more complex by including more details about vehicle dynamics
  - for example, you could use a dynamic bicycle model
    - note the presence of terms like $F_{c,f}$, which represents the lateral force on the tires at the front of the vehicle, and $F_{c,r}$, which represents the lateral force on the rear tire
    - you could even add more complexity and model the four wheels of the car
- while these models are technically more accurate than any of the others, in practice, using them doesn't usually make sense for prediction
- there is so much uncertainty inherent to predicting the behaviors of other drivers that minor accuracy improvements to process models just aren't worth the computational overhead that they come with

<img src="resources/process_models.png"/>

- note how all the models contain an additional term $W$
- this is where the uncertainty on the process model is stored
- a classic choice to represent uncertainty is a multivariate Gaussian with zero mean

# More on Process Models

- later in the lesson I'm going to ask you to read a paper titled [A comparative study of multiple-model algorithms for maneuvering target tracking](https://d17h27t6h515a5.cloudfront.net/topher/2017/June/5953fc34_a-comparative-study-of-multiple-model-algorithms-for-maneuvering-target-tracking/a-comparative-study-of-multiple-model-algorithms-for-maneuvering-target-tracking.pdf) but for now I'd like you to take a look at section 3.1 and 3.2 only
  - this section, titled MM Tracking Algorithms' Design, discusses the 9 process models used in the earlier part of the paper


### Notes on Notation

#### 1. Matrix Notation

- when you see something like the following: $F_{CV} = \text{diag}[F_2, F_2], F_2 = \begin{bmatrix} 1 & T \\ 0 & 1 \end{bmatrix}$ it means that $F$ is a 4x4 matrix, with $F_{2}$ as blocks along the diagonal
- written out fully, this means: $F_{CV} = \begin{bmatrix} 1 & T & 0 & 0\\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & T \\ 0 & 0 & 0 & 1 \end{bmatrix}$

#### 2. State Space

- the process models all use cartesian coordinates
- the state space is $\mathbf{x} = \begin{bmatrix} x\\ \dot{x} \\ y \\ \dot{y} \end{bmatrix}$

#### 3. Variables

- the equation $x_{k} = Fx_{k-1} + Gu_{k-1} + Gw_k, \ \ w_k \sim \mathcal{N}(0,Q)$ should be read as follows:
  - the predicted state at time $k(x_k)$ is given by evolving $(F)$ the previous state $(x_{k-1})$, incorporating $(G)$ the controls $(u_{k-1})$ given at the previous time step, and adding normally distributed noise $(w_k)$

# Multimodal Estimation

- we have talked about individual process models but we haven't yet talked about how to maintain some beliefs about which behavior the driver intends to perform
- this is the role of multimodal estimation algorithms
- a simple approach to multimodal estimation is called *autonomous multiple model estimation* or *AMM*
- while describing AMM, I will use the same notation that is used in a paper
  - the variable $M$ represents the number of process models or behaviors and each variable $\mu$ represents the probability of behavior


- to understand how these probabilities are computed, let's go back to our T intersection example
- let's say we have two process models here--one to go straight and one to turn right and they both have Gaussian uncertainty
- let's say that we observe this state for the vehicle at time $k-1$ and we observe this state for the vehicle at time $k$
- in order to compute the new probabilities for the behaviors based on the new observation, we will run our two process models for one step starting from the state at time $k-1$ 
- when we do this, we get these two expected states (white clouds) for time $k$

<img src="resources/multimodal_estimation_1.png"/>

- if we just look at what the distribution on the vehicle's $s$ coordinate looks like for the two expected states, this is what we see
  - the red curve gives the probability density function of $s$ for turning right,
  - the blue curve represents going straight
  - the observation at timestep $k$ is somewhere here (green)
- we can see that this observation is substantially more consistent with turn right than it is with go straight
- this is measured by the likelihood of the observation as for each model and the probability of each behavior is a function of these likelihoods and of the probabilities computed in the previous timestep

<img src="resources/multimodal_estimation_2.png"/>

- the important quantity for the AMM is the ratio of these likelihoods after they get multiplied by the previous probability
- the equation ends up looking like this ($\mu_k^{(i)}$), where this term is the probability of model $i$ at timestep $k$
  - it includes the probability of model $i$ at timestep $k-1$
  - the $L$ term is the likelihood of the observation at time $k$ for that model
  - the denominator just serves to normalize our probabilities so that they all sum to one
  - $M$, in this situation would be two since we are only considering two maneuvers

# Summary of Data Driven and Model Based Approaches

#### Data-Driven Approaches

- they solve the prediction problem in two phases:
  - offline training
    - in this phase the goal is to feed some machine learning algorithm a lot of data to train it
    - for the trajectory clustering example this involved:
      - define similarity - we first need a definition of similarity that agrees with human common-sense definition
      - unsupervised clustering - at this step some machine learning algorithm clusters the trajectories we've observed
      - define prototype trajectories - for each cluster identify some small number of typical "prototype" trajectories
  - online prediction
    - once the algorithm is trained we bring it onto the road
    - when we encounter a situation for which the trained algorithm is appropriate (returning to an intersection for example) we can use that algorithm to actually predict the trajectory of the vehicle
    - for the intersection example this meant:
      - observe partial trajectory
        - as the target vehicle drives we can think of it leaving a "partial trajectory" behind it
      - compare to prototype trajectories
        - we can compare this partial trajectory to the corresponding parts of the prototype trajectories
        - when these partial trajectories are more similar (using the same notion of similarity defined earlier) their likelihoods should increase relative to the other trajectories
      - generate predictions
        - for each cluster we identify the most likely prototype trajectory
        - we broadcast each of these trajectories along with the associated probability (see the image below)

<img src="resources/online_prediction_summary.jpg"/>


#### Model Based Approaches

- you can think of model based solutions to the prediction problem as also having an "offline" and "online" component
- in that view, this approach requires:
  - defining process models (offline)
    - you saw how process models can vary in complexity from very simple... $\begin{bmatrix} \dot{s}\\ \dot{d} \end{bmatrix} = \begin{bmatrix} s_{0} \\ 0 \end{bmatrix} + \mathbf{w}$ to very complex... $\begin{bmatrix} \ddot{s} \\ \ddot{d} \\ \ddot{\theta} \end{bmatrix} = \begin{bmatrix} \dot{\theta}\dot{d} + a_s \\ -\dot{\theta}\dot{s} + \frac{2}{m}(F_{c,f}\cos\delta + F_{c,r}) \\ \frac{2}{I_z} (l_f F_{c,f} - l_rF_{c,r}) \end{bmatrix} + \mathbf{w}$
  - using process models to compare driver behavior to what would be expected for each model
    - process models are first used to compare a target vehicle's observed behavior to the behavior we would expect for each of the maneuvers we've created models for
    - the pictures below help explain how process models are used to calculate these likelihoods
     <img src="resources/process_models_likelihoods.jpg"/>
    - on the left we see two images of a car
      - at time $k-1$ we predicted where the car would be if it were to go straight vs go right
      - then at time $k$ we look at where the car actually is
      - the graph on the right shows the car's observed $s$ coordinate along with the probability distributions for where we expected the car to be at that time
      - in this case, the $s$ that we observe is substantially more consistent with turning right than going straight
  - probabilistically classifying driver intent by comparing the likelihoods of various behaviors with a multiple-model algorithm
    - in the image above you can see a bar chart representing probabilities of various clusters over time
    - multiple model algorithms serve a similar purpose for model based approaches: they are responsible for maintaining beliefs for the probability of each maneuver
    - the algorithm we discussed is called the Autonomous Multiple Model algorithm (AMM) and it can be summarized with this equation: $\large \mu_k^{(i)} = \frac{\mu_{k-1}^{(i)}L_k^{(i)}}{\sum_{j=1}^M\mu_{k-1}^{(j)}L_k^{(j)}}$
    - or, if we ignore the denominator (since it just serves to normalize the probabilities), we can capture the essence of this algorithm with $\mu_k^{(i)} \propto \mu_{k-1}^{(i)}L_k^{(i)}$ where the $\mu_k^{(i)}$ is the probability that model number $i$ is the correct model at time $k$ and $L_k^{(i)}$ is the likelihood for that model (as computed by comparison to process model)
  - extrapolating process models to generate trajectories
    - trajectory generation is straightforward once we have a process model
    - we simply iterate our model over and over until we've generated a prediction that spans whatever time horizon we are supposed to cover
    - note that each iteration of the process model will necessarily add uncertainty to our prediction

# Overview of Hybrid Approaches

- so far you have seen that prediction can be done with model based or data driven approaches
- you learned that model based approaches incorporate our knowledge of the objects motion dynamics with process models, handle uncertainty on maneuvers using multimodal estimators, and you have seen that there are many ways to implement model based approaches
- in learning about data driven approaches you saw that there are many versions of "data driven" approaches and that these approaches can extract subtle patterns from training data which means they can produce predictions which are very well tailored to a specific driving situation


- in practice, the best way to do prediction is often by taking a hybrid approach that takes advantage of the strengths of both types of approaches
- remember earlier when we talked about how model based approaches combine process models with a multimodal estimator?
- well, the multimodal estimator could be replaced with a machine learning approach
- to replace that component with a machine learning approach, the type of algorithm we need is a classifier

# Intro to Naive Bayes

- one strategy that is often used in hybrid approaches for behavior classification is to combine a classifier with a filter
- for now we will introduce a simple behavior classifier, Naive Bayes

<img src="resources/naive_bayes.png"/>

- let's walk through how Naive Bayes works by using an example
- we are going to reason on gender using statistics of two feature variables: height and weight
- as an output we want the probability that a person is male or female given their height or weight
- in a Naive Bayes classifier, the probability of being male given height and weight is just the probability of the height given male times the probability of that weight given male multiplied by the prior probability of being male divided by the probability of the height and weight in the overall population
- the reason why this algorithm is called Naive Bayes is that it assumes that all features contribute independently while in reality there is correlation between height and weight in people
- in practice the independence assumption often winds up working
- the equation above can be simplified--this term (red) will affect both probabilities male and female in the same way--it is just a normalization factor
  - this means we can first compute this part of the equation both for male and female and then compute the final probability of male given height and weight and probability of female given height and weight by normalizing them to make them sum to one


- the problem is all about finding these terms
- often, we can assume it goes in distribution for the feature variables
- if you make this assumption the algorithm is then called Gaussian Naive Bayes
- so in practice, implementing a good Gaussian Naive Bayes classifier is all about:
  - selecting the correct feature variables for the classification problem
    - this is where we can use some human intuition combined with feature selection algorithms to anticipate what factors are relevant for a given classification situation
    - eye color for example would not be very useful in predicting gender
  - identifying some good means and variances for different classes
     - we can either guess these numbers or we can look at lots of data to learn them
     - for example, if you have access to lots of data about how drivers handle intersections and if you define some good features which indicating tension's of drivers you could use Naive Bayes to compute the probability of each behavior at each time step
    - for the trajectory prediction part you could use one of the following models we talked about earlier


- **Q:** A car on a highway is approaching an exit ramp. We want to classify the driver's intent as "go straight" or "exit right". Which of the following state variables would be least useful to this classification?
- **A:** The $s$ coordinate will not help in distinguishing between these two behaviors.

# Implement Naive Bayes

- in this exercise you will implement a Gaussian Naive Bayes classifier to predict the behavior of vehicles on a highway
- in the image below you can see the behaviors you'll be looking for on a 3 lane highway (with lanes of 4 meter width)
- the dots represent the $d$ (y axis) and $s$ (x axis) coordinates of vehicles as they either:
  - change lanes left (shown in blue)
  - keep lane (shown in black)
  - change lanes right (shown in red)

<img src="resources/naive_bayes_highway.png"/>

- your job is to write a classifier that can predict which of these three maneuvers a vehicle is engaged in given a single coordinate (sampled from the trajectories shown below)
- each coordinate contains 4 features:
  - $s$, $d$, $\dot{s}$, $\dot{d}$
- you also know the lane width is 4 meters (this might be helpful in engineering additional features for your algorithm)

### Instructions

- implement the `train(data, labels)` method in the class `GNB` in `classifier.cpp`
  - training a Gaussian Naive Bayes classifier consists of computing and storing the mean and standard deviation from the data for each label/feature pair
    - for example, given the label "change lanes left” and the feature $\dot{s}$, it would be necessary to compute and store the mean and standard deviation of $\dot{s}$ over all data points with the "change lanes left” label
  - additionally, it will be convenient in this step to compute and store the prior probability $p(C_k)$ for each label $C_k$
    - this can be done by keeping track of the number of times each label appears in the training data
- implement the `predict(observation)` method in `classifier.cpp`
  - given a new data point, prediction requires two steps:
    - compute the conditional probabilities for each feature/label combination
      - for a feature $x$ and label $C$ with mean $\mu$ and standard deviation $\sigma$ (computed in training), the conditional probability can be computed using the formula [here](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Gaussian_naive_Bayes): $p(x = v | C) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp^{-\frac{(v-\mu)^2}{2\sigma^2}}$
        - here $v$ is the value of feature $x$ in the new data point
    - use the conditional probabilities in a Naive Bayes classifier
      - this can be done using the formula [here](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Constructing_a_classifier_from_the_probability_model): $y = \underset{k\in (1,\ldots, K)}{argmax } \,\,p(C_k) \prod^n_{i=1}p(x_i = v_i| C_k)$
        - in this formula, the argmax is taken over all possible labels $C_k$ and the product is taken over all features $x_i$ with values $v_i$

- you are welcome to use some existing implementation of a Gaussian Naive Bayes classifier
- but to get the best results you will still need to put some thought into what features you provide the algorithm when classifying
- though you will only be given the 4 coordinates listed above, you may find that by "engineering" features you may get better performance
  - for example: the raw value of the $d$ coordinate may not be that useful
  - but `d % lane_width` might be helpful since it gives the relative position of a vehicle in it's lane regardless of which lane the vehicle is in


- helpful resources
  - [sklearn documentation on GaussianNB](http://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes)
  - [Wikipedia article on Naive Bayes / GNB](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Gaussian_naive_Bayes)


- `Nd013_Pred_Data` has all the training and testing data for this problem

- code is available in `code/01_implement_naive_bayes/`

# Conclusion

- in this lesson, you learned about the prediction problem and the two main classes of solutions, model based solutions and data driven solutions
- you also learned about one particular hybrid approach which uses a Gaussian naive bayes classifier to predict driver behavior
- there isn't any one correct approach--that's part of what makes the prediction problem so fundamentally difficult


- in fact, in this lesson we actually made a simplifying assumption by only considering one object at a time
- if you were to take into account multiple objects, then you would also have to take into account interactions between those objects
  - these interactions can get very complex, very quickly
- fortunately, even when you only consider one object at a time, you can still make useful predictions
  - this is especially true during highway driving which is the situation you'll be dealing with in the final project