# Writeup: Track 3D-Objects Over Time - Midterm Project


This is the Mid-Term Project for the second course in the [Udacity Self-Driving Car Engineer Nanodegree Program](https://www.udacity.com/course/c-plus-plus-nanodegree--nd213) : Sensor Fusion and Tracking.

In this project, real-world data from [Waymo Open Dataset](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files) and 3D Point Cloud are used for LiDAR based Object Detection.


## Project Sections

The project is devided in 4 sections:

1. Compute Lidar Point-Cloud from Range Image 
-  'ID_S1_EX1': Visualize range image channels
-  'ID_S1_EX2': Visualize lidar point-cloud

2. Create Birds-Eye View from Lidar PCL 
- 'ID_S2_EX1': Convert sensor coordinates to BEV-map coordinates
- 'ID_S2_EX2': Compute intensity layer of the BEV map
- 'ID_S2_EX3': Compute height layer of the BEV map

3. Model-based Object Detection in BEV Image 
- 'ID_S3_EX1': Add a second model from a GitHub repo
- 'ID_S3_EX2': Extract 3D bounding boxes from model response

4. Performance Evaluation for Object Detection 
- 'ID_S4_EX1': Compute intersection-over-union between labels and detections
- 'ID_S4_EX2': Compute false-negatives and false-positives
- 'ID_S4_EX3': Compute precision and recall


To run the project:

1. refer to the README.md file for all the requirements (libraries, dataset)


2. run the `loop_over_dataset.py` as follows:

```
python3 loop_over_dataset.py
```
In the loop_over_dataset.py you can select the ID_EX sections separately by selecting them in line 87

All corresponding code for this project can be found in the `student` directory.
The project has been run locally on a 2021 M1 MacbookPro 



## Project recap and analysis

Let's go through the project sections with a closer look at:

- Finding and displaying 10 examples of vehicles with varying degrees of visibility in the point-cloud
- Identifying vehicle features that appear as a stable feature on most vehicles (e.g. rear-bumper, tail-lights) and describe them briefly. Also, use the range image viewer from the last example to underpin your findings using the lidar intensity channel.

### Section 1. Compute Lidar Point-Cloud from Range Image 

This section includes the following steps in order to visualize the range-intensity images and the 3Dpoint clouds:
-  'ID_S1_EX1': Visualize range image channels
-  'ID_S1_EX2': Visualize lidar point-cloud

We start retrieving the lidar data and range images of the roof-mounted lidar from the dataset and converting two channels (range and intensity) to 8bit scale and normalize the intensity channel between its 1-99 percentile in order to discard outliers. Then we stack the range and intensity channels vertically to visualize the range image.

<img src="img/writeup-midterm/range_image.png"/>

Then we visualize the Lidar point cloud using the open3d library
This will be the starter prespective

<img src="img/writeup-midterm/upperview-pcd.png"/>


I've modified the loop_over_dataset.py and object_pcl.py in the ID_S1_EX1 part in order to display both the complete, the FRONT and the LEFT side range-intensity images.
By zooming the point cloud object we can find the corresponding point areas in order to compare the two visualizations as below:

<img src="img/writeup-midterm/range_image_total.png"/>
<img src="img/writeup-midterm/pcd_front.png"/>
<img src="img/writeup-midterm/pcd_left.png"/>


Looking at the range-intensity images, it's clear how the intensity channel is very sensitive about reflective vehicle parts such as licence plates, tail lights and front lights. It also discriminates well road line marks.

<img src="img/writeup-midterm/range_image_boxes.png"/>

On the other side, 3D point clouds take account of the 3D shapes of the objects and their peculiar traits. In the below images we can assess how well windshields, wheels, side mirrors and the general vehicle shape is detected by the 3D point cloud 

<img src="img/writeup-midterm/windshields.png"/>
<img src="img/writeup-midterm/windshields_and_wheels.png"/>

As seen during the course, an important trait of the point cloud regards how Multiple Signal Returns
are managed. For the sake of the course, the ri_return2 data from the waymo dataset are not used, but it would be very interesting to analyze how the point cloud empty areas will change using these data. 


### Section 2. Create Birds-Eye View from Lidar PCL 

This section includes the following steps in order to create and visualize the Bird-eye view (BEV) map:
- 'ID_S2_EX1': Convert sensor coordinates to BEV-map coordinates
- 'ID_S2_EX2': Compute intensity layer of the BEV map
- 'ID_S2_EX3': Compute height layer of the BEV map

In order to perform object detection, we consider projection-based approaches to reduce the dimensionality of the 3D point cloud along a specified dimension. One of the most used representations is the BEV map (bird's eye view map)(top-down view), a high information 2D projection of the 3D point cloud for behavior prediction and planning tasks. 

The BEV map pros are the following:
- The objects of interest are located on the same plane as the sensor-equipped vehicle with only little variance
- The BEV projection preserves the physical size and the proximity relations between objects, separating them more clearly than with both the FV and the RV projection.

which are achieved by compacting the point cloud along the upward-facing axis (the zz-axis in the Waymo vehicle coordinate system). The BEV is divided into a grid consisting of equally sized cells, which enables us to treat it as an image, where each pixel corresponds to a region on the road surface.  [Source: course notes]

Below: the BEV map from a frame in our dataset

<img src="img/writeup-midterm/bev_map.png"/>

Next, we want to to fill the "intensity" channel of the BEV map with data from the point-cloud. In order to do so, we must identify all points with the same (x,y)-coordinates within the BEV map and then assign the intensity value of the top-most lidar point to the respective BEV pixel. Also, we must normalize the resulting intensity image using percentiles, in order to make sure that the influence of outlier values (very bright and very dark regions) is sufficiently mitigated and objects of interest (e.g. vehicles) are clearly separated from the background.

Below: the intensity layer of the bev map

<img src="img/writeup-midterm/bev_intensity.png"/>

As we can see, it's not easy to visualize the point cloud here. We'll need to add the "height" channel of the BEV map with data from the point-cloud as below:

<img src="img/writeup-midterm/bev_map_final.png"/>

### Section 3. Model-based Object Detection in BEV Image

Now, within the BEV map we can draw the bboxes of the objects from the ground truth labels which are present in the dataset. 

Before the detections can move along in the processing pipeline, they need to be converted into metric coordinates in vehicle space. This task is about performing this conversion such that all detections have the format [1, x, y, z, h, w, l, yaw], where 1 denotes the class id for the object type vehicle.


Next, we can actually detect objects by loading the model from the Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds (https://github.com/maudzung/SFA3D)

The network has the following architecture:

- ResNet-based Keypoint Feature Pyramid Network (KFPN) 

- Input: birds-eye-view (BEV) map as input. The BEV map is encoded by height, intensity, and density of 3D LiDAR point clouds. Assume that the size of the BEV input is (H, W, 3).

- Outputs: Heatmap for main center with a size of (H/S, W/S, C) where S=4 (the down-sample ratio), and C=3 (the number of classes) 

- Objects: Cars, Pedestrians, Cyclists, but we'll perform the detection for the Cars/Vehicles class

Below: the detection results on the SFA3D pretrained model

<img src="img/writeup-midterm/detected.png"/>

### Section 4. Performance Evaluation for Object Detection 

In this section we finally assess the performances of the vehicles detection on the dataset.

In order to do so, we perform the following steps:
- 'ID_S4_EX1': Compute intersection-over-union between labels and detections
- 'ID_S4_EX2': Compute false-negatives and false-positives
- 'ID_S4_EX3': Compute precision and recall

The charts below represent our results:

**Please note**  we weren't able to visualize the plots using the terminal because of some system limitations (macbook m1). In order to display it, we run `loop_over_dataset.py` setting `ex=ID_S4_EX3` at line 87. The script will create a `data.json` file , in which the python dictionary datastructure containing our metrics results has been written. Then, by running the script `display_charts.py` from a Python IDE we'll be able to visualize the plots as below.

<img src="img/writeup-midterm/performance.png"/>


# Writeup: Sensor Fusion and Object Tracking - Final Project

This is the Final-Term Project for the second course in the [Udacity Self-Driving Car Engineer Nanodegree Program](https://www.udacity.com/course/c-plus-plus-nanodegree--nd213) : Sensor Fusion and Tracking.

The final project is built over the Mid-Term project code and consists of four main steps:

<img src="img/writeup-final/track.png"/>


## Project Sections

Step 1: Implement EKF to track a single real-world target with lidar
'F_ID_S1'

Step 2: Implement the track management to initialize and delete tracks, set a track state and a track score
'F_ID_S2'

Step 3: Implement a single nearest neighbor data association to associate measurements to tracks (multitarget tracking)
'F_ID_S3'

Step 4: Implement the nonlinear camera measurement model
'F_ID_S4'


To run the project:

1. refer to the README.md file for all the requirements (libraries, dataset)


2. run the `loop_over_dataset.py` as follows: (pythonw from macbook terminal)

```
pythonw loop_over_dataset.py
```
In the loop_over_dataset.py you can select the F_ID sections separately by selecting them in line 103

All corresponding code for this project can be found in the `student` directory.
The project has been run locally on a 2021 M1 MacbookPro 

## Project recap and analysis

Let's walk through the project steps

### Step 1. Implement EKF to track a single real-world target with lidar

This task involves writing code within the file `student/filter.py` in order to implement an Extended Kalman Filter to track the objects detected from our lidar sensor.

The EKF algorithm involves the definition of the tracking problem in the state-space form as follows:

- 1. The problem is defined by a linear model, with the state vector x = (px, py, pz, vx, vy, vz). Thus, the System Matrix F will take dimension (6, 6). 

- 2. The vehicles are modeled with constant-velocity 

- 3. Process noise Q, raising from this simplified modeling of the vehicles dynamics, is supposed to be random with mean=0 and covariance matrix Q. The bigger the entries in q, the more acceleration and braking we expect in the motion scenario. Here, by default, `params.q = 3 m/s^2` which defines a normal highway motion scenrio.

- 4. Measurement noise is supposed to have mean 0 and covariance matrix R. The values of R depends on the sensor calibration and here are `params.sigma_lidar = 0.1` for Lidar sensor and `params.cam = 5` for Camera sensor.

- 5. The tracking results will be evaluated via RMSE error


In this step we'll define the `update()` and `predict()` functions in order to complete the Kalman filter.

We'll run the filter on a single track, and these are the RMSE results for the tracking (note that the plot has been saved after updating the standard x and P values with the calculated ones)

As you can see from the plot, the RMSE is <0.32 as expected.

<img src="img/writeup-final/step1-rmse.png"/>


### Step 2: Implement the track management to initialize and delete tracks, set a track state and a track score

This second task involves the completion of some functions inside the `student/trackmanagement.py` file.

The track management system allows the EKF to perform tracking of multiple objects simultaneously, handling the current tracks, the vanishing ones and the new tracks in an effective way.

To each tracked objects, a track ID, a tracking score and a tracking state will be assigned at each frame. Tracking states can get the values 'initialized', 'tentative' and 'confirmed'. 

After initializing a track, the track- measurement pairing process will determine its score and its state. Tracks with low scores, high uncertainty or no further measurements will be likely to be removed.

The track management system will also be able to deal with False Positive trackings (clutters or Ghost Tracks) and False Negative trackings (occlusions)

Below, the RMSE plot of a single track which has been deleted from the tracking list after some frames:

<img src="img/writeup-final/step2-rmse.png"/>

### Step 3: Implement a single nearest neighbor data association to associate measurements to tracks (multitarget tracking)

Data association is a key step of the EKF process, consisting in applying a specific algorithm in order to assign all each measurement to the track which is more likely to represent the object measured.

In this first commit, we'll use the SNN (Simple Nearest Neighbor) algorithm to perform the association. Note that this algorithm while being simple is more prone to reach partial optima. More sophisticated algorithms like GNN and JPDA will be tested in the Improvement section. 

The SNN applies the standard Mahalanobis distance to each track-measurement pair, thus populating the Association matrix A. Gating will also be performed in order to reduce the computational cost.

You'll find this step's code in the `student/association.py` file.

Below, the RMSE plot of this multi-track step, which correctly tracks 3 confirmed objects 

<img src="img/writeup-final/step3-rmse.png"/>


### Step 4: Implement the nonlinear camera measurement model

So far, we've been referring only on the Lidar sensor data for our tracking process. Thus, we'll now introduce the Camera data and see if there are some improvements over the same dataset.

The Camera Model is must take care of the non linearity of the measurements function, which raises from the need to project the 2D camera data to the 3D vehicle space. This process will introduce the linearization of the h(x) function via first order taylor espansion. This is the core of the Extended Kalman Filter approach.

In the current 3D case, where multivariate measurement gaussians take place, the Measurement Jacobian Hj(x) must be calculated.

The Camera FOV will be narrower than the Lidar one, but as we can see below, the RMSE results improved after introducing the camera data (as we expected)

<img src="img/writeup-final/step4-rmse.png"/>

Below, an example on how the camera data helped dealing with the ghost tracks:

Ghost tracks without camera data:

<img src="img/writeup-final/ghost-lidar.png"/>


After introducing camera data, the faulty detection performed by lidar is not tracked:

<img src="img/writeup-final/ghost-camera.png"/>


## Writeup questions

#### Which part of the project was most difficult for you to complete, and why?
Following the exercises in class covered the majority of the steps in this final project. However, I spent most of the debugging time trying to define the association and camera model in the correct way.

#### Do you see any benefits in camera-lidar fusion compared to lidar-only tracking (in theory and in your concrete results)?
The beauty of the sensor fusion approach is it's capacity of getting the best from each sensor used. Camera and Lidar have their own advantages and disadvantages, but combination of different data sources will always improve the robustness and realibility of the system, performing a second check on false positives and false negatives, which in the Lidar realm can be caused by very light-absorbent or reflective objects.


#### Which challenges will a sensor fusion system face in real-life scenarios? Did you see any of these challenges in the project?
Using a state-space approach always implies creating a model of the real world scenario: the more precise the model the more accurate will be our output. In the project case, we modeled our vehicles as 'linear velocity' objects, expecting from them a modest variance in breaking/acceleration in the measurement covariance matrix. As simple as it may sound, the model will not be able to effectively predict the complex behaviour of the cars in a real urban environment (let's note that we also considered only vehicles detections, without tracking pedestrians, cyclists and so on). Considering a different model will sure increase the robustness of our tracking system.

#### Can you think of ways to improve your tracking results in the future?
There are many ways to improve our results, as also suggested in the project page:

- Parameters finetuning: e.g. we could apply the standard deviation values for lidar, which can be obtained from the 3D object detection in the midterm project, to parameters in the system noise Q
- Model choice: a more specific model (ex: bycicle non linear model) will be able to describe the dynamics of our vehicles in a more effective way.
- Data association algorithm: our results should improve applying a more sophisticated association algorithm, like GNN and JPDA
- Better object detection: improving the object detection performance will let our system be less prone to misclassification errors.
- Introducing object's width, length, heigth to the Kalman Filter
- Varying dt: varying dt could give more accurate predictions than the costant dt model.


