### Vehicle Detection and Tracking
___
The goal of this project is to detect vehicles in a video stream and draw bounding boxes around the detected vehicles. A classifier is trained to detect vehicles and non-vehicles. The dataset provided by udacity is used. 

* __Feature Extraction and Training:__ Different types of signatures are extracted from the image to train the classifier.

* __Detect vehicles using sliding windows:__ Using the trained classifier to detect vehicles in a given frame using a sliding window approach. 

* __Merge Multiple Detections:__ Sliding window approach produces multiple detections for the same object due to the overlap of the windows. A heatmap is used to form a  union of multiple detections within a region.




---

### Dataset Exploration:

The dataset consists of car images (8792) and non-car images (8968). The classes are well balanced in the dataset,
hence we do not have to worry about the shape of the dataset.

#### Cars
<p align="center">
  <img src="car_tracking_report_img/cars.png" alt=""/>
</p>

#### Non Cars
<p align="center">
  <img src="car_tracking_report_img/non_cars.png" alt=""/>
</p>

---

### Feature Extraction:

The images in the dataset are RGB 64x64 pixels in dimensions. The following features are extracted from the image.

* __Histogram of Oriented Gradients (HOG)__
Different colour spaces and HOG parameters were explored before settling on the following HOG parameters.

```
orientations=10 | pixels_per_cell=(8, 8) | cells_per_block=2
```

##### Images

<p align="center">
  <img src="car_tracking_report_img/img.png" alt=""/>
</p>

##### Hog feature
<p align="center">
  <img src="car_tracking_report_img/hog.png" alt=""/>
</p>

The Image format used here is YUV format.The HOG features are extracted from each channel individually and concatenated together. the hog feature extraction from `sklearn` is used to detect hog features. 

```python
yuv = cvt_image_colour_space(img,'YUV')
for channel in range(3):
features.append(get_hog_features(yuv[:,:,channel],vis=False,orient=10,pix_per_cell=8))
```

Increasing the `pixels_per_cell` improved the speed of feature extraction but does not improve classifier accuracy. `get_hog_features` method is used to extract the features

* __Colour Channel Histogram__

Histogram of individual channels are stringed together.
```
color_hist(img, nbins=32, bins_range=(0, 256))
```

<p align="center">
  <img src="car_tracking_report_img/image_histogram.png" alt=""/>
</p>

* __Spatial Binning__

To capture the locality of pixels the images are reduced to 32x32 px and flattened to an array .
```
features = cv2.resize(feature_image, size).ravel() 
```


`get_features()` method is used to extract the features from an image. The method does the following:
 
 * HOG features
 * Colour Histogram features
 * Spatial Binning features


---

### Training the Classifier



A `LinearSVC` was used as the classifier. Features are extracted from every image in the dataset.The data set is normalised using the sklearn's StandardScalar method.Once the features are normalised the dataset is split into training and test dataset using `train_test_split`.  

`The implementation is in the notebook attached `

---

### Sliding windows

The classifier is trained with images od size 64x64 pixels. To be able to detect the cars in a given image a window's of different sizes are moved across the region of interest within the image.A vehicle present in the scene will get smaller will the distance from the car. The large sized windows are used to pickup vehicles closer to the car. the smaller sized windows are used to pickup vehicles farther away. 

If the window sizes are not always 64x64 pixels, the region bounded by the window would resize to 64x64 pixels. As an implementation detail, instead of changing the window size we resize images to different scales and scan it with the same window producing the same results.

`find_cars(image,scale,y_scale,window_width=64)` method is used to scale the input image down and slide a window across the image and returns all the detections within the image. 

The sliding search is so that the stride is no less than one-third of the window. Implementation could be found in `find_cars` method. 

<p align="center">
  <img src="car_tracking_report_img/sliding_windows.png" alt=""/>
</p>



---

### Merging detections

Since the image is being scanned multiple times with windows of different sizes. It is possible that same vehicle is picked up multiple times with different window sizes. A heatmap is formed using the intersecting regions of detections.   

```python
for coordinate in coordinates:
  #(x1, y1, x2, y2)
  x1 = coordinate[0]
  y1 = coordinate[1]
  x2 = coordinate[2]
  y2 = coordinate[3]
            
  heatmap[y1:y2, x1:x2] += 1
```

The heat map is thresholded and the label function from the `label` from `scipy.ndimage.measurements` is used to find groupings and bounding boxes for the groups. The implementation is in `merge_detections` method.

###### Result from pipeline
<p align="center">
  <img src="car_tracking_report_img/result.png" alt=""/>
</p>

        

---

### Video pipeline

All the above stages are combined in the `pipeline` method to process the videos. 

The above individual stages are used in the pipeline to detect cars in the video. Since the video is temporal in nature.In order to have better detections on the video stream, a cache of the past detections is used to form the heat map and thresholded accordingly. Using last N detections to form  teh heatmap helps reduce false positives. 

```python
detection_history = deque(maxlen=20) # used to cache the previous detections
  
```



###### The left video stream is all detections on the frame and the right video stream is merged detections of multiple windows.
___

<video controls src="car_tracking_report_img/car_detection_report_video.mp4" width=900/>


---

### Discussion

This implementation is very crude in terms of performance and in the way bounding boxes are drawn.A stepped up of the system could be achieved by using low-level language-based implementation. A combination of the techniques used here along with a conv- net to detect objects would improve the confidence of the detections.

* The hog feature extraction for sliding windows is very slow. This can be optimised by extracting the features and once and looking it up. 
* The algorithm is only as good as the data that is fed into it. If the input image is noisy the possibility of having false positives is large.
* This pipeline does not cater for different weather conditions (rain and low light).  
* A method to automatically detect vanishing points could be implemented to select the region of interest.
