### **Vehicle Detection Project**

The goals / steps of this project are the following:

* Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
* Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector. 
* Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
* Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
* Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
* Estimate a bounding box for vehicles detected.


### Writeup / README

#### 1. Provide a Writeup / README that includes all the rubric points and how you addressed each one.  You can submit your writeup as markdown or pdf.  [Here](https://github.com/udacity/CarND-Vehicle-Detection/blob/master/writeup_template.md) is a template writeup for this project you can use as a guide and a starting point. 

This document contains the writeup as per rubric.

### Histogram of Oriented Gradients (HOG)

#### 1. Explain how (and identify where in your code) you extracted HOG features from the training images.


HOG transform has following parameters to control `orientations, pixels_per_cell, cells_per_block, transform_sqrt`. One also needs to choose a color space to work on. I experimented with various values as given below. The documentation for `hog` function is given [here](http://scikit-image.org/docs/dev/api/skimage.feature.html#hog)

`orientations = [6,8,9,12]`
`pixels_per_cell` - I kept it at supplied default of 8
`cells_per_block` - I kept it at 2 so that I could work with easy division/multiples of 2 across the project
`transform_sqrt` - kept it constant as `True`. The documentation says *`Power law compression, also known as Gamma correction, is used to reduce the effects of shadowing and illumination variations.`*

`color_spaces=['RGB', 'HSV', 'YCrCb']` 


#### 2. Explain how you settled on your final choice of HOG parameters.

With some experimentation, the final values I settled were

`orientations = 9 and color_space = 'YCrCb', channels = 'ALL'`

This selection involved a balance between accuracy and length of hog feature vector. Higher value of `orientations` and doing hog on all three channels, bring about fine grained accuracy but increases the time to extract hog features. I decided to stay on the higher spectrum of value to get better accuracy but it caused the final video processing to become slow even with single shot calcualtion of hog features for each frame in the video.

Sample hog plots with final parameters on `car` and `non-car` training images is shown below


![Image of HOG](./writeup_images/hog.png)

As we can see from HOG images, HOG trackes well sharp edges of the car. However, it is kiley to produce false positives for images with other sharp edges like lane lines, side rails, shadows etc. 

I will implement other techniques to reduce false positives using some kind of multi frame averaging




#### 3. Describe how (and identify where in your code) you trained a classifier using your selected HOG features (and color features if you used them).


Along with HOG features on all three channels of YCrCb space (`orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), transform_sqrt=True`), I also used bin color features on each channel of YCrCb (with `size=(32, 32)`) and color hsitograms on each channel with `bin_count = 32`. This generates a feature vector of length 8460. The entire dataset was broken into train and test with test=10% of total. I experimented with Random Forest and LinearSVC. Linear SVC gave a test set accuracy of 99.04% with Random Forest alos hovering around the same value. I decided to stick to `LinearSVC` with default paramaters due to its speed of prediction over RandomForest.

The feature vector was standarised using `StandardScalar`

The code to train a model is encapsulated in the function `train_model` in the notebook `vehicle-detection-FINAL.ipynb`.

The code expects data to be there in a folder called `./dataset` with two sub-folders underneath - `./dataset/vehciles` and `./dataset/non-vehicles` containing respective two classes of images for **cars** and **non-cars**.

### Sliding Window Search

#### 1. Describe how (and identify where in your code) you implemented a sliding window search.  How did you decide what scales to search and how much to overlap windows?

After classifier was trained, I turned to using the classifier on video images with a sliding window approach wherein region from image with different window sizes was taken, features_extracted and then fed into the trained model to predict if that region contained an image of vehicle or not.

For sliding windows, I used the lower half of the image with window sizes of (64,64) and (96, 96) with 0.5 overlap. The output had a good prediction capability with some false positives. However, due the initial aapproach of finding HOG features for each window separately, the performance of prediction was very slow with close to 2 sec for processing a single image. 

These original functions are implemented in: `slide_window , get_all_slide_windows, single_img_features, search_windows_old`.

I then turned over to the optimization suggested in lectures. I took HOG for the full image (the lower half) for each window size (64x64 and 96x96) and then using the computed HOG to extract HOG for the individual sliding windows. This lead to a great improvement of performace. The function `search_windows` slides windows over the image using the HOG optimization and outputs the windows that were predicted to have a vehicle.

#### 2. Show some examples of test images to demonstrate how your pipeline is working.  What did you do to try to minimize false positives and reliably detect cars?

I used different sizes of windows with multiple combinations such as 64x64, 96x96, 128x128, 192x192 but finally setlled down with 64x64 and 96x96 balancing speed of prediction and accuracy score of predictions. This is along with other classifier parameters

`spatial_size = (32,32)
hist_bins = 32
color_space = 'YCrCb' 
orient = 9
pix_per_cell = 8
cell_per_block = 2
hog_channel = 'ALL'
`

Sample output of sliding window detection for the images in folder `./test_images/` is shown below. The code is implemented in function `pipeline` which is called from `process_image`

![Classifier Detection](./writeup_images/classifier_detection1.png)
![Classifier Detection](./writeup_images/classifier_detection2.png)
![Classifier Detection](./writeup_images/classifier_detection3.png)
![Classifier Detection](./writeup_images/classifier_detection4.png)
![Classifier Detection](./writeup_images/classifier_detection5.png)
![Classifier Detection](./writeup_images/classifier_detection6.png)


We can see that classifier is doing a good job to identify the regions in images with vehicles. We also see some false positives which need to be removed. The approach is explained below.

### Video Implementation

#### 1. Provide a link to your final video output.  Your pipeline should perform reasonably well on the entire project video (somewhat wobbly or unstable bounding boxes are ok as long as you are identifying the vehicles most of the time with minimal false positives.)

Here's a [link to my video result](./project_video_output.mp4)

I also combined the advance lane detection pipeline. Here's a [link to my combined pipeline video result](./project_video_output_combined.mp4)


#### 2. Describe how (and identify where in your code) you implemented some kind of filter for false positives and some method for combining overlapping bounding boxes.

I recorded the positions of positive detections in each frame of the video.  From the positive detections I created a heatmap and then thresholded that map to identify vehicle positions.   

To reduce false positives, I accumulated heatmaps from each frame over rolling 5 frames and then thresholded the combined heatmap with `threshold=9`. I then used `scipy.ndimage.measurements.label()` to identify individual blobs in the heatmap.  I then assumed each blob corresponded to a vehicle.  I constructed bounding boxes to cover the area of each blob detected. Documentation for `label` can be found [here](https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.ndimage.label.html#scipy.ndimage.label)


Here's an example result showing the heatmap from a series of frames of video. For each frame, I show 6 images: `Original Image`, `Positive regions in frame`, `Frame Heatmap`, `Cumulative Heatmap (5 frame rolling)`, `Cumulative Heatmap thresholded` ,`Final Image`.

The set of 6 images per frame is shown for a 30-frame video. Here I show fix of the 30 frames to show how false positives are being removed using rolling integration of heatmaps over a history of last 5 frame heatmaps thresholded with a value of 9.

#### Frame 1

You can see this frame has a false positive detected in the frame which is removed using thresholding over 5-frame combined heatmap.

![Pipeline 1](./writeup_images/pipeline_frame1.png)

#### Frame 2

![Pipeline 2](./writeup_images/pipeline_frame2.png)

#### Frame 3

![Pipeline 3](./writeup_images/pipeline_frame3.png)

#### Frame 4

![Pipeline 4](./writeup_images/pipeline_frame4.png)

#### Frame 5

![Pipeline 5](./writeup_images/pipeline_frame5.png)

#### Frame 6

Like frame 1, this also has a false positive which is removed using cumulative heatmap thresholded

![Pipeline 6](./writeup_images/pipeline_frame6.png)




---

### Discussion

#### 1. Briefly discuss any problems / issues you faced in your implementation of this project.  Where will your pipeline likely fail?  What could you do to make it more robust?

I had to spend a lot of time trying various combinations of color spaces, whether to use histogram and bin features, what sizes to use, HOG parameters etc. 

Even more difficult was to come up with a proper set of sliding windows. I first tried the approach of sliding windows with varying sizes based on the region in the image and likelihood of vehicle size in that region. This was to keep the number of sliding windows as less as possible and have a faster pipeline. However, this tourned out to be very difficult and then I turned to the approach of using two sized windows 64x64 and 96x96 in the lower half of image using a optimized sliding window detection. In this apporach HOG is calculated once and then sub sampled for each sliding window. 

The implementation of this optimized sliding widows turned out to be difficult esp. to map windowed region in image to the right slice of HOG result vector. 


The pipeline was made robust by keeping track of all detected regions in last 5 frames. These regions were combined to a single heatmap and then thresholded. 

Above approach helped in removing false positives but the bounding boxes are still jittery. I could use some kind of weighted average or some more sophisticated way of matching hot regions from one frame to next frame. One approach could be to use last few frames with high confidence detection to detect the position of box in next frame and then combining that with actual prediction in the next frame. 

