GitHub - seyfig/VehicleDetectionTracking: Udacity Self-Driving Car Vehicle Detection and Tracking Project

Vehicle Detection Project

The goals / steps of this project are the following:

Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
Estimate a bounding box for vehicles detected.

Rubric Points

###Here I will consider the rubric points individually and describe how I addressed each point in my implementation.

###Spatial and Histogram Features

The spatial features are extracted with bin_spatial function in the code cell 10. The histogram features are extracted with color_hist function in the code cell 11. In order to find the best spatial and histogram parameters, 5-fold cross validation was performed over the subset images. The subset images have 1196 car images and 1125 notcar images. The code is in the spahist.py file. Several sets of parameters was tested. The tested parameters, selected parameters according to highest average accuracy, and the parameter set that has the highest average accuracy over 5-fold are given in the following table.

Spatial (Test)	Histogram (Test)	Spatial	Histogram	Best	Acc
8,16,32,64,128	8,16,32,64,128	8,16,32	128,64,32	16,32	0.9897
8,12,16,24,32	32,48,64,96,128	12,16,32	128,96,64	12,32	0.9905
8,10,12,14,16	64,96,128,192,256	12,14,16	64,96,128	12,96	0.9935
12,13,14,15,16	64,80,96,112,128	12,13,16	96,112,128	12,112	0.9935
11,12,13,14,15,16	100,114,128,142,156	14,15,16	114,128,142	14,142	0.9919
11,12,13,14,15,16	100,114,128,142,156	12,14,16	100,114,142	14,100	0.9940

The highest accuracy depends on the data; however, a spatial value between 12 - 16 performed better. The number of bins value did not have that much effect on the accuracy value. Values greater than 32 performed usually better than the values smaller than 32. Usually, 128 used to perform following tests.

###Histogram of Oriented Gradients (HOG)

####1. Extracting HOG features from the training images.

Hog features are extracted as in the 9th code cell of the Ipython notebook. First, the image is converted to desired color space, then each channel of the image is sent to hog function of skimage.feature module. The required parameters are orient, pix_per_cell and cell_per_block. The helper function is the get_hog_features function in code cell 7.

Here is an example using one of the best parameter set for extracting hog features, the YCrCb color space and HOG parameters of orientations=9, pixels_per_cell=(8, 8) and cells_per_block=(2, 2):

####2. Explain how you settled on your final choice of HOG parameters.

#####2.1. Comparison 1 Similar 5-fold test was performed for hog parameters on the subset images data set. The parameters used in this experiment are as follows:

Colorspace : HLS, HSV, LUV, RGB, YCrCb, YUV
Orient: 6, 9, 12
Pix Per Cell (PPC): 4, 6, 8, 10, 12
Cell Per Block (CPB): 1, 2 ,3 ,4
Hog Channel (HOGCH): 0, 1, 2, ALL

The top ten performing parameter sets are given in the following table.

Color	Orient	PPC	CPB	HOGCH	HOGsec	TRNsec	Acc
HLS	12	4	2	ALL	26.958	1.494	0.9970
HSV	12	4	1	ALL	28.722	0.412	0.9966
YCrCb	6	4	3	ALL	23.835	1.492	0.9961
YCrCb	6	4	4	ALL	21.723	2.24	0.9961
HLS	12	4	3	ALL	25.159	2.916	0.9961
HLS	12	4	1	ALL	28.297	0.396	0.9961
HLS	12	6	1	ALL	14.959	0.168	0.9957
YUV	6	4	4	ALL	21.394	2.236	0.9957
HSV	12	4	2	ALL	27.452	1.484	0.9957
HSV	12	6	1	ALL	15.236	0.162	0.9952

The best performing parameter that can be determined was only the ALL of the hog channels.

Since this test and the previous test on spatial parameters and histogram parameters are not sufficient to select hog parameters, other tests combining all parameters are performed.

#####2.2. Comparison 2 The changes can be listed as follows:

Dataset changed to the complete data set which contains 8792 cars and 8968 notcars.
5-fold cross validation changed to 5x2-fold cross validation. Therefore, for each fold half of the data is used for training and the other half is used for testing.
imreader was added as a new parameter, which indicates the module to be used to read images. Since mpimg reads images as they have a pixel value between 0 and 1, it affected the classifier performance.

For the following comparison, only cv2 was used as image reader. The parameters used in this experiment are as follows:

Colorspace : HSV, LUV, YCrCb, YUV
Orient: 8, 9, 10, 11, 12
Pix Per Cell (PPC): 4, 6, 8, 10, 12, 14, 16
Cell Per Block (CPB): 1, 2
Hog Channel (HOGCH): ALL The top 15 parameter sets are given in the following table. The HOGsec column shows the feature extraction time.

Color	Orient	PPC	CPB	HOGsec	TRNsec	TESTsec	Acc
YCrCb	12	8	2	86.96	2.436	0.0006	0.9891
YUV	12	8	2	86.7	2.503	0.0017	0.989
YCrCb	12	4	2	215.8	6.5	0.0022	0.9883
YCrCb	10	8	2	85.68	2.906	0.0015	0.9882
YUV	12	4	2	212.62	6.465	0.0013	0.988
YUV	10	8	2	84.66	1.416	0.0003	0.9878
YCrCb	10	4	2	212.89	5.579	0.0016	0.9876
YCrCb	11	4	2	215.36	6.152	0.0025	0.9876
YCrCb	11	8	2	86.37	5.403	0.0023	0.9875
YUV	10	4	2	205.88	5.481	0.0015	0.9874
YCrCb	12	4	1	225.85	2.002	0.0016	0.9874
YUV	12	4	1	223.39	1.972	0.0011	0.9871
YUV	11	8	2	85.3	1.525	0.0023	0.987
LUV	10	4	2	201.01	6.027	0.0006	0.987
YCrCb	9	8	2	83.47	5.304	0.0012	0.9869

#####2.3. Comparison 3 Another comparison was performed to determine whether the spatial parameter or the number of bins parameter affect the accuracy when three methods are combined. In addition, the effect of changing the image reader was tested. The parameters used in this experiment are as follows:

imreader: cv2, mpimg
Colorspace : YCrCb, YUV
Orient: 6, 9, 12
Pix Per Cell (PPC): 16
Cell Per Block (CPB): 1, 2, 3
Hog Channel (HOGCH): 0, 1, 2, ALL
Spatial (Spa): 8, 12, 14, 16, 32
Number of Bins(Hbin): 64, 114, 128, 142, 192

#	imreader	Color	Orient	CPB	HOGCH	Spa	Hbin	FeatVLen	HOGsec	TRNsec	TESTsec	Acc
1	cv2	YCrCb	12	1	ALL	16	114	1686	72.56	1.296	0.0186	0.994
2	cv2	YUV	12	1	ALL	16	114	1686	72.03	1.691	0.0181	0.994
3	cv2	YUV	12	1	ALL	16	128	1728	72.22	1.594	0.0165	0.994
4	cv2	YUV	12	1	ALL	12	114	1350	73.23	1.343	0.0121	0.9939
5	cv2	YCrCb	12	1	ALL	16	128	1728	72.82	1.336	0.0179	0.9939
6	cv2	YUV	12	1	ALL	12	128	1392	72.64	1.021	0.0119	0.9939
7	cv2	YCrCb	12	1	ALL	14	114	1506	72.57	1.22	0.0126	0.9938
8	cv2	YCrCb	12	1	ALL	12	114	1350	72.47	1.218	0.0114	0.9938
9	cv2	YUV	12	2	ALL	12	192	2304	68.75	2.144	0.021	0.9938
10	cv2	YUV	12	1	ALL	16	64	1536	72.17	1.298	0.0222	0.9938
11	cv2	YUV	12	2	ALL	14	192	2460	68.67	1.556	0.0211	0.9937
12	cv2	YCrCb	12	1	ALL	16	142	1770	73.08	1.541	0.0162	0.9937
13	cv2	YUV	12	1	ALL	16	192	1920	72.41	1.987	0.0169	0.9937
14	cv2	YUV	12	2	ALL	14	114	2226	68.07	2.166	0.0229	0.9937
210	mpimg	YCrCb	12	1	ALL	12	192	1584	64.52	1.539	0.0149	0.9921

As a result, cv2 (0.9799) performed better than mpimg (0.9759) as image reader. It had both higher average and the best performing parameter set with mpimg read come 210th. It is hard to differentiate Hbin values from each other, but the spatial values between 12 and 16 mostly performed better than 8 and 32.

#####2.4. Comparison 4 There were other tests applied. The effect of using the original image on spatial and histogram feature extraction whereas using YCrCb on hog feature extraction was tested. Both of them affected to have lower accuracy. The decrease on accuracy when changing the color space for histogram feature extraction was higher, from 0.9912 to 0.9893. For spatial feature extraction, the decrease was from 0.9905 to 0.9901.

#####2.5. Comparison 5

The performance of ensembling was tested in another comparison. Three SVMs were trained with spatial features, histogram features, and hog features. The classification was performed by majority voting. The ensemble usually performed better than all of the classifiers separately, however combining all features and training one SVM outperformed ensembling. The reason for that is, the SVM with hog features has much more accuracy that the others. When ensembling, the samples correctly classified by the hog SVM were classified wrong. The parameters used in this experiment are as follows:

imreader: cv2
Colorspace : YCrCb, YUV
Orient: 9
Pix Per Cell (PPC): 4, 8, 16
Cell Per Block (CPB): 1, 2
Hog Channel (HOGCH): ALL
Spatial (Spa): 12, 14, 16
Number of Bins(Hbin): 114, 128, 142

The average accuracy values for the ensemble, combined, and separate SVMs are given in the following table.

Features	Accuracy
All	0.9924
Ensemble	0.9858
Hog	0.9822
Hist	0.9367
Spatial	0.9137

#####2.6. Comparison 6

The results shown in the following table belong to the search for the models with high accuracy, and for the models with relatively high accuracy and low feature extraction time.

#	imreader	Orient	PPC	CPB	HOGCH	FeatVLen	HOGsec	TRNsec	TESTsec	Acc
1	cv2	12	8	1	ALL	3456	106.87	2.959	0.0322	0.9943
2	cv2	12	8	2	ALL	8208	100.72	1.507	0.0731	0.994
3	cv2	12	12	1	ALL	2052	78.04	1.971	0.0204	0.9935
4	cv2	12	16	1	ALL	1728	72.43	1.79	0.0159	0.9935
5	cv2	12	16	2	ALL	2448	69.3	2.289	0.024	0.9934
6	cv2	9	8	2	ALL	6444	94.05	4.263	0.0574	0.9934
7	cv2	9	8	1	ALL	2880	103.16	2.317	0.026	0.9933
8	cv2	9	16	2	ALL	2124	66.32	1.616	0.0324	0.9932
9	cv2	12	12	2	ALL	3456	71.25	1.865	0.0344	0.9929
10	cv2	9	16	1	ALL	1584	70.11	1.712	0.0156	0.9929
-	cv2	9	16	1	0	1296	35.34	1.323	0.0116	0.9886
-	cv2	12	16	1	0	1344	35.88	1.369	0.0117	0.9889
-	cv2	6	16	2	0	1368	33.14	1.664	0.0127	0.986
-	cv2	12	12	1	0	1452	37.77	1.727	0.0132	0.988
-	cv2	9	12	1	0	1377	37.11	1.624	0.0138	0.9874
-	cv2	9	12	2	0	1728	34.55	2.103	0.015	0.9885
-	cv2	9	16	2	0	1476	33.78	1.567	0.0151	0.9889
-	cv2	6	8	1	0	1536	45.19	1.758	0.0153	0.9876
-	cv2	9	8	1	0	1728	46.27	2.229	0.0162	0.9899
-	mpimg	12	8	1	0	1920	45.41	2.464	0.0166	0.9864
-	mpimg	9	8	1	0	1728	45.07	2.244	0.0175	0.987

The parameters used in this experiment are as follows:

imreader: cv2, mpimg
Colorspace : YCrCb
Orient: 6, 9, 12
Pix Per Cell (PPC): 8, 12, 16, 24, 32
Cell Per Block (CPB): 1, 2
Hog Channel (HOGCH): 0, 1, 2, ALL
Spatial (Spa): 16
Number of Bins(Hbin): 128

#####2.7. Comparison 7

After trying the models on the project video, and switching to multiple scale windows the time required to process the video was considerably high. In order to decrease the time to process the video, the time spend on each feature extraction process was investigated. HOG features can be extracted in the beginning for each frame and each scale value. The most time-consuming operation is to extract the histogram features for a frame. Since it is required to process each window separately. As a result, models without histogram features were examined. In addition, in previous experiments, the 0 and 1 hog channels for YCrCb and 0 and 2 hog channels for YUV performed better the remaining hog channel.

#	Color	Orient	PPC	CPB	HOGCH	FeatVLen	HOGsec	TRNsec	TESTsec	Acc
1	YUV	12	8	2	[0- 1- 2]	7824	87.1	1.562	0.0684	0.9926
2	YCrCb	12	8	2	[0- 1- 2]	7824	88.21	2.73	0.0673	0.9926
3	YCrCb	12	8	2	[0- 1]	5472	60.82	1.975	0.0495	0.9919
4	YCrCb	10	8	2	[0- 1- 2]	6648	85.73	5.221	0.0589	0.9917
5	YCrCb	12	8	1	[0- 1- 2]	3072	94.72	3.898	0.034	0.9917
6	YUV	12	8	2	[0- 2]	5472	60.43	3.061	0.0478	0.9917
7	YUV	10	8	2	[0- 1- 2]	6648	85.84	1.529	0.0568	0.9917
8	YUV	12	8	1	[0- 1- 2]	3072	93.52	4.892	0.0319	0.9913
9	YCrCb	10	8	2	[0- 1]	4688	58.86	2.622	0.0445	0.9911
10	YUV	10	8	2	[0- 2]	4688	59.66	1.533	0.0432	0.9908
11	YCrCb	12	16	2	[0- 1- 2]	2064	56.75	1.658	0.0231	0.9908
12	YCrCb	12	8	1	[0- 1]	2304	65.5	3.205	0.0228	0.9907
13	YUV	12	16	2	[0- 1- 2]	2064	57.23	1.779	0.0225	0.9907
14	YUV	8	8	2	[0- 1- 2]	5472	83.03	3.3	0.0509	0.9907
15	YCrCb	8	8	2	[0- 1- 2]	5472	84.02	3.999	0.0491	0.9905
16	YUV	10	16	2	[0- 1- 2]	1848	55.33	1.629	0.0209	0.9905
17	YCrCb	12	12	2	[0- 1- 2]	3072	61.6	2.811	0.0326	0.9904
18	YCrCb	8	8	2	[0- 1]	3904	58.15	3.459	0.036	0.9904
19	YUV	12	16	2	[0- 2]	1632	40.49	1.453	0.0211	0.9904
20	YCrCb	9	8	2	[0- 1- 2]	6060	84.56	3.516	0.0559	0.9904
21	YCrCb	12	16	2	[0- 1]	1632	40.29	1.555	0.0138	0.9904

The parameters used in this experiment are as follows:

imreader: cv2
Colorspace : YCrCb, YUV
Orient: 6, 8, 9, 10, 12
Pix Per Cell (PPC): 8, 12, 16
Cell Per Block (CPB): 1, 2
Hog Channel (HOGCH): 0, 1, 2, ALL, 01, 02, 12
Spatial (Spa): 16

#####2.8. Conclusion

The conclusions up to this point were as follows:

cv2 performs better than mpimg when reading png images
YCrCb and YUV are best performing color spaces, YCrCb seem to have slightly higher accuracy
For Orient 12 results in higher accuracy values; however, when applying on the project video it does not perform significantly well than 9.
For Pix Per Cell 4 may perform better for some of the iterations, however, increases computational costs significantly. 8 can be considered as the optimum value, however, in order to have a faster pipeline, 16 can be used.
For better accuracy all of the hog channels can be used. For higher speed, 0 hog channel for YCrCb color space can be used. For balanced speed and accuracy 0 and 1 hog channels can be used.
Histogram features can be omitted to decrease computation cost.

As the final models, the 9th model in the previous table was used as the fast model, and the 6th model from comparison 6 was used as the best model according to accuracy value.

Name	Comparison	#	imreader	Color	Orient	PPC	CPB	HOGCH	Spatial	Hbin	FeatVLen	HOGsec	TRNsec	TESTsec	Acc
Fast	6	9	cv2	YCrCb	10	8	2	[0-1]	16	None	4688	58.86	2.622	0.0445	0.9911
Best	7	6	cv2	YCrCb	9	8	2	ALL	16	128	6444	94.05	4.263	0.0574	0.9934

The final code for performing parameters test is given in crossval.py.

####3. Describe how (and identify where in your code) you trained a classifier using your selected HOG features (and color features if you used them).

After deciding the parameters as in the previous section, I used the Classifier class, which is located in the 13th code cell, to train the SVM classifier. The LinearSVC class of sklearn.svm module was used to train the SVM. Before training, the extracted features were normalized using StandardScaler class of sklearn.preprocessing module.

###Sliding Window Search

####1. Describe how (and identify where in your code) you implemented a sliding window search. How did you decide what scales to search and how much to overlap windows?

The sliding window search was performed with classification. The functions for this operation are called find_cars. There are three functions, find_cars_multi (code cell 17) was used with SVM classifiers. It takes ystart_list, ystop_list, and height_list, and performs classification for all different scales. In order to make the process faster, this process added the features in the test_images list. Then run the classifier once, and loop through the features that are predicted to be cars. The function find_cars_multi_log (code cell 16) was almost the same function with find_cars_multi, and was used for debugging and logging. The function find_cars_cnn (code cell 18) performs the same operation but instead of extracting features, and predicting with SVM, it predicts with CNN.

I decided to search at three different scales, 1.0, 1.5, and 2.0. Windows were defined with ystart and ystop values, these values are given in the following table.

Scale	Height	Ystart	Ystop	Overlap
1.0	64	400	496	3
1.5	96	380	500	2
2.0	128	392	520	1

####2. Show some examples of test images to demonstrate how your pipeline is working. What did you do to optimize the performance of your classifier?

Ultimately I searched on three scales using YCrCb 2-channel (0 and 1) HOG features plus spatially (spatial size of 16x16) binned color in the feature vector. Some example images are given below.

####3. Convolutional Neural Network

Another classifier used in the project was a convolutional neural network. The model has two convolution layers as the following:

64 x 64 x 3 input
8 x 8 filter, with 6 depth, 4 x 4 stride, activation relu
Dropout 0.5
8 x 8 filter, with 12 depth, 4 x 4 stride
Flatten
Dropout 0.5
Activation relu
Fully connected to 2 output
Softmax

The convolutional neural network also trained with the complete data. 20% of the data was split as test set. The accuracy obtained was 0.9924 at epoch 50. The model is simple but it is faster than the SVM model with hog features.

Video Implementation

####1. Provide a link to your final video output. Your pipeline should perform reasonably well on the entire project video (somewhat wobbly or unstable bounding boxes are ok as long as you are identifying the vehicles most of the time with minimal false positives.)

Project video processed with Fast SVM model

Project video processed with Best SVM model

Project video processed with Fast SVM model

####2. Describe how (and identify where in your code) you implemented some kind of filter for false positives and some method for combining overlapping bounding boxes.

I recorded the positions of positive detections in each frame of the video. From the positive detections, I created a heatmap. If it was the first image of the pipeline or a single image, i thresholded that heatmap with 1. Otherwise, I combined the heatmaps of the previous image up to 5 images by summing the values. Then, I applied threshold to the combined heatmap by the number of heatmaps combined and the ratio 0.75. These thresholds were applied in the process_image function in the code cell 29. These values were used for the Fast model. For the Best SVM model, process_image_best function in the code cell 30, and for the CNN model, process_image_cnn function in the code cell 37 were used. I then used scipy.ndimage.measurements.label() to identify individual blobs in the heatmap. Then, for each blob, I compared the x and y values with the vehicles in the list. If a blob and a vehicle have an intersection, then the blob assumed to pair with that vehicle.

I constructed bounding boxes to cover the area of each vehicle in the vehicle list. The vehicle list was kept with Tracker class in the code cell 22. The vehicle class is in the code cell 20. The function draw_labeled_bboxes_reverse, located in the 25th code cell, takes image, labels and the tracker object as input parameters. When matching the blobs of labels with vehicles, if a label has an area of 1.5 times the area of the vehicle, it is assumed that the label is too big for the vehicle, and there is another vehicle associated with the same label. And the vehicle assumed to be hiding if the vehicle is completely on the inside of the label.

If a label is not associated with existing vehicles, then it is added as a new vehicle. A vehicle is considered to be reliable if it is detected more than or equal to 10 times.

If a vehicle is not detected for 5 (v_visible) times consecutively, it will not be shown. For reliable vehicles, the number of non-detected frames is increased by 15 (last_n). If a vehicle not detected 10 (remove_n) times, and it is removed from the list. For reliable vehicles, this number is doubled.

Here are six frames and their corresponding heatmaps:

Here is the output of `scipy.ndimage.measurements.label()` on the integrated heatmap from all six frames:

Here the resulting bounding boxes are drawn onto the last frame in the series:

###Discussion

####1. Briefly discuss any problems / issues you faced in your implementation of this project. Where will your pipeline likely fail? What could you do to make it more robust?

I have spent a significant amount of time to find parameters to extract features that should enable the classifier run in near time and have an accuracy value greater than 0.99, which I believe to be necessary to make the pipeline reliable. However, I was not able to find such parameters. The fastest parameters I can come up with processed the project video a rate of 3 frames per second. I have also tried a faster model which has a processing rate greater than 3.5 frames per second; however, there were false positives. It requires more time to find a faster model that is also reliable.

I also tried to train a convolutional neural network with the same goal. I did not spend much time on it, but it seemed more promising.

This pipeline assumes the camera position will be stayed fix and searches only the windows for this camera position. I tried it with the challenge video from the Advanced Lane Finding project, it was able to detect vehicles, however, there were false positives that may be dangerous.

Developing models other parameters, especially the parameters obtained from the comparison 7, can be tried as future work. In addition, other ensembles such as ensembling three hog channels or ensembling hog channels from different color spaces may perform better. In order to increase processing speed, all of the training data may be resized to a lower resolution. However, working with convolutional neural networks probably lead to better results.

Process times and the models are given in the table below:

Model	Process Time	FPS	Accuracy
Fast	7:15	2.80	99.11%
Best	13:13	1.62	99.34%
CNN	1:50	11.40	99.24%

The tracking pipeline was optimized for the Fast SVM model, therefore, the performance of other models can be improved.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cnn		cnn
output_images		output_images
svc		svc
README.md		README.md
VehicleDetection.png		VehicleDetection.png
VehicleDetectionTracking.html		VehicleDetectionTracking.html
VehicleDetectionTracking.ipynb		VehicleDetectionTracking.ipynb
cnn.py		cnn.py
crossval.py		crossval.py
spahist.py		spahist.py
writeup_template.md		writeup_template.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rubric Points

Video Implementation

Here are six frames and their corresponding heatmaps:

Here is the output of `scipy.ndimage.measurements.label()` on the integrated heatmap from all six frames:

Here the resulting bounding boxes are drawn onto the last frame in the series:

About

Releases

Packages

Languages

seyfig/VehicleDetectionTracking

Folders and files

Latest commit

History

Repository files navigation

Rubric Points

Video Implementation

Here are six frames and their corresponding heatmaps:

Here is the output of scipy.ndimage.measurements.label() on the integrated heatmap from all six frames:

Here the resulting bounding boxes are drawn onto the last frame in the series:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Here is the output of `scipy.ndimage.measurements.label()` on the integrated heatmap from all six frames:

Packages