## Data Exploration

I created the sample data by driving around the track in both directions between 3-4 times each side. The driving was done with keyboard for speed/acceleration and with the mouse for steering, since I noticed in earlier tries with the keyboard, that the steering angles where not smooth but very spikey. With the mouse input a much clearer turning angle could be achieved for the steering angle.

The recorded data consists out of the path to left, center and right images, the steering angle (label), as well as general information as throttle, break and speed. For the CNN training all images will be used, as NVIDIA featured in their approach. 
The final model will only use the center image to determine a steering angle. 

Additionally the vehicle needs throttle to actually drive. The steering angle depends on the speed of the vehicle at the moment of steering. Higher speed results in a larger turning radius. The data was recorded at top speed (~30mph), so the correlation of speed/steering can be ignored. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import cv2
import numpy as np
from sklearn.utils import shuffle

In [None]:
# I had 4 seperate recordings, therefore merging them all into 1 dataframe

df1 = pd.read_csv('1/driving_log.csv', header=None) 
df2 = pd.read_csv('2/driving_log.csv', header=None) 
df3 = pd.read_csv('3/driving_log.csv', header=None) 
df4 = pd.read_csv('4/driving_log.csv', header=None) 

df1.columns = ['C', 'L', 'R', 'Steering', 'Throttle', 'Break', 'Speed']
df2.columns = ['C', 'L', 'R', 'Steering', 'Throttle', 'Break', 'Speed']
df3.columns = ['C', 'L', 'R', 'Steering', 'Throttle', 'Break', 'Speed']
df4.columns = ['C', 'L', 'R', 'Steering', 'Throttle', 'Break', 'Speed']

# adjust path
for col in ['C', 'L', 'R']:
    df1[col] = ['1/IMG/' + i.split("\\")[-1:][0] for i in df1[col]]
for col in ['C', 'L', 'R']:
    df2[col] = ['2/IMG/' + i.split("\\")[-1:][0] for i in df2[col]]
for col in ['C', 'L', 'R']:
    df3[col] = ['3/IMG/' + i.split("\\")[-1:][0] for i in df3[col]]
for col in ['C', 'L', 'R']:
    df4[col] = ['4/IMG/' + i.split("\\")[-1:][0] for i in df4[col]]

df = pd.concat([df1,df2,df3,df4])
print(f'Total Observations: {len(df)}')

As can be seen, there are a total of 21323 center images available. This count could later be increased by flipping the images and labels around and thereby doubling the total amount of training data. Therefore we could only check the absolute distribution. Additionally with an ImageDataGenerator by Keras, we can generalize the images with some augmentation during training.

In [None]:
plt.hist(sorted([abs(i) for i in df['Steering']]), bins=150)
plt.title('Distribution of Steering Angles')
plt.show()

In [None]:
cols = 3
rows = 3

fig, ax = plt.subplots(cols,rows, dpi=160, figsize=(8,4))
ax = ax.ravel()
for i in range(cols*rows):
    
    img = mpimg.imread(df.iloc[i*1500]['C'])
    angle = str(round(df.iloc[i*1500]['Steering'],6))
    cv2.putText(img, angle,(50, 120),cv2.FONT_HERSHEY_SIMPLEX,1,(0, 255, 0),3)
    
    ax[i].axis('off')
    ax[i].imshow(img)

plt.show()

After a short display of some images with the label on them, it can be clearly seen that left curves require a negative steering angle, wheras right curves require a positive angle. This information can be used to take left and right images into account when training the model, because the perspective in those images have a different center and are looking towards the outer edges. With a correction of ~0.25 the training data would triple and also feature another perspective.

As can be seen the steering angle ranges from 0-1, where 1 represents an 25° angle. The data is heavily skewed towards 0, therefore some kind of sampling needs to be applied to accurately represent curves. Therefore I create a linspace between 0 and 1 with 200 bins and sort each observation within those buckets. Afterwards I take a random sample of max 150 observations per bin (if bin has >150, else just all) and store those in a list. This list is then used to filter the dataframe for corresponding images.

In [None]:
# bins
bins = 200
max_n_samples = 150
space = np.linspace(0.0, 1.0, bins)

# dict with every observation index
d = {key:[] for key in space}
l = [abs(i) for i in df['Steering']]

for i in range(len(l)):    
    for c in range(bins-1):
        if space[c] <= l[i] < space[c+1]:
            d[space[c]].append(i)

# stores the selected dataframe index
inds = []

for key in d.keys():
    if len(d[key]) >= max_n_samples:
        for i in np.random.choice(d[key],max_n_samples):
            inds.append(i)
    else:
        for i in d[key]:
            inds.append(i)

Afterwards I create the dataframe for the ImageDataGenerator. This features all the selected images within the bins created before.
- take left, center and right image
- add .25, 0, -.25 to create incentive to go back to the center of the road in the perspective
- crop the image between 50:130 in the horizontal axis
- save the images/dataframe to upload on server
- shuffle dataframe

In [None]:
steerings = []
counter = 0
direction = ['L', 'C', 'R']
direction_corr = [.25, 0, -.25]

for index in inds:
    for col in range(len(direction)):
        img = mpimg.imread(df.iloc[index][direction[col]])[50:130,:,:]
        mpimg.imsave(f'imgs_test/img_{counter}_{direction[col]}.jpg', img)
        steerings.append([
            f'imgs_test/img_{counter}_{direction[col]}.jpg',
            df.iloc[index]['Steering'] + direction_corr[col]]
        )
        counter += 1
        
newdf = pd.DataFrame(steerings)
newdf.columns = ['path', 'label']
shuffle(newdf).to_csv('imgs/log.csv', index=False)

In [None]:
cols = 3
rows = 3

fig, ax = plt.subplots(cols,rows, dpi=160, figsize=(8,2))
ax = ax.ravel()
for i in range(cols*rows):
    
    img = mpimg.imread(newdf.iloc[i*1000]['path'])
    angle = str(round(newdf.iloc[i*1000]['label'],6))
    
    cv2.putText(img, angle,(50, 50),cv2.FONT_HERSHEY_SIMPLEX,1,(0, 255, 0),3)
    
    ax[i].axis('off')
    ax[i].imshow(img)

plt.show()

After saving all the images within a seperate directory, I got around 14000 images to start training the model. 