# Behavioral Cloning Notes

### Lesson 15

#### Recovery Laps
If you drive and record normal laps around the track, even if you record a lot fo them, it might not be enough to train your model to drive properly.

Here's the problem: If yor tranin data is all focused on driving down the middle of the road, your model won't ever learn what to do if it gets off the side of the road. And probably when you run your model to predict steering measurements, thigns won't go perfectly and the car will wander off the side of the road at some point.

So you need to teach your car what to do when it's off on the side of the road.

One approach might be to constantly wander off the side of the road and then steer back to the middle.

A better approach is to only record data when the car is driving from side of the road back toward the center line.

So as the human driver, you're still weaving back and forth between the middle of the road and the shoulder, but you need to turn off data recording when you weave out to the side and turn it back on when you steer back in the middle.

#### Driving Counter-Clockwise
Track one has a left turn bias. If you only drive around the first track in a clock-wise direction, the data will be biased towards left turns. One way to combat the bias is to turn the car around adn record counter-clockwise laps around the track. Driving counter-clockwise is also like giving the model a new track to learn from, so the model will generalize better.

#### Using Both Tracks
If you end up using data from only one track, the CNN could essentially memorize the track. Consider using data from both trakc one and track two to make a more generalized model.

#### Collecting Enough Data
How do you know when you have collected enough data? Machine learning involves trying out ideas and testing them to see if they work. If the model is over or underfitting, then try to figure out why and adjust accordingly.

Since this model outputs a single continuous numeric value, one appropriate error metric would be mean squared error. If the mean squared error is high on both a training and validation set, the model is underfitting. If the mean squared error is low on a training set but high on a validation set, the model is overfitting. Collecting more data can help improve a model when the model is overfitting.

What if the model has a low mean squared error on both the training and validation sets, but the car is falling off the track?

Try to figure out the cases where the vehicle is falling off the track. Does it occur only on turns? Then maybe it's important to collect more turning data. The vehicle's driving behavior is only as good as the behavior of the driver who provided the data.

Here are some general guidelines for data collection:

* two or three laps of center lane driving
* one lap of recovery driving from the sides
* one lap focusing on driving smoothly around curves

### Lesson 16: Visualizing Loss

#### Outputting Training and Validation Loss Metrics
In Keras, the `model.fit()` and `model.fit_generator()` methods have a verbose parameter that tells Keras to output loss metrics as the model trains. The `verbose` parameter can optionally be set to `verbse = 1` or `verbose = 2`.

Setting `model.fit(verbose=1)` will

* output a progress bar in the terminal as the model trains
* output the loss metric on the training set as the model trains
* output the loss on the training and validation sets after each epoch

With `model.fit(verbose = 2`, Keras will only output the loss on the training set and validation set after each epoch.

#### Model History Object
When calling `model.fit()` or `model.fit_generator()`, Keras outputs a history objct that contains the training and validation loss for each epoch. Here is an example of how you can use the history object to visualize the loss:

![Verbose Output](examples/verbose.png)

******

```
import keras.models import Model
import matplotlib.pyplot as plt

history_object = model.fit_generator(train_generator, samples_per_epoch =
    len(train_samples), validation_data = 
    validation_generator,
    nb_val_samples = len(validation_samples), 
    nb_epoch=5, verbose=1)

### print the keys contained in the history object
print(history_object.history.keys())

### plot the training and validation loss for each epoch
plt.plot(history_object.history['loss'])
plt.plot(history_object.history['val_loss'])
plt.title('model mean squared error loss')
plt.ylabel('mean squared error loss')
plt.xlabel('epoch')
plt.legend(['training set', 'validation set'], loc='upper right')
plt.show()
```

******

### Lesson 17: Generators

#### How to Use Generators
The images captured in the car simulator are much larger than the iamges encountered in the Traffic Sign Classifier Project, a size of 320 x 160 x 3 compared to 32 x 32 x 3. Storing 10,000 traffic sign images would take about 30 M but storing 10,000 simulator images would take over 1.5GB. That's a lot of memory! Not to mention that preprocessing data can change data types from an `int` to a `float`, which can increase the sie of the data by the factor of 4.

Generators can be a great way to work with large amounts of data. Instead of storing the preprocessed data in memory all at once, using a generator you can pull pieces of the data nd process them on the fly only when you need them, which is much more memory-efficient.

A generator is like a coroutine, a process that can run separately from another main routine, which makes it a useful Python function. Instead of using `return`, the generator uses `yield`, which still return the desired output values but svaes the current values of all the generator's variables. When the generator is called a second time ti re-starts right after the `yield` statement, with all its variables set to the same values as before.

Below is a short quiz using a generator. This generator appends a new Fibonacci number to its list every time it is called. To pass, simply modify the generator's `yield` so it returns a list instead of `1`. The result will be we can get the first 10 Fibonacci numbers simply by calling our generator 10 times. If we need to go do something else besides generate Fibonacci numbers for a while we can do that and then always just call the generator again whenever we need more Fibonacci numbers.

******
```
def fibonacci():
    list = []
    while 1:
        if(len(list) < 2):
            list.append(1)
        else:
            list.append(list[-1]+list[-2])
        yield 1 
        # change this line so it yields its list instead of 1

our_generator = fibonacci()
my_output = []

for i in range(10):
    my_output = (next(our_generator))
    
print(my_output)
```
******

Here is an example of how you could use a generator to load data and preprocess it on the fly, in batch size portions to feed into your Behavioral Cloning model.

******
```
import os
import csv

samples = []
with open('./driving_log.csv') as csvfile:
    reader = csv.reader(csvfile)
    for line in reader:
        samples.append(line)

from sklearn.model_selection import train_test_split
train_samples, validation_samples = train_test_split(samples, test_size=0.2)

import cv2
import numpy as np
import sklearn

def generator(samples, batch_size=32):
    num_samples = len(samples)
    while 1: # Loop forever so the generator never terminates
        shuffle(samples)
        for offset in range(0, num_samples, batch_size):
            batch_samples = samples[offset:offset+batch_size]

            images = []
            angles = []
            for batch_sample in batch_samples:
                name = './IMG/'+batch_sample[0].split('/')[-1]
                center_image = cv2.imread(name)
                center_angle = float(batch_sample[3])
                images.append(center_image)
                angles.append(center_angle)

            # trim image to only see section with road
            X_train = np.array(images)
            y_train = np.array(angles)
            yield sklearn.utils.shuffle(X_train, y_train)

# compile and train the model using the generator function
train_generator = generator(train_samples, batch_size=32)
validation_generator = generator(validation_samples, batch_size=32)

ch, row, col = 3, 80, 320  # Trimmed image format

model = Sequential()
# Preprocess incoming data, centered around zero with small standard deviation 
model.add(Lambda(lambda x: x/127.5 - 1.,
        input_shape=(ch, row, col),
        output_shape=(ch, row, col)))
model.add(... finish defining the rest of your model architecture here ...)

model.compile(loss='mse', optimizer='adam')
model.fit_generator(train_generator, samples_per_epoch= /
            len(train_samples), validation_data=validation_generator, /
            nb_val_samples=len(validation_samples), nb_epoch=3)
            
```
******

### Lesson 18: Recording Video in Autonomous Mode

#### Recording Video in Autonomous Mode

Because your hardware setup might be different from a reviewer's hardware setup, driving behavior could be different on your machine than on the reviewer's. To help with reviewing your submission, we require that you submit a video recording of your vehicle driving autonomsly around the track. The video should include at least one full lap around the track. Keep in mind the rubric specifications:

> "No tire may leave the drivable portion of the track surface. The car may not pop up onto ledges or roll over any surfaces that would otherwise be considered unsane (if humans were in the vehicle)."

In the Github repo, we have included a file called video.py, which can be used to create the video recording when in autonomous mode. 

The README file in Github repo contains instructions about to make the video recording. Here are the instructions as well:

```
python drive.py model.h5 run1
```

The fourth argument `run` is the directory to save the images seen by the agent to. If the directory already exists it'll be overwritten.

```
ls run1

[2017-01-09 16:10:23 EST]  12KiB 2017_01_09_21_10_23_424.jpg
[2017-01-09 16:10:23 EST]  12KiB 2017_01_09_21_10_23_451.jpg
[2017-01-09 16:10:23 EST]  12KiB 2017_01_09_21_10_23_477.jpg
```

The image file name is a timestamp when the image was seen. This information is used by `video.py` to create a chronological video of the agent driving.

#### Using video.py
```
python video.py run1
```
Create a video based on images found in the `run` directory. The name of the video will be name of the directory following by `.mp4` so, in this case the video will be `run1.mp4`.

Optionally, one can spcify the FPS (frames per second) of the video:

```
python video.py run1 --fps 48
```

The video will run at 48 FPS. The default FPS is 60.