# Follow Me Project Write-up

In this project it was required to train a neural network to identify a target ("hero") from a drone with a camera within a simulated urban environment. The following image is an exmaple the drone within the environment following the hero. This is the behaviour that I'm aiming to achieve in this project.

<img src="files/writeup_files/quad_following.png">

The steps taken to complete the project were:

1. Setting up a local environment with the RoboND Quad Simulator specific for this project.

2. Setting up an AWS Amazon Machine Images (AMI). This AMI was used for all coding, training and testing of the model.

3. Collecting data from the simulator to train your network. This step is optional because of the extensive traiing dataset provided by Udacity.

4. Building a neural network.

5. Training the network and extracting the final model and weights from the AWS environment.

6. Testing the model with the Follow Me simulator.

## Step 1: Local environment

There were three components to setting up the local environment:

1. Installing the required Python tools for the RoboND environment - this was however already complete from earlier lessons
2. Installing the QuadSim simulation software
3. Downloading the project data

The QuadSim simulation software was downloaded from a [Udacity RoboND repository on GitHub.](https://github.com/udacity/RoboND-DeepLearning-Project/releases/latest)

The project data was then downloaded from an AWS S3 hosted Udacity data repository. The data downloaded was: [Training Data](https://s3-us-west-1.amazonaws.com/udacity-robotics/Deep+Learning+Data/Lab/train.zip), [Validation Data](https://s3-us-west-1.amazonaws.com/udacity-robotics/Deep+Learning+Data/Lab/validation.zip) and [Sample Evaluation Data](https://s3-us-west-1.amazonaws.com/udacity-robotics/Deep+Learning+Data/Project/sample_evaluation_data.zip).

## Step 2: AWS remote environment

In this project it is possible to undertake the neural network training on a local PC. Students from earlier cohorts however found that the computational power to train the model to a level that receives a passing submission could exceed 12 hours. As a solution Udacity has encouraged students to use AWS EC2 cloud computing services which offer much greater and more appropriate (i.e. GPU) computing resources. Student credits were also available to undertake the training.

To establish my AWS environment I created an account, claimed my credits and launched a **p2.xlarge** EC2 instance. I launched this instance with the **Udacity Robotics Deep Learning Laboratory** AMI. This AMI has pre-loaded all the tools required to train a neural network with TensorFlow on GPU hardware.

As part of the establishment of my remote environment I also forked from the Udacity GitHub account the RoboND-DeepLearning-Project for my own use on my GitHub account ([Michael Hetherington](https://github.com/michaelhetherington/RoboND-DeepLearning-Project)). I then uploaded all the project data downloaded in Step 1 to my DeepLearning-Project repository. 

I found this was an effective method to fully manage my project remotely in the cloud. All communications between my AWS EC2 instance and my GitHub repository could then be managed through command line commands. I could then further synchronise this with my local PC which enable me to upload screenshots from the local simulator.

Using the guidance from the course notes I established a Jupyter Notebook service on my AWS EC2 instance that could be accessed through a web browser on a local PC. The command to initiate the Jupyter Notebook service is: 
```shell
jupyter notebook --ip='*' --port=8888 --no-browser
```

## Step 3: Collecting data from the simulator

In previous cohorts collecting data was a substantial task. In later cohorts, mine included, Udacity has provided links to download training, validation and sample evaluation data from an AWS S3 hosted data repository. The download of this data was described in Step 1 and its upload to my GitHub repository in Step 2.

The data provided contained:
- 4,131 training images
- 1,184 validation images
- 1,134 sample evaluation images (542 following, 270 patrol_non_target and 322 patrol_w_targ)

I subsequently collected an additional 2,125 training images and 1,206 validation images from the quad simulator. I used this data to supplement the Udaicty data in the very last few model training runs I undertook. This additional data represents ~50% increase in training and validation data volume.

## Step 4: Building a neural network

A Fully Convolutional Network (FCN) is adopted for this project because while doing the convolution, they preserve the spatial information throughout the entire network. 

For example, if we were trying to identify an apple, and the apple was the sole object in an image, then we could use a typical convolutional layer with only encoder blocks and a fully connected block. This set-up is illustrated in the following image with the three leftmost block the encoders and the far right block the fully connected:

<img src="files/writeup_files/Conv_diagram.png">

An FCN however also features the same number of decoder blocks as encoder blocks after the fully connected layer. These decoding blocks effectively upscale the output and allow the neural network to preserve the spatial information of the input image. This is illustrated in the following image with each of the three decoder blocks seen to the right of the fully connected layed that featured in the above example.

<img src="files/writeup_files/FCN_diagram.png">


In the project template file Udacity has provided the code to build the FCN layers. The layers provided are:

- Seperable Convolutions; 
- 1x1 Regular Convolution; and
- Bilinear Upsampling.

The Seperable Convolutions are used throughout the encoder and decoder blocks as they have a number of advantages over normal convolutions, the 1x1 Regular Convolution is the fully connected layer between the encoder and decoder blocks while the Bilinear Upsampling layer is used within the decoder blocks exclusively.

Once the layers were prepared the next step was to build the model. 

The initial step was to create an Encoder Block and a Decoder Block. From these blocks I constructed the model  by combining numerous copies of these blocks together with a 1x1 fully connected layer in between. In testing the model I varied the number of blocks used as well as the filter/layer depths within the blocks. This fine tuning was required to achieve more effective final scores from the trained network.

The code for the Encoder, Decoder and constructed model are shown here successively:

**Encoder Block**
```python 
def encoder_block(input_layer, filters, strides):
    
    output_layer = separable_conv2d_batchnorm(input_layer, filters, strides)
    
    return output_layer
```

**Decoder Block**
```python
def decoder_block(small_ip_layer, large_ip_layer, filters):
    
    upsample = bilinear_upsample(small_ip_layer)
    
    concat = layers.concatenate([upsample, large_ip_layer])
    
    output_layer = separable_conv2d_batchnorm(concat, filters=3, strides=1)
    
    return output_layer
```
*with this encoder block I tried using upto three seperable convolution layers but noticed no real difference in final score for my trained model

**Constructed Model**
```python
def fcn_model(inputs, num_classes):
    
    encode_1 = separable_conv2d_batchnorm(inputs, filters=64, strides=2)
    encode_2 = separable_conv2d_batchnorm(encode_1, filters=128, strides=2)
    encode_3 = separable_conv2d_batchnorm(encode_2, filters=256, strides=2)
    encode_4 = separable_conv2d_batchnorm(encode_3, filters=512, strides=2)
    encode_5 = separable_conv2d_batchnorm(encode_4, filters=1024, strides=2)

    conv_1x1 = conv2d_batchnorm(encode_5, filters=2048, kernel_size=1, strides=1)
    
    decode_1 = decoder_block(conv_1x1, encode_4, filters=1026)
    decode_2 = decoder_block(decode_1, encode_3, filters=512)
    decode_3 = decoder_block(decode_2, encode_2, filters=256)
    decode_4 = decoder_block(decode_3, encode_1, filters=128)
    x = decoder_block(decode_4, inputs, filters=64)
    
    return layers.Conv2D(num_classes, 1, activation='softmax', padding='same')(x)
```

*This is the third revision of an original model I constructed. The below table shows the variations in filter depth I used for each model. For each model I retained the same number of encoder and decoder blocks (5 each).

| Model | # Blocks | Block Strides | Block Depth | 1x1 Conv Block Depth |
| :-: | :-: | :-: | :-: | :-: |
| Original| 5 | 2 | (8,16,32,64,128) | 256 |
| Revised | 5 | 2 | (16,32,64,128,256) | 512 |
| Revised 2 | 5 | 2 | (64,128,256,512,1024) | 2048 |
| Revised 3 | 5 | 2 | (128,256,512,1024,2048) | 4096 |

## Step 5: Training the neural network and extracting the model weights

Once the model was constructed in the previous step a number of hyper parameters had to be chosen to set how the model trained itself. While Udacity provided recommended starting figures it was necessary to fine tune these parameters across a number of training and evaluation runs to achieve better final scores for the project. Below is a description of each of the hyperparameters used in this project.

### Hyperparameters

- **batch_size**: number of training samples/images that get propagated through the network in a single pass.
- **num_epochs**: number of times the entire training dataset gets propagated through the network.
- **steps_per_epoch**: number of batches of training images that go through the network in 1 epoch. We have provided you with a default value. One recommended value to try would be based on the total number of images in training dataset divided by the batch_size.
- **validation_steps**: number of batches of validation images that go through the network in 1 epoch. This is similar to steps_per_epoch, except validation_steps is for the validation dataset. We have provided you with a default value for this as well.
- **workers**: maximum number of processes to spin up. This can affect your training speed and is dependent on your hardware. We have provided a recommended value to work with. 

#### Batch Size and Steps per Epoch
I selected the Batch Size and Steps per Epoch values such that their product would roughly equate to the quantity of training images I was using. I found however that this didn't necessarily make much of a difference and that raising the Steps per Epoch would produce better results than raising the Batch Size.

Increasing the Batch Size and Steps per Epoch could have positive or negative effects on the final score of the model but always invariably increased the processing time of the model. 

#### Number of Epochs
Initially I started using a low Number of Epochs to get a feel for how well the hyperparameters performed. Generally, if the hyperparameters were producing good results, then I could achieve a good final score even with just a few epochs. 

Once I'd found a set of hyperparameters I was comfortable with I would increase the Number of Epochs in an attempt to reduce training and validation loss. In some instances however, I would discover that my model would conclude training during an epoch with a sudden jump in validation loss which I believe indicates that the final weights it selected were poorly matched to the task.

#### Learning Rate
I found that varying Learning Rate could have a substantial impact upon model effectiveness. 

With a relatively low Learning Rate (e.g. 0.001) the model would take quite a few epochs at the beginning to settle down to a training loss <0.1 but once there would progressively minimise loss without large jumps. 

Overall I discovered that a Learning Rate of 0.1 produced the most consistent and highest results. Even with a relatively low number of epochs (e.g. 5) a Learning Rate of 0.1 could produce a good result.

Reducing the Learning Rate generally had the effect of slowing down the model. Higher Learing Rates, such as 0.1, were much quicker to process than low rates like 0.001.

#### Validation Steps
This hyperparameter I generally left untouched and set at 50. I would sometimes change it such that when it was multiplied with Steps per Epoch it would roughly equal the number of images in the validation dataset.

#### Workers
I was unsure how many workers I could allocate to the model training and unsure how to write code that would output how many workers had been procured. I set this at 50 and assumed that this maximum would exceed the capacity of the AWS instance I was using and thus would be fastest. If this assumption was wrong it wouldn't have a detrimental effect on the training output.

### Summary of Training Runs
The below table summarises the hyperparameters used and the final scores achieved for a series of runs I undertook to train the model. For each run I've also designated the model used (i.e. the original or one of three revisions) and the dataset used. 

Initially I conducted all my training on the Udacity provided training dataset as I'd read on the Slack channels that this was comprehensive and should be more than adequate to achieve a final score of 0.4 or greater as required to complete the project. However, I was unable to achieve a final score above 0.38. This seemed to be an upper limit that I was unable to overcome. I supplemented the Udacity dataset with my own data collected from the Quad simulator. This supplementary data increased both my training and validation datasets by ~50%. Unfortunately I've still been unable to tune my model to a point where it can achieve a final score of 0.4. Regardless, with my best model, I can perform the simulation test and the quad copter will located and follow the target without issues.

**Note to reviewer: I'm seeking assistance with improving my model to achieve a final_score of >=0.4 for a passing grade**

| Run No. | Final Score | Learning Rate | Batch Size | Num Epochs | Steps per Epoch | Val Steps | Workers | Model | Data Set |    
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
|1| **0.29** | 0.05 | 32 | 3 | 200 | 50 | 10 | Original | Udacity |
|2| **0.34** | 0.1 | 64 | 4 | 400 | 50 | 10 | Original | Udacity |
|3| **0.01** | 0.1 | 128 | 3 | 256 | 64 | 20 | Original | Udacity |
|4| **0.30** | 0.05 | 64 | 3 | 400 | 50 | 20 | Original | Udacity |
|5| **0.25** | 0.05 | 16 | 5 | 256 | 50 | 20 | Original | Udacity |
|6| **0.16** | 0.1 | 32 | 5 | 256 | 70 | 20 | Original | Udacity |
|7| **0.29** | 0.15 | 64 | 5 | 400 | 50 | 10 | Original | Udacity |
|8| **0.31** | 0.0015 | 20 | 5 | 200 | 55 | 10 | Revised | Udacity |
|9| **0.36** | 0.1 | 32 | 12 | 200 | 50 | 50 | Revised | Udacity |
|10| **0.38** | 0.1 | 32 | 15 | 400 | 50 | 50 | Revised_2 | Udacity |
|11| **0.25** | 0.1 | 32 | 15 | 400 | 50 | 50 | Revised_3 | Udacity |
|12| **0.21** | 0.001 | 32 | 15 | 200 | 50 | 50 | Revised_3 | Udacity |
|13| **0.35** | 0.1 | 32 | 30 | 400 | 50 | 50 | Revised_3 | Udacity |
|14| **0.31** | 0.1 | 16 | 30 | 250 | 50 | 50 | Revised_2 | Udacity |
|15| **0.16** | 0.1 | 32 | 15 | 500 | 50 | 50 | Revised_2 | Udacity |
|16| **0.30** | 0.1 | 32 | 4 | 200 | 50 | 50 | Revised_2 | Udacity |
|17| **0.31** | 0.1 | 32 | 4 | 300 | 50 | 50 | Revised_2 | Udacity |
|18| **0.25** | 0.1 | 32 | 4 | 400 | 50 | 50 | Revised_2 | Udacity |
|19| **0.37** | 0.1 | 32 | 45 | 400 | 50 | 50 | Revised_2 | Udacity |
|20| **0.36** | 0.1 | 32 | 15 | 400 | 50 | 50 | Revised_2 | Udacity_supplemented |
|21| **0.34** | 0.15 | 64 | 15 | 200 | 50 | 50 | Revised_2 | Udacity_supplemented |
|22| **0.37** | 0.1 | 32 | 15 | 400 | 75 | 50 | Revised_2 | Udacity_supplemented |

As the training progresses the cross-entropoy loss between the model and the training/valisation datasets is plotted for each epoch. This provides insight in to how effectively the model is training given the input hyperparameters. The below image shows the final loss plot from Run_22 with a total of 15 epochs. 

<img src="files/writeup_files/Run_22_loss_curves.png">

Throughout this model training it is evident that there is significant and varying divergence between the training and validation losses between epochs. While the training loss generally reduces over time there is no clear pattern or value to which the validation loss tends toward. One concern I've had, and which may be a cause of the loss divergence, is that my model is over fitting the data. If the model is over fitting it will effectively be learning the training data too well and won't be able to accurately classify new images it hasn't been presented with previously.

If the model were presented with a different target to track - for example a dog, cat, car, etc. - I believe this model structure would be adequate for identifying the target. This would *only* be possible however if a different, target appropriate, dataset were provided. It would be impossible for the current model to learn to track a dog by using the provided training and validation data which only features humans. There would be further limits on this model if it were also trying to identify and then track a target within an extremely dense environment such as a crowd at a rock concert.

Some key observations I made as I progressively tuned and re-ran my model training include:

- I think I had problems with over-fitting which was resulting in wildly varying training and validation loss values
- I’ve reduced learning rate to 0.001 and given more epochs to allow for the slow convergence to a low train_loss that occurs in the first few epochs
- I’ve noted that even with a low learning rate (0.001) and many epochs (20) the training and validation loss can diverge and converge between later stage epochs suggesting that there may be a more optimal approach to determining how many epochs to use
- It’s not clear how to ensure over-fitting doesn’t occur from employing too many epochs
- I also increased the number of filters/parameters in the network in an attempt to get large changes in final score. This doesn't seem to have had a linear effect; sometimes the final score is lower for the same hyperparameters
- Once the number of filters/parameters was increased this greatly reduced the rate of loss convergence while the learning rate was retained at 0.001
- High learning rates (0.1) converge fast but also result in seemingly random loss divergence across epochs

### Future Enhancements

For future enhancements the following items could be considered:
- Using a different optimizer such as Nadam
- Trying different activation functions other than *softmax*
- Redesigning how skip-layer connections are constructed. I feel that this is not working effectively in this current design
- Varying the number of encode/decode blocks present within the model
- Define a new function which automatically runs a series of model training runs and automatically varies the hyperparameters systematically between runs

One outstanding question which I will continue to research and seek to understand is: 'Does it necessarily follow that the model is getting better after each epoch? Or are the divergent losses indicating a reduction in model effectiveness?'

## Step 6: Testing the model with Follow Me simulator

Once the model was trained and the weights file was created it was time to test how well it could identify the "hero" within the Follow Me simulator.

Because I'd trained a number of models (actually in excess of the 22 I have files for) I was able to test how well high and low final score models were at finding and following the "hero". I found that:
- for the model from Run_15, with a final score of just 0.16, the quad can find the hero but generally can't maintain tracking and will quickly lose the target. After a long period however the quad managed to lock on to the target and continue to follow it without loss; and
- for the model from Run_10, with a final score of 0.38, the quad can easily find the target and then lock on to it without losing it at all

To demonstrate the success of my model, I provide below screenshots of my quad following the target and processing camera images using the model produced from **Run_10**.

The following images are screenshots of:
- the quad copter following the target within the simulator; and
- the camera view and model classification output

<img src="files/writeup_files/quad_follow_sim.png">

<img src="files/writeup_files/quad_follow_output.png">