# Continuous Control Project

## 1. Models tested

We tried and trained a DDPG algo with different sets of parameters:
- learning batch sizes for values [50, 100, 150, 200, 250]
- number of steps between learning steps for values [1, 4, 8, 12, 16, 20]
- Ornstein-Uhlenbeck process (theta, sigma) for values [(0.15, 0.2), (0, 0)]
- Actor & Critic learning rates for values [(1e-4, 1e-3), (1e-3, 1e-3)]

### Notation

A DDPG model will be identified by the following string: 
"[batch_size]_[nb_learning_steps]_[actor_lr]_[critic_lr]_[noise_theta]_[noise_sigma]_[actor_hidden_layers]_[critic_hidden_layer]"

For instance "150_8_0.001_0.001_0.15_0.1_(128, 128)_(128, 128)" designates a DDPG algo trained with:
- an actor neural network composed of 2 hidden layers fully connected of 128 nodes each
- a critic neural network composed of 2 hidden layers fully connected of 128 nodes each
- using a learning batch size of 150 steps at a time
- every 8 steps
- with an actor learning rate of 1e-3
- and a critic learning rate of 1e-3
- with noise parameters of (theta = 0.15, sigma = 0.1)

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import results_analysis

## 2. Training results

Below we gather the results of all the training runs tried with different sets of parameters

In [None]:
# Get results from all training runs contained in the Results directory
df_results_step, df_results, df_details = results_analysis.get_training_results()

# Get best training runs
# best_results_step finds the highest step-values among all training runs
# best_results finds the highest end-of-episode values among all training runs
best_results_step, best_results = results_analysis.get_best_runs(df_results_step, df_results, nb_runs=5)

display(best_results_step.iloc[:5])
display(best_results.iloc[:5])

## 3. Parameter analysis

In order to find the best model, I used the first training runs I had done and tried to keep the variation of each parameter that got the "best" results over all the runs (looking at max and mean values)

After running some more training passes with the most promising combinations of parameter values, I got the below summary table. 

In [None]:
# parameters_summary contains some statistics for each parameter values tried
parameters_summary = results_analysis.get_stats_per_parameter(df_details)
display(parameters_summary)

## 4. Best model

I selected as "best model" the one that reached an average score of 30 the fastest; in 782 steps

In [None]:
selected_model = best_results.index[0][1]

display(df_results[df_results['model_tag']==selected_model])

### 4.1 Training plots

In [None]:
# Plot 100-steps average scores
training_run, selected_model = best_results.index[0]
results_analysis.plot_model(df_results_step, df_results, training_run, selected_model, by_step=False, kind='bar')

In [None]:
# Plot all steps
results_analysis.plot_model(df_results_step, df_results, training_run, selected_model, by_step=True, kind='line')

### 4.2 Test

Below a test run using the selected "best model"

In [1]:
import os 
import results_analysis

#You need to be at the root directory of the navigation directory to run the model in the next cell
os.chdir('..')
print(os.getcwd())

try:
    selected_model = best_results.index[0][1]
except:
    selected_model = '150_8_0.001_0.001_0.15_0.1_(128, 128)_(128, 128)'

print('selected_model', selected_model)

C:\Users\J\Programming\Visual Studio Code\UdacityRL\ContinuousControl
selected_model 150_8_0.001_0.001_0.15_0.1_(128, 128)_(128, 128)


In [2]:
#If you get the "handle is closed" error, you need to restart your kernel and execute from the Conclusion first cell; 
#I don't know how to fix that
%run -i continuous_control.py test --test_params=best_params.json --test_model="auto" --test_results_path=""

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Load model weights ./ModelWeights/150_8_0.001_0.001_0.15_0.1_(128, 128)_(128, 128).pth
Score: 38.27