# Hyperparameter Study

In [1]:
PATH = '07_no_clip_restart'


## 1) Tau and No regularization

1. We aim at testing the policy's sensivity w.r.t $\tau$ the temperature parameter. 
2. Initially, $\tau = 100$ and it falls linearly with the number of episodes (`explore_episodes=450`). 
3. Each test dataframe consists of the DataFrame.describe() statistics from **N** = 30 independent random trials, each of which consisting of rollouts of `M=100`, with $\tau$ set to a predetermined value.

Parameters:
```
ALPHA = 0.5  # ALPHA:
BETA = 0.3  # BETA:
TAU = 1.0   # Final TAU. ONLY active is EXPLORE=True
EXPLORE_EPISODES = 475
EPISODES = 500
EXPLORE = True  # WHETER OR NOT WE USE EXPLORATION
RESTART = True
```

## 1.1) Out-of-sample simulations

### 1.1.1) Tau=1 and Seed=0

![t01s00](07_no_clip_restart/00_tau01/00/simulation-seed00.gif)

### 1.1.2) Tau=1 and Seed=1

![t01s01](07_no_clip_restart/00_tau01/01/simulation-seed01.gif)

### 1.1.3) Tau=5 and Seed=1

![t05s01](07_no_clip_restart/03_tau05/01/simulation-seed01.gif)

# 1.2 Leaderboard

In [2]:
import pandas as pd

tau01_df = pd.read_csv(PATH + '/00_tau01/02/pipeline.csv', sep=',', index_col=0)
tau02_df = pd.read_csv(PATH + '/01_tau02/02/pipeline.csv', sep=',', index_col=0)
tau03_df = pd.read_csv(PATH + '/02_tau03/02/pipeline.csv', sep=',', index_col=0)
tau05_df = pd.read_csv(PATH + '/03_tau05/02/pipeline.csv', sep=',', index_col=0)
tau10_df = pd.read_csv(PATH + '/04_tau10/02/pipeline.csv', sep=',', index_col=0)

def describe(dataframe: pd.DataFrame, label: str) -> pd.DataFrame:
    """Describes the dataframe
    
    Parameters
    ----------
    dataframe: pd.DataFrame
        A dataframe with description N independent rollouts.
        Each consisting of M timesteps.
        Trials are in the columns and rows are statistics.
        The result of df.describe()
   
    Returns
    -------
    dataframe: pd.DataFrame
        A description of the average return.
    
    """
    df = dataframe.drop(['std', 'count', '25%', '50%', '75%'], axis=0)
    ts = df.T.describe()['mean']
    ts.name = label
    return ts.to_frame()

In [3]:
tau01_df.T.describe()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,100.0,-0.826094,0.364042,-1.664242,-1.083374,-0.807345,-0.532816,-0.222745
std,0.0,1.341297,0.882586,2.83496,2.08777,1.303901,0.609076,0.43185
min,100.0,-7.614286,0.0,-16.444468,-11.901067,-7.377226,-2.995352,-1.899361
25%,100.0,-0.709591,0.098551,-1.587146,-0.866781,-0.660018,-0.526641,-0.198577
50%,100.0,-0.516601,0.207527,-1.052941,-0.637413,-0.492935,-0.346795,-0.057756
75%,100.0,-0.356736,0.307522,-0.745237,-0.416555,-0.339049,-0.223996,-0.026103
max,100.0,-0.127377,4.982876,-0.4588,-0.145477,-0.115342,-0.07752,-0.005976


In [4]:
dataframes = []
dataframes.append(describe(tau01_df, label='tau01'))
dataframes.append(describe(tau02_df, label='tau02'))
dataframes.append(describe(tau03_df, label='tau03'))
dataframes.append(describe(tau05_df, label='tau05'))
dataframes.append(describe(tau10_df, label='tau10'))
noregdf = pd.concat(dataframes, axis=1)
noregdf

Unnamed: 0,tau01,tau02,tau03,tau05,tau10
count,30.0,30.0,30.0,30.0,30.0
mean,-0.826094,-0.479021,-0.442162,-0.345316,-0.37029
std,1.341297,0.304314,0.29002,0.119567,0.090621
min,-7.614286,-1.469546,-1.139234,-0.714682,-0.565207
25%,-0.709591,-0.509073,-0.50144,-0.410162,-0.412021
50%,-0.516601,-0.378446,-0.315756,-0.321267,-0.325997
75%,-0.356736,-0.26734,-0.253503,-0.261022,-0.302905
max,-0.127377,-0.168934,-0.171007,-0.159647,-0.250193


1. The trials seem to suggest that indeed there is an U-shaped relationship between $\tau$ and the performance. 
2. $\tau = 5$ is found to be the best performing both in terms of mean.
3. However the best performing policy (max) has $\tau = 1$. Suggesting that even this simple scenario presents high variation w.r.t initial  positions of the landmark and the induced topology for the gradient parameter search.
4. However among the upper quarter the 7th best policy of $\tau = 1$ out ranks those from $\tau = 5$ presenting better tail behaviour. 

## 1.3. In-Sample Simulations


### 1.3.1 Best rollout (Tau=01)

> tau01_df['64']
```
count    100.000000
mean      -0.127377
std        0.096640
min       -0.669727
25%       -0.145477
50%       -0.115342
75%       -0.077520
max       -0.019049
Name: 64, dtype: float64
```

![t01s64](07_no_clip_restart/00_tau01/02/simulation-pipeline-best.gif)

### 1.3.2 Best mean (Tau=05)

> tau05_df['67']
```
count    100.000000
mean      -0.159647
std        0.080587
min       -0.469426
25%       -0.202637
50%       -0.149833
75%       -0.108500
max       -0.014875
Name: 67, dtype: float64
```

![t05s67](07_no_clip_restart/03_tau05/02/simulation-pipeline-best.gif)


# 2. Tau with Regularization


1. We further regularize the variables $\delta_t$ and $\omega_t$ by applicating the techinique called parameter clipping.

2. Other parameters are kept at their values.

Parameters:
```
ALPHA = 0.5  # ALPHA:
BETA = 0.3  # BETA:
TAU = 1.0   # Final TAU. ONLY active is EXPLORE=True
EXPLORE_EPISODES = 475
EPISODES = 500
EXPLORE = True  # WHETER OR NOT WE USE EXPLORATION
RESTART = True
```

In [5]:
PATH = '08_clipping_restart'
tau01_df = pd.read_csv(PATH + '/00_tau01/02/pipeline.csv', sep=',', index_col=0)
tau02_df = pd.read_csv(PATH + '/01_tau02/02/pipeline.csv', sep=',', index_col=0)
tau03_df = pd.read_csv(PATH + '/02_tau03/02/pipeline.csv', sep=',', index_col=0)
tau05_df = pd.read_csv(PATH + '/03_tau05/02/pipeline.csv', sep=',', index_col=0)
tau10_df = pd.read_csv(PATH + '/04_tau10/02/pipeline.csv', sep=',', index_col=0)


In [6]:
dataframes = []
dataframes.append(describe(tau01_df, label='tau01'))
dataframes.append(describe(tau02_df, label='tau02'))
dataframes.append(describe(tau03_df, label='tau03'))
dataframes.append(describe(tau05_df, label='tau05'))
dataframes.append(describe(tau10_df, label='tau10'))
regdf = pd.concat(dataframes, axis=1)
regdf

Unnamed: 0,tau01,tau02,tau03,tau05,tau10
count,30.0,30.0,30.0,30.0,30.0
mean,-0.506445,-0.386484,-0.331849,-0.34209,-0.417652
std,0.235331,0.160066,0.094523,0.071565,0.090178
min,-1.014563,-0.737667,-0.643118,-0.500263,-0.59855
25%,-0.72075,-0.450062,-0.373464,-0.387874,-0.497402
50%,-0.559567,-0.338805,-0.320137,-0.344609,-0.414975
75%,-0.298594,-0.268294,-0.266351,-0.271376,-0.339129
max,-0.121159,-0.171241,-0.175994,-0.237887,-0.281009


We find that regularization futher helps dropping the average and in reducing the average.

In [7]:
df = pd.merge(
    noregdf, 
    regdf, 
    how='inner',  
    left_index=True, 
    right_index=True, 
    suffixes=('_noreg', '_regul'), 
    copy=True).T.sort_index()

In [8]:
df

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
tau01_noreg,30.0,-0.826094,1.341297,-7.614286,-0.709591,-0.516601,-0.356736,-0.127377
tau01_regul,30.0,-0.506445,0.235331,-1.014563,-0.72075,-0.559567,-0.298594,-0.121159
tau02_noreg,30.0,-0.479021,0.304314,-1.469546,-0.509073,-0.378446,-0.26734,-0.168934
tau02_regul,30.0,-0.386484,0.160066,-0.737667,-0.450062,-0.338805,-0.268294,-0.171241
tau03_noreg,30.0,-0.442162,0.29002,-1.139234,-0.50144,-0.315756,-0.253503,-0.171007
tau03_regul,30.0,-0.331849,0.094523,-0.643118,-0.373464,-0.320137,-0.266351,-0.175994
tau05_noreg,30.0,-0.345316,0.119567,-0.714682,-0.410162,-0.321267,-0.261022,-0.159647
tau05_regul,30.0,-0.34209,0.071565,-0.500263,-0.387874,-0.344609,-0.271376,-0.237887
tau10_noreg,30.0,-0.37029,0.090621,-0.565207,-0.412021,-0.325997,-0.302905,-0.250193
tau10_regul,30.0,-0.417652,0.090178,-0.59855,-0.497402,-0.414975,-0.339129,-0.281009
