# Statistical Analysis of Data

## Environment Settings

An statistical Analysis of the data captured will be performed.

The environment configuration is the following:

- A rectangle area is used whose dimension is 2 x 1.5 meters. 
- A custom robot similar to an epuck was used.
- The robot starts in the middle of the arena.
- The robot moves in a random fashion way around the environment avoiding obstacles.
- The robot has 8 sensors that measure the distance between the robot and the walls.
- Some noise was introduced in the sensors measurements of the robot using the concept of [lookup tables](https://cyberbotics.com/doc/reference/distancesensor) in the Webots simulator which according to Webots documentation "The first column of the table specifies the input distances, the second column specifies the corresponding desired response values, and the third column indicates the desired standard deviation of the noise. The noise on the return value is computed according to a gaussian random number distribution whose range is calculated as a percent of the response value (two times the standard deviation is often referred to as the signal quality)". The following values were taken:

    -First experiment:
        - (0, 0, 0.01)
        - (10, 10, 0.01)
    -Second experiment:
    
        - (0, 0, 0.2)
        - (10, 10, 0.2)
- The simulator runs during 10 minutes in fast mode which is translated into 12 hours of collected data.

In [1]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install scikit-learn
!{sys.executable} -m pip install keras

import pandas as pd
import tensorflow as tf
import numpy as np
import math
from sklearn.ensemble import RandomForestRegressor
from keras import models
from keras import layers
from keras import regularizers
import matplotlib.pyplot as plt
from keras import optimizers



Using TensorFlow backend.


# First Experiment

In [2]:
csv_file = 'robot_info_dataset-jumped.csv'
df = pd.read_csv(csv_file)
df.head()

Unnamed: 0.1,Unnamed: 0,x,y,theta,dx,dy,dtheta,sensor_1,sensor_2,sensor_3,...,sensor_7,sensor_8,dsensor_1,dsensor_2,dsensor_3,dsensor_4,dsensor_5,dsensor_6,dsensor_7,dsensor_8
0,0,0.920614,0.761198,168.209483,-0.07067,0.011198,-11.790739,1.085179,0.790267,0.893342,...,1.13979,1.144901,,,,,,,,
1,1,0.850135,0.775909,168.212418,-0.070479,0.014711,0.002935,0.571635,0.596799,0.88334,...,0.830057,1.028332,-0.513544,-0.193468,-0.010002,-0.430864,-0.070277,-0.387726,-0.309733,-0.116568
2,2,0.779657,0.790625,168.209551,-0.070478,0.014716,-0.002867,0.581452,0.904627,0.689004,...,0.4912,0.88913,0.009817,0.307828,-0.194336,0.239518,0.20648,0.293382,-0.338857,-0.139203
3,3,0.709174,0.80534,168.212871,-0.070483,0.014715,0.003319,0.956302,0.842911,0.796714,...,1.246415,0.712158,0.374849,-0.061716,0.10771,0.075412,-0.345782,-0.084918,0.755215,-0.176971
4,4,0.638698,0.820056,168.208857,-0.070477,0.014716,-0.004013,0.671731,0.779896,0.962191,...,0.567806,0.595164,-0.28457,-0.063014,0.165477,0.005216,0.12815,-0.054777,-0.678608,-0.116994


## Data pre-processing

The data collected 1384848 samples.

In [3]:
df.shape

(65342, 23)

The data set contains some null values so they should be deleted from the samples.

In [4]:
df = df.dropna()

Now the data will be normalized.

In [None]:
normalized_df=(df-df.min())/(df.max()-df.min())
normalized_df.describe()

Unnamed: 0.1,Unnamed: 0,x,y,theta,dx,dy,dtheta,sensor_1,sensor_2,sensor_3,...,sensor_7,sensor_8,dsensor_1,dsensor_2,dsensor_3,dsensor_4,dsensor_5,dsensor_6,dsensor_7,dsensor_8
count,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,...,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0,65341.0
mean,0.5,0.498321,0.504753,0.502063,0.499785,0.500412,0.501624,0.239976,0.236145,0.261438,...,0.251293,0.242889,0.449485,0.468761,0.513828,0.507022,0.519272,0.531825,0.446832,0.426383
std,0.288682,0.272549,0.264025,0.290735,0.353425,0.335002,0.114192,0.140647,0.14903,0.169722,...,0.160999,0.143636,0.078247,0.073403,0.077416,0.078125,0.081846,0.080184,0.072415,0.07785
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.25,0.255777,0.269104,0.251332,0.139102,0.181098,0.496242,0.127116,0.108774,0.119219,...,0.112714,0.127502,0.412053,0.436092,0.483323,0.477268,0.488782,0.501383,0.415117,0.388997
50%,0.5,0.498537,0.503544,0.498631,0.50002,0.500797,0.501627,0.216759,0.215659,0.237666,...,0.22488,0.220298,0.442927,0.467409,0.516314,0.512401,0.524693,0.534465,0.445713,0.419784
75%,0.75,0.735371,0.740156,0.7524,0.860369,0.821281,0.506991,0.328127,0.337889,0.375692,...,0.360337,0.333973,0.479438,0.498031,0.547039,0.544322,0.557577,0.565527,0.475271,0.456114
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


## Input and output variables

The data will be split into training, testing and validation sets. 60% of the data will be used for training, 20% for training and 20% of validation.

In [None]:

# train size
test_size_percentage = .2
train_size_percentage = .6
ds_size = normalized_df.shape[0]
train_size = int(train_size_percentage * ds_size)
test_size = int(test_size_percentage * ds_size)

# shuffle dataset
normalized_df = normalized_df.sample(frac=1)

# separate inputs from outputs
inputs = normalized_df[['x', 'y', 'theta']]
targets = normalized_df[['sensor_1', 'sensor_2', 'sensor_3', 'sensor_4', 'sensor_5', 'sensor_6', 'sensor_7', 'sensor_8']]

# train
train_inputs = inputs[:train_size]
train_targets = targets[:train_size]

# test
test_inputs = inputs[train_size:(train_size + test_size)]
test_targets = targets[train_size:(train_size + test_size)]

# validation
validation_inputs = inputs[(train_size + test_size):]
validation_targets = targets[(train_size + test_size):]

## Neural Network


As input the neural network receives the x, y coordinates and rotation angle $\theta$. The output are the sensor measurements. One model per sensor will be created.

In [None]:

def get_model():
    # neural network with a 10-neuron hidden layer
    model = models.Sequential()
    model.add(layers.Dense(10, activation='relu', input_shape=(3,)))
#     model.add(layers.Dropout(0.5))
    model.add(layers.Dense(6, activation='relu'))
    model.add(layers.Dense(3, activation='relu'))
    model.add(layers.Dense(1))
    
#     rmsprop = optimizers.RMSprop(learning_rate=0.01)
    model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
              
    return model

In [None]:
model = get_model()

history = model.fit(inputs, targets[['sensor_7']], epochs=75, batch_size=1, verbose=1)
history.history['mae']
model.save("nn_sensor_7.h5")

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Epoch 1/75
Epoch 2/75
Epoch 3/75
Epoch 4/75
Epoch 5/75
Epoch 6/75
Epoch 7/75
Epoch 9/75
Epoch 10/75
Epoch 11/75
Epoch 17/75
Epoch 18/75
Epoch 19/75
Epoch 20/75
Epoch 22/75
Epoch 23/75
Epoch 24/75
Epoch 25/75
Epoch 26/75
Epoch 28/75
Epoch 29/75
Epoch 30/75
Epoch 31/75
Epoch 32/75
Epoch 33/75
Epoch 34/75
Epoch 35/75
Epoch 36/75
Epoch 37/75
Epoch 38/75
Epoch 40/75
Epoch 41/75