## Creating a simple dataset to train a simple autoencoder

In this notebook I'll try to create a simple dataset to train the first try of a simple autoencoder.

In [35]:
import numpy as np
import random
import pandas as pd

Let's define the fixed parameters:

- M_max: maximum mass that the box can take
- N: number of shoots per episode
- S: number total of episodes generated
- mu: friction coefficient

In [36]:
M_max = 50
N = 2
S = 50000
mu = 0.75
g = 9.8

## Creating a first sample
In this section I'm trying to create a simnple sample of what an instance of the dataset will look like. After this is just applying a loop and transform it to a pandas data set.

In [23]:
# First we define the initial variables and initialize the correspondent arrays.
x = [] # List to store the positions of the box in each step of the episode
v = [] # List to store the velocities of the bullet in each step of the episode
m = [] # List to store the masses of the bullet in each step of the episode
m1 = 1.5 # First allowed value for the mass of the bullet
m2 = 1.5 # Second allowed value for the mass of the bullet
v1 = 2 # First allowed value for the velocity of the bullet
v2 = 6 # Second allowed value for the velocity of the bullet
M = random.uniform(0.1, M_max) # initalize randomly the value of the mass of the box
cum_x = 0 # Variable to store the cumulative sum of the displacement, i.e the current position of the box
# Now we create a loop for generating the sample
for i in range(0, N):
    # We choose a random velocity for the bullet 
    if bool(random.getrandbits(1)):
        v.append(v1)
    else:
        v.append(v2)
    # We choose a random mass for the bullet:
    if bool(random.getrandbits(1)):
        m.append(m1)
    else:
        m.append(m2)
    M = M + m[i]
    V = v[i]
    delta_x=0.5*g*m[i]*V**2/(M*mu)
    cum_x = cum_x + delta_x
    x.append(delta_x)
sample=np.column_stack((v, x))
sample.shape
print(sample)
print(M)

[[ 6.         33.31088781]
 [ 2.          3.24204557]]
12.091131705435421


The code above generates a numpy array 10x3(`shape = (10,3)`) in which we store in each of the three columns the velocity of the bullet, the mass of the bullet and the position of the box in each step of the episode, which is determined by the row number. In total we have a total of d=10x3=30 fatures for each instance of the data set. To determine the mass of the box we only need two steps (6 features) in an optimal situation in which we don't take the same shot choice for both steps (remember that mu is in theory unknown to the agent).

## Creating the data set
The dataset is going to consist of S instances of shape (N,3) and each of those instances is going to be associated a value (label) which corresponds to the correct mass of the box M_init. So we should end with two datasets:
- Samples, designated by **X**, which is a tensor of size (S, N, 3)
- Labels, designated by **y**, which is a vector of size (S, N)

In [134]:
# Let's create a loop to create a list of samples.
m1 = 1.5 # First allowed value for the mass of the bullet
m2 = 3 # Second allowed value for the mass of the bullet
v1 = 4 # First allowed value for the velocity of the bullet
v2 = 6 # Second allowed value for the velocity of the bullet
Box_masses = []
samples = []
for j in range(0, S):
    x = [] # List to store the positions of the box in each step of the episode
    v = [] # List to store the velocities of the bullet in each step of the episode
    m = [] # List to store the masses of the bullet in each step of the episode
    M = random.uniform(0.1, M_max) # Initalize randomly the value of the mass of the box
    Box_masses.append(M) # We add the mass of the box to the list of masses
    cum_x = 0 # Variable to store the cumulative sum of the displacement, i.e the current position of the box
    
    for i in range(0, N):
        # We choose a random velocity for the bullet 
        if bool(random.getrandbits(1)):
            v.append(v1)
        else:
            v.append(v2)
        # We choose a random mass for the bullet:
        if bool(random.getrandbits(1)):
            m.append(m1)
        else:
            m.append(m2)
        M = M + m[i]
        V = v[i]
        delta_x=0.5*g*m[i]*V**2/(M*mu)
        cum_x = cum_x + delta_x
        x.append(cum_x)
    samples.append([v, m, x])



In [140]:
# Now we are going to reshape the dataset to be more clear on the features
samples3 = []
for i in range(0, S):
    samples2 = []
    for k in range(0, N):
        for j in range(0,3):
            samples2.append(samples[i][j][k])
    samples3.append(samples2)
            
samples3 = np.array(samples3)
samples3.shape
labels=[]
for i in range(0, N):
    labels.append('v' + str(i+1))
    labels.append('m' + str(i+1))
    labels.append('x' + str(i+1))
X = pd.DataFrame(samples3)
X = pd.DataFrame(samples3, columns=labels)
X


Unnamed: 0,v1,m1,x1,v2,m2,x2,v3,m3,x3,v4,...,x7,v8,m8,x8,v9,m9,x9,v10,m10,x10
0,6.0,1.5,51.776910,6.0,3.0,149.588785,6.0,1.5,197.175409,6.0,...,431.909221,4.0,3.0,466.700134,6.0,1.5,504.990325,4.0,3.0,537.609938
1,4.0,3.0,44.960388,4.0,3.0,87.482256,4.0,1.5,108.181848,4.0,...,289.200132,6.0,3.0,366.104523,4.0,3.0,398.856383,6.0,1.5,434.948348
2,4.0,3.0,80.646564,4.0,3.0,153.771131,6.0,1.5,232.370732,4.0,...,504.211763,4.0,1.5,529.964926,4.0,3.0,578.296051,4.0,1.5,601.739021
3,6.0,3.0,993.824295,4.0,3.0,1276.350723,4.0,1.5,1396.046782,6.0,...,2414.572376,6.0,1.5,2535.810129,6.0,1.5,2649.251203,4.0,1.5,2696.623024
4,4.0,1.5,30.366574,6.0,1.5,96.143632,4.0,3.0,150.554333,4.0,...,428.949709,4.0,3.0,470.411599,6.0,1.5,515.854598,4.0,3.0,554.269154
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49995,4.0,3.0,60.956487,6.0,3.0,188.214215,6.0,1.5,249.627841,6.0,...,485.597449,4.0,1.5,507.543930,4.0,1.5,528.892793,6.0,3.0,620.000807
49996,6.0,3.0,172.782510,4.0,1.5,209.385982,6.0,1.5,288.070177,6.0,...,555.015533,6.0,1.5,617.087265,6.0,3.0,733.068425,6.0,3.0,841.894392
49997,6.0,3.0,795.565645,4.0,3.0,1039.248970,4.0,1.5,1144.702127,6.0,...,2008.083367,4.0,1.5,2062.397380,4.0,3.0,2157.805960,4.0,1.5,2202.774062
49998,6.0,3.0,596.306778,6.0,1.5,851.352101,6.0,1.5,1074.180161,6.0,...,1949.459069,6.0,1.5,2060.288028,4.0,1.5,2106.633537,4.0,1.5,2150.392285


In [139]:
# We do something similar for 'y':
y = pd.DataFrame(list(zip(Box_masses)), columns= ['M box'])
y

Unnamed: 0,M box
0,49.603860
1,49.312716
2,26.164293
3,2.324885
4,37.226792
...,...
49995,35.584901
49996,27.628100
49997,3.651871
49998,5.874627


In [141]:
X.to_csv('Box_Samples.csv')
y.to_csv('Box_Masses.csv')

## Experiment 2: Sliding mass + spring 
In this section I going to create a dataset for the case where we have 2 experiments.

In [16]:
M_max = 20
N = 2
S = 50000
g = 9.8
m = 2.5
v= 3
K = 5


In [24]:
M_max = 10
N = 2
S = 50000
g = 9.8
m = 2.5
v= 3
K = 5
d = []
l = []
mu_list = []
M_list = []
for i in range(0, S):
    M = random.uniform(1,M_max)
    mu = random.uniform(0.2, 1)
    l.append(M * g / K)
    d.append(0.5 * 1/(g * mu)* (m*v/M)**2)
    mu_list.append(mu)
    M_list.append(M)
X = pd.DataFrame(
    {'d': d,
     'l': l,
    })
y = pd.DataFrame({
    'mu':mu_list,
    'M': M_list
})
print(X,y)

              d          l
0      2.140001   2.382146
1      0.314245  11.019755
2      1.954126   3.133234
3      0.692523   7.956502
4      0.058429  15.097720
...         ...        ...
49995  0.119000  10.526453
49996  0.155158   8.530427
49997  0.148940  11.016285
49998  3.688913   3.219366
49999  3.867187   3.517949

[50000 rows x 2 columns]              mu         M
0      0.907879  1.215381
1      0.288913  5.622324
2      0.574699  1.598589
3      0.251478  4.059440
4      0.827808  7.702918
...         ...       ...
49995  0.836119  5.370639
49996  0.976479  4.352258
49997  0.609954  5.620554
49998  0.288363  1.642534
49999  0.230359  1.794872

[50000 rows x 2 columns]
