## How will the classes be preprocessed?

In our current environment, we already have the data files created from the simulator. Each data file represents a complete flight path from the starting position of the drone to the target position assigned to the drone at the beginning of the flight. Over 502 flight paths were generated and each one has a random target position assigned to it. All of the files are stored in the data directory.

## Goal of the preprocessing?

The goal is to generate data files that describe the change in a quadrotors position through a list of IMU readings provided by the accelerometer and gyroscope. The quadrotors position refers to its 6D euler position. The accelerometer readings reflects the quadrotors sensed $x$, $y$ and $z$ accelerations. The gyroscope readings reflects the quadrotors sensed $\phi$, $\theta$ and $\psi$ angular velocity.

In [1]:
import os

files = os.listdir("../data/")
print("In total there are {} files,\n\tthe first file is {},\n\tthe last one is {}".format(len(files), files[0], files[-1]))

In total there are 502 files,
	the first file is result-1613260479,
	the last one is result-1613264056


In [2]:
import pandas as pd

FILE_PATHS = "../data/";

def get_dataframe(file_name):
    return pd.read_csv(FILE_PATHS + file_name)

df = get_dataframe(files[0])
df.head()

Unnamed: 0,timestamp,attitude_controller_id,desired_linear_position_x,desired_linear_position_y,desired_linear_position_z,linear_position_x,linear_position_y,linear_position_z,angular_position_phi,angular_position_theta,angular_position_psi,accelerometer_reading_x,accelerometer_reading_y,accelerometer_reading_z,gyroscope_reading_phi,gyroscope_reading_theta,gyroscope_reading_psi
0,0.03,k,-6.706236,-6.229772,-7.095647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.06,k,-6.706236,-6.229772,-7.095647,-0.000211,-0.000211,-0.022924,-0.014155,-0.014155,1.004945e-17,0.01949,-0.017832,-2.302526,-2.157392,-2.157392,1.531695e-15
2,0.09,k,-6.706236,-6.229772,-7.095647,-0.000905,-0.000905,-0.037148,-0.014717,-0.014662,-4.95017e-06,-0.336713,-0.332503,-4.943869,4.209502,4.247487,-0.003428095
3,0.12,k,-6.706236,-6.229772,-7.095647,-0.002052,-0.00205,-0.04945,-0.014699,-0.014502,-9.207557e-06,-0.349644,-0.263271,1.511464,0.390787,0.447006,0.0005411242
4,0.15,k,-6.706236,-6.229772,-7.095647,-0.003625,-0.003616,-0.060045,-0.014417,-0.014042,-9.153503e-06,-0.312502,-0.270486,1.123085,0.173689,0.197567,0.00283461


In [3]:
df.describe()

Unnamed: 0,timestamp,desired_linear_position_x,desired_linear_position_y,desired_linear_position_z,linear_position_x,linear_position_y,linear_position_z,angular_position_phi,angular_position_theta,angular_position_psi,accelerometer_reading_x,accelerometer_reading_y,accelerometer_reading_z,gyroscope_reading_phi,gyroscope_reading_theta,gyroscope_reading_psi
count,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0,2291.0
mean,34.38,-6.706236,-6.229772,-7.095647,-3.975464,-3.733606,-6.832862,-0.010498,-0.009159,-2.665493e-06,-0.01191,-0.010289,-0.023513,0.003741,0.003765,0.002053314
std,19.844972,1.687907e-13,3.695629e-13,1.448047e-13,2.161224,2.006842,1.459724,0.005741,0.005296,3.742408e-06,0.961686,0.899738,5.221305,0.106009,0.109896,0.008106396
min,0.03,-6.706236,-6.229772,-7.095647,-6.696166,-6.209644,-8.53139,-0.014926,-0.014662,-1.494647e-05,-4.11731,-3.912165,-71.764529,-2.157392,-2.157392,-0.003428095
25%,17.205,-6.706236,-6.229772,-7.095647,-6.123736,-5.722466,-7.230153,-0.014708,-0.013258,-3.884662e-06,-0.323593,-0.296221,-0.067208,-8.2e-05,-9.1e-05,-1.41096e-07
50%,34.38,-6.706236,-6.229772,-7.095647,-4.211702,-3.997975,-7.131009,-0.014504,-0.012893,-8.753875e-07,-0.011048,-0.009768,0.005747,4e-06,5e-06,1.08764e-08
75%,51.555,-6.706236,-6.229772,-7.095647,-2.068425,-1.963027,-7.061219,-0.004838,-0.003414,-8.558166e-08,0.250168,0.231908,0.085865,0.001208,0.001067,1.021864e-06
max,68.73,-6.706236,-6.229772,-7.095647,0.0,0.0,0.0,3.3e-05,1.5e-05,1.004945e-17,6.605947,6.253863,80.635386,4.209502,4.247487,0.05576225


From the first file, we output the first couple of rows and the description of the dataset. The column descriptions are given below. We know that the `timestamp` column describes the amount of time that has passed since the start of the simulation, the timestamps themselves are consistent, every 0.03 seconds the simulator describes the quadrotors current linear and angular positions, and the readings of the accelerometer and gyroscope. The linear/angular positions represent the 6D euler space of the drone, where the physical position of the drones distance in the $x$, $y$, and $z$ describe how far away the drone is from the original starting position. The angular positions of the drone, $\phi$, $\theta$, and $\psi$, describe the radius of the difference between a level quadrotor and its current angle. $\phi$ refers to the drones pitch, $\theta$ refers to the drones roll, and $\psi$ refers to the drones yaw.

|Column Name|Description|
|---|---|
|timestamp|Refers to the amount of time that has passed since the start of the simulation. Time passes at indexes of 0.03 seconds.|
|attitude_controller_id|Refers to the attitude controller that is utilized during the flight. All flights should use the Kemper, "k", attitude controller. The attitude controller defines how the quadrotor is going to fly|
|desired_linear_position_x|The target position that the drone wants to be at in the $x$ direction|
|desired_linear_position_y|The target position that the drone wants to be at in the $y$ direction|
|desired_linear_position_z|The target position that the drone wants to be at in the $z$ direction|
|linear_position_x|The actual $x$ linear position of the drone in the simulated environment.|
|linear_position_y|The actual $y$ linear position of the drone in the simulated environment.|
|linear_position_z|The actual $z$ linear position of the drone in the simulated environment.|
|angular_position_phi|The actual $\phi$ angular position of the drone in the simulated environment.|
|angular_position_theta|The actual $\theta$ angular position of the drone in the simulated environment.|
|angular_position_psi|The actual $\psi$ angular position of the drone in the simulated environment.|
|accelerometer_reading_x|The reading of the accelerometer attached to the drone from the $x$ direction.|
|accelerometer_reading_y|The reading of the accelerometer attached to the drone from the $y$ direction.|
|accelerometer_reading_z|The reading of the accelerometer attached to the drone from the $z$ direction.|
|gyroscope_reading_phi|The reading of the gyroscope attached to the drone from the $\phi$ direction.|
|gyroscope_reading_theta|The reading of the gyroscope attached to the drone from the $\theta$ direction.|
|gyroscope_reading_psi|The reading of the gyroscope attached to the drone from the $\psi$ direction.|

Since our data is scattered through 502 different files with several thousands of rows inbetween, we want to create a dataframe that will describe the change in the $x$, $y$, $z$, $\phi$, $\theta$, and $\psi$ from a vector of readings of the accelerometer and gyroscope. To get the vector information together, with the change of the angular/linear positions, a function needs to be created to get the data of a random file, select a random row in the file, and get the difference in time and positions along with a vector of the readings.

To start a function for a random file needs to be created, and when the file is loaded as a dataframe it's checked with the first file to validate its structure.

In [4]:
import random

def read_random_file(files, columns):
    file = files[random.randint(0, len(files) - 1)]
    if (file == ".DS_Store"):
        file = files[random.randint(0, len(files) - 1)]
    df = get_dataframe(file)
    try:
        assert(str(df.columns) == str(columns))
    except AssertionError as error:
        print(file + " failed");
    return df;

read_random_file(files, df.columns).head()

Unnamed: 0,timestamp,attitude_controller_id,desired_linear_position_x,desired_linear_position_y,desired_linear_position_z,linear_position_x,linear_position_y,linear_position_z,angular_position_phi,angular_position_theta,angular_position_psi,accelerometer_reading_x,accelerometer_reading_y,accelerometer_reading_z,gyroscope_reading_phi,gyroscope_reading_theta,gyroscope_reading_psi
0,0.03,k,7.865248,-7.432587,-3.682956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.06,k,7.865248,-7.432587,-3.682956,0.000132,-0.000132,-0.017872,0.011725,-0.011725,0.0,0.025327,0.027484,-2.489576,0.063415,-2.452213,0.04332
2,0.09,k,7.865248,-7.432587,-3.682956,0.000625,-0.000625,-0.030246,0.012387,-0.012346,-4e-06,0.260476,-0.272284,-4.777292,-4.209884,4.241545,-0.003116
3,0.12,k,7.865248,-7.432587,-3.682956,0.001465,-0.001463,-0.04096,0.012515,-0.012368,-7e-06,0.161525,-0.244614,1.897228,-0.379185,0.417172,0.00077
4,0.15,k,7.865248,-7.432587,-3.682956,0.002558,-0.002552,-0.049744,0.01239,-0.012108,-8e-06,0.244287,-0.287739,0.66351,-0.186227,0.215858,0.001753


The `read_random_file` function should produce a dataframe from a random file of all the files in the `FILE_PATHS` directory. This allows us to now get a snapshot of the dataframe of a set amount of rows, deriving the change in positions and the vector of the readings. The function can either add a row to an existing dataframe or return the data back as a tuple. The starting row should be selected randomly, but should not be greater than the difference of the amount of rows we want to collect and the maximum amount of rows for the files.

The `get_random_rows` gets a list of rows from a random starting position (minus one) to the end position (starting position + amount). The difference of the quadrotors position needs to be calculated from at least 2 rows, as you need the first row to provide insight into the readings of the quadrotor (IMU readings) and the second row to see where the quadrotor ended up (linear and angular positions).

In [5]:
def get_random_rows(df, amount):
    if (df.shape[0] - amount <= 1):
        return pd.DataFrame()
    start_index = random.randint(1, df.shape[0] - amount)
    return df.iloc[(start_index - 1):(amount + start_index), :]

In [6]:
get_random_rows(df, 25)

Unnamed: 0,timestamp,attitude_controller_id,desired_linear_position_x,desired_linear_position_y,desired_linear_position_z,linear_position_x,linear_position_y,linear_position_z,angular_position_phi,angular_position_theta,angular_position_psi,accelerometer_reading_x,accelerometer_reading_y,accelerometer_reading_z,gyroscope_reading_phi,gyroscope_reading_theta,gyroscope_reading_psi
1468,44.07,k,-6.706236,-6.229772,-7.095647,-5.425993,-5.125421,-7.209032,-0.012298,-0.009181,-2.869079e-07,-0.056698,-0.049059,-0.01181,-0.000883,-0.000699,5.051589e-10
1469,44.1,k,-6.706236,-6.229772,-7.095647,-5.429702,-5.128631,-7.210185,-0.012258,-0.009149,-2.857736e-07,-0.183135,-0.158012,-0.050602,0.002135,0.001687,1.413635e-08
1470,44.13,k,-6.706236,-6.229772,-7.095647,-5.43322,-5.131676,-7.211269,-0.012217,-0.009117,-2.84632e-07,0.130183,0.112981,0.047489,0.000162,0.000131,5.123452e-09
1471,44.16,k,-6.706236,-6.229772,-7.095647,-5.436733,-5.134715,-7.212342,-0.012178,-0.009086,-2.835505e-07,-0.311388,-0.268899,-0.089095,0.002787,0.002209,5.195264e-08
1472,44.19,k,-6.706236,-6.229772,-7.095647,-5.439871,-5.137431,-7.213294,-0.01214,-0.009056,-2.825057e-07,0.612766,0.529779,0.19403,0.041163,-0.002159,0.04817848
1473,44.22,k,-6.706236,-6.229772,-7.095647,-5.443557,-5.14062,-7.2144,-0.012104,-0.009027,-2.808563e-07,-1.132124,-0.978713,-0.336515,0.006476,0.005106,7.944462e-07
1474,44.25,k,-6.706236,-6.229772,-7.095647,-5.446502,-5.143167,-7.215278,-0.012067,-0.008998,-2.791907e-07,0.950707,0.824772,0.337043,-0.004403,-0.003495,-1.763449e-07
1475,44.28,k,-6.706236,-6.229772,-7.095647,-5.449902,-5.146108,-7.216282,-0.012032,-0.00897,-2.777396e-07,-0.133018,-0.115411,-0.034137,-0.002645,-0.002101,-2.185494e-07
1476,44.31,k,-6.706236,-6.229772,-7.095647,-5.453571,-5.149282,-7.217353,-0.011994,-0.00894,-2.767412e-07,-0.378231,-0.326477,-0.103996,0.003711,0.002923,-2.511547e-07
1477,44.34,k,-6.706236,-6.229772,-7.095647,-5.45696,-5.152213,-7.218332,-0.011955,-0.008909,-2.755041e-07,-0.015066,-0.01213,0.003888,0.003824,0.003051,2.765243e-07


In [7]:
def convert_to_table_row(df):
    try:
        row = [
            df["timestamp"].iloc[-1] - df["timestamp"].iloc[0], # time_change
            df["linear_position_x"].iloc[-1] - df["linear_position_x"].iloc[0], # x_change
            df["linear_position_y"].iloc[-1] - df["linear_position_y"].iloc[0], # y_change
            df["linear_position_z"].iloc[-1] - df["linear_position_z"].iloc[0], # z_change
            df["angular_position_phi"].iloc[-1] - df["angular_position_phi"].iloc[0], # phi_change
            df["angular_position_theta"].iloc[-1] - df["angular_position_theta"].iloc[0], # theta_change
            df["angular_position_psi"].iloc[-1] - df["angular_position_psi"].iloc[0] # psi_change
        ];

        row += df["accelerometer_reading_x"].iloc[:-1].tolist()
        row += df["accelerometer_reading_y"].iloc[:-1].tolist()
        row += df["accelerometer_reading_z"].iloc[:-1].tolist()
        row += df["gyroscope_reading_phi"].iloc[:-1].tolist()
        row += df["gyroscope_reading_theta"].iloc[:-1].tolist()
        row += df["gyroscope_reading_psi"].iloc[:-1].tolist()
        return row
    except IndexError as error:
        print(df + " failed");
        return False

convert_to_table_row(get_random_rows(df, 2))

[0.060000000000002274,
 -0.0017059737223039662,
 -0.0014357887374423228,
 0.00035631164561600315,
 1.3700738787858317e-05,
 9.035646290915056e-06,
 1.686980775243855e-10,
 -0.1301009912085481,
 -0.1472401253260835,
 -0.109586760444982,
 -0.123787587646628,
 0.0275238922454394,
 0.0312751698665907,
 -0.00021827185993155643,
 0.0017064805082757,
 -0.00014360169328254128,
 0.0011195798972105,
 3.151235984342194e-08,
 -4.068319029863621e-08]

To get a random dataframe instance to a 1D row we would call `convert_to_table_row`. 

In [8]:
def generate_string_names(name, amount):
    names = []
    for i in range(amount):
        names.append(name + "_" + str(i));
        
    return names;

def generate_columns(amount):
    columns = [
        "time_change",
        "x_change",
        "y_change",
        "z_change",
        "phi_change",
        "theta_change",
        "psi_change",
    ];
    columns += generate_string_names("accelerometer_reading_x", amount);
    columns += generate_string_names("accelerometer_reading_y", amount);
    columns += generate_string_names("accelerometer_reading_z", amount);
    columns += generate_string_names("gyroscope_reading_phi", amount);
    columns += generate_string_names("gyroscope_reading_theta", amount);
    columns += generate_string_names("gyroscope_reading_psi", amount);
    return columns;

generate_columns(2)

['time_change',
 'x_change',
 'y_change',
 'z_change',
 'phi_change',
 'theta_change',
 'psi_change',
 'accelerometer_reading_x_0',
 'accelerometer_reading_x_1',
 'accelerometer_reading_y_0',
 'accelerometer_reading_y_1',
 'accelerometer_reading_z_0',
 'accelerometer_reading_z_1',
 'gyroscope_reading_phi_0',
 'gyroscope_reading_phi_1',
 'gyroscope_reading_theta_0',
 'gyroscope_reading_theta_1',
 'gyroscope_reading_psi_0',
 'gyroscope_reading_psi_1']

To get the columns of the table row we can call `generate_columns` with the size of the vector readings from the accelerometer and gyroscope. To create a single dataframe with multiple changes and vectors we call this method multiple times and append each row together. This produces a 2D array which can then be converted to a DataFrame with the column names generated. The inputs for the function, `generate_table` should take in a list of files, a size of the vector to grad and the amount of rows that need to be generated.

In [9]:
def generate_table(files, vector_size, row_size):
    df_first = get_dataframe(files[0]);
    columns = df_first.columns
    table = []
    
    for i in range(row_size):
        random_rows = get_random_rows(read_random_file(files, columns), vector_size)
        if (random_rows.empty):
            continue
            
        row = convert_to_table_row(random_rows)
        if (row == False):
            continue
        table.append(row)
        
    return pd.DataFrame(table, columns=generate_columns(vector_size))

generate_table(files, 1, 3)

Unnamed: 0,time_change,x_change,y_change,z_change,phi_change,theta_change,psi_change,accelerometer_reading_x_0,accelerometer_reading_y_0,accelerometer_reading_z_0,gyroscope_reading_phi_0,gyroscope_reading_theta_0,gyroscope_reading_psi_0
0,0.03,0.000359,0.000612,-0.000153,-1.881299e-06,-4e-06,2.661223e-10,-0.097983,-0.165058,0.041348,0.000243,0.00047,-8.140327e-08
1,0.03,0.003234,-0.003067,-0.017866,2.751412e-06,-3e-06,3.23745e-08,0.275792,-0.270047,-1.122884,8e-05,-4e-05,2.861026e-06
2,0.03,-1.5e-05,0.003253,-1.4e-05,2.034391e-07,2.2e-05,-5.328962e-06,0.004438,-0.318732,0.018239,-2.7e-05,-0.029357,0.0004996363


After generating the table manually a list of several tables of vector and row sizes should be created. The files should be stored in a processed_data directory which contains all of the preprocessed data. Each file should be annotated with the size of the vector and the amount of rows.

In [10]:
PROCESSED_FILE_PATHS = '../processed_data/'

def create_files(files, vector_sizes, row_sizes):
    for vector_size in vector_sizes:
        for row_size in row_sizes:
            generate_table(files, vector_size, row_size).to_csv(
                PROCESSED_FILE_PATHS + 
                "row_size" + str(vector_size) + 
                "_vector_size" + str(row_size) + ".csv", index=False, header=True);

create_files(files, [1, 2, 10, 25, 100, 225], [2000, 4000, 6000, 8000, 10000]);

In [11]:
create_files(files, [400], [2000, 4000, 6000, 8000, 10000]);

In [23]:
create_files(files, [625], [2000, 4000, 6000, 8000, 10000]);

In [11]:
create_files(files, [900], [2000, 4000, 6000, 8000, 10000]);

The outputs of each of these CSVs should contain the amount of distance traveled in the linear directions along with the change in the angular positions since the change in time. Below is a table of a `vector_size=2` column.

|Column Names|Description|
|---|---|
|time_change|The amount of time that has passed overall|
|x_change|The change in the $x$ position.|
|y_change|The change in the $y$ position.|
|z_change|The change in the $z$ position.|
|phi_change|The change in the $\phi$ angular position since the start of the flight|
|theta_change|The change in the $\theta$ angular position since the start of the flight
|psi_change|The change in the $\psi$ angular position since the start of the flight
|accelerometer_reading_x_0|The first $x$ reading from the accelerometer.|
|accelerometer_reading_x_1|The last $x$ reading from the accelerometer.|
|accelerometer_reading_y_0|The first $y$ reading from the accelerometer.|
|accelerometer_reading_y_1|The last $y$ reading from the accelerometer.|
|accelerometer_reading_z_0|The first $z$ reading from the accelerometer.|
|accelerometer_reading_z_1|The last $z$ reading from the accelerometer.|
|gyroscope_reading_phi_0|The first $\phi$ reading from the gyroscope.|
|gyroscope_reading_phi_1|The last $\phi$ reading from the gyroscope.|
|gyroscope_reading_theta_0|The first $\theta$ reading from the gyroscope.|
|gyroscope_reading_theta_1|The last $\theta$ reading from the gyroscope.|
|gyroscope_reading_psi_0|The first $\psi$ reading from the gyroscope.|
|gyroscope_reading_psi_1|The last $\psi$ reading from the gyroscope.|