# Toss Training

We'll load in the data that people uploaded for HW and use that to train a network.

In [None]:
!pip install tensorflow

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

from pathlib import Path
from math import sqrt
import matplotlib.pyplot as plt

## Loading the Data

Pull most of this from our data examination notebook. However, the data we want to train on is a little different. 

We are going to analyze each _toss_. That means all the data from a single toss is going to be fed to our network in order to understand what that _toss_ was.

Looking back at the previous training - that means each toss is going to be a _row_ in our training data. This means we need to load in the data from each toss, and _transpose_ it so that each measurement is a column.

Finally - we'll look only at the _total_ acceleration, as we did in the previous plots.

We saw that the number of measurements is 26-27 - so lets take only the first 25 to be safe. So we will have 25 inputs. We need to rotate that into a column, which we will then append to the master `DataFrame`. One row per file!

In [None]:
# Define the directory where the CSV files are located
directory = Path('./data/ClassData')

def fetch_data(sub_dir_name: str) -> pd.DataFrame:
    # Define the directory where the CSV files are located
    f_dir = directory / sub_dir_name
    # Recursively get a list of all .txt files in this directory and below.
    csv_files = list(f_dir.glob('**/*.txt'))
    # Initialize an empty list to store the DataFrames
    dfs = []
    # Loop over the list of CSV files
    for index, file in enumerate(csv_files):
        # Read the CSV file into a DataFrame
        df_sample = pd.read_csv(file)
        # Calculate the total acceleration
        df_sample['a'] = (df_sample.ax**2 + df_sample.ay**2 + df_sample.az**2).apply(sqrt)
        # Convert the 'a' column into a numpy array
        a = df_sample['a'].to_numpy()
        # Transpose it so that it is a single row with 25 columns.
        a_col = a.reshape(1, -1)[0][:25]
        # Create a dataframe with the 25 columns, labeled "a1", "a2", etc.
        df_row = pd.DataFrame(a_col).T
        df_row.columns = ['a' + str(i) for i in range(1, 26)]
        dfs.append(df_row)
    # Concatenate all the DataFrames in the list into a single DataFrame
    df = pd.concat(dfs, ignore_index=True)
    return df

And load in the actual data.

In [None]:
df_holding = fetch_data('held')
df_horizontal = fetch_data('horizontal')
df_up = fetch_data('up')

Next we need to label it. This is the expected output. We have two choices for our NN output. First one, is a single number that goes from zero to 3 (say). The other choice is to have three outputs from the network. The first if the toss was directly `up`, the second `horizontal`, and the third `holding`. This three-output is the right way to go. Otherwise the network will try to interpolate between the three for a single output.

Another way to think about this: what should the network do if the actual toss was very close between a toss straight up and a toss onto a couch?

In [None]:
def add_label(df: pd.DataFrame, label_holding: bool, label_horizontal: bool, label_up: bool):
    'In place add new columns to the DataFrame to store the label'
    df['is_up'] = np.ones(len(df)) if label_up else np.zeros(len(df))
    df['is_horizontal'] = np.ones(len(df)) if label_horizontal else np.zeros(len(df))
    df['is_holding'] = np.ones(len(df)) if label_holding else np.zeros(len(df))

In [None]:
add_label(df_holding, True, False, False)
add_label(df_horizontal, False, True, False)
add_label(df_up, False, False, True)

And the training data is everything all together.

In [None]:
data = pd.concat([df_holding, df_horizontal, df_up], ignore_index=True)

In [None]:
data

## Preparing the data for training

Create training and test samples for later work.

In [None]:
x = data.iloc[:, :-3]
y = data.iloc[:, -3:]

In [None]:
x

In [None]:
y

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

In [None]:
print(len(x_train), len(x_test))

Those numbers are too small.

In [None]:
y_test

## Network and Training

Lets copy the network from the sample thing we did earlier. But... we then change things:

* We have 25 inputs
* We have 3 outputs

In [None]:
model = tf.keras.Sequential()
model.add(layers.Dense(25, activation='relu', input_shape=(25,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(3, activation='sigmoid'))
print(model.summary())

In [None]:
model.compile(optimizer='rmsprop', loss='mse')

In [None]:
model.fit(x_train, y_train, epochs=30, validation_data=(x_test, y_test))

## Looking at the results!

For each output, we want to look at it for each class of toss. For holding the bluefruit, we'd expect the _up_ and _horizontal_ to both be zeros, and the _holding_ to be one. Lets plot all this.

First, of course, we need to run the prediction on the test data.

In [None]:
y_p = model.predict(x_test)
y_predict = pd.DataFrame(y_p, columns=['p_up', 'p_horizontal', 'p_holding'])
y_predict['is_up'] = y_test['is_up'].to_numpy()
y_predict['is_horizontal'] = y_test['is_horizontal'].to_numpy()
y_predict['is_holding'] = y_test['is_holding'].to_numpy()

In [None]:
y_predict

In [None]:
for filter in ['is_up', 'is_horizontal', 'is_holding']:
    for col in ['p_up', 'p_horizontal', 'p_holding']:
        plt.hist(y_predict[y_predict[filter] == 1.0][col], alpha=0.5, label='up', range=(0,1))
        plt.xlabel(col)
        plt.title(f"{col} for {filter}=1.0")
        plt.show()