# Analyze the Toss Data

We will load up all the data we can by scanning a set of directories.

In [None]:
import pandas as pd
import os
import matplotlib.pyplot as plt
from math import sqrt
import seaborn as sns

## Load the data

We will load everything into a pandas dataframe.

In [None]:
# Define the directory where the CSV files are located
directory = './data'

# Get a list of all CSV files in the directory
csv_files = [f for f in os.listdir(directory) if f.endswith('.txt')]

# Initialize an empty list to store the DataFrames
dfs = []

# Loop over the list of CSV files
for index, file in enumerate(csv_files):
    # Read the CSV file into a DataFrame
    df = pd.read_csv(os.path.join(directory, file))
    
    # Add a new column to the DataFrame that stores the file name
    df['file_name'] = file
    df['file_index'] = index
    df['measurement_index'] = range(1, len(df) + 1)
    
    # Append the DataFrame to the list
    dfs.append(df)

# Concatenate all the DataFrames in the list into a single DataFrame
df = pd.concat(dfs, ignore_index=True)

In [None]:
df

## A few things about the data we've loaded.

What is min and max number of measurements in each?

In [None]:
by_file = df.groupby('file_index')
print(f'Min # of measurements: {by_file.size().min()}, max: {by_file.size().max()}')

How good is the "jitter" between measurements?

In [None]:
# Calculate the difference in the time column
df['time_diff'] = df['time'].diff()

# Plot the histogram. Simple cut of -0.5 to avoid where we reset the times
# this means we don't have to do a group by to get the measurements right.
df[df.time_diff > -0.5].time_diff.plot(kind='hist', bins=100)
plt.xlabel('Time Difference')
plt.ylabel('Frequency')
plt.title('Histogram of Time Differences')
plt.show()

Quickly - what are the large 0.16 guys there?

In [None]:
df[df.time_diff > 0.16]

That seems to say something happens when you write out the file the first time. Perhaps we should hold everything in memory and then flush it - and avoid this?

This is interesting because our data has to be interpreted and we aren't putting anything about delta-t in there (I hope).

## Looking at the data

Mainly, we want to see if the data looks any different between the various types of throws we've done.

Lets look at the total acceleration first.

In [None]:
df['a'] = (df.ax**2 + df.ay**2 + df.az**2).apply(sqrt)

In [None]:
df['a'].plot(kind='hist', bins=100)
plt.xlabel('Total Acceleration')
plt.ylabel('Frequency')
plt.title('Histogram of Total Acceleration')
plt.show()

In [None]:
# Create a FacetGrid with file_index as the row variable
g = sns.FacetGrid(data=df, row='file_index', sharey=True, aspect=4, height=2)

# Plot line plots for each file_index
g.map(sns.lineplot, 'measurement_index', 'a')

# Add a red line at 9.8
g.map(plt.axhline, y=9.8, color='red')

# Adjust the layout of the plots
g.fig.tight_layout()

# Show the plots
plt.show()


Ok these things are different - but telling the difference between a toss straight up and a longer toss is going to be difficult!