*Music Enabled Running - FR Corp*

# **Exploratory Data Analysis - Test Person One**

In this notebook you can see the exploratory data analysis made from the data of the first testperson from the project Music Enabled Running. The notebook consists of multiple chapters, where data will be imported, converted, filtered and analysed.



In [1]:
#Necessary to load in files from Google Drive with Colaboratory
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## **1.Libraries**

In [2]:
import pandas as pd        
import numpy as np

import seaborn as sns
import matplotlib as mat
import matplotlib.pyplot as plt   
%matplotlib inline

import time
from datetime import datetime

import scipy as sy
import scipy.fftpack as syfp
import pylab as pyl

import plotly.express as px

## **2.Dataset Import**

In [3]:
df_combined = pd.read_csv('/content/drive/MyDrive/Fontys/Fontys Semester 7/Mini Company - FR Corp/1.Projects/Music/2.Exploratory Data Analysis/Datasets/TestRunnerOne/Combined/combined_v3.csv')


## **3.Data Understanding**

In this chapter we get to know the data better by looking at the basic metrics, variables and data types. This will help us understand the data more, which will help us with converting and filtering.

In [4]:
df_combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176314 entries, 0 to 176313
Data columns (total 24 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   t             176314 non-null  object 
 1   foot_x        176314 non-null  object 
 2   pronation     176314 non-null  float64
 3   braking       176314 non-null  float64
 4   impact        176314 non-null  float64
 5   contact_time  176314 non-null  int64  
 6   flight_ratio  176314 non-null  float64
 7   strike        176314 non-null  int64  
 8   power         176314 non-null  int64  
 9   session_id    176314 non-null  object 
 10  user_id       176314 non-null  object 
 11  t_start       176314 non-null  object 
 12  t_end         176314 non-null  object 
 13  duration      176314 non-null  float64
 14  foot_y        176314 non-null  object 
 15  cadence       176314 non-null  int64  
 16  speed         176314 non-null  float64
 17  track_uri     176314 non-null  object 
 18  paus

### **3.1.Basic Metrics**

In [5]:
df_combined.describe()

Unnamed: 0,pronation,braking,impact,contact_time,flight_ratio,strike,power,duration,cadence,speed,position
count,176314.0,176314.0,176314.0,176314.0,176314.0,176314.0,176314.0,176314.0,176314.0,176314.0,176314.0
mean,-11.43449,6.129579,13.870403,292.811342,17.801944,6.624477,251.163453,11078.975387,83.930913,3.552511,89.691403
std,3.278634,1.372021,1.994003,61.502141,7.1018,2.077714,31.929497,14580.24726,6.8119,0.465999,60.509569
min,-53.0,0.3125,0.25,219.0,-190.0,1.0,0.0,1313.822852,0.0,0.0,0.0
25%,-13.5,5.25,13.0,276.0,16.125,5.0,237.0,3811.848831,84.0,3.410156,36.976
50%,-11.6,6.0,14.375,286.0,18.375,6.0,251.0,4546.86529,85.0,3.582031,87.418
75%,-9.6,6.8125,15.3125,295.0,21.0,8.0,266.0,6770.491056,86.0,3.769531,138.551
max,19.2,12.75,15.75,4907.0,42.125,16.0,577.0,46753.661441,117.0,18.035156,372.213


In [6]:
#count of the steps in footpods
count_foot = df_combined.groupby('foot_x')['power'].count().reset_index()
count_foot

Unnamed: 0,foot_x,power
0,left,87667
1,right,88647


In [7]:
#see if there are any null values in footpods dataset
df_combined.isnull().sum()

t                0
foot_x           0
pronation        0
braking          0
impact           0
contact_time     0
flight_ratio     0
strike           0
power            0
session_id       0
user_id          0
t_start          0
t_end            0
duration         0
foot_y           0
cadence          0
speed            0
track_uri        0
paused           0
artist          27
track            0
context_uri      0
context          0
position         0
dtype: int64

### **3.2.Variables and Data Types**

In [8]:
df_combined.dtypes

t                object
foot_x           object
pronation       float64
braking         float64
impact          float64
contact_time      int64
flight_ratio    float64
strike            int64
power             int64
session_id       object
user_id          object
t_start          object
t_end            object
duration        float64
foot_y           object
cadence           int64
speed           float64
track_uri        object
paused             bool
artist           object
track            object
context_uri      object
context          object
position        float64
dtype: object

## **4.Data Converting**
In this chapter we are going to convert the date and time data to the correct datatypes. This will be used to calculate the frequency.

### **4.1.Data Type Conversion**

In [9]:
#Check all the datatypes in the dataset.
df_combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176314 entries, 0 to 176313
Data columns (total 24 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   t             176314 non-null  object 
 1   foot_x        176314 non-null  object 
 2   pronation     176314 non-null  float64
 3   braking       176314 non-null  float64
 4   impact        176314 non-null  float64
 5   contact_time  176314 non-null  int64  
 6   flight_ratio  176314 non-null  float64
 7   strike        176314 non-null  int64  
 8   power         176314 non-null  int64  
 9   session_id    176314 non-null  object 
 10  user_id       176314 non-null  object 
 11  t_start       176314 non-null  object 
 12  t_end         176314 non-null  object 
 13  duration      176314 non-null  float64
 14  foot_y        176314 non-null  object 
 15  cadence       176314 non-null  int64  
 16  speed         176314 non-null  float64
 17  track_uri     176314 non-null  object 
 18  paus

In [11]:
#convert columns to datetime columns
df_combined['t'] = pd.to_datetime(df_combined['t'])
df_combined['t_start'] = pd.to_datetime(df_combined['t_start'])
df_combined['t_end'] = pd.to_datetime(df_combined['t_end'])

 ### **4.2.Split Date and Time Data**

In [12]:
#split the datetime column into date and time columns
df_combined['date'] = [d.date() for d in df_combined['t']]
df_combined['time'] = [d.time() for d in df_combined['t']]

In [13]:
#Convert the date and time columns to datetime datatypes
df_combined['date'] = pd.to_datetime(df_combined['date'])
df_combined['time'] = pd.to_datetime(df_combined['t'], format = '%H:%M:%S.%f').dt.time[0]

### **4.3.Frequency Calculation**

In [14]:
#Calculate time difference between each row of the t column.
df_combined['timediff'] = df_combined['t'].diff()
#Convert the time difference to seconds.
df_combined['timediff'] = df_combined['timediff'].dt.total_seconds()
#Calculate the frequency based on the time difference.
df_combined['frequency'] = 1 / df_combined['timediff']
#Remove first row with null timediff value.
df_combined = df_combined.iloc[1: , :]

In [15]:
#Checking the frequencies of the first 10 rows.
df_combined['frequency'].head(10)

1     0.708970
2     0.152883
3     0.514992
4     1.732424
5     1.686613
6     1.728632
7     2.263560
8     1.730059
9     2.118303
10    2.528636
Name: frequency, dtype: float64

## **5.Data Filtering**

After converting all the necessary columns to the correct datatype, it is time to filter out all the unnecessary data from the dataset.

### **5.1.Music per Session**

In this chapter we are going to filter out the so-called bad sessions. We label a session as bad when the duration of the session is less than 30 minutes because test sessions are also in this dataset. So, to filter the bad sessions out, we are going to count the amount of music that was played within each session. This way we can see how many songs were played and we can "easily" determine if a session is valuable or not.

In [16]:
df_combined.head()

Unnamed: 0,t,foot_x,pronation,braking,impact,contact_time,flight_ratio,strike,power,session_id,user_id,t_start,t_end,duration,foot_y,cadence,speed,track_uri,paused,artist,track,context_uri,context,position,date,time,timediff,frequency
1,2020-10-20 17:41:15.181020021,right,-13.2,0.6875,0.8125,991,-33.875,11,72,20DDD5B1-20F1-4278-BDF8-058BE39F053E,1865743f-9faa-41e6-8a70-74aa34726884,2020-10-20 17:40:51.200,2020-10-20 18:37:47.807,3416.607238,left,0,0.882812,spotify:track:0LlleCJVkI3axaEZvksrXd,False,The Chainsmokers,Somebody - Fluencee Remix,spotify:playlist:37i9dQZF1E38o5h27AIod2,Daily Mix 1,19.039,2020-10-20,17:41:13.770524,1.410496,0.70897
2,2020-10-20 17:41:21.721952915,right,-13.3,0.5625,0.8125,962,-31.875,11,73,20DDD5B1-20F1-4278-BDF8-058BE39F053E,1865743f-9faa-41e6-8a70-74aa34726884,2020-10-20 17:40:51.200,2020-10-20 18:37:47.807,3416.607238,left,28,0.914062,spotify:track:0LlleCJVkI3axaEZvksrXd,False,The Chainsmokers,Somebody - Fluencee Remix,spotify:playlist:37i9dQZF1E38o5h27AIod2,Daily Mix 1,19.039,2020-10-20,17:41:13.770524,6.540933,0.152883
3,2020-10-20 17:41:23.663728952,left,-7.5,0.625,1.0625,889,-12.375,10,65,20DDD5B1-20F1-4278-BDF8-058BE39F053E,1865743f-9faa-41e6-8a70-74aa34726884,2020-10-20 17:40:51.200,2020-10-20 18:37:47.807,3416.607238,right,42,1.066406,spotify:track:0LlleCJVkI3axaEZvksrXd,False,The Chainsmokers,Somebody - Fluencee Remix,spotify:playlist:37i9dQZF1E38o5h27AIod2,Daily Mix 1,19.039,2020-10-20,17:41:13.770524,1.941776,0.514992
4,2020-10-20 17:41:24.240954876,right,-14.2,0.4375,0.875,917,-32.125,11,77,20DDD5B1-20F1-4278-BDF8-058BE39F053E,1865743f-9faa-41e6-8a70-74aa34726884,2020-10-20 17:40:51.200,2020-10-20 18:37:47.807,3416.607238,left,29,0.964844,spotify:track:0LlleCJVkI3axaEZvksrXd,False,The Chainsmokers,Somebody - Fluencee Remix,spotify:playlist:37i9dQZF1E38o5h27AIod2,Daily Mix 1,19.039,2020-10-20,17:41:13.770524,0.577226,1.732424
5,2020-10-20 17:41:24.833858967,left,-8.3,0.75,1.0625,850,-11.5,9,65,20DDD5B1-20F1-4278-BDF8-058BE39F053E,1865743f-9faa-41e6-8a70-74aa34726884,2020-10-20 17:40:51.200,2020-10-20 18:37:47.807,3416.607238,left,0,0.0,spotify:track:0LlleCJVkI3axaEZvksrXd,False,The Chainsmokers,Somebody - Fluencee Remix,spotify:playlist:37i9dQZF1E38o5h27AIod2,Daily Mix 1,19.039,2020-10-20,17:41:13.770524,0.592904,1.686613


In [17]:
sessions_count = df_combined["date"].value_counts(normalize = True)
print(sessions_count)

2020-12-31    0.155615
2020-11-29    0.106152
2021-06-28    0.082688
2021-06-15    0.069212
2021-07-07    0.064363
2021-04-21    0.063881
2021-04-13    0.062293
2021-06-02    0.062111
2021-03-25    0.060342
2021-01-27    0.052197
2021-03-17    0.051896
2020-11-24    0.050790
2020-10-20    0.049940
2021-03-04    0.048953
2020-12-08    0.019567
Name: date, dtype: float64


In [18]:
#grouping on date assuming we have only one session per day, and count the unique amount of songs that was played during that day 
df_date_music = df_combined.groupby('date')['track_uri'].nunique().reset_index()
#filter based on per session_id
df_sessionid_music = df_combined.groupby('session_id')['track_uri'].nunique().reset_index()

In [19]:
#visualisation of amount of song played per date
fig = px.bar(df_date_music, x = "date", y = "track_uri", 
             labels = {"date": "Date", "track_uri": "Amount of song played",}, 
             title = "Count of Songs Played per Day")
fig.show()

In [20]:
#visualisation of amount of song played per date
fig = px.bar(df_sessionid_music, x = "session_id", y = "track_uri", 
             labels = {"session_id":"Session ID", "track_uri":"Amount of song played",},
             title = "Count of Songs Played per Session")
fig.show()

**INFO: Average lenght of song** 

According to a report of Quarz, the average song on 2018 on the Billboard Hot 100 is about 3 minutes and 30 seconds long. 

https://qz.com/1519823/is-spotify-making-songs-shorter/


So to determine how long a good session is, we need to do some calculations. 

In [21]:
#calculate the duration of music per day
df_date_music['duration w music'] = df_date_music['track_uri'] * 3.5
df_date_music.head()

Unnamed: 0,date,track_uri,duration w music
0,2020-10-20,17,59.5
1,2020-11-24,20,70.0
2,2020-11-29,37,129.5
3,2020-12-08,7,24.5
4,2020-12-31,34,119.0


In [22]:
#calculate the duration of music per session
df_sessionid_music['duration w music'] = df_sessionid_music['track_uri'] * 3.5
df_sessionid_music

Unnamed: 0,session_id,track_uri,duration w music
0,0AD451FE-D853-4084-BE4C-1B0B8B471FDF,20,70.0
1,16A796BA-4F88-4C13-9245-4360066D7D3E,37,129.5
2,20DDD5B1-20F1-4278-BDF8-058BE39F053E,17,59.5
3,2826E86F-B9A1-44EB-8962-B0CBD54979E4,34,119.0
4,6623D25F-CA67-4A39-B792-2618977C5F7D,21,73.5
5,67C64B4E-1CDB-45DC-9DD9-E5DAF03A1048,18,63.0
6,79A52EF2-F99C-4628-ABE2-27B904F50B8E,24,84.0
7,819FF543-6009-4F63-A2DE-EA1F8498D0F8,31,108.5
8,A47F2D5A-3F6E-4EB4-BD72-2CE028F7A4F4,18,63.0
9,B9495C65-F6D3-45F4-8DA4-8D1D44C6F3E9,22,77.0


In the results above we can see there are alot of session with more than 20 songs played and only one below 10 songs. This shows us that the runner has been running for a long time on average in these sessions.

In [23]:
#filter this with the dataset grouped on day
df_date_music_good = df_date_music[(df_date_music['duration w music'] < 75) & (df_date_music['duration w music'] > 20)]
df_date_music_good

Unnamed: 0,date,track_uri,duration w music
0,2020-10-20,17,59.5
1,2020-11-24,20,70.0
3,2020-12-08,7,24.5
5,2021-01-27,18,63.0
6,2021-03-04,18,63.0
7,2021-03-17,18,63.0
10,2021-04-21,21,73.5


In [24]:
#filter this with the dataset grouped on session id
df_sessionid_music_good = df_sessionid_music[(df_sessionid_music['duration w music'] < 75) & (df_sessionid_music['duration w music'] > 20)]
df_sessionid_music_good

Unnamed: 0,session_id,track_uri,duration w music
0,0AD451FE-D853-4084-BE4C-1B0B8B471FDF,20,70.0
2,20DDD5B1-20F1-4278-BDF8-058BE39F053E,17,59.5
4,6623D25F-CA67-4A39-B792-2618977C5F7D,21,73.5
5,67C64B4E-1CDB-45DC-9DD9-E5DAF03A1048,18,63.0
8,A47F2D5A-3F6E-4EB4-BD72-2CE028F7A4F4,18,63.0
11,F8E0AEA8-E246-43BB-8256-A82F25C6444F,7,24.5
14,FF61EDAF-3C20-4121-8F3B-EB2B6944FFBB,18,63.0


#### **5.1.1.Drop Sessions**

In this section we are going to filter the good sessions.

In [25]:
df_cleaned_music = df_combined[
                               (df_combined['session_id'] == '0AD451FE-D853-4084-BE4C-1B0B8B471FDF') |
                               (df_combined['session_id'] == '20DDD5B1-20F1-4278-BDF8-058BE39F053E') |
                               (df_combined['session_id'] == '6623D25F-CA67-4A39-B792-2618977C5F7D') |
                               (df_combined['session_id'] == '67C64B4E-1CDB-45DC-9DD9-E5DAF03A1048') |
                               (df_combined['session_id'] == 'A47F2D5A-3F6E-4EB4-BD72-2CE028F7A4F4') |
                               (df_combined['session_id'] == 'F8E0AEA8-E246-43BB-8256-A82F25C6444F') |
                               (df_combined['session_id'] == 'FF61EDAF-3C20-4121-8F3B-EB2B6944FFBB')].reset_index()

In [26]:
df_combined.shape

(176313, 28)

In [27]:
df_cleaned_music.shape

(59457, 29)

### **5.2.Outliers**

In this chapter we are going to focus on the outliers and drop these. So that we can only see what we are interested in.

In [28]:
#calculating statistical data 
df_cleaned_music[['timediff']].describe()

Unnamed: 0,timediff
count,59457.0
mean,187.1889
std,21206.6
min,0.2000082
25%,0.2945971
50%,0.3547509
75%,0.4276402
max,3103191.0


In [29]:
#look for outliers with the help of a histogram
fig = px.histogram(
    data_frame = df_cleaned_music,
    x = 'timediff')
fig.show()

In [30]:
#look for outliers with the help of a boxplot
fig = px.box(df_cleaned_music, y = 'timediff')
fig.show()

In [31]:
#Dropping the lower and upper values that fall out the boxplot.
df_filtered = df_cleaned_music.drop(df_cleaned_music[(df_cleaned_music['timediff'] > 0.63)].index)
df_filtered = df_filtered.drop(df_filtered[(df_filtered['timediff'] < 0.2)].index)

In [32]:
#looking at the changes made of the filtered data with a boxplot.
fig = px.box(df_filtered, y = 'timediff')
fig.show()

In [33]:
#looking at the changes made of the filtered data with a histogram.
fig = px.histogram(
    data_frame = df_filtered,
    x = 'timediff')
fig.show()

In [34]:
#Checking for extra null values.
df_filtered.describe()

Unnamed: 0,index,pronation,braking,impact,contact_time,flight_ratio,strike,power,duration,cadence,speed,position,timediff,frequency
count,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0,58659.0
mean,64640.634481,-13.060344,6.008934,14.152469,284.479637,19.654083,5.964404,257.163436,3597.045652,84.409127,3.704104,90.554017,0.366117,2.91332
std,41747.948223,2.480513,1.07428,1.777127,49.082011,6.245343,1.704127,28.510256,639.089491,4.899356,0.413856,62.020408,0.092705,0.752317
min,4.0,-50.7,0.4375,0.625,219.0,-55.0,1.0,0.0,1313.822852,0.0,0.0,0.0,0.200008,1.587367
25%,14864.5,-14.7,5.375,13.625,270.0,17.5,5.0,244.0,3460.017518,84.0,3.523438,37.338,0.294111,2.343927
50%,75878.0,-12.8,6.0,14.5625,281.0,19.625,6.0,257.0,3475.261888,85.0,3.664062,86.94,0.353919,2.825505
75%,90785.5,-11.3,6.625,15.3125,290.0,22.5,7.0,272.0,3990.54327,85.0,3.894531,135.742,0.426634,3.400077
max,127231.0,8.1,9.8125,15.75,3017.0,42.125,16.0,532.0,4206.436572,92.0,6.574219,372.213,0.629974,4.999796


In [35]:
#Removing all rows with a flight ratio below zero.
df_filtered = df_filtered.drop(df_filtered[(df_filtered['flight_ratio'] <= 0)].index)

In [36]:
#Looking at the changes in statistical data.
df_filtered.describe()

Unnamed: 0,index,pronation,braking,impact,contact_time,flight_ratio,strike,power,duration,cadence,speed,position,timediff,frequency
count,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0,57921.0
mean,64789.861346,-13.029896,6.04716,14.278838,279.976848,20.189325,5.953557,258.719428,3598.637935,84.756323,3.730176,90.764061,0.364843,2.921682
std,41720.062475,2.432297,1.014341,1.340591,16.794254,3.943107,1.697928,24.601943,637.971881,3.321095,0.339491,62.048839,0.091989,0.750555
min,22.0,-22.3,1.375,6.75,219.0,0.125,1.0,0.0,1313.822852,0.0,0.0,0.0,0.200008,1.587367
25%,14944.0,-14.6,5.375,13.6875,270.0,17.625,5.0,244.0,3460.017518,84.0,3.53125,37.696,0.293815,2.347488
50%,75937.0,-12.8,6.0,14.5625,281.0,19.625,6.0,258.0,3475.261888,85.0,3.667969,87.575,0.353524,2.828661
75%,90837.0,-11.3,6.625,15.3125,290.0,22.5,7.0,272.0,3990.54327,85.0,3.902344,136.154,0.425987,3.403503
max,127215.0,-4.1,9.8125,15.75,480.0,42.125,14.0,511.0,4206.436572,92.0,5.496094,372.213,0.629974,4.999796


### **5.3.Contact Time**

In [37]:
plt = px.scatter(df_filtered, y = 'contact_time', x = 'contact_time')
plt.show()

In [38]:
# Dropping the lower and upper values that fallout the boxplot
df_filtered = df_filtered.drop(df_filtered[(df_filtered['contact_time'] > 600)].index)
plt = px.scatter(df_filtered, y = 'contact_time', x = 'contact_time')
plt.show()

### **5.4.Drop Columns**

In [39]:
df_filtered.columns

Index(['index', 't', 'foot_x', 'pronation', 'braking', 'impact',
       'contact_time', 'flight_ratio', 'strike', 'power', 'session_id',
       'user_id', 't_start', 't_end', 'duration', 'foot_y', 'cadence', 'speed',
       'track_uri', 'paused', 'artist', 'track', 'context_uri', 'context',
       'position', 'date', 'time', 'timediff', 'frequency'],
      dtype='object')

In [40]:
#drop unnecessary columns
df_filtered = df_filtered.drop(["index", "foot_y", "user_id", "t_end", "paused"], axis=1)
#change the name of the foot column
df_filtered = df_filtered.rename(columns = {'foot_x': 'foot'})

In [41]:
df_filtered.columns

Index(['t', 'foot', 'pronation', 'braking', 'impact', 'contact_time',
       'flight_ratio', 'strike', 'power', 'session_id', 't_start', 'duration',
       'cadence', 'speed', 'track_uri', 'artist', 'track', 'context_uri',
       'context', 'position', 'date', 'time', 'timediff', 'frequency'],
      dtype='object')

## **6.Analysis**

### **6.1.Choosing Session**

In [42]:
#find the longest session
df_longest_session = df_filtered.groupby("session_id")['duration'].max().reset_index()
df_longest_session

Unnamed: 0,session_id,duration
0,0AD451FE-D853-4084-BE4C-1B0B8B471FDF,3460.017518
1,20DDD5B1-20F1-4278-BDF8-058BE39F053E,3416.607238
2,6623D25F-CA67-4A39-B792-2618977C5F7D,4206.436572
3,67C64B4E-1CDB-45DC-9DD9-E5DAF03A1048,3990.54327
4,A47F2D5A-3F6E-4EB4-BD72-2CE028F7A4F4,3746.141195
5,F8E0AEA8-E246-43BB-8256-A82F25C6444F,1313.822852
6,FF61EDAF-3C20-4121-8F3B-EB2B6944FFBB,3475.261888


In [43]:
df_chosen_session = df_filtered[(df_filtered['session_id'] == '6623D25F-CA67-4A39-B792-2618977C5F7D')].reset_index()

### **6.2.Basic Correlations**

In [44]:
corr = df_chosen_session.corr()
corr.style.background_gradient(cmap='coolwarm')

Unnamed: 0,index,pronation,braking,impact,contact_time,flight_ratio,strike,power,duration,cadence,speed,position,timediff,frequency
index,1.0,0.205815,-0.064464,-0.317279,0.59096,-0.690883,-0.178521,-0.448858,-0.0,0.01108,-0.624772,-0.01075,0.014005,-0.010978
pronation,0.205815,1.0,0.273883,0.507847,0.253778,-0.31154,-0.620426,-0.138873,0.0,0.040119,-0.146264,-0.009569,0.012706,-0.006343
braking,-0.064464,0.273883,1.0,0.32979,-0.033865,-0.009027,-0.355006,0.062239,-0.0,0.072767,0.125106,0.010504,-0.009923,0.00927
impact,-0.317279,0.507847,0.32979,1.0,-0.190171,0.163139,-0.397456,0.148009,-0.0,0.116311,0.310053,-0.00625,-0.00843,0.017776
contact_time,0.59096,0.253778,-0.033865,-0.190171,1.0,-0.931116,-0.193419,-0.499043,0.0,-0.418953,-0.866175,0.001813,0.033501,-0.027012
flight_ratio,-0.690883,-0.31154,-0.009027,0.163139,-0.931116,1.0,0.254411,0.51445,0.0,0.164249,0.862765,-0.028753,-0.02537,0.019114
strike,-0.178521,-0.620426,-0.355006,-0.397456,-0.193419,0.254411,1.0,0.127199,0.0,-0.04456,-0.025613,-0.024397,-0.021207,0.01003
power,-0.448858,-0.138873,0.062239,0.148009,-0.499043,0.51445,0.127199,1.0,0.0,0.16723,0.514394,-0.049673,-0.017371,0.015992
duration,-0.0,0.0,-0.0,-0.0,0.0,0.0,0.0,0.0,1.0,-0.0,-0.0,0.0,-0.0,-0.0
cadence,0.01108,0.040119,0.072767,0.116311,-0.418953,0.164249,-0.04456,0.16723,-0.0,1.0,0.350128,0.060455,-0.032527,0.024865


In [45]:
fig = px.scatter_matrix(df_chosen_session,
    dimensions = ["pronation", "speed", "braking", "impact", "power", "timediff", "frequency", 'contact_time', 'flight_ratio'],
    color = "foot", width = 1400, height = 1400,
    title = "Pairplot Test Person #1 - Both Feet")
fig.update_traces(diagonal_visible = False)
fig.show()

In [46]:
feet_left = ['left']
df_left_foot = df_chosen_session[df_chosen_session['foot'].isin(feet_left)]

fig = px.scatter_matrix(df_left_foot,
    dimensions = ["pronation", "speed", "braking", "impact", "power", "timediff", "frequency", 'contact_time', 'flight_ratio'],
    color = "foot", width = 1400, height = 1400,
    title = "Pairplot Test Person #1 - Left Foot")
fig.update_traces(diagonal_visible = False)
fig.show()

In [47]:
feet_right = ['right']
df_right_foot = df_chosen_session[df_chosen_session['foot'].isin(feet_right)]
fig = px.scatter_matrix(df_right_foot,
    dimensions = ["pronation", "speed", "braking", "impact", "power", "timediff", "frequency", 'contact_time', 'flight_ratio'],
    color = "foot", width = 1400, height = 1400,
    title = "Pairplot Test Person #1 - Right Foot")
fig.update_traces(diagonal_visible = False)
fig.show()

### **6.3.In Depth Correlations**

In this chapter we take a more in depth look at some interessting correlations that were found in the scatter matrix.

#### **6.3.1.Flight Ratio and Contact Time**

In [48]:
fig = px.scatter(df_chosen_session, x = "flight_ratio", y = "contact_time", color = "foot",
                 labels = {"flight_ratio": "Flight Ratio in miliseconds", "contact_time": "Contact Time in miliseconds", "foot": "Foot of Runner"},
                 title = "Correlation between Contact Time and Flight Ratio")
fig.show()

In [49]:
fig = px.scatter_3d(df_chosen_session, 
                    x = 'flight_ratio', 
                    y = 'contact_time', 
                    z = 'foot',
                    color = 'foot')
fig.show()

#### **6.3.2.Amount of Steps per Feet**

In [50]:
df_footsteps = df_chosen_session.groupby('foot')['t'].count().reset_index()
df_footsteps.columns = ["foot", "count"]

fig = px.bar(df_footsteps, x = "foot", y = "count", color = 'foot', 
             labels = {"foot": "Foot", "count":"Amount of Steps",},
             title = "The Amount of Steps by each Foot")
fig.show()

#### **6.3.3.Frequency**

In this chapter we look at the frequency of test person one. Here we can see what the frequency of his steps were per second on the specific session.

In [51]:
fig = px.line(df_chosen_session, x = 't', y = "frequency", color = 'foot', title = 'Frequency between steps')
#fig = px.line(df_chosen_session, x = 't', y = "frequency", title = 'Frequency between steps')
fig.show()