# Exploratory data analysis on fitbit activity data

In [4]:
import pandas as pd;
import matplotlib.pyplot as plt;
%matplotlib inline


In [5]:
data = pd.read_csv('fitbit_export_activity_jan_mar.csv');

The data analysed is provided by the fitbit activity tracker for one particular person over the period of 3 months. Let's see if we can show any trends and dependencies in there.'

In [10]:
data.head(15)

Unnamed: 0,Date,CalBurned,Steps,Distance,Floors,MinNoActivity,MinLowActivity,MinMidActivity,MinHighActivity,Cal
0,01-01-2016,1491,0,0,0,1440,0,0,0,0
1,02-01-2016,1491,0,0,0,1440,0,0,0,0
2,03-01-2016,1491,0,0,0,1440,0,0,0,0
3,04-01-2016,1491,0,0,0,1440,0,0,0,0
4,05-01-2016,1491,0,0,0,1440,0,0,0,0
5,06-01-2016,1491,0,0,0,1440,0,0,0,0
6,07-01-2016,1491,0,0,0,1440,0,0,0,0
7,08-01-2016,1491,0,0,0,1440,0,0,0,0
8,09-01-2016,1491,0,0,0,1440,0,0,0,0
9,10-01-2016,1993,11290,764,0,1180,115,10,51,660


Let's take a look at some basic statistics.

In [7]:
data.describe()

Unnamed: 0,CalBurned,Steps,Floors,MinNoActivity,MinLowActivity,MinMidActivity,MinHighActivity,Cal
count,91.0,91.0,91,91.0,91.0,91.0,91.0,91.0
mean,2254.505495,10170.549451,0,809.791209,206.912088,18.351648,36.582418,998.571429
std,470.214097,6838.523493,0,342.394948,113.502486,17.776622,35.737956,606.28754
min,1463.0,0.0,0,295.0,0.0,0.0,0.0,0.0
25%,2096.5,7236.5,0,566.0,184.5,6.0,16.0,783.5
50%,2298.0,10270.0,0,691.0,237.0,14.0,28.0,1060.0
75%,2498.5,13682.0,0,1001.0,274.0,25.0,49.5,1319.5
max,3438.0,31093.0,0,1440.0,420.0,75.0,203.0,2459.0


# Cleaning of the data
The data contains several records where absolutely no activity was tracked. Most certainly the person was not using their device on that day. These records contain no useful information at all, we'll discard them.

In [None]:
data = data[data['MinNoActivity']!= 1440]


One day consists of 24 * 60 = 11440 minutes. We can see however that the minutues within one day in our data do not add up to this value. To better account for fact, we introduce a new column, for 'MinUnknown' containing the missing minutes.

In [None]:
dfKnown = data['MinNoActivity']+data['MinLowActivity']+data['MinMidActivity']+data['MinHighActivity']


In [None]:
data['MinUnknown'] = 1440 - dfKnown

In [None]:
data

Let's just check if we did a good job in filling up missing data'

In [None]:
plt.plot(data.index.values, data['MinNoActivity']+data['MinLowActivity']+data['MinMidActivity']+data['MinHighActivity'] + data['MinUnknown'])

OK, looks like every day has 1440 minutes, that's what we wanted.

# Visualizing the data
Let's see some graphical representations of the data

In [None]:
fig, ax = plt.subplots()
ax.stackplot(data.index.values, data['MinNoActivity'],data['MinUnknown'],data['MinLowActivity'], data['MinMidActivity'],data['MinHighActivity'] )
plt.show()

It looks like we have still a few records where the "no activity" is filling almost all the day. Probably, the fitbit device did not function correctly on that day. However, we leave this noisy data for now.

Let's investigate how active thi person is on everage

In [None]:
print( "Average activity per day in hours None=", round(data['MinNoActivity'].mean()/60,1), "Low=",round(data['MinLowActivity'].mean()/60,1), "Mid=",round(data['MinMidActivity'].mean()/60,1), "High=",round(data['MinHighActivity'].mean()/60,1))

In [None]:
data['Steps'].mean()

In conclusion, the person does not spend much time in an active way. However, it reaches the doctors recomendation of 10 thousand steps per day.

# Adding new features
Since we can suspect that the data depends on the day of the week, we'll add this feature

In [None]:
#This is how to convert the date to day of week
import datetime
datetime.datetime.strptime('01-01-16', '%d-%m-%y').strftime('%a')


In [None]:
#TODO: add the day of the week feature and provide some stats on that

In [None]:
#data['Date'].apply(lambda x: datetime.datetime.strptime(x, '%d-%m-%y').strftime('%a'))