<!--NAVIGATION-->
< [Get Exercise Logs](For_The_Bit_01-04_Get_Exercise_Logs.ipynb) | [Contents](For_The_Bit_00-Introduction.ipynb) | [Get Weight Logs](For_The_Bit_01-06_Get_Weight_Logs.ipynb) >

# For The Bit

## 1 - Getting my `fitbit` data 

### Part V.  Add exercise flags to the intraday data

We want to know whether a given minute in the intraday data was during exercise or not, and if so, what type.

We just add an extra column to the MGS intraday data-- a flag for whether I am exercising at any moment.

In [1]:
import pandas as pd
import json
import datetime
import dateutil
import os

In [2]:
mgs = pd.read_csv('../data/gully/intraday/mgs_intraday.csv', index_col=0, converters={0:pd.to_datetime})

In [3]:
with open('../data/gully/exercise/logs/exercise_log_6180663039.json', 'r') as f:
    this_json = json.load(f)

Examine the log properties:

In [4]:
#pd.Series(this_json['activities'][0])

### Make a minute-level date vector for exercise
And then loop on it.

In [5]:
exer_dir = '../data/gully/exercise/logs/'
flist = os.listdir(exer_dir)

In [6]:
master_df = pd.DataFrame()

In [7]:
for i, file in enumerate(flist):
    
    with open(exer_dir + file, 'r') as f:
        this_json = json.load(f)
    
    startTime = this_json['activities'][0]['startTime'][:-13]
    duration = this_json['activities'][0]['duration']/1000.0/60.0
    activityName = this_json['activities'][0]['activityName']
    dr = pd.date_range(start=startTime, periods=duration, freq='1min')
    #print("{:>4d}  {:>20}  {:>6.2f}  {:<20}".format(i, startTime, duration, activityName))
    df = pd.DataFrame(index=dr)
    df['exercise'] = activityName
    master_df = master_df.append(df)

In [8]:
master_df.exercise.unique()

array(['Walk', 'Run', 'Elliptical', 'Spinning', 'Hike', 'Bike',
       'Skateboarding', 'Dancing', 'Sport', 'Yoga', 'Workout',
       'Aerobic Workout'], dtype=object)

In [9]:
mgs_out = pd.merge(mgs, master_df, how='outer', left_index=True, right_index=True)

In [10]:
len(mgs), len(mgs_out)

(179186, 179309)

Hmm, that's strange... these *should* be the same length.

In [11]:
extras = set(mgs_out.index) - set(mgs.index)

In [12]:
bads = sorted(list(extras))

In [13]:
len(bads), len(master_df)

(119, 5581)

Hmm... 110 minutes out of 2028 minutes of exercise were tracked in the exercise but *not* in the intraday data.  **Strange.**

In [14]:
master_df.loc[bads].tail()

Unnamed: 0,exercise
2017-03-25 17:19:00,Walk
2017-04-01 11:50:00,Bike
2017-04-15 10:05:00,Bike
2017-04-22 12:43:00,Bike
2017-04-27 21:20:00,Aerobic Workout


In [15]:
mgs_out.loc[bads].tail()

Unnamed: 0,steps,HR,sleep,exercise
2017-03-25 17:19:00,,,,Walk
2017-04-01 11:50:00,,,,Bike
2017-04-15 10:05:00,,,,Bike
2017-04-22 12:43:00,,,,Bike
2017-04-27 21:20:00,,,,Aerobic Workout


Sure enough... No Steps nor HR available...  
Well, I'll look into this later.  Let's save it for now.

In [16]:
mgs_out.to_csv('../data/gully/intraday/mgs_intraday_exercise.csv')