# Apple Health Extractor

This code will parse your Apple Health export data, create multiple CSV and do some simple data checks and data analysis. 

Enjoy! 

--------

## Extract Data and Export to CSVs from Apple Health's Export.xml

* Command Line Tool to Process apple health's export.xml file 
* Create multiple CSV files for each data type. 
* Original Source: https://github.com/tdda/applehealthdata
* Based on the size of your Apple Health Data, this script may take several minutes to complete.

**NOTE: Currently there are a few minor errors based on additional data from Apple Health that require some updates.** 

## Setup and Usage NOTE

* Export your data from Apple Health App on your phone. 
* Unzip export.zip into this directory and rename to data. 
* Inside your directory there should be a directory and file here: /data/export.xml
* Run inside project or in the command line.

In [10]:
# %run -i 'apple-health-data-parser' 'export.xml' 
%run -i 'apple-health-data-parser' 'data/export.xml' 

Reading data from data/export.xml . . . done


  self.nodes = self.root.getchildren()


Unexpected node of type ExportDate.

Tags:
ActivitySummary: 673
ExportDate: 1
Me: 1
Record: 1230834
Workout: 239

Fields:
HKCharacteristicTypeIdentifierBiologicalSex: 1
HKCharacteristicTypeIdentifierBloodType: 1
HKCharacteristicTypeIdentifierDateOfBirth: 1
HKCharacteristicTypeIdentifierFitzpatrickSkinType: 1
activeEnergyBurned: 673
activeEnergyBurnedGoal: 673
activeEnergyBurnedUnit: 673
appleExerciseTime: 673
appleExerciseTimeGoal: 673
appleStandHours: 673
appleStandHoursGoal: 673
creationDate: 1231073
dateComponents: 673
device: 1217099
duration: 239
durationUnit: 239
endDate: 1231073
sourceName: 1231073
sourceVersion: 1226778
startDate: 1231073
totalDistance: 239
totalDistanceUnit: 239
totalEnergyBurned: 239
totalEnergyBurnedUnit: 239
type: 1230834
unit: 1219792
value: 1230581
workoutActivityType: 239

Record types:
ActiveEnergyBurned: 591394
AppleExerciseTime: 21938
AppleStandHour: 10786
AppleStandTime: 4258
AudioExposureEvent: 2
BasalEnergyBurned: 169982
BodyMass: 313
DietaryCalciu

-----

# Apple Health Data Check and Simple Data Analysis

In [3]:
import numpy as np
import pandas as pd
import glob

----

# Weight

In [4]:
weight = pd.read_csv("data/BodyMass.csv")

In [5]:
weight.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
304,MyFitnessPal,28552,,BodyMass,lb,2020-03-11 08:19:26 -0400,2020-03-10 08:19:00 -0400,2020-03-10 08:19:00 -0400,204.2
305,MyFitnessPal,28552,,BodyMass,lb,2020-03-11 21:11:20 -0400,2020-03-11 21:11:00 -0400,2020-03-11 21:11:00 -0400,204.1
306,Health,13.3.1,,BodyMass,lb,2020-03-12 05:48:05 -0400,2020-03-12 05:47:00 -0400,2020-03-12 05:47:00 -0400,203.5
307,MyFitnessPal,28552,,BodyMass,lb,2020-03-12 21:51:16 -0400,2020-03-12 21:51:00 -0400,2020-03-12 21:51:00 -0400,203.5
308,Health,13.3.1,,BodyMass,lb,2020-03-13 08:45:08 -0400,2020-03-13 05:45:00 -0400,2020-03-13 05:45:00 -0400,204.1


In [5]:
weight.describe()

Unnamed: 0,device,value
count,0.0,309.0
mean,,193.038511
std,,9.005642
min,,179.5
25%,,183.6
50%,,195.5
75%,,200.0
max,,209.2


----

## Steps

In [6]:
steps = pd.read_csv("data/StepCount.csv")

In [7]:
len(steps)

105845

In [8]:
steps.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [9]:
steps.describe()

Unnamed: 0,value
count,105845.0
mean,85.740838
std,159.318891
min,1.0
25%,17.0
50%,39.0
75%,85.0
max,6192.0


In [10]:
steps.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
105840,Greg Ames’s iPhone,13.3.1,"<<HKDevice: 0x282c3c780>, name:iPhone, manufac...",StepCount,count,2020-04-09 07:09:22 -0400,2020-04-09 06:57:11 -0400,2020-04-09 07:06:06 -0400,229
105841,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282c3c820>, name:Apple Watch, ma...",StepCount,count,2020-04-09 07:08:33 -0400,2020-04-09 06:57:38 -0400,2020-04-09 07:06:10 -0400,175
105842,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282c3c8c0>, name:Apple Watch, ma...",StepCount,count,2020-04-09 08:20:29 -0400,2020-04-09 08:07:53 -0400,2020-04-09 08:17:32 -0400,99
105843,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282c3c960>, name:Apple Watch, ma...",StepCount,count,2020-04-09 08:28:14 -0400,2020-04-09 08:18:08 -0400,2020-04-09 08:23:41 -0400,62
105844,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282c3ca00>, name:Apple Watch, ma...",StepCount,count,2020-04-09 08:43:09 -0400,2020-04-09 08:32:31 -0400,2020-04-09 08:32:33 -0400,1


In [11]:
# total all-time steps
steps.value.sum()

9075239

-------

## Stand Count

In [12]:
stand = pd.read_csv("data/AppleStandHour.csv")

In [13]:
len(stand)

10701

In [14]:
stand.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [15]:
stand.describe()

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


In [16]:
stand.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
10696,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282eb0730>, name:Apple Watch, ma...",AppleStandHour,,2020-04-08 21:03:57 -0400,2020-04-08 21:00:00 -0400,2020-04-08 22:00:00 -0400,HKCategoryValueAppleStandHourStood
10697,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282eb07d0>, name:Apple Watch, ma...",AppleStandHour,,2020-04-08 22:13:13 -0400,2020-04-08 22:00:00 -0400,2020-04-08 23:00:00 -0400,HKCategoryValueAppleStandHourStood
10698,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282eb0870>, name:Apple Watch, ma...",AppleStandHour,,2020-04-09 06:58:07 -0400,2020-04-09 06:00:00 -0400,2020-04-09 07:00:00 -0400,HKCategoryValueAppleStandHourStood
10699,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282eb0910>, name:Apple Watch, ma...",AppleStandHour,,2020-04-09 07:01:47 -0400,2020-04-09 07:00:00 -0400,2020-04-09 08:00:00 -0400,HKCategoryValueAppleStandHourStood
10700,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282eb09b0>, name:Apple Watch, ma...",AppleStandHour,,2020-04-09 08:07:05 -0400,2020-04-09 08:00:00 -0400,2020-04-09 09:00:00 -0400,HKCategoryValueAppleStandHourStood


------

## Resting Heart Rate (HR)

In [17]:
restingHR = pd.read_csv("data/RestingHeartRate.csv")

In [18]:
len(restingHR)

661

In [19]:
restingHR.describe()

Unnamed: 0,device,value
count,0.0,661.0
mean,,58.73525
std,,3.740578
min,,49.0
25%,,56.0
50%,,59.0
75%,,61.0
max,,76.0


---

## Walking Heart Rate (HR) Average

In [20]:
walkingHR = pd.read_csv("data/WalkingHeartRateAverage.csv")

In [21]:
len(walkingHR)

628

In [22]:
walkingHR.describe()

Unnamed: 0,device,value
count,0.0,628.0
mean,,83.103503
std,,10.313837
min,,59.0
25%,,76.5
50%,,81.25
75%,,87.0
max,,137.5


---

## Heart Rate Variability (HRV)

In [23]:
hrv = pd.read_csv("data/HeartRateVariabilitySDNN.csv")

In [24]:
len(hrv)

1151

In [25]:
hrv.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [26]:
hrv.describe()

Unnamed: 0,value
count,1151.0
mean,37.705395
std,15.626666
min,7.98436
25%,26.57445
50%,35.0259
75%,45.5933
max,123.408


In [27]:
hrv.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
1146,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282f5cff0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2020-04-08 14:24:29 -0400,2020-04-08 14:23:24 -0400,2020-04-08 14:24:29 -0400,42.5272
1147,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282f5d090>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2020-04-08 14:27:41 -0400,2020-04-08 14:26:36 -0400,2020-04-08 14:27:41 -0400,54.4237
1148,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282f5d130>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2020-04-08 22:27:06 -0400,2020-04-08 22:26:01 -0400,2020-04-08 22:27:06 -0400,28.7913
1149,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282f5d1d0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2020-04-09 07:08:32 -0400,2020-04-09 07:07:31 -0400,2020-04-09 07:08:32 -0400,51.2957
1150,Greg’s Apple Watch,6.1.3,"<<HKDevice: 0x282f5d270>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2020-04-09 09:03:20 -0400,2020-04-09 09:02:20 -0400,2020-04-09 09:03:20 -0400,42.0119


-------

## VO2 Max

In [28]:
vo2max = pd.read_csv("data/VO2Max.csv")

In [29]:
len(vo2max)

52

In [30]:
vo2max.describe()

Unnamed: 0,sourceVersion,device,value
count,0.0,0.0,52.0
mean,,,38.380846
std,,,2.71747
min,,,33.5271
25%,,,36.815725
50%,,,38.04175
75%,,,40.507325
max,,,43.7816


----

## Blood Pressure

In [31]:
diastolic = pd.read_csv("data/BloodPressureDiastolic.csv")
systolic = pd.read_csv("data/BloodPressureSystolic.csv")

FileNotFoundError: [Errno 2] File data/BloodPressureDiastolic.csv does not exist: 'data/BloodPressureDiastolic.csv'

In [None]:
diastolic.describe()

In [None]:
systolic.describe()

------

## Sleep

In [None]:
sleep = pd.read_csv("data/SleepAnalysis.csv")

In [None]:
sleep.tail()

In [None]:
sleep.describe()