# Apple Health Extractor

This code will parse your Apple Health export data, create multiple CSV and do some simple data checks and data analysis. 


--------

## Extract Data and Export to CSVs from Apple Health's Export.xml

* Command Line Tool to Process apple health's export.xml file 
* Create multiple CSV files for each data type. 
* Original Source: https://github.com/tdda/applehealthdata
* Based on the size of your Apple Health Data, this script may take several minutes to complete.

**NOTE: Currently there are a few minror errors based on additional data from Apple Health that require some updates.** 

## Setup and Usage NOTE

* Export your data from Apple Health App on your phone. 
* Unzip export.zip into this directory and rename to data. 
* Inside your directory there should be a directory and file here: /data/export.xml
* Run inside project or in the command line.

In [3]:
# %run -i 'apple-health-data-parser' 'export.xml' 
%run -i "apple-health-data-parser" "data/export.xml" 

Reading data from data/export.xml . . . done
Unexpected node of type ExportDate.

Tags:
ActivitySummary: 553
ExportDate: 1
Me: 1
Record: 989147
Workout: 1358

Fields:
HKCharacteristicTypeIdentifierBiologicalSex: 1
HKCharacteristicTypeIdentifierBloodType: 1
HKCharacteristicTypeIdentifierCardioFitnessMedicationsUse: 1
HKCharacteristicTypeIdentifierDateOfBirth: 1
HKCharacteristicTypeIdentifierFitzpatrickSkinType: 1
activeEnergyBurned: 553
activeEnergyBurnedGoal: 553
activeEnergyBurnedUnit: 553
appleExerciseTime: 553
appleExerciseTimeGoal: 553
appleMoveTime: 553
appleMoveTimeGoal: 553
appleStandHours: 553
appleStandHoursGoal: 553
creationDate: 990505
dateComponents: 553
device: 967905
duration: 1358
durationUnit: 1358
endDate: 990505
sourceName: 990505
sourceVersion: 980027
startDate: 990505
totalDistance: 1358
totalDistanceUnit: 1358
totalEnergyBurned: 1358
totalEnergyBurnedUnit: 1358
type: 989147
unit: 982920
value: 989119
workoutActivityType: 1358

Record types:
ActiveEnergyBurned: 3824

-----

# Apple Health Data Check and Simple Data Analysis

In [5]:
import numpy as np
import pandas as pd
import glob

----

# Weight

In [6]:
weight = pd.read_csv("data/BodyMass.csv")

In [7]:
weight.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
0,Health,13.1.3,,BodyMass,kg,2019-10-19 02:50:14 +0530,2019-10-19 02:50:14 +0530,2019-10-19 02:50:14 +0530,75


In [8]:
weight.describe()

Unnamed: 0,device,value
count,0.0,1.0
mean,,75.0
std,,
min,,75.0
25%,,75.0
50%,,75.0
75%,,75.0
max,,75.0


----

## Steps

In [9]:
steps = pd.read_csv("data/StepCount.csv")

In [10]:
len(steps)

41097

In [11]:
steps.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [12]:
steps.describe()

Unnamed: 0,value
count,41097.0
mean,78.535441
std,147.717801
min,1.0
25%,12.0
50%,27.0
75%,74.0
max,1219.0


In [13]:
steps.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
41092,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283877bb0>, name:Apple Watch, ma...",StepCount,count,2021-05-23 20:06:53 +0530,2021-05-23 20:02:59 +0530,2021-05-23 20:03:57 +0530,106
41093,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283877bb0>, name:Apple Watch, ma...",StepCount,count,2021-05-23 20:06:53 +0530,2021-05-23 20:06:35 +0530,2021-05-23 20:06:51 +0530,33
41094,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283877bb0>, name:Apple Watch, ma...",StepCount,count,2021-05-23 20:16:58 +0530,2021-05-23 20:06:51 +0530,2021-05-23 20:16:25 +0530,185
41095,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283877bb0>, name:Apple Watch, ma...",StepCount,count,2021-05-23 20:32:37 +0530,2021-05-23 20:17:00 +0530,2021-05-23 20:25:58 +0530,108
41096,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283877bb0>, name:Apple Watch, ma...",StepCount,count,2021-05-23 20:47:01 +0530,2021-05-23 20:27:33 +0530,2021-05-23 20:36:56 +0530,75


In [14]:
# total all-time steps
steps.value.sum()

3227571

-------

## Stand Count

In [15]:
stand = pd.read_csv("data/AppleStandHour.csv")

In [16]:
len(stand)

6040

In [17]:
stand.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [18]:
stand.describe()

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


In [19]:
stand.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
6035,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283898d70>, name:Apple Watch, ma...",AppleStandHour,,2021-05-23 16:41:01 +0530,2021-05-23 16:00:00 +0530,2021-05-23 17:00:00 +0530,HKCategoryValueAppleStandHourStood
6036,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283898d70>, name:Apple Watch, ma...",AppleStandHour,,2021-05-23 17:12:26 +0530,2021-05-23 17:00:00 +0530,2021-05-23 18:00:00 +0530,HKCategoryValueAppleStandHourStood
6037,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283898d70>, name:Apple Watch, ma...",AppleStandHour,,2021-05-23 18:01:37 +0530,2021-05-23 18:00:00 +0530,2021-05-23 19:00:00 +0530,HKCategoryValueAppleStandHourStood
6038,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283898d70>, name:Apple Watch, ma...",AppleStandHour,,2021-05-23 19:07:36 +0530,2021-05-23 19:00:00 +0530,2021-05-23 20:00:00 +0530,HKCategoryValueAppleStandHourStood
6039,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x283898d70>, name:Apple Watch, ma...",AppleStandHour,,2021-05-23 20:02:54 +0530,2021-05-23 20:00:00 +0530,2021-05-23 21:00:00 +0530,HKCategoryValueAppleStandHourStood


------

## Resting Heart Rate (HR)

In [20]:
restingHR = pd.read_csv("data/RestingHeartRate.csv")

In [21]:
len(restingHR)

436

In [22]:
restingHR.describe()

Unnamed: 0,device,value
count,0.0,436.0
mean,,72.307339
std,,8.711838
min,,55.0
25%,,66.0
50%,,72.0
75%,,78.0
max,,100.0


---

## Walking Heart Rate (HR) Average

In [23]:
walkingHR = pd.read_csv("data/WalkingHeartRateAverage.csv")

In [24]:
len(walkingHR)

438

In [22]:
walkingHR.describe()

Unnamed: 0,sourceVersion,device,value
count,185.0,0.0,185.0
mean,4.092432,,80.927027
std,0.072584,,12.104564
min,4.0,,60.0
25%,4.1,,73.0
50%,4.1,,78.0
75%,4.1,,86.0
max,4.3,,135.0


---

## Heart Rate Variability (HRV)

In [25]:
hrv = pd.read_csv("data/HeartRateVariabilitySDNN.csv")

In [26]:
len(hrv)

1163

In [27]:
hrv.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [28]:
hrv.describe()

Unnamed: 0,value
count,1163.0
mean,44.842997
std,22.81742
min,6.72913
25%,28.7326
50%,40.7132
75%,56.2772
max,157.426


In [29]:
hrv.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
1158,Seshathiri’s Apple Watch,7.4,"<<HKDevice: 0x2838e7700>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2021-05-08 20:42:53 +0530,2021-05-08 20:41:52 +0530,2021-05-08 20:42:53 +0530,12.7671
1159,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x2838e76b0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2021-05-23 12:34:42 +0530,2021-05-23 12:33:36 +0530,2021-05-23 12:34:42 +0530,19.6715
1160,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x2838e76b0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2021-05-23 14:16:20 +0530,2021-05-23 14:15:17 +0530,2021-05-23 14:16:20 +0530,69.9346
1161,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x2838e76b0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2021-05-23 16:22:34 +0530,2021-05-23 16:21:29 +0530,2021-05-23 16:22:34 +0530,45.2034
1162,Seshathiri’s Apple Watch,7.4.1,"<<HKDevice: 0x2838e76b0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2021-05-23 20:24:05 +0530,2021-05-23 20:23:00 +0530,2021-05-23 20:24:05 +0530,27.0578


-------

## VO2 Max

In [30]:
vo2max = pd.read_csv("data/VO2Max.csv")

In [31]:
len(vo2max)

35

In [32]:
vo2max.describe()

Unnamed: 0,sourceVersion,device,value
count,0.0,0.0,35.0
mean,,,31.901463
std,,,1.374384
min,,,31.1791
25%,,,31.2954
50%,,,31.2954
75%,,,31.2954
max,,,35.1375


----

NameError: name 'systolic' is not defined

------

## Sleep

In [34]:
sleep = pd.read_csv("data/SleepAnalysis.csv")

In [35]:
sleep.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
77,Seshathiri’s Apple Watch,7.1,,SleepAnalysis,,2020-12-03 12:55:28 +0530,2020-12-03 08:12:09 +0530,2020-12-03 08:47:09 +0530,HKCategoryValueSleepAnalysisAsleep
78,Seshathiri’s Apple Watch,7.1,,SleepAnalysis,,2020-12-03 12:55:28 +0530,2020-12-03 08:50:39 +0530,2020-12-03 11:56:09 +0530,HKCategoryValueSleepAnalysisAsleep
79,Seshathiri’s Apple Watch,7.1,,SleepAnalysis,,2020-12-03 12:55:28 +0530,2020-12-03 11:58:39 +0530,2020-12-03 12:39:39 +0530,HKCategoryValueSleepAnalysisAsleep
80,Seshathiri’s iPhone,14.2,,SleepAnalysis,,2020-12-03 12:40:00 +0530,2020-12-03 06:40:12 +0530,2020-12-03 12:40:00 +0530,HKCategoryValueSleepAnalysisInBed
81,Seshathiri’s iPhone,14.2,,SleepAnalysis,,2020-12-04 11:30:00 +0530,2020-12-04 02:09:07 +0530,2020-12-04 11:30:00 +0530,HKCategoryValueSleepAnalysisInBed


In [36]:
sleep.describe()

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,
