# Apple Health Extractor

This code will parse your Apple Health export data, create multiple CSV and do some simple data checks and data analysis. 

Enjoy! 

--------

## Extract Data and Export to CSVs from Apple Health's Export.xml

* Command Line Tool to Process apple health's export.xml file 
* Create multiple CSV files for each data type. 
* Original Source: https://github.com/tdda/applehealthdata
* Based on the size of your Apple Health Data, this script may take several minutes to complete.

**NOTE: Currently there are a few minror errors based on additional data from Apple Health that require some updates.** 

## Setup and Usage NOTE

* Export your data from Apple Health App on your phone. 
* Unzip export.zip into this directory and rename to data. 
* Inside your directory there should be a directory and file here: /data/export.xml
* Run inside project or in the command line.

## Apple Health Data
- Converting Data from XML to CSV
- Data Cleansing
- Exploratory Analysis

## XML to CSV
Line one converst the xml files into csv data.

In [39]:
# %run -i 'apple-health-data-parser' 'export.xml' 
#%run -i 'apple-health-data-parser' 'data/export.xml' 

-----

# Apple Health Data Check and Simple Data Analysis

In [40]:
import numpy as np
import pandas as pd
import glob

----

# Weight

In [41]:
weight = pd.read_csv("data/BodyMass.csv")

In [42]:
weight

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
0,Health,15.1,,BodyMass,kg,2022-01-07 20:38:01 +0100,2022-01-07 20:37:00 +0100,2022-01-07 20:37:00 +0100,62.7
1,Health,15.1,,BodyMass,kg,2022-01-07 20:38:12 +0100,2022-01-07 20:38:00 +0100,2022-01-07 20:38:00 +0100,62.7
2,Health,15.1,,BodyMass,kg,2022-01-07 20:38:35 +0100,2021-06-07 23:00:00 +0100,2021-06-07 23:00:00 +0100,58.0
3,Apple Watch von Laura,7.6.1,,BodyMass,kg,2021-11-13 19:59:28 +0100,2021-11-13 19:59:28 +0100,2021-11-13 19:59:28 +0100,61.0
4,Health,10.3.1,,BodyMass,kg,2017-05-15 21:28:47 +0100,2017-05-15 21:28:47 +0100,2017-05-15 21:28:47 +0100,59.5
5,Health,15.2.1,,BodyMass,kg,2022-01-31 20:59:21 +0100,2022-01-31 20:59:00 +0100,2022-01-31 20:59:00 +0100,62.5
6,Health,15.2.1,,BodyMass,kg,2022-02-17 18:40:33 +0100,2022-02-17 18:40:00 +0100,2022-02-17 18:40:00 +0100,62.2


In [43]:
weight.describe()

Unnamed: 0,device,value
count,0.0,7.0
mean,,61.228571
std,,1.841842
min,,58.0
25%,,60.25
50%,,62.2
75%,,62.6
max,,62.7


----

## Steps

In [44]:
steps = pd.read_csv("data/StepCount.csv")

In [45]:
len(steps)

82085

In [46]:
steps.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [47]:
steps.describe()

Unnamed: 0,value
count,82085.0
mean,126.395322
std,283.919366
min,1.0
25%,11.0
50%,35.0
75%,128.0
max,21282.0


In [48]:
steps.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
82080,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x282412710>, name:Apple Watch, ma...",StepCount,count,2022-03-04 13:32:11 +0100,2022-03-04 13:28:59 +0100,2022-03-04 13:29:56 +0100,34
82081,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x282412710>, name:Apple Watch, ma...",StepCount,count,2022-03-04 13:32:11 +0100,2022-03-04 13:29:58 +0100,2022-03-04 13:30:09 +0100,14
82082,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x282412710>, name:Apple Watch, ma...",StepCount,count,2022-03-04 13:32:11 +0100,2022-03-04 13:31:20 +0100,2022-03-04 13:32:01 +0100,30
82083,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x282412710>, name:Apple Watch, ma...",StepCount,count,2022-03-04 13:43:08 +0100,2022-03-04 13:35:35 +0100,2022-03-04 13:35:48 +0100,32
82084,Laura iPhone,15.3.1,"<<HKDevice: 0x282412530>, name:iPhone, manufac...",StepCount,count,2022-03-04 13:44:57 +0100,2022-03-04 13:33:54 +0100,2022-03-04 13:37:41 +0100,125


In [49]:
# total all-time steps
steps.value.sum()

10375160

In [50]:
#pip install pytz
import pytz

In [51]:
convert_tz = lambda x: x.to_pydatetime().replace(tzinfo=pytz.utc).astimezone(pytz.timezone('Europe/Zurich'))
get_year = lambda x: convert_tz(x).year
get_month = lambda x: '{}-{:02}'.format(convert_tz(x).year, convert_tz(x).month) #inefficient
get_date = lambda x: '{}-{:02}-{:02}'.format(convert_tz(x).year, convert_tz(x).month, convert_tz(x).day) #inefficient
get_day = lambda x: convert_tz(x).day
get_hour = lambda x: convert_tz(x).hour
get_day_of_week = lambda x: convert_tz(x).weekday()

In [52]:
steps['startDate'] = pd.to_datetime(steps['startDate'])
steps['year'] = steps['startDate'].map(get_year)
steps['month'] = steps['startDate'].map(get_month)
steps['date'] = steps['startDate'].map(get_date)
steps['day'] = steps['startDate'].map(get_day)
steps['hour'] = steps['startDate'].map(get_hour)
steps['dow'] = steps['startDate'].map(get_day_of_week)

In [53]:
steps_by_date = steps.groupby(["date"])["value"].sum().reset_index(name="Steps")

In [54]:
steps_by_date.count()/365

date     4.369863
Steps    4.369863
dtype: float64

-------

## Stand Count

In [55]:
stand = pd.read_csv("data/AppleStandHour.csv")

In [56]:
len(stand)

2230

In [57]:
stand.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [58]:
stand.describe()

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


In [59]:
stand.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
2225,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827ccc80>, name:Apple Watch, ma...",AppleStandHour,,2022-03-04 09:24:28 +0100,2022-03-04 09:00:00 +0100,2022-03-04 10:00:00 +0100,HKCategoryValueAppleStandHourStood
2226,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827ccc80>, name:Apple Watch, ma...",AppleStandHour,,2022-03-04 10:05:58 +0100,2022-03-04 10:00:00 +0100,2022-03-04 11:00:00 +0100,HKCategoryValueAppleStandHourStood
2227,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827ccc80>, name:Apple Watch, ma...",AppleStandHour,,2022-03-04 11:01:23 +0100,2022-03-04 11:00:00 +0100,2022-03-04 12:00:00 +0100,HKCategoryValueAppleStandHourStood
2228,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827ccc80>, name:Apple Watch, ma...",AppleStandHour,,2022-03-04 12:03:11 +0100,2022-03-04 12:00:00 +0100,2022-03-04 13:00:00 +0100,HKCategoryValueAppleStandHourStood
2229,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827ccc80>, name:Apple Watch, ma...",AppleStandHour,,2022-03-04 13:24:25 +0100,2022-03-04 13:00:00 +0100,2022-03-04 14:00:00 +0100,HKCategoryValueAppleStandHourStood


------

## Resting Heart Rate (HR)

In [60]:
restingHR = pd.read_csv("data/RestingHeartRate.csv")

In [61]:
len(restingHR)

114

In [62]:
restingHR.describe()

Unnamed: 0,device,value
count,0.0,114.0
mean,,59.868421
std,,6.046301
min,,48.0
25%,,56.0
50%,,59.0
75%,,63.0
max,,82.0


---

## Walking Heart Rate (HR) Average

In [63]:
walkingHR = pd.read_csv("data/WalkingHeartRateAverage.csv")

In [64]:
len(walkingHR)

98

In [65]:
walkingHR.describe()

Unnamed: 0,device,value
count,0.0,98.0
mean,,108.132653
std,,13.385165
min,,80.0
25%,,99.625
50%,,107.0
75%,,115.875
max,,167.0


---

## Heart Rate Variability (HRV)

In [66]:
hrv = pd.read_csv("data/HeartRateVariabilitySDNN.csv")

In [67]:
len(hrv)

619

In [68]:
hrv.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [69]:
hrv.describe()

Unnamed: 0,value
count,619.0
mean,52.077868
std,25.700866
min,6.1084
25%,33.17235
50%,48.0508
75%,66.7794
max,182.977


In [70]:
hrv.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
614,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827f5ae0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2022-03-04 03:12:38 +0100,2022-03-04 03:11:33 +0100,2022-03-04 03:12:31 +0100,51.7397
615,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827f5ae0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2022-03-04 05:13:16 +0100,2022-03-04 05:12:14 +0100,2022-03-04 05:13:12 +0100,35.5677
616,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827f5ae0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2022-03-04 07:12:19 +0100,2022-03-04 07:11:17 +0100,2022-03-04 07:12:15 +0100,71.4405
617,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827f5ae0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2022-03-04 09:12:17 +0100,2022-03-04 09:11:16 +0100,2022-03-04 09:12:13 +0100,67.2449
618,Apple Watch von Laura,8.4.2,"<<HKDevice: 0x2827f5ae0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2022-03-04 13:47:23 +0100,2022-03-04 13:46:22 +0100,2022-03-04 13:47:21 +0100,28.8736


-------

## VO2 Max

In [71]:
vo2max = pd.read_csv("data/VO2Max.csv")

In [72]:
len(vo2max)

38

In [73]:
vo2max.describe()

Unnamed: 0,device,value
count,0.0,38.0
mean,,39.610526
std,,1.631959
min,,36.25
25%,,38.4375
50%,,39.63
75%,,40.8125
max,,41.94


----

## Blood Pressure

In [74]:
diastolic = pd.read_csv("data/BloodPressureDiastolic.csv")
systolic = pd.read_csv("data/BloodPressureSystolic.csv")

FileNotFoundError: [Errno 2] No such file or directory: 'data/BloodPressureDiastolic.csv'

In [None]:
diastolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,65.586207
std,,5.0816
min,,55.0
25%,,63.0
50%,,67.0
75%,,69.0
max,,76.0


In [None]:
systolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,113.206897
std,,8.973689
min,,95.0
25%,,106.0
50%,,112.0
75%,,122.0
max,,128.0


------

## Sleep

In [79]:
#read in sleep data
sleep = pd.read_csv("data/SleepAnalysis.csv")

In [80]:
sleep.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
723,AutoSleep,6.7.40,,SleepAnalysis,,2022-03-04 09:27:42 +0100,2022-03-04 01:13:00 +0100,2022-03-04 03:44:00 +0100,HKCategoryValueSleepAnalysisAsleep
724,AutoSleep,6.7.40,,SleepAnalysis,,2022-03-04 09:27:42 +0100,2022-03-04 04:08:00 +0100,2022-03-04 06:44:00 +0100,HKCategoryValueSleepAnalysisAsleep
725,AutoSleep,6.7.40,,SleepAnalysis,,2022-03-04 09:27:42 +0100,2022-03-04 06:58:00 +0100,2022-03-04 08:17:00 +0100,HKCategoryValueSleepAnalysisAsleep
726,AutoSleep,6.7.40,,SleepAnalysis,,2022-03-04 09:27:42 +0100,2022-03-03 22:45:00 +0100,2022-03-04 09:14:00 +0100,HKCategoryValueSleepAnalysisInBed
727,AutoSleep,6.7.40,,SleepAnalysis,,2022-03-04 09:27:42 +0100,2022-03-04 08:30:00 +0100,2022-03-04 09:14:00 +0100,HKCategoryValueSleepAnalysisAsleep


In [81]:
# check for unique values
sleep["value"].unique()

array(['HKCategoryValueSleepAnalysisInBed',
       'HKCategoryValueSleepAnalysisAsleep'], dtype=object)

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


In [82]:
# calculates the sleep cycle 

#for col in ['creationDate', 'startDate', 'endDate']:
 #   data[col] = pd.to_datetime(data[col])

for col in ['creationDate', 'startDate', 'endDate']:
    sleep[col] = pd.to_datetime(sleep[col])


In [None]:
#sleep['type'] = sleep['type'].str.replace('HKCategoryTypeIdentifier', '')

0      SleepAnalysis
1      SleepAnalysis
2      SleepAnalysis
3      SleepAnalysis
4      SleepAnalysis
           ...      
723    SleepAnalysis
724    SleepAnalysis
725    SleepAnalysis
726    SleepAnalysis
727    SleepAnalysis
Name: type, Length: 728, dtype: object

In [83]:
sleep.head()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
0,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-21 09:00:10+01:00,2017-04-21 01:29:52+01:00,2017-04-21 04:51:28+01:00,HKCategoryValueSleepAnalysisInBed
1,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-21 09:00:10+01:00,2017-04-21 04:51:32+01:00,2017-04-21 09:00:05+01:00,HKCategoryValueSleepAnalysisInBed
2,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 02:33:28+01:00,2017-04-22 06:51:32+01:00,HKCategoryValueSleepAnalysisInBed
3,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 06:52:44+01:00,2017-04-22 07:49:56+01:00,HKCategoryValueSleepAnalysisInBed
4,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 07:50:56+01:00,2017-04-22 07:57:20+01:00,HKCategoryValueSleepAnalysisInBed


In [84]:
#sleep['startDate'] = pd.to_datetime(sleep['startDate'])
sleep['year'] = sleep['startDate'].map(get_year)
sleep['month'] = sleep['startDate'].map(get_month)
sleep['date'] = sleep['startDate'].map(get_date)
sleep['day'] = sleep['startDate'].map(get_day)
sleep['hour'] = sleep['startDate'].map(get_hour)
sleep['dow'] = sleep['startDate'].map(get_day_of_week)

In [85]:
sleep.head()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value,year,month,date,day,hour,dow
0,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-21 09:00:10+01:00,2017-04-21 01:29:52+01:00,2017-04-21 04:51:28+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-21,21,3,4
1,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-21 09:00:10+01:00,2017-04-21 04:51:32+01:00,2017-04-21 09:00:05+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-21,21,6,4
2,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 02:33:28+01:00,2017-04-22 06:51:32+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-22,22,4,5
3,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 06:52:44+01:00,2017-04-22 07:49:56+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-22,22,8,5
4,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 07:50:56+01:00,2017-04-22 07:57:20+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-22,22,9,5


In [86]:
sleep['time_asleep'] = sleep['endDate'] - sleep['startDate']



In [87]:
sleep.head()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value,year,month,date,day,hour,dow,time_asleep
0,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-21 09:00:10+01:00,2017-04-21 01:29:52+01:00,2017-04-21 04:51:28+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-21,21,3,4,0 days 03:21:36
1,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-21 09:00:10+01:00,2017-04-21 04:51:32+01:00,2017-04-21 09:00:05+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-21,21,6,4,0 days 04:08:33
2,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 02:33:28+01:00,2017-04-22 06:51:32+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-22,22,4,5,0 days 04:18:04
3,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 06:52:44+01:00,2017-04-22 07:49:56+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-22,22,8,5,0 days 00:57:12
4,Uhr,50,"<<HKDevice: 0x282413ac0>, name:iPhone, manufac...",SleepAnalysis,,2017-04-22 09:00:24+01:00,2017-04-22 07:50:56+01:00,2017-04-22 07:57:20+01:00,HKCategoryValueSleepAnalysisInBed,2017,2017-04,2017-04-22,22,9,5,0 days 00:06:24


In [88]:

#calculates the total sleep time by date
sleep_time_by_date = sleep.groupby(["date"])["time_asleep"].sum().reset_index(name="Sleep")

In [89]:
sleep_time_by_date.head(10)

Unnamed: 0,date,Sleep
0,2017-04-21,0 days 07:30:09
1,2017-04-22,0 days 06:24:14
2,2017-04-23,0 days 08:00:02
3,2017-04-24,0 days 07:20:45
4,2017-04-25,0 days 09:00:23
5,2017-04-26,0 days 08:48:05
6,2017-04-29,0 days 07:29:19
7,2017-04-30,0 days 08:14:33
8,2017-05-01,0 days 07:50:24
9,2017-05-02,0 days 08:12:56


In [90]:
#clculation the REM Phase
#sleep = sleep.groupby('creationDate').agg(time_asleep=('time_asleep', 'sum'),
 #   bed_time=('startDate', 'min'), 
  #  awake_time=('endDate', 'max'), 
   # sleep_counts=('creationDate','count'), 
   #  rem_cycles=pd.NamedAgg(column='time_asleep', aggfunc=lambda x: (x // datetime.timedelta(minutes=90)).sum()))

NameError: name 'datetime' is not defined