# Apple Iphone's Health app analytics

This notebook allows you to explore analyze the Iphone's health app data. First, export the data from your phone following these steps:

1. On your Iphone, open the Health app.
2. Tap the user profile pic in the top right corner of the app.
3. Scroll down and tap "Export health data".
4. Select the desired app to transfer your health data as a .zip file.
5. Unzip the .zip file in your computer.

After this, just insert the path to the unzipped folder in the variable below and run the notebook.

In [1]:
path = r'C:\Users\Eduardo\Desktop\apple_health_export'

## 1. Modules

In [2]:
import os
import xmltodict
import numpy as np
import pandas as pd
import seaborn as sns
from datetime import datetime, date
from matplotlib import pyplot as plt

Set graphic options

In [3]:
%matplotlib notebook
sns.set_theme()
plt.rcParams['figure.figsize'] = [10, 5]
plt.rcParams['figure.dpi'] = 100

## 2. Functions and global variables

In [120]:
# Translation to readable metric names
translate = dict(
    HKQuantityTypeIdentifierStepCount      = 'Step count',
    HKQuantityTypeIdentifierFlightsClimbed = 'Flights climbed'
)


def plot_distributions(df, group, metric, order=None):
    '''
    Plots the distributions for a given data metric (metric) 
    across multiple groups defined by input variable (group).
    
    Inputs
    ------
    df     -- dataframe holding the data.
    x      -- grouping variable.
    metric -- metric whose distribution is plotted.
    order  -- order to show the grouping variable groups.
    
    Outputs
    -------
    None
    '''
    
    # Keep only data for the input metric
    df = df.copy().loc[df['@type'] == metric]
    
    # Plot metric distributions and corresponding datapoints
    fig, axes = plt.subplots()
    sns.violinplot(data=df, x=group, y='value', palette='pastel', inner=None, ax=axes, order=order)
    sns.stripplot(data=df, x=group, y='value', alpha=0.15, ax=axes, order=order)
    sns.pointplot(data=df, x=group, y='value', estimator=np.mean, color='black', errorbar=None, ax=axes, order=order)

    # Plot labelling
    axes.set_title(f'Distribution of {translate[metric]} across {group}')
    axes.set_xlabel(group.capitalize())
    axes.set_ylabel(translate[metric].capitalize())
    
    
def plot_daily(df, metric, N=100):
    '''
    Plots the daily time series for a given variable (metric).
    
    Inputs
    ------
    df     -- dataframe holding the data.
    metric -- metric whose time series is plotted.
    N      -- reduction factor for the xticks.
    
    Outputs
    -------
    None
    '''
    
    # Keep only data for the input metric
    df = df.copy().loc[df['@type'] == metric]

    # Plot variables
    x = df['value'].index
    y = df['value'][x]
    labels = df['date'][x]

    # Plot time series
    fig, axes = plt.subplots()
    axes.plot(x, y, color='lightcoral', alpha=0.75)

    # Plot labelling
    axes.set_title(f'Time series of {translate[metric]}')
    axes.set_xlabel('Date')
    axes.set_ylabel(translate[metric].capitalize())
    axes.set_xticks(x[0::N])
    axes.set_xticklabels(labels[0::N], rotation=45);
    plt.subplots_adjust(bottom=0.25)
    
    

## 3. Import data

In [4]:
# Get XML file path
xml_file_name = os.listdir(path)[0]
xml_file_path = path + f'\{xml_file_name}'

# Open the XML file as df
with open(xml_file_path) as file:
    dict_xml   = xmltodict.parse(file.read())
    df_xml     = pd.DataFrame.from_dict(dict_xml)['HealthData']
    df_records = pd.DataFrame(df_xml['Record'])
    df_workout = pd.DataFrame(df_xml['Workout'])
    df_summary = pd.DataFrame(df_xml['ActivitySummary'])

## 4. Iphone activity analysis
This section of the code explores the main metrics from the *df_records* registered by the iPhone, which include the step count and the flights climbed. In this first cell, we preprocess the data imported to obtain the aggregated metrics per day and create new useful columns.

In [104]:
# Use only iPhone data and desired metrics
metrics = ['HKQuantityTypeIdentifier' + x for x in ['StepCount', 'FlightsClimbed']]
df_iphone = df_records.copy().loc[(df_records['@type'].isin(metrics)) & (df_records['@sourceName'] == 'iPhone')].reset_index(drop=True)

# Convert value column from string to numeric
df_iphone['@value'] = pd.to_numeric(arg=df_iphone['@value'], downcast='integer')

# Create a date column and aggregate metrics by day
df_iphone.insert(loc=1, column='date', value=df_iphone['@creationDate'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S %z').date()))
df_iphone = df_iphone.groupby(['date', '@type'])['@value'].sum().reset_index(name='value')

# Create a year column
df_iphone.insert(loc=1, column='year', value=df_iphone['date'].apply(lambda x: x.year))

# Create a weekday column
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday','Saturday', 'Sunday']
df_iphone.insert(loc=2, column='weekday', value=df_iphone['date'].apply(lambda x: weekdays[x.weekday()]))

# Show data
df_iphone

Unnamed: 0,date,year,weekday,@type,value
0,2018-09-19,2018,Wednesday,HKQuantityTypeIdentifierStepCount,2911.0
1,2018-09-20,2018,Thursday,HKQuantityTypeIdentifierFlightsClimbed,2.0
2,2018-09-20,2018,Thursday,HKQuantityTypeIdentifierStepCount,1599.0
3,2018-09-21,2018,Friday,HKQuantityTypeIdentifierFlightsClimbed,3.0
4,2018-09-21,2018,Friday,HKQuantityTypeIdentifierStepCount,10814.0
...,...,...,...,...,...
2999,2023-07-16,2023,Sunday,HKQuantityTypeIdentifierStepCount,2662.0
3000,2023-07-17,2023,Monday,HKQuantityTypeIdentifierFlightsClimbed,3.0
3001,2023-07-17,2023,Monday,HKQuantityTypeIdentifierStepCount,9909.0
3002,2023-07-18,2023,Tuesday,HKQuantityTypeIdentifierFlightsClimbed,1.0


### 4.1. Steps

Daily step count

In [108]:
plot_daily(df=df_iphone, metric='HKQuantityTypeIdentifierStepCount')

<IPython.core.display.Javascript object>

Step distribution per week day

In [122]:
plot_distributions(df=df_iphone, group='weekday', metric='HKQuantityTypeIdentifierStepCount', order=weekdays)

<IPython.core.display.Javascript object>

Step distribution per year

In [112]:
plot_distributions(df=df_iphone, group='year', metric='HKQuantityTypeIdentifierStepCount')

<IPython.core.display.Javascript object>

### 4.2. Flights climbed

Daily flights climbed

In [114]:
plot_daily(df=df_iphone, metric='HKQuantityTypeIdentifierFlightsClimbed')

<IPython.core.display.Javascript object>

Flights climbed distribution per weekday

In [121]:
plot_distributions(df=df_iphone, group='weekday', metric='HKQuantityTypeIdentifierFlightsClimbed', order=weekdays)

<IPython.core.display.Javascript object>

Flights climbed distribution per year

In [116]:
plot_distributions(df=df_iphone, group='year', metric='HKQuantityTypeIdentifierFlightsClimbed')

<IPython.core.display.Javascript object>