# Explore Healthkit Data from StrivePD

This template walks you through the process of exploring and pulling Apple HealthKit data.

We utilize the Rune Stream API and Python package `runeq`.
- Full API documentation can be found here: https://docs.runelabs.io/.
- Information about the Python package (SDK) can be found here: https://runeq.readthedocs.io/en/latest/.
- Information on StrivePD can be found here: https://strive.group/.
- Information on HealthKit (HK) can be found here: https://developer.apple.com/documentation/healthkit

A complete list of the HK variables that can currently be pulled with the rune API are:

| Data Category | Fields | Unit / Values |
| --- | --- | --- |
| Activity | stepCount | counts |
| Activity | distanceWalkingRunning | length |
| Activity | vo2 max | ml/kg/min |
| Activity | lowCardioFitnessEvent | Low, Below Average, Above Average, High  |
| Activity | appleExcerciseTime | time units |
| Activity | appleStandTime | time units |
| Mobility | sixMinuteWalkTestDistance | meter |
| Mobility | walkingSpeed | m/s |
| Mobility | walkingStepLength | meter |
| Mobility | walkingAsymmetryPercentage | percent |
| Mobility | walkingDoubleSupportPercentage | percent |
| Mobility | stairAscentSpeed | m/s |
| Mobility | stairDescentSpeed | m/s |
| Vitals | heartRate | count/time units |
| Vitals | restingHeartRate | count/time units |
| Vitals | heartRateVariabilitySDNN | milliseconds |
| Vitals | bloodPressure | pressure units |
| Vitals | oxygenSaturation | percent |
| Sleep | inBed, awake, asleep | 0, 1, 2 |


Additional variables may be added over time.

To query HK data from the rune platform, the rune API endpoint is Span with a start and an end time. The Health Kit data can be generated by an iPhone and/or an Apple Watch and may be recorded in parallel with both devices.


## Import the necessary packages/modules

We will utilize some commonly used Python packages/modules, including pandas dataframes. Config and stream are imported from the Rune SDK, runeq.

In [1]:
import os
import datetime as dt
import pandas as pd
from runeq import Config, stream


## Function for pulling "Span" data from the API

In Rune's Stream API, there are two different endpoints for pulling different types of patient-reported "event" data:

* Event: Events that occur at a single point in time
* Span:  Events that have duration, including "fuzzy" start and end times

More information on these endpoints can be found here: https://docs.runelabs.io/#tag/v1event-and-span

`get_spans` will pull HK data with the Stream API. These wrapper functions can be easily modified to access the other API endpoints, or to directly write the pulled data to a file.

In [2]:
def get_spans(client, params):
    """Makes API calls for spans, outputs dataframe"""

    accessor = client.Span(**params)

    df = pd.DataFrame()
    for page in accessor.iter_json_data():
        df_page = pd.DataFrame(page['span'])
        df = df.append(df_page, ignore_index=True)

    return df


## 1. Initialize API credentials

First initialize your API credentials. These credentials are analogous to having a username/password for accessing patient data. You can set up an access token for read access to all patients within your organization. See our [API doc](https://docs.runelabs.io/stream/#section/Overview/Authentication) for instructions on how to set this up.

Next, set up a .yaml file with your token ID and secret. This is text file that will store your credentials. See our [`runeq` quickstart](https://runeq.readthedocs.io/en/latest/pages/quickstart.html#configuration) for how to set this up.

Once this .yaml file is in place, it can be used to create a client object.

In [3]:
# Set up a client.

cfg = Config()
client = stream.V1Client(cfg)


If your .yaml file is not in the default location, its path can be passed as an argument to `Config()`.

This client object can now be used to make API calls. Next, we will specify the parameters for our API calls. Check out the full API documentation for required vs. optional parameters per endpoint.


## 2. Specify endpoint parameters and retrieve data

The patient ID and device ID can be accessed in the [research portal](https://app.runelabs.io/). The patient ID is at the top left corner of the screen when viewing patient data, and a "copy device ID" button is under each data stream. Both are also available in the patient settings menu (click the "gear" icon).

An optional device ID parameter can be included to pull events from a specific device. However, the device ID line can be removed (or commented out), so that all events will be pulled, regardless of their source. 

The [Rune research portal](https://app.runelabs.io/) is handy for finding windows of time that include the desired data. In this case, event data was examined in the research portal to find a window of five days that include a variety of patient reported events.


In [4]:
params = {
    'patient_id': '637c548a3c3c4e92ae46e4098df0f8d0',
    'start_time': dt.datetime(2021, 11, 14).timestamp(),
    'end_time': dt.datetime(2021, 11, 16).timestamp(),
    'event': 'device.healthkit',
}


## 3. Explore the Spans

We will explore the "Spans" (which have duration). To get a feel for the structure of the data, first examine the contents of the dataframe directly.


In [5]:
# Get data Spans

spans = get_spans(client, params)


In [6]:
# Display the contents of the dataframe.

spans


Unnamed: 0,start_time_min,start_time,end_time,end_time_max,device_id,id,event_namespace,event_type,event_enum,payload,display_name
0,1.636850e+09,1.636850e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-276929ad46ba2d01c2a151bc8610b8c3...,device,healthkit,distance_walking_running,"{'device': 'iPhone', 'distance_m': 137.946}",device.healthkit.distance_walking_running
1,1.636850e+09,1.636850e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-e970b814f19c7d56e5ca7cfde04b8a9c...,device,healthkit,step_count,"{'device': 'iPhone', 'step_count': 171}",device.healthkit.step_count
2,1.636851e+09,1.636851e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-807b8c3f75b52127aa28a5a01e076b62...,device,healthkit,distance_walking_running,"{'device': 'iPhone', 'distance_m': 26.5125}",device.healthkit.distance_walking_running
3,1.636851e+09,1.636851e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-9960b61ff7f2f3e25c7345885fe35c81...,device,healthkit,step_count,"{'device': 'iPhone', 'step_count': 36}",device.healthkit.step_count
4,1.636852e+09,1.636852e+09,1.636853e+09,1.636853e+09,SjsbJKZR,event-df0f8d0-6a151e5d1f7b96ec4f73990d4242ffd8...,device,healthkit,distance_walking_running,"{'device': 'iPhone', 'distance_m': 54.669}",device.healthkit.distance_walking_running
...,...,...,...,...,...,...,...,...,...,...,...
264,1.637009e+09,1.637009e+09,1.637009e+09,1.637009e+09,SjsbJKZR,event-df0f8d0-9bbf4703287f16240f586280d4ce6aad...,device,healthkit,distance_walking_running,"{'device': 'iPhone', 'distance_m': 89.5135}",device.healthkit.distance_walking_running
265,1.637010e+09,1.637010e+09,1.637010e+09,1.637010e+09,SjsbJKZR,event-df0f8d0-533cacd9cf09b4ab4938eb7534956737...,device,healthkit,distance_walking_running,"{'device': 'iPhone', 'distance_m': 163.934}",device.healthkit.distance_walking_running
266,1.637010e+09,1.637010e+09,1.637010e+09,1.637010e+09,SjsbJKZR,event-df0f8d0-7cd7ec64760055c9532dfce6730d839f...,device,healthkit,step_count,"{'device': 'iPhone', 'step_count': 169}",device.healthkit.step_count
267,1.637016e+09,1.637016e+09,1.637016e+09,1.637016e+09,SjsbJKZR,event-df0f8d0-16c2106991036ee4f4fd8fe090e9839f...,device,healthkit,distance_walking_running,"{'device': 'iPhone', 'distance_m': 14.9082}",device.healthkit.distance_walking_running


#### Find the available event types:

We expect to find 'healthkit' as the only event_type, because we used the parameter 'event': 'device.healthkit' in the API call.


In [7]:
span_event_types = spans.loc[spans['event_namespace'] == 'device'].event_type.unique()
span_event_types


array(['healthkit'], dtype=object)

#### Explore the possible event enums for event types in the patient namespace:

In [8]:
# Loop through the event types and print all unique enums for each.

for event_type in span_event_types:
    print(f'* "{event_type}" event enums')
    enums = spans.loc[spans['event_type'] == event_type].event_enum.unique()
    for enum in enums:
        print(f'    * {enum}')
        

* "healthkit" event enums
    * distance_walking_running
    * step_count
    * walking_speed
    * walking_double_support_percentage
    * walking_step_length
    * walking_asymmetry_percentage
    * sleep_analysis
    * apple_stand_time
    * apple_exercise_time



In the list above are the specific spans that the device recorded during the chosen time window. To extract specific spans, one could parse them from the complete dataframe or pull them directly from the API.


### Save the complete dataframe to a CSV file:

In [9]:
BASE_PATH = '~/Documents/api_data/'


In [10]:
# Create specific path for this file by appending the current date/time to avoid overwriting previous data.
timestr = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
save_filename = 'spans_' + timestr + '.csv'
save_filepath = os.path.join(BASE_PATH, save_filename)

# Save the dataframe to the specified file.
spans.to_csv(save_filepath, index=False)


### Examine the payload of a span:

Additional information may be relevant, such as the quality or interruptions of sleep, which are packed into the payload.

Examine the payload of the first row in the dataframe:


In [11]:
spans.iloc[0].payload


{'device': 'iPhone', 'distance_m': 137.946}

There seem to be several interesting variables in the payload, so it might be inconvenient to have the payload packed into one column of the CSV file. To make the data clearer, extract one type of span from the dataframe, and then expand the payload into separate columns in the dataframe:


In [12]:
# Extract 'sleep' rows from the dataframe.
healthkit_spans = spans[spans.event_type == 'healthkit']

# Expand the payload into columns.
healthkit_spans_expanded = pd.concat([healthkit_spans.drop(
    'payload', axis=1), pd.DataFrame(healthkit_spans['payload'].tolist())], axis=1)

# Examine result.
healthkit_spans_expanded

Unnamed: 0,start_time_min,start_time,end_time,end_time_max,device_id,id,event_namespace,event_type,event_enum,display_name,...,step_count,speed_mps,percentage,length_m,placement_side_index,placement_side_label,state_index,state_label,time_zone,time_s
0,1.636850e+09,1.636850e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-276929ad46ba2d01c2a151bc8610b8c3...,device,healthkit,distance_walking_running,device.healthkit.distance_walking_running,...,,,,,,,,,,
1,1.636850e+09,1.636850e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-e970b814f19c7d56e5ca7cfde04b8a9c...,device,healthkit,step_count,device.healthkit.step_count,...,171.0,,,,,,,,,
2,1.636851e+09,1.636851e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-807b8c3f75b52127aa28a5a01e076b62...,device,healthkit,distance_walking_running,device.healthkit.distance_walking_running,...,,,,,,,,,,
3,1.636851e+09,1.636851e+09,1.636851e+09,1.636851e+09,SjsbJKZR,event-df0f8d0-9960b61ff7f2f3e25c7345885fe35c81...,device,healthkit,step_count,device.healthkit.step_count,...,36.0,,,,,,,,,
4,1.636852e+09,1.636852e+09,1.636853e+09,1.636853e+09,SjsbJKZR,event-df0f8d0-6a151e5d1f7b96ec4f73990d4242ffd8...,device,healthkit,distance_walking_running,device.healthkit.distance_walking_running,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
264,1.637009e+09,1.637009e+09,1.637009e+09,1.637009e+09,SjsbJKZR,event-df0f8d0-9bbf4703287f16240f586280d4ce6aad...,device,healthkit,distance_walking_running,device.healthkit.distance_walking_running,...,,,,,,,,,,
265,1.637010e+09,1.637010e+09,1.637010e+09,1.637010e+09,SjsbJKZR,event-df0f8d0-533cacd9cf09b4ab4938eb7534956737...,device,healthkit,distance_walking_running,device.healthkit.distance_walking_running,...,,,,,,,,,,
266,1.637010e+09,1.637010e+09,1.637010e+09,1.637010e+09,SjsbJKZR,event-df0f8d0-7cd7ec64760055c9532dfce6730d839f...,device,healthkit,step_count,device.healthkit.step_count,...,169.0,,,,,,,,,
267,1.637016e+09,1.637016e+09,1.637016e+09,1.637016e+09,SjsbJKZR,event-df0f8d0-16c2106991036ee4f4fd8fe090e9839f...,device,healthkit,distance_walking_running,device.healthkit.distance_walking_running,...,,,,,,,,,,


**Note**: The result is a bit ugly, because these spans correspond to different HK enums and have differnet payloads. Some filtering could be applied to reduce to the same type/enum before pivoting.


### Save the expanded version to a CSV file:

In [13]:
# Create specific path for this file by appending the current date/time to avoid overwriting previous data.
timestr = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
save_filename = 'healthkit_expanded_' + timestr + '.csv'
save_filepath = os.path.join(BASE_PATH, save_filename)

# Save the dataframe to the specified file.
healthkit_spans_expanded.to_csv(save_filepath, index=False)


## 4. Summary

This template serves as a tutorial for exploring and pulling [Health Kit](https://developer.apple.com/documentation/healthkit) data from the StrivePD App. The data is pulled using the Spans endpoint of the [Rune API](https://docs.runelabs.io/). The included procedure can be followed to explore if Health Kit data is available. Once the desired classifications of data (namespace, category, and enum) are ascertained, specific HK data can be pulled from the API and manipulated or stored for further use.