# Explore Patient-Reported Data from StrivePD

This demo walks through the process of exploring and pulling patient-reported data (e.g., symptoms, medications, activity, food/drink) from the Rune platform API. Data recorded via Rune's StrivePD app is used as an example, but the included methods will work for all data available from the Event and Span endpoints.

We utilize the Rune Stream API and Python package `runeq`. Full API documentation can be found here: https://docs.runelabs.io/. Information about the Python package (SDK) can be found here: https://runeq.readthedocs.io/en/latest/.

Information on StrivePD can be found here: https://strive.group/.

## Import the necessary packages/modules

We will utilize some commonly used Python packages/modules, including pandas dataframes. Config and stream are imported from the Rune SDK, runeq.

In [1]:
import os
import datetime as dt
import pandas as pd
from runeq import Config, stream

## Functions for pulling "Event" and "Span" data from the API

In Rune's Stream API, there are two different endpoints for pulling different types of patient-reported "event" data:

* Event: Events that occur at a single point in time
* Span:  Events that have duration, including "fuzzy" start and end times

More information on these endpoints can be found here: https://docs.runelabs.io/#tag/v1event-and-span

`get_events` and `get_spans` will pull data with the Stream API. These wrapper functions can be easily modified to access the other API endpoints, or to directly write the pulled data to a file.

In [2]:
def get_events(client, params):
    """Makes API calls for events, outputs dataframe"""

    accessor = client.Event(**params)

    df = pd.DataFrame()
    for page in accessor.iter_json_data():
        df_page = pd.DataFrame(page['event'])
        df = df.append(df_page, ignore_index=True)

    return df


def get_spans(client, params):
    """Makes API calls for spans, outputs dataframe"""

    accessor = client.Span(**params)

    df = pd.DataFrame()
    for page in accessor.iter_json_data():
        df_page = pd.DataFrame(page['span'])
        df = df.append(df_page, ignore_index=True)

    return df

## 1. Initialize API credentials

First initialize your API credentials. These credentials are analogous to having a username/password for accessing patient data. You can set up an access token for read access to all patients within your organization. See our [API doc](https://docs.runelabs.io/stream/#section/Overview/Authentication) for instructions on how to set this up.

Next, set up a .yaml file with your token ID and secret. This is text file that will store your credentials. See our [`runeq` quickstart](https://runeq.readthedocs.io/en/latest/pages/quickstart.html#configuration) for how to set this up.

Once this .yaml file is in place, it can be used to create a client object.

In [3]:
# Set up a client.

cfg = Config()
client = stream.V1Client(cfg)

If your .yaml file is not in the default location, its path can be passed as an argument to `Config()`.

This client object can now be used to make API calls. Next, we will specify the parameters for our API calls. Check out the full API documentation for required vs. optional parameters per endpoint.

## 2. Specify endpoint parameters and retrieve data

The patient ID and device ID can be accessed in the [research portal](https://app.runelabs.io/). The patient ID is at the top left corner of the screen when viewing patient data, and a "copy device ID" button is under each data stream. Both are also available in the patient settings menu (click the "gear" icon).

An optional device ID parameter can be included to pull events from a specific device. However, the device ID line can be removed (or commented out), so that all events will be pulled, regardless of their source. 

The [Rune research portal](https://app.runelabs.io/) is handy for finding windows of time that include the desired data. In this case, event data was examined in the research portal to find a window of five days that include a variety of patient reported events.

In [4]:
params = {
    'patient_id': 'c985db94656040cc8c4b191ea0e82d4f',
    'device_id': '84JO8cfE',
    'start_time': dt.datetime(2021, 2, 15).timestamp(),
    'end_time': dt.datetime(2021, 2, 20).timestamp()
}

Each endpoint requires a separate pull from the API:

In [5]:
# Retrieve data.

events = get_events(client, params)
spans = get_spans(client, params)

## 3. Explore the Events

First, we'll focus on the "Events" (which occur at a single point in time).

To get a feel for the structure of the data, first examine the contents of the dataframe directly.

In [6]:
# Display the contents of the dataframe.

events

Unnamed: 0,time,created_time,device_id,id,event_namespace,event_type,event_enum,payload,display_name
0,1613506000.0,1613520000.0,84JO8cfE,event-0e82d4f-03fed649a0177b60ba5613645d2117b1...,patient,medication,carbidopa-levodopa,"{'category': 'Medication', 'dosage_quantity': ...",patient.medication.carbidopa-levodopa
1,1613507000.0,1613520000.0,84JO8cfE,event-0e82d4f-38b68f62b47d7d2fc7222d460b828b6c...,patient,food-drink,caffeine,"{'category': 'Food & Drink', 'name': 'Caffeine'}",patient.food-drink.caffeine
2,1613513000.0,1613521000.0,84JO8cfE,event-0e82d4f-50a6710320ab862cb5ad5e1c838f5f35...,patient,medication,carbidopa-levodopa,"{'category': 'Medication', 'dosage_quantity': ...",patient.medication.carbidopa-levodopa
3,1613513000.0,1613520000.0,84JO8cfE,event-0e82d4f-fff8551303f31e8587a5c3ee51671f89...,patient,medication,advil,"{'category': 'Medication', 'dosage_quantity': ...",patient.medication.advil
4,1613601000.0,1613601000.0,84JO8cfE,event-0e82d4f-94fa4c50ed604d26acd02101c9e5a7dd...,patient,food-drink,caffeine,"{'category': 'Food & Drink', 'name': 'Caffeine'}",patient.food-drink.caffeine
5,1613601000.0,1613601000.0,84JO8cfE,event-0e82d4f-5ca83f287a96fbcb05b81b6248a7d183...,patient,symptom,on-period,"{'category': 'Symptom', 'duration': None, 'int...",patient.symptom.on-period
6,1613601000.0,1613601000.0,84JO8cfE,event-0e82d4f-cc75f6224eaa96c95df7778ca01d52f9...,patient,medication,advil,"{'category': 'Medication', 'dosage_quantity': ...",patient.medication.advil
7,1613749000.0,1613749000.0,84JO8cfE,event-0e82d4f-045554080432c0abc94fb8a415bf6f77...,patient,symptom,ocd,"{'category': 'StriveStudy', 'duration': None, ...",patient.symptom.ocd
8,1613749000.0,1613749000.0,84JO8cfE,event-0e82d4f-b5c524f696f6b0f0f33c290ca8f59a3b...,patient,medication,advil,"{'category': 'Medication', 'dosage_quantity': ...",patient.medication.advil
9,1613751000.0,1613751000.0,84JO8cfE,event-0e82d4f-88701de815e9b2ce6f214d3dc181c3a5...,patient,symptom,ocd,"{'category': 'StriveStudy', 'duration': None, ...",patient.symptom.ocd


A few things are evident from a quick glance at the dataframe:

* There are 11 different events in this time frame (as there are 11 rows in the dataframe).
* There are two timestamps:
    * "time": When the event occurred
    * "created_time": When the patient entered the event into Strive
* The event_namespace, event_type, and event_enum categorize the events
    * This is explained in the [API documentation](https://docs.runelabs.io/#tag/v1event-and-span).
* There is additional information about many of the events in the payload.

### Find the available event namespaces:

Explore the categorization of events further by finding all of the namespaces (sources of data) available in this dataset.

In [7]:
events.event_namespace.unique()

array(['patient'], dtype=object)

Because a device ID was specified, and it corresponds to the StrivePD app, only the "patient" namespace appears. Other devices (e.g., Apple Watches or deep brain stimulators) are associated with different namespaces.

### Find the available event types for each event namespace:

In [8]:
events.loc[events['event_namespace'] == 'patient'].event_type.unique()

array(['medication', 'food-drink', 'symptom'], dtype=object)

* patient:
    * medication
    * food-drink
    * symptom
    
These are the types of events that exist for this patient during the chosen time window, but different event types might exist at other times, or for other devices or patients.

### Explore the possible event enums for event types in the patient namespace:

In [9]:
# Loop through the event types and print all unique enums for each.

for event_type in ('medication', 'food-drink', 'symptom'):
    print(f'* "{event_type}" event enums')
    enums = events.loc[events['event_type'] == event_type].event_enum.unique()
    for enum in enums:
        print(f'    * {enum}')

* "medication" event enums
    * carbidopa-levodopa
    * advil
* "food-drink" event enums
    * caffeine
* "symptom" event enums
    * on-period
    * ocd


These are the specific events that the patient reported during the chosen time window.

The symptom events may seem out of place, as one might expect symptoms to have a duration (and thus come from the Span
endpoint). In this case, the "on-period" symptom was mistakenly reported without a duration. The "ocd" symptoms are periodic reports of obsessive compulsive disorder symptom severity using an OCD-specific measure.

Let's focus on the "carbidopa-levodopa" medication events. To extract just those events, one could parse them from the complete dataframe, or pull them directly from the API.

To pull them directly from the API, use the "event" parameter of the Event endpoint.

### Pull a specific type of event from the API:

In [10]:
# Copy the previously use parameters.
params_levodopa = params.copy()

# Add the desired "event" parameter to the new set of parameters.
params_levodopa['event'] = 'patient.medication.carbidopa-levodopa'

# Pull from the API with the specific parameter set.
medication_levodopa_events = get_events(client, params_levodopa)

### Examine the dataframe of carbidopa-levodopa medication events:

In [11]:
medication_levodopa_events

Unnamed: 0,time,created_time,device_id,id,event_namespace,event_type,event_enum,payload,display_name
0,1613506500,1613520000.0,84JO8cfE,event-0e82d4f-03fed649a0177b60ba5613645d2117b1...,patient,medication,carbidopa-levodopa,"{'category': 'Medication', 'dosage_quantity': ...",patient.medication.carbidopa-levodopa
1,1613512800,1613521000.0,84JO8cfE,event-0e82d4f-50a6710320ab862cb5ad5e1c838f5f35...,patient,medication,carbidopa-levodopa,"{'category': 'Medication', 'dosage_quantity': ...",patient.medication.carbidopa-levodopa


This dataframe can be saved to a CSV file for local storage and access in other environments (e.g., Matlab).

### Save the complete dataframe to a CSV file:

In [12]:
# Specify base path for saving data.
BASE_PATH = '~/Documents/api_data/'

# Create specific path for this file,
#  appending the current date/time to avoid overwriting previous data.
timestr = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
save_filename = 'levodopa_' + timestr + '.csv'
save_filepath = os.path.join(BASE_PATH, save_filename)

# Save the dataframe to the specified file.
medication_levodopa_events.to_csv(save_filepath, index=False)

### Examine the payload of an event:

In this particular case, additional information may be relevant, such as the dosage strength or quanitity, which are packed into the payload.

Examine the payload of the first row in the dataframe:

In [12]:
medication_levodopa_events.iloc[0].payload

{'category': 'Medication',
 'dosage_quantity': None,
 'dosage_strength': None,
 'failed_dose': False,
 'name': 'Carbidopa/Levodopa',
 'on_schedule': False}

It appears this patient did not report the dosage strength or quantity in this instance.

In some cases, it might be inconvenient to have the payload packed into one column of the CSV file.

Expand the payload into separate columns in the dataframe:

In [13]:
medication_levodopa_events_expanded = pd.concat([medication_levodopa_events.drop(
    'payload', axis=1), pd.DataFrame(medication_levodopa_events['payload'].tolist())], axis=1)

# Examine the result.
medication_levodopa_events_expanded

Unnamed: 0,time,created_time,device_id,id,event_namespace,event_type,event_enum,display_name,category,dosage_quantity,dosage_strength,failed_dose,name,on_schedule
0,1613506500,1613520000.0,84JO8cfE,event-0e82d4f-03fed649a0177b60ba5613645d2117b1...,patient,medication,carbidopa-levodopa,patient.medication.carbidopa-levodopa,Medication,,,False,Carbidopa/Levodopa,False
1,1613512800,1613521000.0,84JO8cfE,event-0e82d4f-50a6710320ab862cb5ad5e1c838f5f35...,patient,medication,carbidopa-levodopa,patient.medication.carbidopa-levodopa,Medication,0.25,,False,Carbidopa/Levodopa,False


### Save the expanded version to a CSV file:

In [15]:
# Create a specific path for this expanded levodopa file,
#  appending the current date/time to avoid overwriting previous data.
timestr = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
save_filename = 'medication_levodopa_expanded_' + timestr + '.csv'
save_filepath = os.path.join(BASE_PATH, save_filename)

# Save the dataframe to the specified file.
medication_levodopa_events_expanded.to_csv(save_filepath, index=False)

### An aside on expanding payloads in dataframes:

The code used above to expand the payload of a particular event type or enum can be applied to dataframes that include more varied events. All of the fields in the different payloads become columns, and events that do not include some of those fields will have NaN values.

In the above example, just substitute `events` for `medication_levodopa_events`.

## 4. Explore the Spans

Next, we'll explore the "Spans" (which have duration).

To get a feel for the structure of the data, first examine the contents of the dataframe directly.

In [14]:
# Display the contents of the dataframe.

spans

Unnamed: 0,start_time_min,start_time,end_time,end_time_max,created_time,device_id,id,event_namespace,event_type,event_enum,payload,display_name
0,1613502000.0,1613502000.0,1613504000.0,1613504000.0,1613520000.0,84JO8cfE,event-0e82d4f-7cfc718bc93ec92e1c2cc407065792b7...,patient,sleep,normal,"{'category': 'Sleep', 'hours_slept': '0.5', 'n...",patient.sleep.normal
1,1613505000.0,1613505000.0,1613505000.0,1613508000.0,1613520000.0,84JO8cfE,event-0e82d4f-9af8d6d42c8f34b7984f40c77e3d48eb...,patient,symptom,off-period,"{'category': 'Symptom', 'duration': '<=1h', 'i...",patient.symptom.off-period
2,1613509000.0,1613509000.0,1613509000.0,1613513000.0,1613521000.0,84JO8cfE,event-0e82d4f-284e282ff185184c317b30be1faf0fc3...,patient,symptom,on-period,"{'category': 'Symptom', 'duration': '<=1h', 'i...",patient.symptom.on-period
3,1613509000.0,1613509000.0,1613509000.0,1613511000.0,1613520000.0,84JO8cfE,event-0e82d4f-2edb4019ed381ff7e0d8260373638fdd...,patient,activity,boxing,"{'category': 'Activity', 'duration': '<=30m', ...",patient.activity.boxing
4,1613516000.0,1613516000.0,1613516000.0,1613517000.0,1613521000.0,84JO8cfE,event-0e82d4f-9bb5604f3d43b24a253edb95ff3bdefc...,patient,side-effect,dyskinesia,"{'category': 'Side Effect', 'duration': '<=30m...",patient.side-effect.dyskinesia
5,1613604000.0,1613604000.0,1613604000.0,1613606000.0,1613608000.0,84JO8cfE,event-0e82d4f-c1aedb55a9f635218bb9c82df260b77e...,patient,activity,tai-chi,"{'category': 'Activity', 'duration': '<=30m', ...",patient.activity.tai-chi
6,1613607000.0,1613607000.0,1613607000.0,1613608000.0,1613608000.0,84JO8cfE,event-0e82d4f-74323a6322148bc8a90bfe00bf7e8bd9...,patient,symptom,balance-issues,"{'category': 'Symptom', 'duration': '<=15m', '...",patient.symptom.balance-issues
7,1613756000.0,1613756000.0,1613756000.0,1613758000.0,1613758000.0,84JO8cfE,event-0e82d4f-eef1c7766693431ea992b27f9af3ef2b...,patient,activity,tai-chi,"{'category': 'Activity', 'duration': '<=30m', ...",patient.activity.tai-chi


A few things are evident from a quick glance at the dataframe:

* There are only seven different events in this time frame (as there are seven rows in the dataframe).
* There are five timestamps:
    * "start_time_min": A "fuzzy" or uncertain start of the event
    * "start_time": The time when the event is certain to have started
    * "end_time": The time when the event is certain to have ended
    * "end_time_max": A "fuzzy" or uncertain end of the event
    * "created_time": When the patient entered the event into Strive
    * (If the patient did not mark a "fuzzy" duration, the pairs of start/end times will match.)
* The event_namespace, event_type, and event_enum categorize the spans, as they do events.
* There is additional information about the spans in the payload.

The possible event_namespace, event_type, and event_num values can be explored for spans exactly as they were explored for events.

### Find the available event types and enums:

In [15]:
spans.loc[spans['event_namespace'] == 'patient'].event_type.unique()

array(['sleep', 'symptom', 'activity', 'side-effect'], dtype=object)

In [16]:
# Loop through the event types and print all unique enums for each.

for event_type in ('activity', 'side-effect', 'sleep', 'symptom'):
    print(f'* "{event_type}" event enums')
    enums = spans.loc[spans['event_type'] == event_type].event_enum.unique()
    for enum in enums:
        print(f'    * {enum}')

* "activity" event enums
    * boxing
    * tai-chi
* "side-effect" event enums
    * dyskinesia
* "sleep" event enums
    * normal
* "symptom" event enums
    * off-period
    * on-period
    * balance-issues


These are the specific spans that the patient reported during the chosen time window.

To extract specific spans, one could parse them from the complete dataframe, or pull them directly from the API, as in the previous "Event" example.

### Save the complete dataframe to a CSV file:

In [19]:
# Create specific path for this file,
#  appending the current date/time to avoid overwriting previous data.
timestr = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
save_filename = 'spans_' + timestr + '.csv'
save_filepath = os.path.join(BASE_PATH, save_filename)

# Save the dataframe to the specified file.
spans.to_csv(save_filepath, index=False)

### Examine the payload of a span:

Additional information may be relevant, such as the quality or interruptions of sleep, which are packed into the payload.

Examine the payload of the first row in the dataframe:

In [17]:
spans.iloc[0].payload

{'category': 'Sleep',
 'hours_slept': '0.5',
 'name': 'Sleep',
 'nap': True,
 'quality': 'poor',
 'times_woken_up': '0'}

There seem to be several interesting variables in the payload, so it might be inconvenient to have the payload packed into one column of the CSV file.

However, the fields in the payload may not apply to other types of spans. Symptoms have no "nap" variable. If all of these span types are expanded together, there will be columns with many NaN values.

To make the data clearer, extract one type of span from the dataframe, and then expand the payload into separate columns in the dataframe:

In [18]:
# Extract 'sleep' rows from the dataframe.
sleep_spans = spans[spans.event_type == 'sleep']

# Expand the payload into columns.
sleep_spans_expanded = pd.concat([sleep_spans.drop(
    'payload', axis=1), pd.DataFrame(sleep_spans['payload'].tolist())], axis=1)

# Examine result.
sleep_spans_expanded

Unnamed: 0,start_time_min,start_time,end_time,end_time_max,created_time,device_id,id,event_namespace,event_type,event_enum,display_name,category,hours_slept,name,nap,quality,times_woken_up
0,1613502000.0,1613502000.0,1613504000.0,1613504000.0,1613520000.0,84JO8cfE,event-0e82d4f-7cfc718bc93ec92e1c2cc407065792b7...,patient,sleep,normal,patient.sleep.normal,Sleep,0.5,Sleep,True,poor,0


### Save the expanded version to a CSV file:

In [34]:
# Create specific path for this file,
#  appending the current date/time to avoid overwriting previous data.
timestr = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
save_filename = 'sleep_expanded_' + timestr + '.csv'
save_filepath = os.path.join(BASE_PATH, save_filename)

# Save the dataframe to the specified file.
sleep_spans_expanded.to_csv(save_filepath, index=False)

## 5. Summary

This notebook serves as a tutorial for exploring and pulling patient-reported data from the Event and Span endpoints of the [Rune API](https://docs.runelabs.io/). The included procedure can be followed to explore any data that takes the form of "events" (i.e., occur at one point in time) or "spans" (i.e., have duration). Once the desired classifications of data (namespace, category, and enum) are ascertained, specific data can be pulled from the API and manipulated or stored for further use.