# Understanding Types of Data Streams Used by the Rune Labs API/SDK

This tutorial explains the structure of data "stream types" that are returned by the Rune Labs API. Because a wide variety of multimodal data are handled by the platform, a set of standardized stream types are utilized.

This tutorial follows: [Pulling Stream Data with the Rune Labs API/SDK](05_pulling_stream_data.ipynb).

For detailed information:
* [Rune Labs API documentation](https://docs.runelabs.io)
* [Rune Labs SDK documentation](https://runeq.readthedocs.io/en/latest/)
* [Rune Labs open source code respository](https://github.com/rune-labs/runeq-python/tree/main/examples) (which includes this notebook)

---

## Set Up

Initialize the Rune SDK with your platform credentials, as described previously: [Getting Started with the Rune Labs API/SDK](01_getting_started_with_Rune_SDK.ipynb)

In [1]:
# Initialize the SDK.
from runeq import initialize

initialize()

To confirm that you have successfully initialized the SDK in your current script or notebook, pull your own information using the `get_current_user()` function.

In [2]:
# Get the ID and name of the current user, based on API credentials.
from runeq.resources.user import get_current_user

my_user = get_current_user()
print(my_user)

User(id="user-b9c372f2b315a6c6cfd9b5ef7eba81e5ef7866d1,user", name="Gavin Philips")


In [3]:
# Confirm user's current active org.
print('Active Org:', my_user.active_org_name)

Active Org: Rune Demo


---

## Stream Types

The Rune Labs platform handles many different types of data, so each stream of data is ingested to the most appropriate stream type. The type defines the dimensions and units of a stream.

To explore a list of stream types currently utilized on the platform, use the `get_all_stream_types` function.

In [4]:
# Get and display a list of all stream types.
from runeq.resources.stream_metadata import get_all_stream_types
import pandas as pd

stream_types = get_all_stream_types()
stream_types.to_dataframe().sort_values(by='name',ignore_index=True)

Unnamed: 0,name,description,dimensions,id
0,Acceleration,Acceleration sampled in the time-domain.,"[{'data_type': 'timestamp', 'quantity_name': '...",acceleration
1,Boolean,True/False condition (or On/Off state) calcula...,"[{'data_type': 'timestamp', 'quantity_name': '...",boolean
2,Byte Data,Raw binary data stored as a sequence of bytes.,"[{'data_type': 'uint', 'quantity_name': 'Offse...",bytes
3,Count,"Count of events or other data points, observed...","[{'data_type': 'timestamp', 'quantity_name': '...",count
4,Current,Electric current (in Amperes) sampled in the t...,"[{'data_type': 'timestamp', 'quantity_name': '...",current
5,Distance,"Distance, sampled or calculated over time.","[{'data_type': 'timestamp', 'quantity_name': '...",distance
6,Duration,"Durations, sampled or calculated over time.","[{'data_type': 'timestamp', 'quantity_name': '...",duration
7,Event,Occurrences of a specific type or class of eve...,"[{'data_type': 'timestamp', 'quantity_name': '...",event
8,FFT Time-Domain,"Electric potential in frequency domain, calcul...","[{'data_type': 'timestamp', 'quantity_name': '...",voltage-spectrum
9,Frequency,Frequency samples (in Hertz) recorded in the t...,"[{'data_type': 'timestamp', 'quantity_name': '...",frequency


Each stream type has a name, description, set of dimensions, and an ID that can be used as a query parameter.

---

### Examples

#### Boolean streams

Take a closer look at one of the available stream types using its ID:

In [5]:
# Examine the boolean stream type.
from pprint import pprint

pprint(stream_types.get('boolean').to_dict())

{'description': 'True/False condition (or On/Off state) calculated or measured '
                'over time.',
 'dimensions': [{'data_type': 'timestamp',
                 'id': 'time',
                 'quantity_abbrev': 't',
                 'quantity_name': 'Time',
                 'unit_abbrev': 'ns',
                 'unit_name': 'Nanoseconds'},
                {'data_type': 'bool',
                 'id': 'bool',
                 'quantity_abbrev': 'Result',
                 'quantity_name': 'Binary Result',
                 'unit_abbrev': '',
                 'unit_name': ''}],
 'id': 'boolean',
 'name': 'Boolean'}


All of the available streams of time series data have `time` as their first dimension, representing the timestamp of each sample. This is stored in nanoseconds, in UTC time. However, our API can return timestamps in several formats (using the `timestamp` parameter in a call such as [get_stream_dataframe()](https://runeq.readthedocs.io/en/latest/pages/resources.html#runeq.resources.stream_metadata.StreamMetadataSet.get_stream_dataframe)).

The boolean stream type has one other dimension, which is of course of `data_type` `bool` (a binary zero or one). So each dimension is a column in a retrieved dataframe: `time` and `bool`.

Time series of boolean data will be ingested into our platform using this boolean stream type. Each timestamp will have one binary value. This stream type is often used to represent the state of a setting on a device over time, such as therapy (stimulation) of a DBS device being on/off.

#### Voltage spectrum streams

Take a closer look at a stream type with more dimensions:

In [6]:
# Examine the voltage spectrum stream type.
pprint(stream_types.get('voltage-spectrum').to_dict())

{'description': 'Electric potential in frequency domain, calculated over a '
                'series of time windows.',
 'dimensions': [{'data_type': 'timestamp',
                 'id': 'time',
                 'quantity_abbrev': 't',
                 'quantity_name': 'Time',
                 'unit_abbrev': 'ns',
                 'unit_name': 'Nanoseconds'},
                {'data_type': 'ufloat',
                 'id': 'frequency',
                 'quantity_abbrev': 'Freq',
                 'quantity_name': 'Frequency',
                 'unit_abbrev': 'Hz',
                 'unit_name': 'Hertz'},
                {'data_type': 'sfloat',
                 'id': 'potential',
                 'quantity_abbrev': 'Potential',
                 'quantity_name': 'Potential',
                 'unit_abbrev': 'V',
                 'unit_name': 'Volts'}],
 'id': 'voltage-spectrum',
 'name': 'FFT Time-Domain'}


A voltage spectrum is two-dimensional, with a voltage `potential` value for each `frequency` bin. Multiple spectra can be captured over time, so `time` is the third dimension, representing the beginning of the time during which the data was captured. So a retrieved dataframe will have a column for each dimension: `time`, `frequency`, and `potential`.

This stream type is used to represent data such as "BrainSense Survey" results or the "frequency snapshots" associated with "Patient Events" from the Medtronic Percept DBS device.

#### Event streams

An `event` type stream is a bit different from numerical stream types. Take a closer look:

In [7]:
# Examine the event stream type.
pprint(stream_types.get('event').to_dict())

{'description': 'Occurrences of a specific type or class of event over time',
 'dimensions': [{'data_type': 'timestamp',
                 'id': 'time',
                 'quantity_abbrev': 't',
                 'quantity_name': 'Time',
                 'unit_abbrev': 'ns',
                 'unit_name': 'Nanoseconds'},
                {'data_type': 'dict',
                 'id': 'event',
                 'quantity_abbrev': 'Payload',
                 'quantity_name': 'Payload',
                 'unit_abbrev': '',
                 'unit_name': ''}],
 'id': 'event',
 'name': 'Event'}


Like other stream types, the first dimension is `time`. However, while the additional dimension(s) of numerical stream types match the data (e.g., `sfloat` for voltage potential), the second dimension of an `event` stream has `'data_type': 'dict'`. This is a payload in the form of a Python dictionary, which can contain a variety of data.

If a patient logs that they have taken a medication with the StrivePD app, this is stored as an event. Rather than being represented as a numerical value, it is a timestamped payload of information. The payload may contain the name of the medication taken, the dosage, etc. This payload is packed into one value in the `event` column, and can be parsed and used as desired.

This stream type is used to represent a wide variety of data, including patient-reported events (e.g., medication) and device information (e.g., settings or diagnostics).

#### Span streams

A `span` type stream is similar to an `event` stream. However, while an `event` represents a single point in time, a `span` represents a span of time. Rather than one timestamp, it has a beginning and an end. Take a closer look:

In [8]:
# Examine the span stream type.
pprint(stream_types.get('span').to_dict())

{'description': 'Occurrences of a specific type or class of event that spans a '
                'finite interval of time.',
 'dimensions': [{'data_type': 'timestamp',
                 'id': 'start_time_min',
                 'quantity_abbrev': 't',
                 'quantity_name': 'Time',
                 'unit_abbrev': 'ns',
                 'unit_name': 'Nanoseconds'},
                {'data_type': 'timestamp',
                 'id': 'start_time',
                 'quantity_abbrev': 't',
                 'quantity_name': 'Time',
                 'unit_abbrev': 'ns',
                 'unit_name': 'Nanoseconds'},
                {'data_type': 'timestamp',
                 'id': 'end_time',
                 'quantity_abbrev': 't',
                 'quantity_name': 'Time',
                 'unit_abbrev': 'ns',
                 'unit_name': 'Nanoseconds'},
                {'data_type': 'timestamp',
                 'id': 'end_time_max',
                 'quantity_abbrev': 't',
         

Like an `event` stream, one dimension is a Python dictionary payload. However, the `time` dimension has been replaced by four additional dimensions:
* `start_time_min`
* `start_time`
* `end_time`
* `end_time_max`

`start_time` and `end_time` are straightforward, representing the start and end of the span of time. `start_time_min` and `end_time_max` exist for cases in which the start and end times are imprecise or "fuzzy." For example, a patient might log that they experienced dyskinesia starting "3-4 hours ago," or exercised for "1-2 hours." In most cases, where timings are precise, `start_time` will match `start_time_min` and `end_time` will match `end_time_max`.

This stream type is also used to represent a wide variety of data, including patient reported events that lasted for a period of time (e.g., symptoms or activity) and device programming or recording sessions.

---

The stream types used to represent data produced by different algorithms are indicated in the algorithm documentation, and can be queried for directly by using the `stream_type_id` parameter.

Additional stream types are added as needed for new data sources.

For a simplified representation of data availability, which is consistent across stream types, see: [Checking Data Availability](07_checking_data_availability.ipynb)