In this notebook, we will analyze the dataset titles "Welltory COVID-19 and Wearables Open Data Research". This dataset is part of a research carried out in 2020. 

The goal behind this dataset is to "detect patterns regarding the COVID-19 disease; progression and recovery"

The dataset, created by the Welltory team, was made available to the public on a non-commercial basis in an effort to fight the pandemic

Source: https://github.com/Welltory/hrv-covid19/tree/master?tab=readme-ov-file

In [122]:
import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [123]:
pd.set_option('display.max_columns', None)
pd.set_option("expand_frame_repr", False)

# Initial Data Exploration & Assessment

### Data Lineage & Provenance

The dataset provided contains data of users with positive COVID-19 that agreed to participate in the research. The research involved tracking the users' (participants) symptomps, heart rate variability, and data, which involved using Welltory wearables that keep track of the different metrics, accessible through the Welltory app.

The data collected by the researchers involved: 
- "Heart rate variability measurements. Measurements were made with any Bluetooth-enabled heart rate monitor or with a smartphone camera with a high resolution - a method called Photoplethysmography (PPG). It is a simple optical technique used to detect blood volume changes in the microvascular bed of tissue to track the heartbeat. 

- Data from user-connected gadgets including devices such as Apple Watch and Garmin that sync with Google Fit or Apple Health.

- Clinically validated physical and mental health assessments. We created a feature specifically for this project, where people would add information about symptoms and test results." (Source: https://github.com/Welltory/hrv-covid19/tree/master?tab=readme-ov-file)

Since the main data that the research is based on is the HRV measurement, it is important to note that getting readings to estimate HRV have many factors that could influence PPG results such as the person's movements or hardware issues, which may raise questions about the data quality. 

The Welltory Team published a paper titled "Wavelet Analysis And Self-Similarity Of Photoplethysmography Signals For HRV Estimation And Quality Assessment", where they discuss their strategy to collect the data ensuring accurate readings, as well as documenting their signal processing process. To be specific, they introduced a new algorithm that does "not only detect peaks, but also identify corrupted signal parts. Their prepocessing procesure included specific steps to avoid signal issues such as "vanished signal and abrupt shift". Although the researchers seem to have taken appropriate steps to report the most accurate measurements, they mention that their algorithm demonstrates one limitation. In fact, they conclude that "the algorithm perform well on various PPG signals", but "cannot be used for HRV estimation from PPG signals collected during or right after exercise since [in which case] the PPG signal does not contain sufficient information". (Source: https://www.mdpi.com/1424-8220/21/20/6798#sec3dot1-sensors-21-06798)

The dataset is split among 9 different csv files namely: 
- participants.csv: containing general information about users (participants)
- hrv_measurements.csv: contains data based on heart rate variability (HRV) measurements collected from COVID-19 participants via the Welltory app
- blood_pressure.csv: Contains blood pressure data and derivatives. Fields functional_changes_index, circulatory_efficiency, kerdo_vegetation_index, robinson_index are calculated when heart rate data is available during measurement
- heart_rate.csv: contains raw heart rate intervals
- wearables.csv: Contains data collected from supported gadgets and aggregated by day
- sleep.csv: Contains data about user sleep collected from supported gadgets and aggregated by day
- weather.csv: Contains data about weather conditions for user's location aggregated by day
- surveys.csv: Contains results of health-related surveys that users take in Welltory app
- scales_description: Contains scales description, value, and meaning

To learn more about the columns in each file, please refer to the datatypes.md file provided by the researchers

# Data Profiling (EDA)

Profile the dataset to understand its characteristics (Done via EDA)

- Explore and interpret data structure, descriptive statistics, data quality, and variable - - relationships
    - Code and documented interpretation of data structure
    - Code and documented interpretation of descriptive statistics
    - Code and documented interpretation of data quality
    - Code and documented interpretation of variable relationships (and value distributions)
- Explore data visually with appropriate visualizations
    - Visualizations used are complete and appropriate and interpretation(s) are documented in the notebook
    - Visualizations follow best practices (titles, axes labels, etc)




In [124]:
df_blood_pressure = pd.read_csv("data/blood_pressure.csv")
print(df_blood_pressure.head())

    user_code  measurement_datetime  diastolic  systolic  functional_changes_index  circulatory_efficiency  kerdo_vegetation_index  robinson_index
0  01bad5a519  2020-04-29  22:33:33        100       150                       NaN                     NaN                     NaN             NaN
1  01bad5a519  2020-04-30  01:33:33        100       150                       NaN                     NaN                     NaN             NaN
2  01bad5a519  2020-04-30  09:16:38         95       140                      3.38                  4545.0                     6.0           141.4
3  01bad5a519  2020-04-30  12:16:38         95       140                       NaN                     NaN                     NaN             NaN
4  01bad5a519  2020-05-01  06:58:06         80       130                      2.89                  4000.0                     NaN           104.0


In [125]:
df_heart_rate = pd.read_csv("data/heart_rate.csv")
df_heart_rate.head()

Unnamed: 0,user_code,datetime,heart_rate,is_resting
0,007b8190cf,2020-04-26 04:49:25,70,0
1,01bad5a519,2020-04-23 06:21:03,74,0
2,01bad5a519,2020-04-23 09:46:01,82,0
3,01bad5a519,2020-04-23 14:05:06,90,0
4,01bad5a519,2020-04-24 03:41:18,72,0


In [126]:
df_hrv_measurements = pd.read_csv("data/hrv_measurements.csv")
df_hrv_measurements.head()

Unnamed: 0,user_code,rr_code,measurement_datetime,time_of_day,bpm,meanrr,mxdmn,sdnn,rmssd,pnn50,mode,amo,lf,hf,vlf,lfhf,total_power,how_feel,how_mood,how_sleep,tags,rr_data
0,007b8190cf,10489a6aea,2020-04-21 21:23:08,morning,75,795.9,0.12,45.802,54.174,15.15,0.775,53.0,508.0,1076.0,267.0,0.472,1851.0,0,-1,,COVID-19; Workout; Sex; Hobby; Studying; Sleep; Smoking; Music; Morning; Day; Evening; Night; Hydrotherapy; Walk,81910088318477857788668398017938468568007918168478227907838008117547678138237937828308528017616608371027766785861807749745745769776771782753756783773795757767790780783811805819786776783782739725728742763784754756764789819838793789806810775770798829841839792798796800774799801801767802828793766770786782
1,007b8190cf,9610d4d4dc,2020-04-26 11:19:25,morning,70,858.0,0.11,32.889,33.022,16.16,0.875,54.0,409.0,310.0,176.0,1.319,895.0,0,0,0.0,,888775811883890894894899893889890832848873902870880826880890903877837783812826838836847818868856867872878867954885890854853867850881890874890837826831852830836855843839812796779824890900821806886826840875902916926869884854843861880890822801820835849869855855834836896867882878851847861876862853824818
2,013f6d3e5b,f3de056155,2020-05-15 04:14:21,night,83,724.1,0.17,54.811,65.987,17.17,0.725,46.0,432.0,881.0,194.0,0.49,1507.0,-1,-2,,COVID-19; Fast/Diet; Hungry; Tired; Fever; I could do better; Illness; Period; Pregnancy; Coffee; Meal; Meeting,6948326428017517167377427737607017327737807657337057757867837126536556446566061047830792711726756766754705718715764737718708745769776755683730742750758687711740748736703715740769728695734740741721693707742703707731712674691709731668665678751731706663677607630652656634670726752752697710734757718685721
3,013f6d3e5b,b04489e32f,2020-05-19 03:06:02,night,75,802.64,0.2,72.223,70.039,22.22,0.825,43.0,814.0,1487.0,1719.0,0.547,4020.0,0,0,,,82181777180583378874772479282577575877780386511831156839836763743809820790732792831858843793790820921776684764774736788766739776817680670854635779753695662691745724751832845806836792820837801780806866861798781827834854824798801847878845804768754800830824784801838850823786826843826780814829835809776816
4,01bad5a519,ac52c706c6,2019-12-31 09:07:43,morning,78,768.07,0.1,29.65,21.196,4.04,0.775,56.0,489.0,128.0,96.0,3.82,713.0,0,0,0.0,,741740734737740731751747745728747763769775807713737719705698707733765816811805795789780747728726721719716752783807818798781767759760750734728745787775787788780786786784768736734748768778799790749791796797787779809758813831813788783794798800782758767781791807801781791770779786774749760772780798777756


In [127]:
df_participants = pd.read_csv("data/participants.csv")
df_participants.head()

Unnamed: 0,user_code,gender,age_range,city,country,height,weight,symptoms_onset
0,007b8190cf,m,25-34,Mandalay,Myanmar,170.18,96.162,
1,013f6d3e5b,f,18-24,São Paulo,Brazil,174.0,77.3,5/15/2020
2,01bad5a519,m,45-54,St Petersburg,Russia,178.0,92.0,4/5/2020
3,0210b20eea,f,25-34,Sochi,Russia,169.0,60.0,5/6/2020
4,024719e7da,f,45-54,St Petersburg,Russia,158.0,68.5,5/27/2020


In [128]:
df_scales_description = pd.read_csv("data/scales_description.csv")
df_scales_description.head()

Unnamed: 0,Scale,Description,Value,Meaning
0,S_COVID_SYMPTOMS,How long the user has been experiencing symptoms,1,Less than 3 days
1,S_COVID_SYMPTOMS,How long the user has been experiencing symptoms,2,3 to 6 days
2,S_COVID_SYMPTOMS,How long the user has been experiencing symptoms,3,7 to 14 days
3,S_COVID_SYMPTOMS,How long the user has been experiencing symptoms,4,More than 14 days
4,S_COVID_COUGH,Symptom intensity: Coughing,1,User isn’t experiencing symptom


In [129]:
df_sleep = pd.read_csv("data/sleep.csv")
df_sleep.head()

Unnamed: 0,user_code,day,sleep_begin,sleep_end,sleep_duration,sleep_awake_duration,sleep_rem_duration,sleep_light_duration,sleep_deep_duration,pulse_min,pulse_max,pulse_average
0,0d297d2410,2019-12-31,2019-12-31 07:50:32,2019-12-31 08:45:22,3290.0,,,,,,,
1,0d297d2410,2020-01-01,2020-01-01 04:13:41,2020-01-01 09:45:02,19881.0,,,,,,,
2,0d297d2410,2020-01-02,2020-01-02 02:14:52,2020-01-02 08:06:00,21068.0,,,,,,,
3,0d297d2410,2020-01-03,2020-01-03 00:10:00,2020-01-03 08:45:10,30910.0,,,,,,,
4,0d297d2410,2020-01-04,2020-01-04 01:27:25,2020-01-04 08:52:20,26695.0,,,21480.0,,55.0,95.0,72.5


In [130]:
df_surveys = pd.read_csv("data/surveys.csv")
df_surveys.head()

Unnamed: 0,user_code,scale,created_at,value,text
0,01bad5a519,S_CORONA,2020-04-23,2,Symptoms are characteristic of coronavirus
1,01bad5a519,S_COVID_BLUISH,2020-04-23,1,User isn’t experiencing symptom
2,01bad5a519,S_COVID_BLUISH,2020-04-25,1,User isn’t experiencing symptom
3,01bad5a519,S_COVID_BLUISH,2020-04-27,1,User isn’t experiencing symptom
4,01bad5a519,S_COVID_BLUISH,2020-04-29,1,User isn’t experiencing symptom


In [131]:
df_wearables = pd.read_csv("data/wearables.csv")
df_wearables.head()

Unnamed: 0,user_code,day,resting_pulse,pulse_average,pulse_min,pulse_max,average_spo2_value,body_temperature_avg,stand_hours_total,steps_count,distance,steps_speed,total_number_of_flights_climbed,active_calories_burned,basal_calories_burned,total_calories_burned,average_headphone_exposure,average_environment_exposure
0,007b8190cf,2020-04-26,,70.0,70.0,70.0,,,,,,,,,2859.0,2859.0,,
1,01bad5a519,2020-02-12,,,,,,,,8574.0,,57.9,,,2624.0,2624.0,,
2,01bad5a519,2020-02-13,,,,,,,,7462.0,,59.1,,,2624.0,2624.0,,
3,01bad5a519,2020-02-15,,,,,,,,2507.0,,60.97,,,2624.0,2624.0,,
4,01bad5a519,2020-02-16,,,,,,,,10131.0,,49.1,,,2624.0,2624.0,,


In [132]:
df_weather = pd.read_csv("data/weather.csv")
df_weather.head()

Unnamed: 0,user_code,day,avg_temperature_C,atmospheric_pressure,precip_intensity,humidity,clouds
0,013f6d3e5b,2020-05-22,18.0667,1017.6,0.0002,70.0,67.0
1,01bad5a519,2020-01-11,-1.2111,1016.4,0.0002,92.0,6.0
2,01bad5a519,2020-01-30,0.5056,1004.7,0.0009,85.0,100.0
3,01bad5a519,2020-04-02,-0.2444,994.4,0.0025,91.0,87.0
4,01bad5a519,2020-04-12,5.1778,1016.1,0.0,61.0,91.0


Some of the steps taken in the EDA were inspired by Dr.Brinnae Bent's EDA code demo.

### Data Structure

In [133]:
df_list = [df_blood_pressure, df_heart_rate, df_hrv_measurements, df_participants, df_scales_description, df_sleep, df_surveys, df_wearables, df_weather]
df_list_names = ["df_blood_pressure", "df_heart_rate", "df_hrv_measurements", "df_participants", "df_scales_description", "df_sleep", "df_surveys", "df_wearables", "df_weather"]

In [134]:
#TODO: Include analysis for each table. 
#TODO: Ahmed does analysis for the odd csvs
#TODO: Tal does analysis for the even csvs

for i, df_name in enumerate(df_list_names):
    #Data Structure
    print(f"Data Structure for {df_name}")
    print("-"*10)
    print(f"Dimensions: {df_list[i].shape}")
    print(f"Data Types:\n{df_list[i].dtypes}")
    print(f"Missing Values:\n{df_list[i].isnull().sum()}")
    print(f"Unique observations:\n{df_list[i].nunique()}")
    print('\n')

Data Structure for df_blood_pressure
----------
Dimensions: (721, 8)
Data Types:
user_code                    object
measurement_datetime         object
diastolic                     int64
systolic                      int64
functional_changes_index    float64
circulatory_efficiency      float64
kerdo_vegetation_index      float64
robinson_index              float64
dtype: object
Missing Values:
user_code                     0
measurement_datetime          0
diastolic                     0
systolic                      0
functional_changes_index    422
circulatory_efficiency      422
kerdo_vegetation_index      438
robinson_index              422
dtype: int64
Unique observations:
user_code                    28
measurement_datetime        719
diastolic                    45
systolic                     55
functional_changes_index    110
circulatory_efficiency      230
kerdo_vegetation_index       77
robinson_index              253
dtype: int64


Data Structure for df_heart_rate
-------

### Interpretation

##### Blood Pressure
The blood pressure table contains 721 observations of users' (participants) blood pressure, with each observation having multiple features that relate to blood pressure measurements that take into account different factors (e.g functional_changes_indes is an assessment of how well the body can adapt to stresors. Take height, weight, and gender into account). **It is interesting to note that the height, weight, and gender are features of the participants table**.

The numerical features represent the blood pressure measurements (diastolic and systolic) as well as other measurements that assess how well the body react to stressors and the blood circulation efficiency among other things.  

The categorical features represent the user id and the day-time for when this observation was made. **It is important to note that these measurements are for 28 unique users, meaning that most of the observation are measurements taken at different time/day for the same user (participant)**

This table is not complete. 422 observations have missing values for functional_changes_index, circulatory_efficiency, or kerdo_vegetation_index, while 438 have missing values for the robinson_index. Deletion of these columns may be necessary since they are missing values for most of the observation.

#### HRV Measurements
The HRV table contains 3245 observations of users' (participants) HRV measurements. To be specific, these observation are for **185 unique users**

The numerical features represent the HRV measurements (heart rate during measurement, average time between each heartbeat in ms (meanrr), difference between highest and lowest cardio interval values in s (mxdmn), std of normal hearbeat intervals in ms (sdnn) etc...).

The categorical features represent the user id, the unique measurement id (rr_code: there are 3245 unique measurement ids corresponding to the total number of observations), the date and time of the measurement, tags (csv of single words, assigned by the user to describe their state during the measurement. Seems optional. Some users included a lot of words, while others didn't include any descriptive words), rr_data (intervals in ms b/w consecutive hearbeats as a csv). **Since we have the average rr, dropping the rr_data column may be a good option to consider**

The table seems complete, except for the features how_sleep (1779 missing values) and tags (1044 missing values). We may consider dropping the how_sleep column since +50% of observations don't have a value associated with it. Before dropping the tags column, we should consider compiling a list of tags for each of the 185 unique users as we may be able to fill in the gaps and get a better general overview of the user's mental/physical state as they reported it. 

#### Scales Description
The Scales Decription table contains 148 observations of different scales used to describe different states of the users' mental and physical being. Each scale is associated with a description, value denoting the intensity of the state the user is experiencing, and meaning to decipher the value associated with that scale. For example, the scale S_COVID_COUGH indicates a user experiencing coughing as a result of covid, where the values range from 1 (no cough) to 6(indicating extremely sever coughing)

The numerical features represent the value that corresponds to each scale (e.g intensity of the symptom experienced by the participant).

The categorical features represent the scale's name, description, and its meaning (given the associated value).

The table does not have any missing data, which is understandable since it represents the key to understand what the scales and their values mean when occuring in other tables. *We may only use this table to integrate the scale description in the Surveys table.*

#### Surveys
The surveys table contains 2259 observations of users' (participants) reporting their physical/mental state (e.g whether or not they have diabetes). This table represents the reporting of **111 unique participants**. The scale, value, and text features correspond to the Scale, Value, and Meaning in the Scales_description table. 

The numerical and categorical features are similar to the scales description table. The only difference is the categorical feature created_at, which indicated the date on which the reporting happened. **It is interesting to note that the date reporting here is different than other tables, in the sense that it only indicates the date and not time of the observation**

This table does not contain any missing values, and no imputation or deletion of rows is necessary. **However, deleting the value or text row might be necessary since they convey similar information given the scale (although the text column is more descriptive).**

#### Weather
The Weather table contains 1717 observations of weather conditions for users' (participants) locations aggregated by day. These observations are for **104 unique users**. 

The numerical features represent weather conditions such as the avg temperature in celsius for that day, atmospheric pressure, precipitation intensity (in pct), and clouds (coverage in pct).

The categorical features represent the user id and the day on which the observation was made. Only the date (and not time) was included since all these observations have been aggregated by day.

This table is complete, and no imputation or deletion of rows/columns is necessary.

### Descriptive Statistics

In [135]:
#Exclude the Scales Description table as discussed above 
try:
    del df_list[df_list_names.index('df_scales_description')]
    df_list_names.remove('df_scales_description')
except ValueError:
    print("Element already deleted from list")

In [139]:
#Descriptive Statistics for numeric features
for i, df_name in enumerate(df_list_names):
    print(f'\nDescriptive Statistics for {df_name}')
    print('-'*15)
    numeric_columns = df_list[i].select_dtypes(include=[np.number]).columns
    print('Central Tendency Measures:')
    print(df_list[i][numeric_columns].describe().loc[['mean', '50%']])
    print('\nDispersion Measures:')
    print(df_list[i][numeric_columns].describe().loc[['std', 'min', 'max']])

    #Check for distribution normality (skewness and kurtosis)
    print('\nDistribution Measures:')
    print('-'*15)
    print(df_list[i][numeric_columns].skew())
    print(df_list[i][numeric_columns].kurtosis())
    print('\n')
    


Descriptive Statistics for df_blood_pressure
---------------
Central Tendency Measures:
      diastolic    systolic  functional_changes_index  circulatory_efficiency  kerdo_vegetation_index  robinson_index
mean  81.228849  119.441054                  2.594013             2735.197057              -15.498233       84.663779
50%   82.000000  120.000000                  2.580000             2640.000000              -17.000000       83.220000

Dispersion Measures:
      diastolic    systolic  functional_changes_index  circulatory_efficiency  kerdo_vegetation_index  robinson_index
std    8.865761   10.522578                  0.291302               747.88257               18.681233       14.399208
min   25.000000   63.000000                  1.680000              1300.00000              -76.000000       49.500000
max  101.000000  157.000000                  3.510000              7875.00000               45.000000      164.850000

Distribution Measures:
---------------
diastolic              

### Interpretation

#### Blood Pressure 
- Most of our numerical values have means and medians relatively close to each other, indicating a roughly symmetric distribution (e.g diastolic mean is 81.228849 and median is 82)

- It is interesting to note the value range of the circulatory efficiency here, where the min 1300 and max value is 7875. *It might be interesting to investigate the existence of a relationship between the fluctuation of the circulatory efficiency with covid symptoms

- Our means and medians for all features are close in value, indicating a roughly summetric distribution. However, it is interesting to note the skewness values for diastolic, for instance, indicating a left skewed distribution. SImilarly, circulatory_efficiency is right skewed (positive skewness value indicating right skew). None of the features have a normal distribution of values given their kurtosis value.

#### HRV Measurements
- Close to half of the numerical features have means and medians relatively close to each other, indicating a roughly symmetric distribution. However, by checking the skewness value, we can see that almost all of the features have a right skewed distribution. Only 3 features, namely how_feel, how_mood, and how_sleep have a symmetrical distribution. Most of the features do not have a normal distribution given the kurtosis value

- It is interesting to note tha the mean of how_feel and how_mood features (answers to "How do you feel physically?" in the post-measurement survey, and answers to "How is your mood?" in the post-measurement survey respectively) is negative, indicating that on average most of the 185 participants had a bad mood and were not feeling great physically

#### Surveys

- Most features in this table are categorical. However, it is interesting to note how the values of the mean (2.364763) and median (2.0). Since the scales are sometimes boolean, the mean and median of the value feature do not provide interesting insights. *It would be interesting to see the mean and median for the 'negative' scales (i.e non boolean scales indicating the presence of symptom )*

#### Weather
- The mean and values of almost all features are close in value (except for precipitation intensity) indicating a roughly symmetrical distribution. It is interesting to note the min and max values (-14C and 44C respectively) for the average temperature feature. Since the std is around 7 degrees celsius, this might indicate some extreme values in the data that might cause skewness. 

### Data Quality

- Checking for duplicated rows or inconsistent values
- Checking for outliers or extreme values that need attention
- Checking if the values make sense based on the context and domain knowledge

In [140]:
#Data Quality
for i, df_name in enumerate(df_list_names):
    print(f'\nData Quality for {df_name}')
    print('-'*15)
    print(f'Duplicated Rows: {df_list[i].duplicated().sum()}')
    print('Checking for Inconsistent Values:')
    print(df_list[i].apply(lambda x: x.value_counts().index[0]).to_frame('most_frequent_values'))


Data Quality for df_blood_pressure
---------------
Duplicated Rows: 0
Checking for Inconsistent Values:
                          most_frequent_values
user_code                           a1c2e6b2eb
measurement_datetime      2020-05-11  00:27:56
diastolic                                   80
systolic                                   122
functional_changes_index                  2.56
circulatory_efficiency                  2160.0
kerdo_vegetation_index                   -32.0
robinson_index                            73.2

Data Quality for df_heart_rate
---------------
Duplicated Rows: 0
Checking for Inconsistent Values:
           most_frequent_values
user_code            35c7355282
datetime    2020-05-10 13:00:00
heart_rate                   78
is_resting                    0

Data Quality for df_hrv_measurements
---------------
Duplicated Rows: 0
Checking for Inconsistent Values:
                                                                                                        

- It is interesting to note how across all tables, the most frequent measurement days are really close, given us insights on when most of the data was collected (May 2020)
- There are no duplicated rows across all of the tables
- It is interesting to note how most survey obsertvations had a user not experiencing the symptom, indicating that most participants did not experience covid symptoms
- By checking the most frequent value for every feature across our tables, we were able to confirm that there are no outliers or extreme values that need our attention

### Data Accuracy

We can safely assume that the data is accurate given the different steps taken by the researchers to ensure accuracy. We cannot cross check it against other sources since this data is proprietary. 

### Data Consistency

The data is mostly consistent across all sources. It is important to note how some tables keep track of the date followed by time, while others only include the date measurement.

Ensure consistency across sources, formats, and time periods

### Data Integrity

Enforce data constraints such as unique UIDs, valid value ranges etc..

# Data Preprocessing

### Data Integration

Integrate multiple sources of data together (ensuring consistent formatting)

### Data Cleaning

- Handle missing values
- Remove dups
- Handle outliers

### Data Trasnformation

- Normalization/standardization
- Handling skewed data via log transformation
- Feature Engineering and Encoding 
    - Create at least one new feature and document your approach
    - Perform a dimensionality reduction method on the data and discuss 

### Handling imbalanced data