# Week 03 Assignment Covid

Welcome to **week three** of this course programming 1. You will learn about combining data with pandas and numpy and you will learn to visualize with bokeh. Concretely, you will preprocess the partly Synthetic Covid data in an appropiate format in order to conduct statistical and visual analysis. Learning outcomes:

- Load a tabular dataset
- Inspect the dataset for quality and metadata information
- Combine data from several tables into one dataframe
- Subselect specific data from dataframes
- Reshape the dataset into a format suitable for visual and statistical analysis
- Visualize data using bokeh 
- Use widgets to make the plot interactive (optional)
- Use geomap to plot locations (optional)


Your job is to **visualize the lab values taken for COVID-19 patients of survived versus not survived patients**. 

The assignment consists of 6 parts:

- [part 1: load the data](#0)
     - [Exercise 1.1](#ex-11)
- [part 2: data wrangling](#1)
     - [Exercise 2.1](#ex-21)
- [part 3: more wrangling](#2)
     - [Exercise 3.1](#ex-31)
- [part 4: plot the data](#3)
     - [Exercise 4.1](#ex-41)
- [part 5: plot patient location](#5)
     - [Exercise 5.1](#ex-51)


Part 1 and 4 are mandatory, part 5 is optional (bonus)
To pass the assingnment you need to a score of 60%. 


## About the data

The data is generated by Synthea's COVID-19 module. The data was constructed using three peer-reviewed publications published in the early stages of the global pandemic, when less was known, along with emerging resources, data, publications, and clinical knowledge. The simulation outputs synthetic Electronic Health Records (EHR), including the daily consumption of Personal Protective Equipment (PPE) and other medical devices and supplies. For this assignment the `conditions`, `patients`, `observations`, `careplans` and `encounters` table will be used. The Data is stored in separate tables to avoid redundancy, with as a concequence that tables need to be combined and reorganized in dataframes for analysing purpose.

Source: Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007. https://doi.org/10.1016/j.ibmed.2020.100007

Please <a href = "https://storage.googleapis.com/synthea-public/10k_synthea_covid19_csv.zip">download</a> the data

#### Covid Patients
Patients are considered Covid patients if they are identified with `CODE` `840539006`


#### Survivors
Patients that had covid and where tested negative after isolation have tested code `94531-1`,  SARS-CoV-2 RNA Pnl Resp NAA+probe (covid-sars test) + a value of `Not detected (qualifier value)`. These patients are considered to be survived covid patients. 

#### Non-Survivors
Patients that did not survived Covid have a `DEATHDATE` which is not null. 


#### Lab values  COVID-19 patients

Patients are monitored for blood and heart conditions once they are admitted in Hospital or under treatment. The lab values of interest are as follow: 

- `48065-7`  Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma
- `26881-3`   Interleukin 6 [Mass/volume] in Serum or Plasma
- `2276-4` Ferritin [Mass/volume] in Serum or Plasma
- `89579-7` Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method
- `731-0` Lymphocytes [#/volume] in Blood by Automated count
- `14804-9` Lactate dehydrogenase [Enzymatic activity/volume] in Serum or Plasma by Lactate to pyruvate reaction


---

In [1]:
# Imports
import numpy as np
import pandas as pd
from pathlib import Path

<a name='0'></a>
## Part 1: Load the data (20 pt)

Instructions: Load the data of the following files

- conditions.csv
- patients.csv
- observations.csv
- careplans.csv
- encounters.csv

Get yourself familiar with the data. Create some meaningful overviews. Answer the following questions

1. How many patients are there
2. How many covid-patients are there
3. How many patients do have a 'Hospital admission for isolation' encounter
    
<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
    <ul><li>use a unique dataframe for each file, use a meaningful name</li>
    <li>pandas.read_csv() method can be used to read a csv file</li>
    <li>pandas.DataFrame.head() method is often used to inspect the dataframe</li>
    <li>.unique() returns a list of unique values of a column</li>
</ul>
</details>

<a name='ex-11'></a>
### 1.1 Code your solution

In [2]:
#YOUR CODE HERE
# Load in the data
df_conditions = pd.read_csv(Path("../data/covid_data/10k_synthea_covid19_csv/conditions.csv"))
df_patients = pd.read_csv(Path("../data/covid_data/10k_synthea_covid19_csv/patients.csv"))
df_observations = pd.read_csv(Path("../data/covid_data/10k_synthea_covid19_csv/observations.csv")) # This needs to be reshaped
df_careplans = pd.read_csv(Path("../data/covid_data/10k_synthea_covid19_csv/careplans.csv"))
df_encounters = pd.read_csv(Path("../data/covid_data/10k_synthea_covid19_csv/encounters.csv"))

# Still need to change the dataypes of the columns FIND a way to do multiple columns at a time.

In [3]:
num_pat = df_patients.Id.nunique() # Check the number of unique patients
print(f"Number of unique patients: {num_pat}") 
print(f"Number of total patients: {len(df_conditions.PATIENT)}") # Check the number of total patients

# Get number of covid patients
num_cov = len(df_conditions[df_conditions.CODE == 840539006])
print(f"Number of patients with covid: {num_cov}")

# get number of admitted patients
num_admitted = len(df_encounters.loc[df_encounters.DESCRIPTION.str.contains("Hospital admission for isolation", case = False)])
print(f"Number of admitted patients: {num_admitted}")

num_died = df_patients.DEATHDATE.notnull().sum()
print(f"Number of patients that have died: {num_died}")
# df_patients.loc[df_patients.DEATHDATE.notnull()]


Number of unique patients: 12352
Number of total patients: 114544
Number of patients with covid: 8820
Number of admitted patients: 1867
Number of patients that have died: 2352


In [4]:
print(df_conditions.PATIENT.nunique())
print(df_conditions.PATIENT.shape)
df_conditions.head()

12165
(114544,)


Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION
0,2019-02-15,2019-08-01,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,d5ee30a9-362f-429e-a87a-ee38d999b0a5,65363002,Otitis media
1,2019-10-30,2020-01-30,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,8bca6d8a-ab80-4cbf-8abb-46654235f227,65363002,Otitis media
2,2020-03-01,2020-03-30,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,681c380b-3c84-4c55-80a6-db3d9ea12fee,386661006,Fever (finding)
3,2020-03-01,2020-03-01,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,681c380b-3c84-4c55-80a6-db3d9ea12fee,840544004,Suspected COVID-19
4,2020-03-01,2020-03-30,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,681c380b-3c84-4c55-80a6-db3d9ea12fee,840539006,COVID-19


In [5]:
df_patients.head()

Unnamed: 0,Id,BIRTHDATE,DEATHDATE,SSN,DRIVERS,PASSPORT,PREFIX,FIRST,LAST,SUFFIX,...,BIRTHPLACE,ADDRESS,CITY,STATE,COUNTY,ZIP,LAT,LON,HEALTHCARE_EXPENSES,HEALTHCARE_COVERAGE
0,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,2017-08-24,,999-68-6630,,,,Jacinto644,Kris249,,...,Beverly Massachusetts US,888 Hickle Ferry Suite 38,Springfield,Massachusetts,Hampden County,1106.0,42.151961,-72.598959,8446.49,1499.08
1,067318a4-db8f-447f-8b6e-f2f61e9baaa5,2016-08-01,,999-15-5895,,,,Alva958,Krajcik437,,...,Boston Massachusetts US,1048 Skiles Trailer,Walpole,Massachusetts,Norfolk County,2081.0,42.17737,-71.281353,89893.4,1845.72
2,ae9efba3-ddc4-43f9-a781-f72019388548,1992-06-30,,999-27-3385,S99971451,X53218815X,Mr.,Jayson808,Fadel536,,...,Springfield Massachusetts US,1056 Harris Lane Suite 70,Chicopee,Massachusetts,Hampden County,1020.0,42.181642,-72.608842,577445.86,3528.84
3,199c586f-af16-4091-9998-ee4cfc02ee7a,2004-01-09,,999-73-2461,S99956432,,,Jimmie93,Harris789,,...,Worcester Massachusetts US,201 Mitchell Lodge Unit 67,Pembroke,Massachusetts,Plymouth County,,42.075292,-70.757035,336701.72,2705.64
4,353016ea-a0ff-4154-85bb-1cf8b6cedf20,1996-11-15,,999-60-7372,S99917327,X58903159X,Mr.,Gregorio366,Auer97,,...,Patras Achaea GR,1050 Lindgren Extension Apt 38,Boston,Massachusetts,Suffolk County,2135.0,42.352434,-71.02861,484076.34,3043.04


In [6]:
df_observations.head()

Unnamed: 0,DATE,PATIENT,ENCOUNTER,CODE,DESCRIPTION,VALUE,UNITS,TYPE
0,2019-08-01,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,6a74fdef-2287-44bf-b9e7-18012376faca,8302-2,Body Height,82.7,cm,numeric
1,2019-08-01,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,6a74fdef-2287-44bf-b9e7-18012376faca,72514-3,Pain severity - 0-10 verbal numeric rating [Sc...,2.0,{score},numeric
2,2019-08-01,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,6a74fdef-2287-44bf-b9e7-18012376faca,29463-7,Body Weight,12.6,kg,numeric
3,2019-08-01,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,6a74fdef-2287-44bf-b9e7-18012376faca,77606-2,Weight-for-length Per age and sex,86.1,%,numeric
4,2019-08-01,f0f3bc8d-ef38-49ce-a2bd-dfdda982b271,6a74fdef-2287-44bf-b9e7-18012376faca,9843-4,Head Occipital-frontal circumference,46.9,cm,numeric


### 1.2 Test your solution
The following function needs to be called. You can use this as a test. There are however more meaningful overviews 
you can create. 

In [7]:
def part1(num_pat, num_cov, num_admitted, num_died):
    print(f'There are {num_pat} patients in total')
    print(f'There are {num_cov} covid patients')
    print(f'There are {num_admitted} admitted patients')
    print(f'{num_died} patients died')

part1(num_pat, num_cov, num_admitted, num_died)

There are 12352 patients in total
There are 8820 covid patients
There are 1867 admitted patients
2352 patients died


### Expected outcome

---

<a name='1'></a>
## Part 2: Data Wrangling: set up the dataframe (30 pt)

In this part we are going to combine data to create a dataframe with values of interest for the lab values analysis. 

We would like a dataframe containing the following information per record (only Covid patients!!!)

- `PATIENT` - the ID of the covid patient
- `days` - the number of days the patient is under observation
- `CODE-Y` - the code of the observation  
- `VALUE` - the lab value of the observation

where only the following observation codes needs to be selected:

- `48065-7`  Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma
- `26881-3`   Interleukin 6 [Mass/volume] in Serum or Plasma
- `2276-4` Ferritin [Mass/volume] in Serum or Plasma
- `89579-7` Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method
- `731-0` Lymphocytes [#/volume] in Blood by Automated count
- `14804-9` Lactate dehydrogenase [Enzymatic activity/volume] in Serum or Plasma by Lactate to pyruvate reaction

The days information is not primarely available and needs to be calculated by substracting observation DATE - START. 

An example of such a dataframe is given below:

In [8]:
#Possible approach:

#Select all the patients with covid from the conditions table
#Combine conditions table (only covid patients) with the patient table into a covid_patient table
#select the only the relevant lab observations from the observations table into a lab_obs table
#merge the covid_patient table with the lab_obs table into a covid_patients_obs table
#clean the covid_patients_obs table (rename columns, select only relevant columns, sort, typecast, add days column)

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
    <ul><li>you can use pandas.DataFrame.merge() to merge dataframes</li>
    <li>df = df[(df.CODE == condition1 | df.CODE == condition1 )] selects rows with CODE of 2 conditional values</li>
    <li>df.DATE - df.START return days if DATE and START are datetime format</li>
    <li>pd.to_datetime() can be used to typecast to datetime</li>
</ul>
</details>

<a name='ex-21'></a>
### 2.1 Code your solution

In [9]:
# Grab only patients with COVID
cov_conditions = df_conditions.loc[df_conditions.CODE == 840539006, :] 

# Define the relevant observation codes
observation_codes = ["48065-7", "26881-3", "2276-4", "89579-7", "731-0", "14804-9"]

# Get only the entries with relevant observation codes
relevant_obs = df_observations.loc[df_observations.CODE.isin(observation_codes), :]

# Merge the cov_conditions df with the relevant_obs df based on the columns PATIENT and PATIENT
covid_patients_obs = cov_conditions.merge(relevant_obs, how = "inner", left_on = ["PATIENT"], right_on = ["PATIENT"])

covid_patients_obs

Unnamed: 0,START,STOP,PATIENT,ENCOUNTER_x,CODE_x,DESCRIPTION_x,DATE,ENCOUNTER_y,CODE_y,DESCRIPTION_y,VALUE,UNITS,TYPE
0,2020-02-19,2020-02-28,f58bf921-cba1-475a-b4f8-dc6fa3b8f89c,e3143bce-4a59-40aa-a198-7a9e54077fd8,840539006,COVID-19,2020-02-19,e97e8d37-7497-4c13-98fd-a4a45655c0bb,731-0,Lymphocytes [#/volume] in Blood by Automated c...,1.1,10*3/uL,numeric
1,2020-02-19,2020-02-28,f58bf921-cba1-475a-b4f8-dc6fa3b8f89c,e3143bce-4a59-40aa-a198-7a9e54077fd8,840539006,COVID-19,2020-02-19,e97e8d37-7497-4c13-98fd-a4a45655c0bb,48065-7,Fibrin D-dimer FEU [Mass/volume] in Platelet p...,0.4,ug/mL,numeric
2,2020-02-19,2020-02-28,f58bf921-cba1-475a-b4f8-dc6fa3b8f89c,e3143bce-4a59-40aa-a198-7a9e54077fd8,840539006,COVID-19,2020-02-19,e97e8d37-7497-4c13-98fd-a4a45655c0bb,2276-4,Ferritin [Mass/volume] in Serum or Plasma,332.4,ug/L,numeric
3,2020-02-19,2020-02-28,f58bf921-cba1-475a-b4f8-dc6fa3b8f89c,e3143bce-4a59-40aa-a198-7a9e54077fd8,840539006,COVID-19,2020-02-19,e97e8d37-7497-4c13-98fd-a4a45655c0bb,89579-7,Troponin I.cardiac [Mass/volume] in Serum or P...,2.3,pg/mL,numeric
4,2020-02-19,2020-02-28,f58bf921-cba1-475a-b4f8-dc6fa3b8f89c,e3143bce-4a59-40aa-a198-7a9e54077fd8,840539006,COVID-19,2020-02-19,e97e8d37-7497-4c13-98fd-a4a45655c0bb,14804-9,Lactate dehydrogenase [Enzymatic activity/volu...,223.9,U/L,numeric
...,...,...,...,...,...,...,...,...,...,...,...,...,...
73913,2020-03-08,2020-03-16,c9699449-7a8b-400a-8e86-fab6aa7134cb,9c7a5b12-a07d-406a-95b3-d7454fc59468,840539006,COVID-19,2020-03-16,57d8ff2e-b92c-4fb5-bbf4-d7d5f23382b4,731-0,Lymphocytes [#/volume] in Blood by Automated c...,0.9,10*3/uL,numeric
73914,2020-03-08,2020-03-16,c9699449-7a8b-400a-8e86-fab6aa7134cb,9c7a5b12-a07d-406a-95b3-d7454fc59468,840539006,COVID-19,2020-03-16,57d8ff2e-b92c-4fb5-bbf4-d7d5f23382b4,48065-7,Fibrin D-dimer FEU [Mass/volume] in Platelet p...,0.5,ug/mL,numeric
73915,2020-03-08,2020-03-16,c9699449-7a8b-400a-8e86-fab6aa7134cb,9c7a5b12-a07d-406a-95b3-d7454fc59468,840539006,COVID-19,2020-03-16,57d8ff2e-b92c-4fb5-bbf4-d7d5f23382b4,2276-4,Ferritin [Mass/volume] in Serum or Plasma,525.2,ug/L,numeric
73916,2020-03-08,2020-03-16,c9699449-7a8b-400a-8e86-fab6aa7134cb,9c7a5b12-a07d-406a-95b3-d7454fc59468,840539006,COVID-19,2020-03-16,57d8ff2e-b92c-4fb5-bbf4-d7d5f23382b4,89579-7,Troponin I.cardiac [Mass/volume] in Serum or P...,3.0,pg/mL,numeric


#### Clean data frame

In [10]:
# Columns to keep
keep = ["days", "PATIENT", "CODE_y", "VALUE", "UNITS"]

try:
    # Change dtype of START and DATE
    covid_patients_obs["START"] = pd.to_datetime(covid_patients_obs["START"])
    covid_patients_obs["DATE"] = pd.to_datetime(covid_patients_obs["DATE"])

    covid_patients_obs["days"] = covid_patients_obs["DATE"].sub(covid_patients_obs["START"]).astype('timedelta64[D]')
    # Keep only relevant columns
    covid_patients_obs = covid_patients_obs.loc[:,keep]
    
    # Rename columns
    covid_patients_obs.rename(columns = {"PATIENT_x":"PATIENT", "CODE_y":"CODE-Y"}, inplace = True)

    # Set patient as index
    covid_patients_obs.set_index("PATIENT", inplace = True)

    # Sort the index 
    covid_patients_obs.sort_index(inplace = True)
    
    # Change the dtype of VALUE to float64
    covid_patients_obs["VALUE"] = covid_patients_obs["VALUE"].astype('float64')
except KeyError as e:
    print(f"Dataframe has already been cleaned, error: {e}")
    
    
covid_patients_obs

Unnamed: 0_level_0,days,CODE-Y,VALUE,UNITS
PATIENT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
00079a57-24a8-430f-b4f8-a1cf34f90060,6.0,89579-7,2.3,pg/mL
00079a57-24a8-430f-b4f8-a1cf34f90060,6.0,2276-4,463.9,ug/L
00079a57-24a8-430f-b4f8-a1cf34f90060,6.0,48065-7,0.5,ug/mL
00079a57-24a8-430f-b4f8-a1cf34f90060,6.0,731-0,0.8,10*3/uL
00079a57-24a8-430f-b4f8-a1cf34f90060,5.0,731-0,1.0,10*3/uL
...,...,...,...,...
ffdbbb1b-745e-4e38-ade2-a19d6e778fee,6.0,14804-9,247.1,U/L
ffdbbb1b-745e-4e38-ade2-a19d6e778fee,7.0,731-0,0.9,10*3/uL
ffdbbb1b-745e-4e38-ade2-a19d6e778fee,8.0,731-0,0.8,10*3/uL
ffdbbb1b-745e-4e38-ade2-a19d6e778fee,2.0,89579-7,1.8,pg/mL


<a name='2'></a>
## Part 3: Data Wrangling, split into survived and not survived (10 pt)

Now we have the required data we would like to split the data into survived and not survived. First we fetch all the ids of the survived and deceased patients. We can use these ids to select the records of the survived patients and the patients that did not survived

Your job is to split the data into survived and not survived records. There are multiple ways to do this. One way is the  `.isin()` method

In [11]:
#the following code is given, RUN THIS CELL
#get survived and deceased ids
completed_isolation_patients = df_careplans[(df_careplans.CODE == 736376001) & (df_careplans.STOP.notna()) \
                                          & (df_careplans.REASONCODE == 840539006)].PATIENT
negative_covid_patient_ids = df_observations[(df_observations.CODE == '94531-1') \
                                          & (df_observations.VALUE == 'Not detected (qualifier value)')].PATIENT.unique()
survivor_ids = np.union1d(completed_isolation_patients, negative_covid_patient_ids)
deceased_ids = df_patients[df_patients.DEATHDATE.notna()].Id

<a name='ex-31'></a>
### 3.1 Code your solution

In [12]:
#YOUR CODE HERE
print(len(completed_isolation_patients), len(negative_covid_patient_ids), len(survivor_ids), len(deceased_ids))

df_survived = covid_patients_obs[covid_patients_obs.index.isin(survivor_ids)]
survived = len(df_survived)

df_died = covid_patients_obs[covid_patients_obs.index.isin(deceased_ids)]
died = len(df_died)

7001 1808 8759 2352


### 3.2 Test your solution

In [13]:
def test3(survived, died):
    print(f'patients records survived: {survived}, patients records deceased {died}')
#call the test3
test3(survived, died)

patients records survived: 57303, patients records deceased 16793


#### Expected outcome

---

<a name='3'></a>
## Part 4: Plot the data (20 pt)

Create plots with the lab data, for each code one plot. Separate the survivors and the deceased by color. An example of such a plot is given below. You can create 6 plots in one grid (for each code one plot) or use a widget (for instance a drop down menu widget) to select a lab CODE. Plot on the x-axis the days, on the y-axis the VALUE. Use proper labels, titles and legends.

<img src="../images/week3_plot.png" width="500" height="500"/>

<a name='ex-41'></a>
### 4.1 Code your solution

In [14]:
# IMPORTS
from bokeh.io import output_notebook, output_file
from bokeh.plotting import figure, show
from bokeh.layouts import gridplot, column
from bokeh.plotting import ColumnDataSource
from bokeh.models import DatetimeTickFormatter
from bokeh.models import CustomJS, Dropdown
import panel as pn
import regex as re

output_notebook()
pn.extension()

In [15]:
# Create a dictionary for the description of the codes
code_description = {"48065-7": "Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma",
    "26881-3": "Interleukin 6 [Mass/volume] in Serum or Plasma",
    "2276-4": "Ferritin [Mass/volume] in Serum or Plasma",
    "89579-7": "Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method",
    "731-0": "Lymphocytes [#/volume] in Blood by Automated count",
    "14804-9": "Lactate dehydrogenase [Enzymatic activity/volume] in Serum or Plasma by Lactate to pyruvate reaction"}

# Create a dictionary for the codes of the descriptions
description_codes = {"Fibrin D-dimer FEU in Platelet poor plasma": "48065-7",
    "Interleukin 6 in Serum or Plasma":"26881-3",
    "Ferritin in Serum or Plasma": "2276-4",
    "Troponin I.cardiac in Serum or Plasma by High sensitivity method":"89579-7",
    "Lymphocytes in Blood by Automated count":"731-0",
    "Lactate dehydrogenase in Serum or Plasma by Lactate to pyruvate reaction":"14804-9"}

In [16]:
# Create a dropdown menu
select = pn.widgets.Select(name='Select lab values', options=description_codes)

def create_plot(code):
    """
    Create a plot for the lab data against the days for both patients that have survived and who have died.
    
    :parameters
    -----------
    code - String
        Selected code which represents lab data
        
    :returns
    --------
    p - Panel.figure
        figure object
    """
    p = figure(title = code_description[code],plot_width = 750, plot_height = 400, tools="pan, hover, zoom_in, zoom_out, yzoom_in, yzoom_out")

    # Get the df of people who died and survived
    died = df_died[df_died["CODE-Y"] == code]
    survived = df_survived[df_survived["CODE-Y"] == code]
    
    # Create the points for patient who have died
    points = p.scatter(died.days, died.VALUE, color = "red", marker = "dot", size = 10, legend_label = "Deceased")
    # Create the points for patient who survived
    points2 = p.scatter(survived.days, survived.VALUE, color = "green", marker = "dot", size = 10, legend_label = "Survived")

    # Set labels
    p.xaxis.axis_label = 'Time in days'
    # Use regex to grab the info about what was measured
    y_label = re.search(r"(?<=\[)(.*)(?=\])",code_description[code])[1] 
    p.yaxis.axis_label = f"{y_label} ({died.UNITS[0]})" # Use the data frame to grab the unit

    # Make legend interactive
    p.legend.location = "top_left"
    p.legend.click_policy="hide"

    return p
    
layout = pn.interact(create_plot, code = select)
pn.Row(pn.Column(layout[0], layout[1]))

### Example: using widget as a decorator

In [17]:
"""
This is an example of how to use a widget as an decorator.
"""

# Create a dropdown menu
select2 = pn.widgets.Select(name='Select lab values', options=description_codes)

@pn.depends(select2)
def create_plot2(code):
    """
    Create a plot for the lab data against the days for both patients that have survived and who have died.
    
    :parameters
    -----------
    code - String
        Selected code which represents lab data
        
    :returns
    --------
    p - Panel.figure
        figure object
    """
    p = figure(title = code_description[code],plot_width = 750, plot_height = 400, tools="pan, hover, zoom_in, zoom_out, yzoom_in, yzoom_out")

    # Get the df of people who died and survived
    died = df_died[df_died["CODE-Y"] == code]
    survived = df_survived[df_survived["CODE-Y"] == code]
    
    # Create the points for patient who have died
    points = p.scatter(died.days, died.VALUE, color = "red", marker = "dot", size = 10, legend_label = "Deceased")
    # Create the points for patient who survived
    points2 = p.scatter(survived.days, survived.VALUE, color = "green", marker = "dot", size = 10, legend_label = "Survived")

    # Set labels
    p.xaxis.axis_label = 'Time in days'
    # Use regex to grab the info about what was measured
    y_label = re.search(r"(?<=\[)(.*)(?=\])",code_description[code])[1] 
    p.yaxis.axis_label = f"{y_label} ({died.UNITS[0]})" # Use the data frame to grab the unit

    # Make legend interactive
    p.legend.location = "top_left"
    p.legend.click_policy="hide"

    return p
    

pn.Column(
    pn.Column(select2),
    create_plot2
)

<a name='4'></a>
## Part 5: Plot the location of the patients (10 pt)

This is a bonus part. Can you plot the patients location on a map? See also 
https://docs.bokeh.org/en/latest/docs/user_guide/geo.html


<a name='ex-51'></a>
### 5.1 Code your solution

In [18]:
# Prepare data

# Columns you want to keep
col_to_keep = ["Id", "LAT", "LON"]

# patient_loc = covid_patients_obs.merge(df_patients.set_index("Id"), how = "inner", left_index = True, right_index = True)[col_to_keep]
patients_loc = df_patients[col_to_keep]
patients_loc_df = patients_loc.sort_values("Id")
patients_loc_df.reset_index(inplace = True, drop = True)
patients_loc_df

Unnamed: 0,Id,LAT,LON
0,0000b247-1def-417a-a783-41c8682be022,42.018180,-71.353040
1,00049ee8-5953-4edd-a277-b9c1b1a7f16b,42.383846,-71.315920
2,000769a6-23a7-426e-a264-cb0e509b2da2,41.531008,-70.999786
3,00079a57-24a8-430f-b4f8-a1cf34f90060,42.268454,-73.314909
4,0008a63c-c95c-46c2-9ef3-831d68892019,42.371697,-71.091808
...,...,...,...
12347,ffd3d544-1fcd-4a87-9514-fa6c37409cbc,42.247341,-71.092014
12348,ffd86fda-ebb9-400e-9fe3-ea1a1037dbad,41.730103,-71.195539
12349,ffdbbb1b-745e-4e38-ade2-a19d6e778fee,42.487156,-70.926180
12350,ffdf0900-bc4b-4f81-b95b-1ea57da21e07,42.053228,-71.121088


In [31]:
# Create a new column stating which patients survived and which have passed away.
patients_loc_df["STATUS"] = np.where(patients_loc_df["Id"].isin(survivor_ids), "Survived", "Deceased")
statuses = ["Survived", "Deceased"]

k = 6378137
lat = np.log(np.tan((90 + (patients_loc_df["LAT"].values)) * np.pi/360)) * k

lon = (patients_loc_df["LON"].values) * (k * np.pi / 180.0)

source = ColumnDataSource(data = dict(lat=lat, lon=lon,
                                     STATUS=patients_loc_df.STATUS.values))



In [32]:
from bokeh.tile_providers import CARTODBPOSITRON, get_provider
from bokeh.transform import factor_cmap

tile_provider = get_provider(CARTODBPOSITRON)

# range bounds supplied in web mercator coordinates
p = figure(x_range=(-2000000, 6000000), y_range=(-1000000, 7000000),
           x_axis_type="mercator", y_axis_type="mercator")
p.add_tile(tile_provider)

p.scatter(x="lon", y="lat", size=5, legend_field = "STATUS", marker = "circle", fill_alpha=0.8, source=source,
        color = factor_cmap("STATUS", ["Green", "Red"], statuses))

show(p)




### GEO map with folium. It works but it is extremly slow and laggy.

In [None]:
#YOUR CODE HERE
# import folium
# # USE geo pandas packages or folium


# # Location USA
# USA = [37.0902, -95.7129]


# m=folium.Map(location=USA, zoom_start = 4)

# for i in range(len(patients_loc_df)):
#     folium.Marker([patients_loc_df.iloc[i]["LAT"], patients_loc_df.iloc[i]["LON"]],
#                  popup = patients_loc_df.iloc[i]["Id"]).add_to(m)
# m