# UNCOVER Dataset (UNCOVER Covid-19 Challenge) 
##### Kaggle challenge: https://www.kaggle.com/roche-data-science-coalition/uncover
##### Download: https://www.kaggle.com/roche-data-science-coalition/uncover

In [91]:
import pandas as pd

## Mobility datasets
### 1.Subset name: apple_mobility_trends 
###### Source: apple      

In [92]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/apple_mobility_trends/mobility-trends.csv')
data.head()

Unnamed: 0,geo_type,region,transportation_type,date,value
0,country/region,Albania,driving,2020-01-13,100.0
1,country/region,Albania,driving,2020-01-14,95.3
2,country/region,Albania,driving,2020-01-15,101.43
3,country/region,Albania,driving,2020-01-16,97.2
4,country/region,Albania,driving,2020-01-17,103.55


##### Attributes/Description:  
It show a relative volume of directions requests per country/region, sub-region or city compared to a baseline volume on January 13th, 2020.
- geo_type: type of the location	
- region: region	
- transportation_type: transportation type of directions requests
- date: date
- value: relative volume compared to a baseline volume on January 13th, 2020 (100*volume_perday/volume_on0113)

##### More: https://www.apple.com/covid19/mobility

(Have data for Tacoma and Seattle)

### 2.Subset name: google_mobility
###### Source: google      

In [93]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/google_mobility/us-mobility.csv')
data.head()

Unnamed: 0,state,county,date,retail,grocery_and_pharmacy,parks,transit_stations,workplaces,residential
0,Total,Total,2020-02-15,6.0,2.0,15.0,3.0,2.0,-1.0
1,Total,Total,2020-02-16,7.0,1.0,16.0,2.0,0.0,-1.0
2,Total,Total,2020-02-17,6.0,0.0,28.0,-9.0,-24.0,5.0
3,Total,Total,2020-02-18,0.0,-1.0,6.0,1.0,0.0,1.0
4,Total,Total,2020-02-19,2.0,0.0,8.0,1.0,1.0,0.0


In [94]:
data.tail()

Unnamed: 0,state,county,date,retail,grocery_and_pharmacy,parks,transit_stations,workplaces,residential
155055,Wyoming,Weston County,2020-04-06,,,,,-39.0,
155056,Wyoming,Weston County,2020-04-07,,,,,-37.0,
155057,Wyoming,Weston County,2020-04-08,,,,,-36.0,
155058,Wyoming,Weston County,2020-04-09,,,,,-31.0,
155059,Wyoming,Weston County,2020-04-10,,,,,-43.0,


##### Attributes/Description: 
These datasets show how visits and length of stay at different places change compared to a baseline. \
The baseline is the median value, for the corresponding day of the week, during the 5-week period Jan 3–Feb 6, 2020. 
- Positive values mean there are more people visiting/staying the place compared to the baseline; negative ones mean less people doing so.
- NaN is because Google doesn't have enough data


- state: state	
- county: county	
- date: date
- retail: Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.  
- grocery_and_pharmacy	
- parks: parks
- transit_stations: Mobility trends for places like public transport hubs such as subway, bus, and train stations.
- workplaces: workplaces
- residential: residential
##### More: https://www.google.com/covid19/mobility/index.html?hl=en

### 3.Subset name: descartes_labs
###### Source: descartes_labs      

In [95]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER_v4/UNCOVER/descartes_labs/us-mobility-data.csv')
data.head()

Unnamed: 0,date,country_code,admin_level,admin1,admin2,fips,samples,m50,m50_index
0,2020-03-01,US,1,Alabama,,1.0,133826,8.331,79
1,2020-03-02,US,1,Alabama,,1.0,143632,10.398,98
2,2020-03-03,US,1,Alabama,,1.0,146009,10.538,100
3,2020-03-04,US,1,Alabama,,1.0,149352,10.144,96
4,2020-03-05,US,1,Alabama,,1.0,144109,10.982,104


##### Attributes/Description: 
Representing the distance (in KM) a typical member of a given population moves in a day at the US admin1 (state) and admin2 (county) level. \
A technical report describing the motivation behind this work with methodology and definitions is available at arxiv.org/pdf/2003.14228.pdf. 

- country_code: ISO 3166-1 alpha-2 code.
- admin_level: 0 for country, 1 for admin1, 2 for admin2 granularity.
- admin1: GeoNames ADM1 feature name for the first-order administrative division, such as a state in the United States.
- admin2: GeoNames ADM2 feature name for the second-order administrative division, such as a county or borough in the United States.
- fips: FIPS code, a standard geographic identifier, to make it easier to combine this data with other data sets.
- samples: The number of samples observed in the specified region.
- m50: The median of the max-distance mobility of all samples in the specified region.
- m50_index: The percent of normal m50 in the region, with normal m50 defined during 2020-02-17 to 2020-03-07. (If larger than 100 means that people are moving "further" than before.)
- m50_index = 100*(m50/m50_normal) 

##### More:https://mktg.descarteslabs.com/mobility-tracking

### 4.Subset name: IHME (projected dataset)
###### Source: IHME

In [96]:
# sample data
data = pd.read_csv('data/IHME_2020_07_04/Reference_hospitalization_all_locs.csv')
data.iloc[20000:20010]

Unnamed: 0,V1,location_name,date,allbed_mean,allbed_lower,allbed_upper,ICUbed_mean,ICUbed_lower,ICUbed_upper,InvVen_mean,...,totdea_lower_smoothed,totdea_upper_smoothed,mobility_data_type,mobility_composite,total_tests_data_type,total_tests,confirmed_infections,est_infections_mean,est_infections_lower,est_infections_upper
20000,20001,Community of Madrid,2020-06-27,191.999808,0.0,342.515625,62.904795,0.0,103.575893,56.15755,...,8886.066117,9067.656401,observed,-27.62071,projected,6299.85019,,499.588918,296.622079,867.286267
20001,20002,Community of Madrid,2020-06-28,186.353176,0.0,340.0,61.060436,0.0,104.287946,54.532311,...,8889.264451,9077.263614,projected,-27.62071,projected,6349.067769,,498.633725,295.558177,866.315587
20002,20003,Community of Madrid,2020-06-29,180.696066,0.0,335.557292,59.352864,0.0,104.666667,53.05802,...,8892.45427,9087.336618,projected,-27.62071,projected,6398.285349,,497.39406,294.262977,866.26
20003,20004,Community of Madrid,2020-06-30,176.407253,0.0,336.890625,57.957551,0.0,105.333333,51.803396,...,8896.009958,9097.025518,projected,-27.62071,projected,6447.502929,,496.046819,291.642693,864.605706
20004,20005,Community of Madrid,2020-07-01,174.273243,0.0,340.638787,56.762361,0.0,103.333333,50.796514,...,8899.559354,9105.651534,projected,-27.62071,projected,6496.720508,,494.85921,290.826864,865.346474
20005,20006,Community of Madrid,2020-07-02,172.963891,0.0,342.875,55.789251,0.0,103.534926,50.037137,...,8903.113987,9114.172283,projected,-27.62071,projected,6545.938088,,493.88989,290.148599,867.286596
20006,20007,Community of Madrid,2020-07-03,172.432335,0.0,344.303571,55.010803,0.0,103.338235,49.440099,...,8906.679276,9122.596815,projected,-27.62071,projected,6595.155667,,492.58836,287.862098,868.422662
20007,20008,Community of Madrid,2020-07-04,172.084039,0.0,341.770588,54.674403,0.0,103.333333,49.221296,...,8910.262872,9132.664583,projected,-27.62071,projected,6644.373247,,490.897684,285.331301,868.616211
20008,20009,Community of Madrid,2020-07-05,171.853414,0.0,341.0375,54.482353,0.0,103.125,49.14998,...,8913.775212,9142.076191,projected,-27.62071,projected,6693.590827,,489.071498,283.920843,868.422491
20009,20010,Community of Madrid,2020-07-06,172.403736,0.0,340.015625,54.398157,0.0,103.125,49.160757,...,8917.122834,9150.45848,projected,-27.62071,projected,6742.808406,,487.104737,282.94335,867.829848


##### Summary: 
IHME has developed projections for total and daily deaths, daily infections and testing, hospital resource use, and social distancing due to COVID-19 for a number of countries. Forecasts at the subnational level are included for select countries. The projections for total deaths, daily deaths, and daily infections and testing each include a reference scenario: Current projection, which assumes social distancing mandates are re-imposed for 6 weeks whenever daily deaths reach 8 per million (0.8 per 100k). They also include two additional scenarios: Mandates easing, which reflects continued easing of social distancing mandates, and mandates are not re-imposed; and Universal Masks, which reflects 95% mask usage in public in every location. Hospital resource use forecasts are based on the Current projection scenario. Social distancing forecasts are based on the Mandates easing scenario. These projections are produced with a model that incorporates data on observed COVID-19 deaths, hospitalizations, and cases, information about social distancing and other protective measures, mobility, and other factors. They include uncertainty intervals and are being updated daily with new data. These forecasts were developed in order to provide hospitals, policy makers, and the public with crucial information about how expected need aligns with existing resources, so that cities and countries can best prepare. \
 - (More focus on healthcare resource estimation)

 - location_name: Name of the country or subnational location 
 - date: Date
 - allbed_mean: Mean covid beds needed by day
 - allbed_lower: Lower uncertainty bound of covid beds needed by day
 - allbed_upper: Upper uncertainty bound of covid beds needed by day
 - ICUbed_mean: Mean ICU covid beds needed by day
 - ICUbed_lower: Lower uncertainty bound of ICU covid beds needed by day
 - ICUbed_upper: Upper uncertainty bound of ICU covid beds needed by day
 - InvVen_mean: Mean invasive ventilation needed by day
 - InvVen_lower: Lower uncertainty bound of invasive ventilation needed by day
 - InvVen_upper: Upper uncertainty bound of invasive ventilation needed by day 
 - admis_mean: Mean hospital admissions by day
 - admis_lower: Lower hospital admissions by day
 - admis_upper: Upper hospital admissions by day
 - newICU_mean: Mean number of new people going to the ICU by day
 - newICU_lower: Lower uncertainty bound of the number of new people going to the ICU by day
 - newICU_upper: Upper uncertainty bound of the number of new people going to the ICU by day
 - bedover_mean: [covid all beds needed] - ([total bed capacity] - [average all bed usage])
 - bedover_lower: Lower uncertainty bound of bedover (above)  
 - bedover_upper: Upper uncertainty bound of bedover (above)
 - icuover_mean: [covid ICU beds needed] - ([total ICU capacity] - [average ICU bed usage])
 - icuover_lower: Lower uncertainty bound of icuover (above)
 - icuover_upper: Upper uncertainty bound of icuover (above)
 - deaths_mean: Mean daily covid deaths
 - deaths_lower: Lower uncertainty bound of daily covid deaths
 - deaths_upper: Upper uncertainty bound of daily covid deaths
 - totdea_mean: deaths Mean cumulative covid deaths
 - totdea_lower: Lower uncertainty bound of cumulative covid deaths
 - totdea_upper: Upper uncertainty bound of cumulative covid deaths
 - deaths_mean_smoothed:  Mean daily covid deaths (smoothed)
 - deaths_lower_smoothed: Lower uncertainty bound of daily covid deaths (smoothed)
 - deaths_upper_smoothed: Upper uncertainty bound of daily covid deaths (smoothed)
 - totdea_mean_smoothed: Mean cumulative covid deaths (smoothed)
 - totdea_lower_smoothed: Lower uncertainty bound of cumulative covid deaths (smoothed)
 - totdea_upper_smoothed: Upper uncertainty bound of cumulative covid deaths (smoothed)
 - mobility_data_type: Indicator of whether mobility composite is observed / projected
 - mobility_composite: projected Mobility composite score
 - total_tests_data_type: Indicator of whether total tests composite is observed or projected
 - total_tests: Total tests
 - confirmed_infections: Observed data only (confirmed infections)
 - est_infections_mean: Mean estimated infections
 - est_infections_lower: Lower uncertainty bound of estimated infections
 - est_infections_upper: Upper uncertainty bound estimated infections


##### Download the data and data documentation: https://covid19.healthdata.org/united-states-of-america/florida
#### paper: 
 - Forecasting the impact of the first wave of the COVID-19 pandemic on hospital demand and deaths for the USA and European Economic Area countries, https://www.medrxiv.org/content/10.1101/2020.04.21.20074732v1
 - Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months, https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1

## Individual medical record datasets 
### 5.Subset name: einstein
###### Source: the Hospital Israelita Albert Einstein, at São Paulo, Brazil

In [97]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/einstein/diagnosis-of-covid-19-and-its-clinical-spectrum.csv')
data.head()

Unnamed: 0,patient_id,patient_age_quantile,sars_cov_2_exam_result,patient_addmited_to_regular_ward_1_yes_0_no,patient_addmited_to_semi_intensive_unit_1_yes_0_no,patient_addmited_to_intensive_care_unit_1_yes_0_no,hematocrit,hemoglobin,platelets,mean_platelet_volume,...,hb_saturation_arterial_blood_gases,pco2_arterial_blood_gas_analysis,base_excess_arterial_blood_gas_analysis,ph_arterial_blood_gas_analysis,total_co2_arterial_blood_gas_analysis,hco3_arterial_blood_gas_analysis,po2_arterial_blood_gas_analysis,arteiral_fio2,phosphor,cto2_arterial_blood_gas_analysis
0,44477f75e8169d2,13,negative,f,f,f,,,,,...,,,,,,,,,,
1,126e9dd13932f68,17,negative,f,f,f,0.236515,-0.02234,-0.517413,0.010677,...,,,,,,,,,,
2,a46b4402a0e5696,8,negative,f,f,f,,,,,...,,,,,,,,,,
3,f7d619a94f97c45,5,negative,f,f,f,,,,,...,,,,,,,,,,
4,d9e41465789c2b5,15,negative,f,f,f,,,,,...,,,,,,,,,,


##### Attributes/Description: 
This dataset contains anonymized data from patients seen at the Hospital Israelita Albert Einstein, at São Paulo, Brazil, and who had samples collected to perform the SARS-CoV-2 RT-PCR and additional laboratory tests during a visit to the hospital.

- patient_id: patient_id	
- patient_age_quantile: age quantile id for different age (range from 0-20)
- sars_cov_2_exam_result: test result for covid-19
        -  (Below are lab test results)
- patient_addmited_to_regular_ward_1_yes_0_no 
- patient_addmited_to_semi_intensive_unit_1_yes_0_no 
- patient_addmited_to_intensive_care_unit_1_yes_0_no 
- hematocrit 
- hemoglobin 
- platelets 
- mean_platelet_volume 
- red_blood_cells 
- lymphocytes 
- mean_corpuscular_hemoglobin_concentration_mchc 
- leukocytes 
- basophils 
- mean_corpuscular_hemoglobin_mch 
- eosinophils 
- mean_corpuscular_volume_mcv 
- monocytes 
- red_blood_cell_distribution_width_rdw 
- serum_glucose 
- respiratory_syncytial_virus 
- influenza_a 
- influenza_b 
- parainfluenza_1 
- coronavirusnl63 
- rhinovirus_enterovirus 
- mycoplasma_pneumoniae 
- coronavirus_hku1 
- parainfluenza_3 
- chlamydophila_pneumoniae 
- adenovirus 
- parainfluenza_4 
- coronavirus229e 
- coronavirusoc43 
- inf_a_h1n1_2009 
- bordetella_pertussis 
- metapneumovirus 
- parainfluenza_2 
- neutrophils 
- urea 
- proteina_c_reativa_mg_dl 
- creatinine 
- potassium 
- sodium 
- influenza_b_rapid_test 
- influenza_a_rapid_test 
- alanine_transaminase 
- aspartate_transaminase 
- gamma_glutamyltransferase 
- total_bilirubin 
- direct_bilirubin 
- indirect_bilirubin 
- alkaline_phosphatase 
- ionized_calcium 
- strepto_a 
- magnesium 
- pco2_venous_blood_gas_analysis 
- hb_saturation_venous_blood_gas_analysis 
- base_excess_venous_blood_gas_analysis 
- po2_venous_blood_gas_analysis 
- fio2_venous_blood_gas_analysis 
- total_co2_venous_blood_gas_analysis 
- ph_venous_blood_gas_analysis 
- hco3_venous_blood_gas_analysis 
- rods 
- segmented 
- promyelocytes 
- metamyelocytes 
- myelocytes 
- myeloblasts 
- urine_esterase 
- urine_aspect 
- urine_ph 
- urine_hemoglobin 
- urine_bile_pigments 
- urine_ketone_bodies 
- urine_nitrite 
- urine_density 
- urine_urobilinogen 
- urine_protein 
- urine_sugar 
- urine_leukocytes 
- urine_crystals 
- urine_red_blood_cells 
- urine_hyaline_cylinders 
- urine_granular_cylinders 
- urine_yeasts 
- urine_color 
- partial_thromboplastin_time_ptt 
- relationship_patient_normal 
- international_normalized_ratio_inr 
- lactic_dehydrogenase 
- prothrombin_time_pt_activity 
- vitamin_b12 
- creatine_phosphokinase_cpk 
- ferritin 
- arterial_lactic_acid 
- lipase_dosage 
- d_dimer 
- albumin 
- hb_saturation_arterial_blood_gases 
- pco2_arterial_blood_gas_analysis 
- base_excess_arterial_blood_gas_analysis 
- ph_arterial_blood_gas_analysis 
- total_co2_arterial_blood_gas_analysis 
- hco3_arterial_blood_gas_analysis 
- po2_arterial_blood_gas_analysis 
- arteiral_fio2 
- phosphor 
- cto2_arterial_blood_gas_analysis


##### More: https://www.kaggle.com/einsteindata4u/covid19

 - website of the hospital: https://www.einstein.br/en/ (No access)
 - Comments from the dataset owner: https://www.kaggle.com/dataset/e626783d4672f182e7870b1bbe75fae66bdfb232289da0a61f08c2ceb01cab01/discussion/139484
 - No paper/description found

### 6.Subset name: covid_19_canada_open_data_working_group
###### Source: canada open data working group

In [98]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER_v4/UNCOVER/covid_19_canada_open_data_working_group/individual-level-mortality.csv')
data.head()

Unnamed: 0,death_id,province_death_id,case_id,age,sex,health_region,province,country,date_death_report,death_source,additional_info,additional_source
0,1,1,60.0,80-89,Male,Vancouver Coastal,BC,Canada,2020-03-08,https://news.gov.bc.ca/releases/2020HLTH0068-0...,Lynn Valley Resident,
1,2,1,477.0,70-79,Male,Simcoe Muskoka,Ontario,Canada,2020-03-11,https://www.nationalobserver.com/2020/03/17/ne...,Was being treated at Royal Victoria Regional H...,
2,3,2,,Not Reported,Not Reported,Vancouver Coastal,BC,Canada,2020-03-16,https://news.gov.bc.ca/releases/2020HLTH0086-0...,Lynn Valley Resident,
3,4,3,,Not Reported,Not Reported,Vancouver Coastal,BC,Canada,2020-03-16,https://news.gov.bc.ca/releases/2020HLTH0086-0...,Lynn Valley Resident,
4,5,4,,Not Reported,Not Reported,Vancouver Coastal,BC,Canada,2020-03-16,https://news.gov.bc.ca/releases/2020HLTH0086-0...,Lynn Valley Resident,


##### Attributes/Description: 
Each line representing a unique case, including age, sex, health region location, and history of travel where available. Sources are included as a reference for each entry. All data are exclusively collected from publicly available sources including government reports and news media. 

- death_id: death id
- province_death_id: death id in the province level
- case_id: case_id
- age: age
- sex: sex 
- health_region: healthcare region 
- province: province 
- country: country 
- date_death_report: the date when report this death case 
- death_source: source of the death 
- additional_info: additional information 
- additional_source: additional source

##### More: https://github.com/ishaberry/Covid19Canada

### 7.Subset name: nextstrain
###### Source: nextstrain

In [99]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER_v4/UNCOVER/nextstrain/covid-19-genetic-phylogeny.csv')
data.head()

Unnamed: 0,strain,virus,gisaid_epi_isl,genbank_accession,date,region,country,division,location,region_exposure,...,length,host,age,sex,originating_lab,submitting_lab,authors,url,title,date_submitted
0,Algeria/G0638_2264/2020,ncov,EPI_ISL_418241,?,2020-03-02,Africa,Algeria,Boufarik,,Africa,...,29862,Human,28,Female,NIC Viral Respiratory Unit - Institut Pasteur ...,National Reference Center for Viruses of Respi...,Albert et al,https://www.gisaid.org,?,2020-03-29
1,Algeria/G0640_2265/2020,ncov,EPI_ISL_418242,?,2020-03-08,Africa,Algeria,Blida,,Africa,...,29867,Human,87,Male,NIC Viral Respiratory Unit - Institut Pasteur ...,National Reference Center for Viruses of Respi...,Albert et al,https://www.gisaid.org,?,2020-03-29
2,Algeria/G0860_2262/2020,ncov,EPI_ISL_420037,?,2020-03-02,Africa,Algeria,Boufarik,,Africa,...,29862,Human,41,Male,NIC Viral Respiratory Unit - Institut Pasteur ...,National Reference Center for Viruses of Respi...,Albert et al,https://www.gisaid.org,?,2020-04-04
3,Anhui/SZ005/2020,ncov,EPI_ISL_413485,?,2020-01-24,Asia,China,Anhui,Suzhou,Asia,...,29860,Human,58,Male,"Department of microbiology laboratory,Anhui Pr...","Department of microbiology laboratory,Anhui Pr...",Li et al,https://www.gisaid.org,?,2020-03-05
4,Argentina/C121/2020,ncov,EPI_ISL_420600,?,2020-03-07,South America,Argentina,Argentina,,South America,...,29903,Human,51,Male,Servicio Virosis Respiratorias-Departamento Vi...,Instituto Nacional Enfermedades Infecciosas C....,Baumeister et al,https://www.gisaid.org,?,2020-04-06


##### Attributes/Description: 
This dataset collects genomes samples between Jan 2020 and Jun 2020.
It provides information about Genomic epidemiology of novel coronavirus
         - Attributes are listed, but no description found
- strain 
- virus 
- gisaid_epi_isl 
- genbank_accession 
- date 
- region 
- country 
- division 
- location 
- region_exposure 
- country_exposure 
- division_exposure 
- segment 
- length 
- host 
- age 
- sex 
- originating_lab 
- submitting_lab 
- authors 
- url 
- title 
- date_submitted


##### More: https://nextstrain.org/ncov/north-america?c=division&f_region=North%20America&r=division

### 8.Subset name: self_care_catalysts
###### Source: self care catalysts app

In [100]:
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER_v4/UNCOVER/self_care_catalysts/self_care_catalysts/covid-19-symptom-tracker.csv')
data.loc[15000:15010]
data.head()

Unnamed: 0,study_id,date_measured,tool_id,tool_title,description,value,answer_text
0,108535,2020-03-16,101658,Fatigue Manager,If fatigue causes problems for you please indi...,,Side effects of medications
1,108535,2020-03-16,101658,Fatigue Manager,If fatigue causes problems for you please indi...,,Disturbed sleep
2,108535,2020-03-16,101658,Fatigue Manager,If fatigue causes problems for you please indi...,,Effort for routine and self-care tasks
3,108535,2020-03-16,101658,Fatigue Manager,If fatigue causes problems for you please indi...,,Pain
4,108535,2020-03-16,101658,Fatigue Manager,If fatigue causes problems for you please indi...,,Poor quality sleep


##### Attributes/Description: 
This dataset come from a covid symptom tracker app.
         - Attributes are listed, but no description found
- study_id 
- date_measured 
- tool_id 
- tool_title 
- description: symptom description
- value: the seriousness degree of the symptom
- answer_text: additional information


##### More: https://www.selfcarecatalysts.com/

## Confirmed cases datasets
### 9.Subset name: johns_hopkins_csse
###### Source: JHU

In [101]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/johns_hopkins_csse/2019-novel-coronavirus-covid-19-2019-ncov-data-repository-confirmed-cases-in-the-us.csv')
data.head()

Unnamed: 0,uid,iso2,iso3,code3,fips,admin2,province_state,country_region,lat,long,combined_key,date,confirmed
0,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,"American Samoa, US",2020-01-22,0
1,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,"American Samoa, US",2020-01-23,0
2,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,"American Samoa, US",2020-01-24,0
3,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,"American Samoa, US",2020-01-25,0
4,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,"American Samoa, US",2020-01-26,0


##### Attributes/Description: 
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).

- uid: unique identifier
- iso2: country code 2
- iso3: country code 3
- code3: area code
- fips: fips 
- admin2: second administration 
- province_state: province/state
- country_region: country/region
- lat 
- long 
- combined_key 
- date 
- confirmed: confirmed cases

##### More: https://github.com/CSSEGISandData/COVID-19

### 10.Subset name: New_York_Times
###### Source: New York Times

In [102]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/New_York_Times/covid-19-county-level-data.csv')
data.head()

Unnamed: 0,date,county,state,fips,cases,deaths
0,2020-01-21,Snohomish,Washington,53061.0,1,0
1,2020-01-22,Snohomish,Washington,53061.0,1,0
2,2020-01-23,Snohomish,Washington,53061.0,1,0
3,2020-01-24,Cook,Illinois,17031.0,1,0
4,2020-01-24,Snohomish,Washington,53061.0,1,0


##### Attributes/Description: 
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

- date: date
- county: county
- state: state 
- fips: fips 
- cases: confirmed cases 
- deaths: confirmed deaths

##### More: https://github.com/nytimes/covid-19-data

### 11.Subset name: USAFacts
###### Source: USAFacts

In [103]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/USAFacts/confirmed-covid-19-cases-in-us-by-state-and-county.csv')
data.tail()

Unnamed: 0,county_fips,county_name,state_name,state_fips,date,confirmed,lat,long,geometry
313105,56045,Weston County,WY,56,2020-04-24,0,43.839612,-104.567488,POINT (-104.5674881 43.83961191)
313106,56045,Weston County,WY,56,2020-04-25,0,43.839612,-104.567488,POINT (-104.5674881 43.83961191)
313107,56045,Weston County,WY,56,2020-04-26,0,43.839612,-104.567488,POINT (-104.5674881 43.83961191)
313108,56045,Weston County,WY,56,2020-04-27,0,43.839612,-104.567488,POINT (-104.5674881 43.83961191)
313109,56045,Weston County,WY,56,2020-04-28,0,43.839612,-104.567488,POINT (-104.5674881 43.83961191)


##### Attributes/Description: 
The aggregated data is from the Centers for Disease Control and Prevention (CDC), state- and local-level public health agencies. County-level data is confirmed by referencing state and local agencies directly. Cases, deaths, and per capita adjustments reflect cumulative totals since January 22, 2020.

- county_fips: county fips
- county_name: county name
- state_name: state name
- state_fips: state fips 
- date: date 
- confirmed: confirmed cases
- lat: lat
- long: long 
- geometry: geometry coordinates

##### More: https://usafacts.org/articles/detailed-methodology-covid-19-data/

## Other datasets
#### only show the dataset as the information provided can be seen from the name and they have much overlap with the introduced datasets above
#### canadian_outbreak_tracker

In [104]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/canadian_outbreak_tracker/canada-cumulative-case-counts.csv')
data.head()

Unnamed: 0,date,frequency,cases,objectid2,recovered,dailyrecovered,totaldeaths,dailydeaths,totaltested,dailytested,totalactive,dailyactive,totalhospitalized,dailyhospitalized,totalicu,dailyicu,retrieved_at
0,1579953600000,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,,,,,2020-04-29 14:16:03
1,1580040000000,0,1,2,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,,,,,2020-04-29 14:16:03
2,1580126400000,1,2,3,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,,,,,2020-04-29 14:16:03
3,1580212800000,1,3,4,0.0,0.0,0.0,0.0,0.0,0.0,3.0,1.0,,,,,2020-04-29 14:16:03
4,1580299200000,0,3,5,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,,,,,2020-04-29 14:16:03


#### ECDC

In [105]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/ECDC/current-data-on-the-geographic-distribution-of-covid-19-cases-worldwide.csv')
data.head()

Unnamed: 0,daterep,day,month,year,cases,deaths,countriesandterritories,geoid,countryterritorycode,popdata2018,continentexp
0,,28,4,2020,172,0,Afghanistan,AF,AFG,37172386.0,Asia
1,,27,4,2020,68,10,Afghanistan,AF,AFG,37172386.0,Asia
2,,26,4,2020,112,4,Afghanistan,AF,AFG,37172386.0,Asia
3,,25,4,2020,70,1,Afghanistan,AF,AFG,37172386.0,Asia
4,,24,4,2020,105,2,Afghanistan,AF,AFG,37172386.0,Asia


#### esri_covid-19 airport information

In [106]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/esri_covid-19/esri_covid-19/coronavirus-world-airport-impacts.csv')
data.head()

Unnamed: 0,geometry,ident,type,name,latitude_d,longitude,elevation,continent,iso_countr,iso_region,municipali,scheduled,gps_code,iata_code,status,objectid
0,POINT (-76.5008 0.505228),SKAS,medium_airport,Tres De Mayo Airport,0.505228,-76.5008,815,South America,CO,CO-PUT,Puerto Asís,yes,SKAS,PUU,,3001
1,POINT (-73.1848 7.1265),SKBG,medium_airport,Palonegro Airport,7.1265,-73.1848,3897,South America,CO,CO-SAN,Bucaramanga,yes,SKBG,BGA,,3002
2,POINT (-74.1469 4.70159),SKBO,large_airport,El Dorado International Airport,4.70159,-74.1469,8361,South America,CO,CO-CUN,Bogota,yes,SKBO,BOG,Restrictions,3003
3,POINT (-74.7808 10.8896),SKBQ,medium_airport,Ernesto Cortissoz International Airport,10.8896,-74.7808,98,South America,CO,CO-ATL,Barranquilla,yes,SKBQ,BAQ,Restrictions,3004
4,POINT (-77.3947 6.20292),SKBS,medium_airport,José Celestino Mutis Airport,6.20292,-77.3947,80,South America,CO,CO-CHO,Bahía Solano,yes,SKBS,BSC,,3005


#### geotab

In [107]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/geotab/border-wait-times-at-us-canada-border.csv')
data.head()

Unnamed: 0,borderid,canadaport,americaport,tripdirection,localhour,localdate,daytype,utc_date,utc_hour,averageduration,aggregationmethod,canadaborderzone,can_iso_3166_2,americaborderzone,us_iso_3166_2,borderlatitude,borderlongitude,bordergeohash,version
0,8,Kingsgate,Eastport,US to Canada,6,2020-01-01,Weekdays,2020-01-01,13,2.94,Yearly Average,"POLYGON ((-116.181846857 49.002442186, -116.18...",CA-BC,"POLYGON ((-116.182157993 49.0004573163, -116.1...",US-ID,49.001,-116.181,c2tqu0r,1
1,8,Kingsgate,Eastport,Canada to US,16,2020-01-01,Weekdays,2020-01-01,23,5.26,Monthly Average,"POLYGON ((-116.181846857 49.002442186, -116.18...",CA-BC,"POLYGON ((-116.182157993 49.0004573163, -116.1...",US-ID,49.001,-116.181,c2tqu0r,1
2,8,Kingsgate,Eastport,US to Canada,7,2020-01-01,Weekdays,2020-01-01,14,3.85,Yearly Average,"POLYGON ((-116.181846857 49.002442186, -116.18...",CA-BC,"POLYGON ((-116.182157993 49.0004573163, -116.1...",US-ID,49.001,-116.181,c2tqu0r,1
3,8,Kingsgate,Eastport,Canada to US,7,2020-01-01,Weekdays,2020-01-01,14,4.31,Yearly Average,"POLYGON ((-116.181846857 49.002442186, -116.18...",CA-BC,"POLYGON ((-116.182157993 49.0004573163, -116.1...",US-ID,49.001,-116.181,c2tqu0r,1
4,8,Kingsgate,Eastport,Canada to US,17,2020-01-01,Weekdays,2020-01-02,0,4.28,Yearly Average,"POLYGON ((-116.181846857 49.002442186, -116.18...",CA-BC,"POLYGON ((-116.182157993 49.0004573163, -116.1...",US-ID,49.001,-116.181,c2tqu0r,1


#### harvard_global_health_institute

In [108]:
# sample data
data = pd.read_csv('data/UNCOVER COVID-19 Challenge/UNCOVER/harvard_global_health_institute/hospital-capacity-by-state-20-population-contracted.csv')
data.head()

Unnamed: 0,state,total_hospital_beds,total_icu_beds,hospital_bed_occupancy_rate,icu_bed_occupancy_rate,available_hospital_beds,potentially_available_hospital_beds,available_icu_beds,potentially_available_icu_beds,adult_population,...,percentage_of_potentially_available_icu_beds_needed_six_months,percentage_of_total_icu_beds_needed_six_months,icu_beds_needed_twelve_months,percentage_of_available_icu_beds_needed_twelve_months,percentage_of_potentially_available_icu_beds_needed_twelve_months,percentage_of_total_icu_beds_needed_twelve_months,icu_beds_needed_eighteen_months,percentage_of_available_icu_beds_needed_eighteen_months,percentage_of_potentially_available_icu_beds_needed_eighteen_months,percentage_of_total_icu_beds_needed_eighteen_months
0,AK,1583.0,130.0,0.66,0.58,533.0,1058.0,55.0,93.0,552319.0,...,3.34,2.39,155.0,2.82,1.67,1.19,101.0,1.84,1.09,0.78
1,AL,13959.0,1870.0,0.64,0.68,4994.0,9476.0,606.0,1238.0,3748089.0,...,1.83,1.21,1131.0,1.87,0.91,0.6,738.0,1.22,0.6,0.39
2,AR,8428.0,856.0,0.52,0.58,4069.0,6248.0,362.0,609.0,2272226.0,...,2.27,1.61,690.0,1.91,1.13,0.81,450.0,1.24,0.74,0.53
3,AZ,12868.0,1742.0,0.62,0.53,4938.0,8903.0,814.0,1278.0,5187520.0,...,2.47,1.82,1581.0,1.94,1.24,0.91,1031.0,1.27,0.81,0.59
4,CA,68554.0,8131.0,0.67,0.58,22831.0,45692.0,3381.0,5756.0,29868127.0,...,3.04,2.15,8737.0,2.58,1.52,1.07,5698.0,1.69,0.99,0.7
