<h1> HARP - Healthcare Availability Resource Prediction of COVID-19 </h1>

<h2> Getting deep into Healthcare infrastructure and Services affected COVID-19 Growth rates </h2>

<h3> Task Details </h3>

The Roche Data Science Coalition is a group of like-minded public and private organizations with a common mission and vision to bring actionable intelligence to patients, frontline healthcare providers, institutions, supply chains, and government. The tasks associated with this dataset were developed and evaluated by global frontline healthcare providers, hospitals, suppliers, and policy makers. They represent key research questions where insights developed by the Kaggle community can be most impactful in the areas of at-risk population evaluation and capacity management. - COVID19 Uncover Challenge

<h3> Notebook Detail Study </h3>

This notebook tends to analyze the methods in which the healthcare infrastructure and resources might had an impact on COVID-19. For this notebook I'll drill down into the associations and correaltions of healthcare and COVID-19 spread. With detailed analysis of what kind of patients actually require HCP and Hospital care, we would look on how our existing infrastructure competes on this ground.

<h3> Brief overview of Notebook as per Evaluation Criteria </h3>

The notebook takes use of the UNCOVER , Covidnet-hospitalization-rates dataset, COVCSD (COVID-19 Countries Statistical Dataset) and COVID-19 in USA dataset to answer the questions in this notebook. More data is collected open-sourced from Staista and John Hopkings Dataset on COVID-19. Using many statistical analyses methods (mentioned under individual URL's) it tends to understand the cause of COVID-19 Hospitalizations on Clinincal Terms and understand the requirement of health infrastrucutre on a better note.

This would certainly prove helpful for policy makers to understand what actually causes hospitalizations in COVID-19 to prepare the existing medical infrastructure to handle the pandemic proving useful with time. Since this notebook is an extended version of my already published notebooks and datasets, it uses some datasets published by me, collaboratively collected to drill down into analysis.

# <a id='lib'><h3>Importing the essential libraries</h3></a>

In [None]:
#Data Analyses Libraries
import pandas as pd                
import numpy as np    
from urllib.request import urlopen
import json
import glob
import os

#Importing Data plotting libraries
import matplotlib.pyplot as plt     
import plotly.express as px       
import plotly.offline as py       
import seaborn as sns             
import plotly.graph_objects as go 
from plotly.subplots import make_subplots
import matplotlib.ticker as ticker
import matplotlib.animation as animation
from matplotlib.ticker import MaxNLocator

#Other Miscallaneous Libraries
import warnings
warnings.filterwarnings('ignore')
from IPython.display import HTML
import matplotlib.colors as mc
import colorsys
from random import randint
import re

# <a id='data'><h3>Datasets used for analyses in this notebook</h3></a>

The various datasets that we take under consideration for this particular notebook are mentioned underneath:

1. UNCOVER Dataset uploaded under the UNCOVER Covid-19 Challenge. Specifically, the clinical trials data available under einstein.
2. COVCSD (Covid-19 Countries Statistical Data), this is a publically available dataset made by me to analyze the cases.
3. COVID-19 NET Hospitaliation rates dataset. (Made available by - Karl Weinmeister)
4. COVID-19 Spread in USA. (Made available by SRK)


# Question A : Who actually got Hospitalized in case of COVID-19?

<H3> AnalysIs 1 : What is the percentage of hospital admission rates in COVID-19? </h3>

In [None]:
#Uploading the dataset
us_tracker1 = pd.read_csv('../input/uncover/UNCOVER/covid_tracking_project/covid-statistics-for-all-us-totals.csv')

#Getting the values from the dataset
us_tests_conducted = int(us_tracker1['totaltestresults'])
us_positive_tests = int(us_tracker1['positive'])
us_hospitalizations = int(us_tracker1['hospitalized'])

#Printing the values(
print('Out of total {} Covid-19 tests conducted in US, {} were positive and {} were hospitalized which is {}% of total confirmed cases'
      .format(us_tests_conducted,us_positive_tests,us_hospitalizations,(us_hospitalizations/us_positive_tests)*100))

For the US, out of available data on 5795728 COVID-19 Tests conducted, 10.567% of the population required hospitalization care. We look forward to more recent data available for US to understand this on State Level. The following dataset is obtained from SRK's COVID-19 in USA Dataset, which talks about total hospitalizations for COVID-19 for most of the US States.

In [None]:
#Getting the total cases (State wise) in USA
usa_covid_cases = pd.read_csv('../input/covid19-in-usa/us_states_covid19_daily.csv')

#Filter to get the recent data
condition = usa_covid_cases['date'] == 20200621
usa_covid_cases = usa_covid_cases[condition]

#Getting the dataset
usa_covid_cases.tail()

In [None]:
#Adding a column to calculate rate of Admission 
usa_covid_cases['Hospitalization_rate'] = usa_covid_cases['hospitalizedCumulative']/usa_covid_cases['positive']*100

#Sorting the dataframe in descending order
usa_covid_cases.sort_values(by=['Hospitalization_rate'], inplace=True, ascending=False)

<iframe src='https://flo.uri.sh/visualisation/2789537/embed' frameborder='0' scrolling='no' style='width:100%;height:600px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2789537/?utm_source=embed&utm_campaign=visualisation/2789537' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

We observe certain states of US that have a high hospitalization rate than the other states. In the later part of the notebook we would investigate on this issue.

<h3> Getting hospitalization Rates of Spain </h3>

In [None]:
#Fetching the data for Spain
spain_data = pd.read_csv('../input/covcsd-covid19-countries-statistical-dataset/covid19-spain-cases.csv')

#Getting the latest data and hospital admission rate in Spain.
latest = spain_data['fecha'] == '2020-04-01'
spain_cases = spain_data[latest]

#Obtaining the values
spain_cases['Rate_general'] = (spain_cases['hospitalizados']/spain_cases['casos'])*100
spain_cases['Rate_icu'] = (spain_cases['uci']/spain_cases['casos'])*100

#Printing the values
spain_cases.head()

<iframe src='https://flo.uri.sh/visualisation/2019610/embed' frameborder='0' scrolling='no' style='width:100%;height:600px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2019610/?utm_source=embed&utm_campaign=visualisation/2019610' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

<h3> Observations from the above dataset </h3>

Out of total confirmed COVID-19 Cases in US, 10.567% was the hospital admission rate. The admission rates in hospitals in US are significantly lower the reported details of confirmed cases are coming at a slower pace that the positive case counts. For the latest data that we have as of now, hospitalization rate varies State wise, and the details of which are made available in the graph.

Cataluna Province in Spain as the highest hospital admission rates of COVID-19 Cases for the entire country in Spain. The individual hospitalization rates fro individual provinces in Spain has been highlighted in the graph generated above.

<h3> Analysis 2 : What age-group of People are admitted to the Hospital? Getting Deep into US Data </h3>

We take COVID-19 NET Hospitalization Rates dataset into the consideration to understand this in a much better note. This dataset was uploaded last (2 months ago) hence we directly fetch the latest data avaialable on the website gis.cdc.gov for analysis.

In [None]:
#Reading the COVID-19 NET Hospitalization rates dataset
net_data = pd.read_csv('/kaggle/input/covidnet-hospitalization-rates/COVID-NET_Surveillance_03-28-2020.csv')

#Getting to display the dataset
net_data.head()

In [None]:
#Dropping the null values in the dataset
net_data = net_data.dropna()

#Plotting the dataset

ages = net_data.AGE_CATEGORY.unique()

ages = ages[ages != '65+ yr']
ages = ages[ages != '85+']
ages = ages[ages != 'Overall']
df_ages = net_data[net_data.AGE_CATEGORY.isin(ages)]

df_ages = df_ages[df_ages.CATCHMENT == 'Entire Network']

fig, ax2 = plt.subplots(figsize=(20, 5))
ax2.xaxis.set_major_locator(MaxNLocator(integer=True))
sns.lineplot(x=df_ages['MMWR-WEEK'], y=df_ages.WEEKLY_RATE, hue=df_ages.AGE_CATEGORY, ax=ax2)
fig.show()

<h3> Observations from the dataset </h3>

For the initial phases and weeks of MMWR-WEEK dataset we observe the following trends in hospitalization of the COVID-19 Patients in the US

* In span of 3 weeks (21days) admission rate of patients (Age 75-84) was >10% of the total confirmed cases of COVID-19 was hospitalized. 
* The hospital admission rate decreased with the decrease in age in the hospital admission rate.

We dig deeper into fetching the data from the website to observe the latest trends in the dataset. The graphs mentioned underneath is uploaded with the most recent available data downloaded and analyzed from the website.


<h3> COVID-19 Hospital Admission Rates w.r.t Age Group in USA Counties </h3>

<img src="https://i.ibb.co/Vq3sM2q/covid-19-net-data.png" alt="covid-19-net-data" border="0" width="1200" height="1200"></a><br/>

<h3> Analysis of Laboratory Confirmed Hospitalizations w.r.t Age Group in USA </h3>

The following is the rate of population per 100,000 people in COVID-19 divided on the basis of age.

* Rate - 450 (For Age 85+) being the highest age group being hospitalized with COVID-19
* Rate - 200 (For Age 75-84) being the second highest hospitalized age group due to COVID-19.

Clearly we can finalize on the trends (as particular in USA) that the **older age of the populations have higher admissions rate in hospitals.**

<h3> Analysis 3 : Ailments vs. Hospitalized Cases. Do certain medical ailments admit people to Hospitals? </h3>

We take COVID-19 NET Hospitalization Rates dataset into the consideration to understand this in a much better note. This dataset was uploaded last (2 months ago) hence we directly fetch the latest data avaialable on the website gis.cdc.gov for analysis. We look for the relations of hospitalization rates of COVID-19 with the patients already suffering with a certain ailment.


<h3> What is the underlying medical conditions of Hospitalized COVID-19 Patients? </h3>

(The underlying data is fetched from https://gis.cdc.gov/grasp/COVIDNet/COVID19_5.html)


In [None]:
# Creating the dataset from the URL

ailments_data = {'Medical_condition' : ['Asthma','Autoimmune diseases','Cardiovascualr disease','Chronic lung disease','Liver disease','Hypertension','Immune Suppression','Metabolic disease','Neurological Disease','Obesity','Renal Disease','Other Disease','No medical condition'],
                'Hospitalized_adults' : [738,179,2023,1263,309,3386,582,2509,1404,2735,964,334,504]}
ailments_data = pd.DataFrame(ailments_data)

#Plotting the dataset
fig = px.bar(ailments_data, x='Medical_condition', y='Hospitalized_adults')
fig.show()

<h3> Analysis of Medical_condition vs. Hospitalized_cases </h3>

* Hypertension and Obesity was one of the prime medical conditions among populations who were hsopitalized.
* Secondly, Metabolic and Cardiovascular were second most prioritized ailments.

We get the following conditions of ailments and cases.

# Answer A - The People which were hospitalized - US

<h3> The set of observations drawn while we understand which populations were admitted to Hospitals </h3>

* Older age was prone to hospitalizations in the US. With close to a correlation of 1, with increase in age hospitalization cases were increased.
* Hypertension and Obesity was other factors that were prominent in COVID-19 hospitalized patients.
* Metabolic and Cardiovascular diseases were also seen as the most prominent ailment for people suffering from COVID-19.

# Question B - Which medical parameters led a COVID-19 Patient to Hospital?

To understand and analyze this question we look forward with the Clinical Trails data available on Kaggle to understand the prominent medical parameters present in COVID-19 Confirmed Patients.

In [None]:
#Importing the clinical spectrum data
clinical_spectrum = pd.read_csv('../input/uncover/UNCOVER/einstein/diagnosis-of-covid-19-and-its-clinical-spectrum.csv')

#Filtering the data to contain the values only for the confirmed COVID-19 Tests
confirmed = clinical_spectrum['sars_cov_2_exam_result'] == 'positive'
clinical_spectrum = clinical_spectrum[confirmed]

#Viewing the dataset statistics
clinical_spectrum.tail()

In [None]:
#Filetering the datasets
hospitalized_condtion = clinical_spectrum['patient_addmited_to_regular_ward_1_yes_0_no'] == 't'
us_hospitalized_spectra = clinical_spectrum[hospitalized_condtion]


unhospitalized_condtion = clinical_spectrum['patient_addmited_to_regular_ward_1_yes_0_no'] == 'f'
us_unhospitalized_spectra = clinical_spectrum[unhospitalized_condtion]

#Taking mean value of the spectra conditions
hospitalized_mean = us_hospitalized_spectra.mean(axis = 0, skipna = True) 
unhospitalized_mean = us_unhospitalized_spectra.mean(axis = 0, skipna = True) 

#Making columns for the dataset
hospitalized_mean = hospitalized_mean.to_frame()
hospitalized_mean = hospitalized_mean.reset_index()
hospitalized_mean.columns = ['Parameter','Hospitalized_figures']

unhospitalized_mean = unhospitalized_mean.to_frame()
unhospitalized_mean = unhospitalized_mean.reset_index()
unhospitalized_mean.columns = ['Parameter','Unhospitalized_figures']

#Merging both the dataframes together
hospitalized_mean['Unhospitalized_figures'] = unhospitalized_mean['Unhospitalized_figures']

#Viewing the dataset
hospitalized_mean.dropna()

#The most important clinical factors
hospitalized_mean['Change'] =  hospitalized_mean['Hospitalized_figures'] - hospitalized_mean['Unhospitalized_figures']
hospitalized_mean.sort_values(['Change'], axis=0, ascending=True, inplace=True) 

#Getting to know the health factors that define HCP Requirement for a patient
lower = hospitalized_mean.head(10)
higher = hospitalized_mean.tail(10)

<h3> The Definitive Medical Parameters on Clinical Trials that led a COVID-19 Positive Patient to Hospital </h3>

In [None]:
#Printing the values
for i in lower['Parameter']:
    print('For lower value of {}, the patient may require HCP'.format(i))
    
for i in higher['Parameter']:
    print('For higher value of {}, the patient may require HCP'.format(i))


<iframe src='https://flo.uri.sh/visualisation/2016047/embed' frameborder='0' scrolling='no' style='width:100%;height:600px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2016047/?utm_source=embed&utm_campaign=visualisation/2016047' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

# Answer B - The Clinical Figures that led a patient to hospital

1. When the constituents/concentration levels of the following compounds in the body are low, and the person is tested +ve for COVID-19 he/she may require HCP. The mentioned consitutents are as under.

   - Rods 
   - Monocytes
   - Aspartate_transaminase
   - Po2_venous_blood_gas
   - Base_excess_arterial_blood_gas,
   - Alanine_transaminase
   - hco3_arterial_blood_gas_analysis results
   - Total_co2_arterial_blood_gas_analysis result
   - Ionized_calcium, the patient may require HCP
   
   
2. When the constituents/concentration levels of the following compounds in the body are high, and the person is tested +ve for COVID-19 he/she may require HCP. The mentioned consitutents are as under.


   - Phosphorous
   - fio2_venous_blood_gas_analysis tests
   - partial_thromboplastin_time_ptt
   - prothrombin_time_pt_activity
   - vitamin_b12
   - d_dimer, the patient may require HCP
   
3. For the elder population tested +ve for COVID-19, if the concentration of above mentioned constituents is not normal, and if the COVID-19 tests for that very patients turn out to be positive, the must immediately seek care as trends show, for that age population, the cases tend to go lethal. Hence, ICU might be required for that patients.

# Question C - Which populations Passed Away from COVID-19?

We look to a recent post published in New York Times.

Pneumonia caused by the coronavirus has had a stunning impact on the city’s hospital system. Normally an E.R. has a mix of patients with conditions ranging from the serious, such as heart attacks, strokes and traumatic injuries, to the nonlife-threatening, such as minor lacerations, intoxication, orthopedic injuries and migraine headaches.

These patients did not report any sensation of breathing problems, even though their chest X-rays showed diffuse pneumonia and their oxygen was below normal. We are just beginning to recognize that Covid pneumonia initially causes a form of oxygen deprivation we call “silent hypoxia” — “silent” because of its insidious, hard-to-detect nature.

Pneumonia is an infection of the lungs in which the air sacs fill with fluid or pus. Normally, patients develop chest discomfort, pain with breathing and other breathing problems. But when Covid pneumonia first strikes, patients don’t feel short of breath, even as their oxygen levels fall. And by the time they do, they have alarmingly low oxygen levels and moderate-to-severe pneumonia (as seen on chest X-rays).

<H3> Analysis : Are there any relations with past diseases/infections of Patients who die due to COVID-19? </H3>

We look for the symptoms found in patients who died due to COVID-19 across multiple countries in world. We tend to look for patterns in the same. These visalizations were generated by Statista 2020. 

<img src="https://www.statista.com/graphic/1/1102796/south-korea-covid-19-deaths-by-chronic-disease.jpg" alt="Statistic: Breakdown of coronavirus (COVID-19) deaths in South Korea as of March 16, 2020, by chronic disease | Statista" style="width: 100%; height: auto !important; max-width:1000px;-ms-interpolation-mode: bicubic;"/></a>

<img src="https://www.statista.com/graphic/1/1108836/china-coronavirus-covid-19-fatality-rate-by-health-condition.jpg" alt="Statistic: Crude fatality rate of novel coronavirus COVID-19 in China as of February 20, 2020, by health condition | Statista" style="width: 100%; height: auto !important; max-width:1000px;-ms-interpolation-mode: bicubic;"/></a>

<img src="https://www.statista.com/graphic/1/1110949/common-comorbidities-in-covid-19-deceased-patients-in-italy.jpg" alt="Statistic: Most common comorbidities observed in coronavirus (COVID-19) deceased patients in Italy as of April 16, 2020 | Statista" style="width: 100%; height: auto !important; max-width:1000px;-ms-interpolation-mode: bicubic;"/></a>

<H3> Observations Seen from the above visualizations </H3>

* For South Korea, the majority of the deaths had Circulatory System Diseases and metabolic diseases. Almost in 62.7% of the patients shared circulatory diseases, overlapped by 46.7% of metabolic diseases.

* For China, cardiovascular diseases, was again the biggest contributor as 13.2% of the deaths due to COVID-19 shared Cardiovascular diseases in common.

* For Italy too, apart Hypertension as a symptom we observe that the cardiovascular system ailments were seen in patients who died due to COVID-19


# Question 4 - Which Patients Suffering from COVID-19 Require ICU Care?

<h3> Can we predict which patients would require ICU Care based on the data available? </h3>

The following analysis is already done descriptively in one of my earlier notebooks posted in earlier UNCOVER Challenge named TREAT : Treatment and Research Established Analysis Tally - [See here](https://www.kaggle.com/aestheteaman01/treat-trial-research-established-analysis-tally). 

The following notebook is coded in R, and it uses COVID-19 Clinical Data as its source dataset to build a model that can predict the confirmity of a patient who can may require to seek ICU Care based on the parameters in the dataset. The model attains an AUC Score of 0.92 hence for the limited data the prediction is accurate. However this could be increased if data is sufficiently available.

<h3> Prediction Score of ICU Model from Clinical Dataset </h3>

<img src="https://i.ibb.co/2qpKk5N/Annotation-2020-06-09-102440.png" alt="Annotation-2020-06-09-102440" border="0" align='left' width="400" height="400">

<H3> Answer 4 - The requirements of ICU </H3>

* The model and detailed analysis done in of my notebooks [- See Here](https://www.kaggle.com/aestheteaman01/treat-trial-research-established-analysis-tally) predicts the requirement of ICU on the basis of the clinical figures available. This can be used to know the requirements of ICU across certain geographical areas.

* In practice, health workers/hospitals using the proposed model would define a policy for a minimum acceptable Sensitivity level or a target number of patients to be prioritized as well as specify current prevalence. These parameters will serve as input for the model to determine patients that would be prioritized as likely to test positive. The model would then output a binary indicator for SARS-CoV-2 infection, likelihood measure and accuracy. The model's output can be used as a tool for prioritization and to support further medical decision making processes. 

* The model has high interpretability further showing that patients admitted with COVID-19 symptoms who tested negative for Rhinovirus Enterovirus, Influenza B and Inf.A.H1N1.2009 and presented low levels of Leukocytes and Platelets were more likely to test positive for SARS-CoV-2.

# Analysis made till now for Deaths, Hospitalizations and ICU Requirement:

<h3> The set of observations drawn while we understand which populations were admitted to Hospitals </h3>

* Older age was prone to hospitalizations in the US. With close to a correlation of 1, with increase in age hospitalization cases were increased.
* Hypertension and Obesity was other factors that were prominent in COVID-19 hospitalized patients.
* Metabolic and Cardiovascular diseases were also seen as the most prominent ailment for people suffering from COVID-19.

<h3> The Clinical Information for COVID-19 Positive Hospitalized Patients </h3>

1. When the constituents/concentration levels of the following compounds in the body are low, and the person is tested +ve for COVID-19 he/she may require HCP. The mentioned consitutents are as under.

   - Rods 
   - Monocytes
   - Aspartate_transaminase
   - Po2_venous_blood_gas
   - Base_excess_arterial_blood_gas,
   - Alanine_transaminase
   - hco3_arterial_blood_gas_analysis results
   - Total_co2_arterial_blood_gas_analysis result
   - Ionized_calcium, the patient may require HCP
   
   
2. When the constituents/concentration levels of the following compounds in the body are high, and the person is tested +ve for COVID-19 he/she may require HCP. The mentioned consitutents are as under.


   - Phosphorous
   - fio2_venous_blood_gas_analysis tests
   - partial_thromboplastin_time_ptt
   - prothrombin_time_pt_activity
   - vitamin_b12
   - d_dimer, the patient may require HCP
   
3. For the elder population tested +ve for COVID-19, if the concentration of above mentioned constituents is not normal, and if the COVID-19 tests for that very patients turn out to be positive, the must immediately seek care as trends show, for that age population, the cases tend to go lethal. Hence, ICU might be required for that patients.

<H3> Observations Seen from medical ailments in common with COVID-19 present in patients that died </H3>

* For South Korea, the majority of the deaths had Circulatory System Diseases and metabolic diseases. Almost in 62.7% of the patients shared circulatory diseases, overlapped by 46.7% of metabolic diseases.

* For China, cardiovascular diseases, was again the biggest contributor as 13.2% of the deaths due to COVID-19 shared Cardiovascular diseases in common.

* For Italy too, apart Hypertension as a symptom we observe that the cardiovascular system ailments were seen in patients who died due to COVID-19

<H3> The requirements of ICU for COVID-19</H3>

* The model and detailed analysis done in of my notebooks [- See Here](https://www.kaggle.com/aestheteaman01/treat-trial-research-established-analysis-tally) predicts the requirement of ICU on the basis of the clinical figures available. This can be used to know the requirements of ICU across certain geographical areas.

1. The model has high interpretability further showing that patients admitted with COVID-19 symptoms who tested negative for Rhinovirus Enterovirus, Influenza B and Inf.A.H1N1.2009 and presented low levels of Leukocytes and Platelets were more likely to test positive for COVID-19.

* All of the above contrains answered in Question 1,2 and 3 could be combined together and can also be used to predict the severity of cases.


# Assumption 1 - Analysis of the Patients and Requirements

<h3> Taking the most severe cases into account </h3>

We investigated the hospital admission rates and deaths of the COVID-19 Patients taking multiple analyses and questions into account. The answers for which are mentioned in the above text. Since we have answers we can take the most significant numbers (Highest) parameters from each of the answers. Hence, by this method we can get to know the COVID-19 Cases that require immediate hospital care based on multiple parameters.

**A patient requires HCP/ICU care if :**

* The Age is high, and he/she is obsese, suffering from cardiovascular/metabolic diseases. (Based on Answer A and C)
* The person tested -ve for Rhinovirus, Influenza B & Inf.A.H1N1 and presented low levels of Leukocytes & Platelets. (Based on Answer B & D)

Now we take these hospitalization and population demographics factors into consideration. And check the spread of COVID-19 in USA and Canada. We wrangle the UNCOVER COVID-19 Dataset to find out more on these factors

# Analysis : The Spread of COVID-19 vs Population Demographics - US

<h3> Understanding the Trends for the United States </h3>

We look forward for the above mentioned figures of Age, Severe Obesity, Cardiovascular Diseases, Metabolic Diseases and Compare them against the Counties of USA to analyze the trends in the spread. We take help for the datasets available in UNCOVER COVID-19 and COVCSD Dataset created by me to understand this on a much better context.

<h3> Which states have the most number of positive COVID-19 Cases? </h3>

In [None]:
#Reading the dataset
usa_cases_tot = pd.read_csv('../input/covcsd-covid19-countries-statistical-dataset/USA_COVID_19.csv',dtype={"fips": str})

#Getting the GeoJSON File for plotting the data
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

#Plotting the data for confirmed Cases of COVID-19    
py.init_notebook_mode(connected=True)

usa_cases_tot['log_ConfirmedCases'] = np.log(usa_cases_tot.Cases + 1)
usa_cases_tot['fips'] = usa_cases_tot['fips'].astype(str).str.rjust(5,'0')
 
fig = px.choropleth(usa_cases_tot, geojson=counties, locations='fips', color='log_ConfirmedCases',
                           color_continuous_scale="Viridis",
                           range_color=(0, 12),
                           scope="usa")

fig.update_layout(title_text="Confirmed Cases of COVID-19 in USA - June 09, 2020")
py.offline.iplot(fig)

<iframe src='https://flo.uri.sh/visualisation/2789652/embed' frameborder='0' scrolling='no' style='width:100%;height:600px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2789652/?utm_source=embed&utm_campaign=visualisation/2789652' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

We observe the area around the coastline of the US to have the highest number of confirmed COVID-19 Cases. The top states of the US with Highest number of cases are displayed in the Bar Chart mentioned above.

<h3> Do the states that have high number of cases have high hospital admission rates? </h3>

To answer this question we look forward with the net hospitalization rates for COVID-19 in US States and compare them against the confirmed COVID-19 Cases for that state.

<iframe src='https://flo.uri.sh/visualisation/2789537/embed' frameborder='0' scrolling='no' style='width:100%;height:600px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2789537/?utm_source=embed&utm_campaign=visualisation/2789537' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

Well, in this case we get to know :

* For most of the states, the increase in hospitalization rate didn't denoted a increase in cases. 
* For the states where cases were very high (Eg. CT, NY, FL) the hospital admission rate and positive case rate were correlated.
* The overall correaltion between the positive cases and hospital admission in US -> 0.65 

The slight correlation here is sufficient for analytical assumptions as our sample size in this case is significantly less.

<H3> Are there any population demographics Factor that is affecting Hospitalization? </h3>

To answer this question we import the Hospitalization Causes Dataset prepared by me under the COVCSD Dataset published on Kaggle. This dataset combines the County Healthcare Rankings and Hospitalization Rates dataset created in this notebook.

In [None]:
#Importing the dataset and Removing NaN Values.
health_indices = pd.read_csv('../input/covcsd-covid19-countries-statistical-dataset/Hospitalization Causes.csv')
health_indices.dropna(subset=['Hospitalization Rates'], axis=0, inplace=True)

#Finding the correaltion Matrix
corr_matrix = health_indices.corr()

#Printing the Correaltion Matrix with just the correlation column
corr_matrix = corr_matrix['Hospitalization Rates']
corr_matrix = corr_matrix.to_frame()

#Getting the correlation matrix values in ascending order
corr_matrix.dropna(subset=['Hospitalization Rates'], axis=0, inplace=True)

#Getting Highest correalted and least correalted values
corr_matrix.sort_values(by=['Hospitalization Rates'], inplace=True, ascending=False)
corr_matrix.drop(corr_matrix.index[0],inplace=True)
most_correlated = corr_matrix.head(15)
least_correlated = corr_matrix.tail(15)

#Printing the heatmap
fig, ax = plt.subplots(figsize=(10,10))  
ax.set_title('Most Correlated Demographic Values with Hospitalization Rates (Positive Correlation) \n\n')
sns.heatmap(most_correlated)


In [None]:
#Printing the heatmap
fig, ax = plt.subplots(figsize=(10,10))  
ax.set_title('Least Correlated Demographic Values with Hospitalization Rates\n\n')
sns.heatmap(least_correlated)

<h3> Observations Made from Heatmaps </h3>

The following are the relationship of Population Demographics with Hospitalizations (For US)

The sample size for the analysis and making of these heatmaps are very less (n=20) hence, the correaltion matrices aren't much significant. However with increase in data availability this value can easily be cross-validated. For the sake of this study:

* % of younger people is negativily correlated with hospitalization rates.
* % of unemployed people depicted a positive correlation with hospitalization rates.

# Relationship between Healthcare Infra and Population Demographics

<h3> Does Healthcare infrastructure is depended on Population Demographics? : Case Study USA </h3>

We would look for the availalbe hospitals in the US Counties to understand this question. This dataset is available under the UNCOVER Covid-19 Dataset file by the name of - Harvard Global Health Institute.

In [None]:
#Creating a dictonary for USA States and Codes

states = {"AL":"Alabama","AK":"Alaska","AZ":"Arizona","AR":"Arkansas","CA":"California","CO":"Colorado","CT":"Connecticut","DE":"Delaware","FL":"Florida","GA":"Georgia","HI":"Hawaii","ID":"Idaho","IL":"Illinois","IN":"Indiana","IA":"Iowa","KS":"Kansas","KY":"Kentucky","LA":"Louisiana","ME":"Maine","MD":"Maryland","MA":"Massachusetts","MI":"Michigan","MN":"Minnesota","MS":"Mississippi","MO":"Missouri","MT":"Montana","NE":"Nebraska","NV":"Nevada","NH":"New Hampshire","NJ":"New Jersey","NM":"New Mexico","NY":"New York","NC":"North Carolina","ND":"North Dakota","OH":"Ohio","OK":"Oklahoma","OR":"Oregon","PA":"Pennsylvania","RI":"Rhode Island","SC":"South Carolina","SD":"South Dakota","TN":"Tennessee","TX":"Texas","UT":"Utah","VT":"Vermont","VA":"Virginia","WA":"Washington","WV":"West Virginia","WI":"Wisconsin","WY":"Wyoming"}
states_data = pd.DataFrame(list(states.items()),columns=['codes','state'])

#Adding the column into COVCSD Dataset to provide codes to states

blend_data = pd.merge(states_data,health_indices,on='state')
blend_data.head()

In [None]:
#Loading the dataset
hospital_data = pd.read_csv('../input/uncover/UNCOVER/harvard_global_health_institute/hospital-capacity-by-state-20-population-contracted.csv')
hospital_data.rename(columns = {'state':'codes'},inplace = True)
hospital_data = hospital_data[['codes','total_hospital_beds','total_icu_beds','hospital_bed_occupancy_rate','icu_bed_occupancy_rate']]

#Merging the datasets together
final_data = pd.merge(blend_data,hospital_data,on='codes')

#Viewing the dataset
final_data.head()

<H3> So are there any relations? </H3>

In [None]:
#Finding the correaltion Matrix
corr_matrix = final_data.corr()

#Printing the Correaltion Matrix with just the correlation column
corr_matrix = corr_matrix['total_hospital_beds']
corr_matrix = corr_matrix.to_frame()

#Getting the correlation matrix values in ascending order
corr_matrix.dropna(subset=['total_hospital_beds'], axis=0, inplace=True)

#Getting Highest correalted and least correalted values
corr_matrix.sort_values(by=['total_hospital_beds'], inplace=True, ascending=False)
corr_matrix.drop(corr_matrix.index[0],inplace=True)
most_correlated = corr_matrix.head(30)
least_correlated = corr_matrix.tail(20)

#Printing the heatmap
fig, ax = plt.subplots(figsize=(10,10))  
ax.set_title('Most Correlated Demographic Values with Hospitalization Bed Infrastructure (Positive Correlation) \n\n')
sns.heatmap(most_correlated)

In [None]:
#Printing the heatmap
fig, ax = plt.subplots(figsize=(10,10))  
ax.set_title('Most Correlated Demographic Values with Hospitalization Bed Infrastructure (Negative Correlation)\n\n')
sns.heatmap(least_correlated)

<h3> Observations Made from Heatmaps </h3>

The following are the relationship of Population Demographics with respect to Infrastructure Availability across US States

1. There was not any demographic data proving to have a strong negative correlation with hospital infrastructure availability.

2. The Hospital infrastrucutre tend to provide a high positive correlation with multiple population demographic values as seen in the heatmap.

The details for the values that tend to show these characters are displayed in the heatmap above.



# Assumption 2 - Understanding COVID-19 Deaths 

<h3> Understanding COVID-19 Deaths w.r.t Hospitalizations : Official covid-19 death tolls still under-count the true number of fatalities </h3>





<h3> How are the Deaths due to COVID-19 distributed globally? </h3>

<img src="https://i.ibb.co/pxk72r2/Annotation-2020-06-27-011905.jpg" alt="Annotation-2020-06-27-011905" border="0" align='left' width="800" height="800">

<h3> Analyses from the Graphs for deaths due to COVID-19 </h3>

Often the cause of death takes several days to establish and report, which creates a lag in the data. And even the most complete covid-19 records will not count people who were killed by conditions that might normally have been treated, had hospitals not been overwhelmed by a surge of patients needing intensive care. - Economist  [(See Here)](https://www.economist.com/graphic-detail/2020/04/16/tracking-covid-19-excess-deaths-across-countries)

A better way to measure the damage caused by such a medical crisis is to look at “excess mortality”: the gap between the total number of people who died from any cause, and the historical average for the same place and time of year.

<H3> Understanding Excess Deaths due to COVID-19 </H3>

<img src="https://i.ibb.co/Tg5Kjn9/4.jpg" alt="4" border="0" align='left' width="800" height="800">

The charts use data from EuroMOMO, a network of epidemiologists who collect weekly reports on deaths from all causes in 24 European countries, covering 350m people. Compared to the baseline average of deaths from 2009-19, the flu seasons of 2017, 2018 and 2019 were all unusually lethal. But the covid-19 pandemic, which arrived much later in the year, has already reached a higher peak—and would have been far more damaging without social-distancing measures. EuroMOMO’s figures suggest that there were about 170,000 excess deaths between March 16th and May 31st.

<h3> How are Excess Deaths of COVID-19 related to Age? </h3>

<img src="https://i.ibb.co/x6zmw9K/5.jpg" alt="5" border="0" align='left' width="1000" height="1000"></a>


<h3> Understandings from the Graphs Above </h3>

1. The deaths due to COVID-19 has exceeded the average range of deaths for the past years.
2. The death trends are variable w.r.t Age. For the lower age groups less death due to COVID-19 was recorded.

<a href="https://ibb.co/55WvtCS"><img src="https://i.ibb.co/19fKhBP/6.jpg" alt="6" border="0"></a>

<H2> Conclusions for the notebook </H2>

In the view of the aforementioned notebook, various questions have been answered with mentioned statistics that posed to a certain conclusion. As the number of cases rose, out of available data, about 10 per cent of the population required hospitalisation while  the reported details of confirmed cases are coming at a slower pace than the positive case counts.

When addressing affected age groups, the old age group ( 75-84) was heavily affected. Howeve, the hospital admission rate decreased with the decrease in age in the hospital admission rate. Hypertension and Obesity came out on top as the prime medical conditions among hospitalised population. While metabolic and cardiovascular diseases were second most priortised ailments. The majority of deaths recorded had Circulatory System Diseases and metabolic diseases.

Available data were then used to check if prediction of patients suffering from COVID - 19 would require ICU. The results also predicted that old age populations, primarily suffering from chronic illness were more prone to death by the Covid - 19.

<H2> The next big steps </H2>

For the next series of Analyses, We would look forward to undestand the medical parameters and hospitalization rate's relation. On this basis we can understand how successful we are in under the approach of "Flattening the Curve"

This notebook would be regular updated by me to check for much newer and diverse data to analyze more trends in spread of COVID-19 and understand it through the terms of health and clinical information. We would love to further test on more datasets across countries and understand how infrastrucutre related to spread of COVID-19. Would update the notebooks with the new findings soon.

Contact LinkedIn - https://www.linkedin.com/in/amankumar01/

Do upvote and comment if you like or wish to suggest something.