# Which population assessed from COVID-19 should stay home and which should see HCP?

<h2> Task Details </h2>
The Roche Data Science Coalition is a group of like-minded public and private organizations with a common mission and vision to bring actionable intelligence to patients, frontline healthcare providers, institutions, supply chains, and government. The tasks associated with this dataset were developed and evaluated by global frontline healthcare providers, hospitals, suppliers, and policy makers. They represent key research questions where insights developed by the Kaggle community can be most impactful in the areas of at-risk population evaluation and capacity management. - COVID19 Uncover Challenge

# <a id='main'><h3>Table of Contents</h3></a>
- [Importing the Essential Libraries](#lib)
- [What we actually know?](#knw)
- [Datasets used in notebook](#data)
- [Spread COVID-19 vs HCP Requirement - Case study of USA](#conc)
- [What is the hospital admission rate of COVID-19 Patients in other regions?](#tot)
- [What is the hospital admission rate of COVID-19 Patients in US?](#usa)
- [Observing the COVID-19 Clinical Spectrum Data to understand the symptoms for the reported cases of COVID-19](#clinic)
- [Which figures contribute towards a patient of COVID-19 Ending up in hospital?](#hospital)
- [The important Findings](#findings)

# <a id='lib'><h3>Importing the essential libraries</h3></a>

In [None]:
#Data Analyses Libraries
import pandas as pd                
import numpy as np    
from urllib.request import urlopen
import json
import glob
import os

#Importing Data plotting libraries
import matplotlib.pyplot as plt     
import plotly.express as px       
import plotly.offline as py       
import seaborn as sns             
import plotly.graph_objects as go 
from plotly.subplots import make_subplots
import matplotlib.ticker as ticker
import matplotlib.animation as animation

#Other Miscallaneous Libraries
import warnings
warnings.filterwarnings('ignore')
from IPython.display import HTML
import matplotlib.colors as mc
import colorsys
from random import randint
import re

# <a id='knw'><h3>What we actually know?</h3></a>

It is a genral trend seen that patients who belong to an elderly age group has higher chances of getting into a COVID-19 infection than that of the younger people. [-Reports ABC News](https://abcnews.go.com/Health/risk-severe-covid-19-increases-decade-age/story?id=69914642)

The datasets mentioned under this challenge takes data from Worldometer which possess the similar figures for the age-group wise distribution of COVID-19 Cases. The figures mentioned there also highlights that people associated with an already exiisting COPD's or medical ailments have a higher risk of getting into a COVID-19 infection -[See here](http://https://www.worldometers.info/coronavirus/coronavirus-age-sex-demographics/)

Not all people who are tested positive for COVID-19 requires HCP (Health Care Personnel). For people who have mild and asymptomatic cases can take enough quarantine and social distancing policies and can be successful in recovery. The patients are tested positive for the virus and are under a crtiical condition require HCP intervention. We can analysis multiple datasets to understand this fact much better and draw out conclusive correlations and evidences to support these hypothesis.

# <a id='data'><h3>Datasets used for analyses in this notebook</h3></a>

The various datasets that we take under consideration for this particular notebook are mentioned underneath:

1. Weather Data for COVID-19 Data Analysis uploaded by Davin Bonin - [See here](https://www.kaggle.com/davidbnn92/weather-data-for-covid19-data-analysis#training_data_with_weather_info_week_4.csv). This dataset contains information about temperature and other weather figures for the countries confirmed with COVID-19 infections. The dataset is updated till April 14th 2020

2. UNCOVER Dataset uploaded under the UNCOVER Covid-19 Challenge.

# <a id='conc'><h3>Spread COVID-19 vs HCP Requirement - Case study of USA</h3></a>

In [None]:
#Importing the dataset
temperature_figures = pd.read_csv('../input/weather-data-for-covid19-data-analysis/training_data_with_weather_info_week_4.csv')

#Converting Temperature to celcius scale
temperature_figures['Temperature'] = (temperature_figures['temp']-32)*(5/9)
temperature_figures['Days since reported'] = temperature_figures['day_from_jan_first']-22

#Removing the not-important columns from the dataset.
temperature_figures.drop(['Id','Lat','Long','day_from_jan_first','wdsp', 'prcp','fog','min','max','temp'],axis=1,inplace=True)

#Viewing the dataset
temperature_figures.head()

<iframe src='https://flo.uri.sh/visualisation/2013830/embed' frameborder='0' scrolling='no' style='width:100%;height:800px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2013830/?utm_source=embed&utm_campaign=visualisation/2013830' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'></a></div>

We observe the following trends from the above hierarchial plots above:

1. Sufflox County has the highest confirmed covid-19 cases.
2. Nassau, Queens, Middlesex, Westchester, Essex, Bronx, Cook are the counties which have the highest reported COVID-19 Cases across US.

# <a id='usa'><h3>What is the hospital admission rate of COVID-19 Patients in US?</h3></a>
 
We look forward to the dataset by Covid-Tracking-Project published under the UNCOVER COVID-19 Challenge to analyze this

In [None]:
#Uploading the dataset
us_tracker1 = pd.read_csv('../input/uncover/covid_tracking_project/covid-statistics-for-all-us-totals.csv')

#Getting the values from the dataset
us_tests_conducted = int(us_tracker1['totaltestresults'])
us_positive_tests = int(us_tracker1['positive'])
us_hospitalizations = int(us_tracker1['hospitalized'])

#Printing the values(
print('Out of total {} Covid-19 tests conducted in US, {} were positive and {} were hospitalized which is {}% of total confirmed cases'.format(us_tests_conducted,us_positive_tests,us_hospitalizations,(us_hospitalizations/us_positive_tests)*100))

# <a id='tot'><h3>What is the hospital admission rate of COVID-19 Patients in other regions?</h3></a>

1. Analyses of Dataset of COVID-19 Hospitalization Rates in Spain.

In [None]:
#Fetching the data for Spain
spain_data = pd.read_csv('../input/covcsd-covid19-countries-statistical-dataset/covid19-spain-cases.csv')

#Getting the latest data and hospital admission rate in Spain.
latest = spain_data['fecha'] == '2020-04-01'
spain_cases = spain_data[latest]

#Obtaining the values
spain_cases['Rate_general'] = (spain_cases['hospitalizados']/spain_cases['casos'])*100
spain_cases['Rate_icu'] = (spain_cases['uci']/spain_cases['casos'])*100

#Printing the values
spain_cases.head()

<iframe src='https://flo.uri.sh/visualisation/2019610/embed' frameborder='0' scrolling='no' style='width:100%;height:600px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2019610/?utm_source=embed&utm_campaign=visualisation/2019610' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

<h3> Observations from the above dataset </h3>

1. Out of total confirmed COVID-19 Cases in US, 2.35% was the hospital admission rate. The admission rates in hospitals in US are significantly lowas the reported details of confirmed cases are coming at a slower pace that the positive case counts.

2. Cataluna Province in Spain as the highest hospital admission rates of COVID-19 Cases for the entire country in Spain. We would further analyze the demographic wise count to search for possible trends if any.

# <a id='clinic'><h3>Observing the COVID-19 Clinical Spectrum Data to understand the symptoms for the reported cases of COVID-19</h3></a>

Understanding the symptoms of COVID-19 for the people who are tested poisitive with the Virus :

We take help of various analyses available on web to answer the underlying questions on symptoms reported for COVID-19


<img src="https://www.statista.com/graphic/1/1105492/china-common-symptoms-of-coronavirus-covid-19-patients.jpg" alt="Statistic: Breakdown of 55,924 sample patients infected with novel coronavirus COVID-19 in China as of February 22, 2020, by symptom | Statista" style="width: 100%; height: auto !important; max-width:1000px;-ms-interpolation-mode: bicubic;"/>

In [None]:
#Importing the clinical spectrum data
clinical_spectrum = pd.read_csv('../input/uncover/einstein/diagnosis-of-covid-19-and-its-clinical-spectrum.csv')

#Filtering the data to contain the values only for the confirmed COVID-19 Tests
confirmed = clinical_spectrum['sars_cov_2_exam_result'] == 'positive'
clinical_spectrum = clinical_spectrum[confirmed]

#Viewing the dataset statistics
clinical_spectrum.head()

<h3> Understanding the Clinical Spectrum data for US </h3>

We take use of multiple plots to understand the clinical spectrum data and figure out the most important variables that let to patients admission in hospitals. We filter the data for the patients that were hospitalized and that weren't hospitalized. We then analyze both the datasets against the available values to figure out difference if any.

In [None]:
#Filetering the datasets
hospitalized_condtion = clinical_spectrum['patient_addmited_to_regular_ward_1_yes_0_no'] == 't'
us_hospitalized_spectra = clinical_spectrum[hospitalized_condtion]


unhospitalized_condtion = clinical_spectrum['patient_addmited_to_regular_ward_1_yes_0_no'] == 'f'
us_unhospitalized_spectra = clinical_spectrum[unhospitalized_condtion]

#Taking mean value of the spectra conditions
hospitalized_mean = us_hospitalized_spectra.mean(axis = 0, skipna = True) 
unhospitalized_mean = us_unhospitalized_spectra.mean(axis = 0, skipna = True) 

In [None]:
#Making columns for the dataset
hospitalized_mean = hospitalized_mean.to_frame()
hospitalized_mean = hospitalized_mean.reset_index()
hospitalized_mean.columns = ['Parameter','Hospitalized_figures']

unhospitalized_mean = unhospitalized_mean.to_frame()
unhospitalized_mean = unhospitalized_mean.reset_index()
unhospitalized_mean.columns = ['Parameter','Unhospitalized_figures']

#Merging both the dataframes together
hospitalized_mean['Unhospitalized_figures'] = unhospitalized_mean['Unhospitalized_figures']

#Viewing the dataset
hospitalized_mean.dropna()
hospitalized_mean.head()

# <a id='hospital'><h3>Which figures contribute towards a patient of COVID-19 Ending up in hospital?</h3></a>

<h3> Does age plays a role in the hospital admission rates? </h3>

<img src="https://www.statista.com/graphic/1/1105402/covid-hospitalization-rates-us-by-age-group.jpg" alt="Statistic: Percentage of COVID-19 cases in the United States from February 12 to March 16, 2020 that resulted in hospitalization, by age group* | Statista" style="width: 100%; height: auto !important; max-width:1000px;-ms-interpolation-mode: bicubic;"/>

<h3> Findings from the underlying graph </h3>

1. For the case of United States the people who are elder ahs the highest % of hospital admissions.
2. Further research needs to be incorported when the datasets are avaiable with the symptomatic factors that are causing the trend.

<h3> Does Age also defines the ICU Admission rates of COVID-19 Patients in US? </h3>

<img src="https://www.statista.com/graphic/1/1105420/covid-icu-admission-rates-us-by-age-group.jpg" alt="Statistic: Percentage of COVID-19 cases in the United States from February 12 to March 16, 2020 that required intensive care unit (ICU) admission, by age group* | Statista" style="width: 100%; height: auto !important; max-width:1000px;-ms-interpolation-mode: bicubic;"/>

The similar trends are observed with the ICU Admission rates of COVID-19 Patients as the hospital admission rates depicted.

<h3> The definitive clinical Factors </h3>

What are the clinical factors available that define where a person should seek HCP/Hospitalization care and who do not?

In [None]:
#The most important clinical factors
hospitalized_mean['Change'] =  hospitalized_mean['Hospitalized_figures'] - hospitalized_mean['Unhospitalized_figures']
hospitalized_mean.sort_values(['Change'], axis=0, ascending=True, inplace=True) 

#Getting to know the health factors that define HCP Requirement for a patient
lower = hospitalized_mean.head(10)
higher = hospitalized_mean.tail(10)

#Printing the values
for i in lower['Parameter']:
    print('For lower value of {}, the patient may require HCP'.format(i))
    
for i in higher['Parameter']:
    print('For higher value of {}, the patient may require HCP'.format(i))


<iframe src='https://flo.uri.sh/visualisation/2016047/embed' frameborder='0' scrolling='no' style='width:100%;height:600px;'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/2016047/?utm_source=embed&utm_campaign=visualisation/2016047' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

<h3> Getting the news </h3>

We look to a recent post published in New York Times. The article states - [Available here](https://www.nytimes.com/2020/04/20/opinion/coronavirus-testing-pneumonia.html)

Pneumonia caused by the coronavirus has had a stunning impact on the city’s hospital system. Normally an E.R. has a mix of patients with conditions ranging from the serious, such as heart attacks, strokes and traumatic injuries, to the nonlife-threatening, such as minor lacerations, intoxication, orthopedic injuries and migraine headaches.

These patients did not report any sensation of breathing problems, even though their chest X-rays showed diffuse pneumonia and their oxygen was below normal. We are just beginning to recognize that Covid pneumonia initially causes a form of oxygen deprivation we call “silent hypoxia” — “silent” because of its insidious, hard-to-detect nature.

Pneumonia is an infection of the lungs in which the air sacs fill with fluid or pus. Normally, patients develop chest discomfort, pain with breathing and other breathing problems. But when Covid pneumonia first strikes, patients don’t feel short of breath, even as their oxygen levels fall. And by the time they do, they have alarmingly low oxygen levels and moderate-to-severe pneumonia (as seen on chest X-rays).


# <a id='findings'><h3>The findings from the Analyses</h3></a>

1. The patients who are admitted to the hospital after testing positive for COVID-19 has a higher quantile age than those who stay at homes and don't require HCP

2. When the constituents/concentration levels of the following compounds in the body are low, and the person is tested +ve for COVID-19 he/she may require HCP. The mentioned consitutents are as under.

   - Rods 
   - Monocytes
   - Aspartate_transaminase
   - Po2_venous_blood_gas
   - Base_excess_arterial_blood_gas,
   - Alanine_transaminase
   - hco3_arterial_blood_gas_analysis results
   - Total_co2_arterial_blood_gas_analysis result
   - Ionized_calcium, the patient may require HCP
   
   
3. When the constituents/concentration levels of the following compounds in the body are high, and the person is tested +ve for COVID-19 he/she may require HCP. The mentioned consitutents are as under.


   - Phosphorous
   - fio2_venous_blood_gas_analysis tests
   - partial_thromboplastin_time_ptt
   - prothrombin_time_pt_activity
   - vitamin_b12
   - d_dimer, the patient may require HCP
   
4. For the elder population tested +ve for COVID-19, if the concentration of above mentioned constituents is not normal, and if the COVID-19 tests for that very patients turn out to be positive, the must immediately seek care as trends show, for that age population, the cases tend to go lethal. Hence, ICU might be required for that patients.

**The above mentioned clinical figures are related to pneumonia ailments. This undetected figures, among populations can be a main source of deaths related to COVID-19. These clinical figures, might go undetected, becuase of which hospitalization is given to patients who become very serious, leading to death. Early detection of the figures above can be a good way to reduce the deaths through a significant number**

**Early detection of these ailments can also prove helpful in reduction of lethal cases and might help with reuduction of patients that require HCP/ICU for proper treatment.**

<h3> The next big steps </h3>
This notebook would be regular updated by me to check for much newer and diverse data to analyze more trends in spread of COVID-19 and understand it through the terms of health and clinical information. I would love to further test on more datasets across countries. Would update the notebooks with the new findings.

Contact LinkedIn - https://www.linkedin.com/in/amankumar01/

Do upvote and comment if you like or wish to suggest something.