<h1 align="center"> Covid-19: Who should get vaccine first? </h1>
<img align = "center" <img src="https://i.ibb.co/L6zMgcd/vac.jpg" alt="vac" border="0"></a>

<h2> I. Introduction : </h2> 

Data used in this Notebook are taken from [CDC](https://ourworldindata.org/coronavirus-source-data).

When the Food and Drug Administration (FDA) authorizes or approves a COVID-19 vaccine, the Advisory Committee on Immunization Practices (ACIP) quickly hold a public meeting to review all available data about that vaccine. From these data, ACIP will then vote on whether to recommend the vaccine and, if so, who should receive it.

On December 1, 2020, ACIP recommended that health care personnel and residents of long-term care facilities be offered COVID-19 vaccine in the initial phase of the vaccination program.

On December 11, 2020, the FDA issued an Emergency Use Authorization for use of the Pfizer-BioNTech COVID-19 vaccineexternal icon in persons aged 16 years and older for the prevention of COVID-19.

On December 13, 2020, the ACIP issued recommendations for the use of Pfizer-BioNTech’s COVID-19 vaccine for the prevention of COVID-19.

On December 18, 2020, the FDA issued an Emergency Use Authorization for the use of the Moderna COVID-19 vaccineexternal icon for use in individuals 18 years of age and older.

On December 20, 2020, ACIP issued recommendations for the use of Moderna COVID-19 vaccine for the prevention of COVID-19.

On December 20, 2020, ACIP updated interim vaccine allocation recommendations.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<h2> LIST OF TOPICS IN THIS STUDY <h2>

[Data taken from CDC](https://www.cdc.gov/nchs/nvss/vsrr/covid_weekly/index.htm)
    
+ Age and sex 
+ Race and Hispanic origin
+ Comorbidities

Let's start by importing the librairies that will be used in this Notebook.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

And then the data.

In [None]:
df1 = pd.read_csv('../input/conditions-contributing-to-deaths-covid19/owid-covid-data.csv')
df2 = pd.read_csv('../input/conditions-contributing-to-deaths-covid19/Provisional_COVID-19_Death_Counts_by_Sex__Age__and_State.csv')
df3 = pd.read_csv('../input/conditions-contributing-to-deaths-covid19/Deaths_involving_coronavirus_disease_2019__COVID-19__by_race_and_Hispanic_origin_group_and_age__by_state.csv')
df4 = pd.read_csv('../input/conditions-contributing-to-deaths-covid19/Conditions_contributing_to_deaths_involving_coronavirus_disease_2019__COVID-19___by_age_group_and_state__United_States.csv')

Before going to study the effect of the underlying condition of Covid-19 patients to the death rate, we resume the Covid cases and deaths number worldwide.

In [None]:
df1.head(2)

In [None]:
df2.head(10)

<h2> II. Covid-19 new cases and deaths number world wide <h2>

In [None]:
df1.columns

In [None]:
df1 = pd.DataFrame(df1, columns=['iso_code', 'continent', 'location', 'date', 'total_cases', 'new_cases','total_deaths', 
                                 'new_deaths','total_vaccinations', 'total_vaccinations_per_hundred', 'population',
                                 'median_age', 'aged_65_older', 'aged_70_older','diabetes_prevalence', 'female_smokers', 
                                 'male_smokers'])

In [None]:
df1.describe().T

In [None]:
df1['date'] = pd.to_datetime(df1['date'])
df1 = df1.iloc[(df1['location']!='World').values]
df1.head(2)

In [None]:
fig = px.choropleth(df1, locations="location", color=df1["total_cases"],locationmode='country names', 
                    hover_name="location",animation_frame=df1["date"].dt.strftime('%Y-%m-%d'), # DataFrame
                    title='New confirmed cases over time updated to 10 Apr 2021', color_continuous_scale=px.colors.sequential.matter)
fig.update(layout_coloraxis_showscale=True)
fig.show()

The number of new covid-19 infections globally is still increasing day by day, and global deaths have not stopped. As of December 26, 2020, the number of infections worldwide was 80,640,724 with 1,763,912 deaths.

In [None]:
px.choropleth(df1.groupby(by='location').sum().reset_index(), locations="location", locationmode='country names', 
              color='new_deaths',hover_name="location", title='new_deaths due to Covid-19 updated on 10 Apr 2021', hover_data=['new_deaths'], 
              color_continuous_scale='matter')

The US, Brazil and India are the three most severely affected countries in the world.

In [None]:
col = 'new_cases'
temp = df1[df1[col]>0].sort_values(col, ascending=True)
fig = px.scatter(temp, x='date', y='location', size=col, color=col, height=3500,color_continuous_scale='Viridis')
fig.update_layout(yaxis = dict(dtick = 1))
fig.update(layout_coloraxis_showscale=False)
fig.show()

In [None]:
temp = df1.groupby(by='location').sum().reset_index().sort_values('new_cases', ascending=False).iloc[:10, :]
fig = px.scatter(temp, x='new_cases', y='new_deaths', color='location', size='new_cases', height=800, text='location', 
                 log_x=True, log_y=True,title='Deaths vs new cases of top 10 countries updated on 10 Apr 2021')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

Although the number of covid-19 infections is lower, there are more deaths from covid in Brazil than in India.

<h2> III. COVID-19 vaccine <h2>

When a new flu strain is identified, like H1N1 in 2009, vaccine manufacturers can use the same processes that are used to make the annual seasonal flu vaccine, saving valuable time. Unlike flu, coronaviruses do not yet have licensed vaccines or processes to build on. In addition, the coronavirus that causes COVID-19 is a new virus, so entirely new vaccines must be developed and tested to ensure they work and are safe. There are many steps in the vaccine testing and approval process. Multiple agencies and groups in the United Statesexternal iconexternal icon are working together to make sure that a safe and effective COVID-19 vaccine is available as quickly as possible.

Multiple COVID-19 vaccines are under development. Large-scale (Phase 3) clinical trials are in progress or being planned for five COVID-19 vaccines in the United States.

At first, there will be a limited supply of COVID-19 vaccine. Operation Warp Speed is working to get those first vaccine doses out once a vaccine is authorized or approved and recommended, rather than waiting until there is enough vaccine for everyone. However, it is important that the initial supplies of vaccine are given to people in a fair, ethical, and transparent way.

Since the pandemic began, ACIP has been holding special meetings to review U.S. data on COVID-19 and the vaccines in development to help prevent it. Before making recommendations, ACIP plans to review all available clinical trial information, including descriptions of:

+ Who is receiving each candidate vaccine (age, race, ethnicity, underlying medical conditions)
+ How different groups respond to the vaccine
+ Side effects experienced?

But, remember that, COVID-19 is [not ‘just another flu’ ](https://www.reuters.com/article/uk-factcheck-not-all-covid19-victims-had/fact-check-not-all-covid-19-victims-had-underlying-health-conditions-new-coronavirus-is-not-just-another-flu-idUSKBN28A2QE)

So, stopping a pandemic requires using all the tools available. 

Vaccines work with your immune system so your body will be ready to fight the virus if you are exposed. Other steps, like covering your mouth and nose with a mask and staying at least 6 feet away from others, help reduce your chance of being exposed to the virus or spreading it to others. 

Goals for vaccination if supply is limited:

+ Decrease death and serious disease as much as possible

+ Preserve functioning of society

+ Reduce the extra burden the disease is having on people already facing disparities

Fortunately, by December 2020, the world found a new light at the end of the tunnel of Covid-19 when there were two companies producing vaccines and a number of other companies doing commercial trials. Although the number of vaccines is limited compared to expectations, [the World is still hoping to have enough vaccines for everyone soon by 2021](https://www.raps.org/news-and-articles/news-articles/2020/3/covid-19-vaccine-tracker).

In [None]:
temp = df1.groupby(by='location').sum().reset_index().sort_values('total_vaccinations', ascending=False).iloc[:20, :]
fig = px.scatter(temp, x='new_cases', y='total_vaccinations', color='location', size='new_cases', height=800, text='location', 
                 log_x=True, log_y=True,title='total_vaccinations vs new cases updated on 10 Apr 2021')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

In [None]:
temp = df1.groupby(by='location').sum().reset_index().sort_values('total_vaccinations', ascending=False).iloc[:20, :]
fig = px.scatter(temp, x='new_cases', y='total_vaccinations', color='location', size='new_cases', height=800, text='location', 
                 log_x=True, log_y=True,title='total_vaccinations vs new cases updated on 10 Apr 2021')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

<h2> IV. Who should take vaccine first? <h2>

Groups considered for early vaccination if supply is limited? While we wait for the vaccine to be enough for everyone, we analyze who should be the first to be vaccinated? 
We are waiting for an official recommendation, four groups to possibly recommend for early COVID-19 vaccination if supply is limited:

+ Healthcare personnel

+ Workers in essential and critical industries

+ People at high risk for severe COVID-19 illness due to underlying medical conditions

+ People 65 years and older

What else?

<h3> 1. Underlying health condition (Pneumonia, Influenza) and Covid-19 <h3>

Early on in the coronavirus disease (COVID-19) pandemic, there was little data on the virus and how it affects the body. As the virus spread across the globe, data showed that some people were at a higher risk of developing severe disease and dying from the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection.

The high-risk individuals are those who are older than 65 years old, those who have weakened immune systems, and those with underlying health conditions, such as heart disease, hypertension, diabetes, obesity, kidney disease, and lung disease, among others.

Now, the U.S. Centers for Disease Control and Prevention (CDC) reports that a majority of deaths in the United States tied to COVID-19 had contributing conditions or comorbidities. Let's start to analyse the data.

In [None]:
df2.head(3)

In [None]:
df2.columns

## 

In [None]:
#df2 = df2.drop(['Footnote'], axis=1)
options = ['0-17 years', '18-29 years', '30-39 years', '40-49 years', '50-64 years', '65-74 years', '75-84 years', '85 years and over']
df2 = df2.iloc[(df2['State']!='United States').values]
df2 = df2[df2['Age Group'].isin(options)]
df2.head(2)

In [None]:
conditions = [df2['State']=='Alabama', df2['State']=='Alaska', df2['State']=='Arizona', df2['State']=='Arkansas',
              df2['State']=='California',df2['State']=='Colorado', df2['State']=='Connecticut', df2['State']=='Delaware',
              df2['State']=='District of Columbia',df2['State']=='Florida', df2['State']=='Georgia',
              df2['State']=='Hawaii', df2['State']=='Idaho', df2['State']=='Illinois', df2['State']=='Indiana',
              df2['State']=='Iowa', df2['State']=='Kansas', df2['State']=='Kentucky', df2['State']=='Louisiana', 
              df2['State']=='Maine', df2['State']=='Maryland',df2['State']=='Massachusetts', df2['State']=='Michigan', 
              df2['State']=='Minnesota', df2['State']=='Mississippi', df2['State']=='Missouri', df2['State']=='Montana', 
              df2['State']=='Nebraska', df2['State']=='Nevada', df2['State']=='New Hampshire',df2['State']=='New Jersey', 
              df2['State']=='New Mexico', df2['State']=='New York', df2['State']=='New York City',df2['State']=='North Carolina',
              df2['State']=='North Dakota', df2['State']=='Ohio', df2['State']=='Oklahoma', df2['State']=='Oregon',
              df2['State']=='Pennsylvania', df2['State']=='Rhode Island', df2['State']=='South Carolina',
              df2['State']=='South Dakota',df2['State']=='Tennessee', df2['State']=='Texas', df2['State']=='Utah', 
              df2['State']=='Vermont', df2['State']=='Virginia', df2['State']=='Washington', df2['State']=='West Virginia', 
              df2['State']=='Wisconsin', df2['State']=='Wyoming', df2['State']=='Puerto Rico']

choices = ['AL','AK','AZ','AR','CA','CO','CT','DE','DC','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MD',
           'MA','MI','MN','MS','MO','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','PA','PR','RI','SC',
           'SD','TN','TX','UT','VT','VA','VI','WA','WV','WI','WY',]
df2['Code'] = np.select(conditions, choices, default='black')

In [None]:
prop = 'COVID-19 Deaths'
fig = px.choropleth(df2.groupby(by='Code').sum().reset_index(),  # Input Pandas DataFrame
                    locations="Code",  # DataFrame column with locations
                    color=prop,  # DataFrame column with color values
                    hover_name='Code', # DataFrame column hover info
                    locationmode = 'USA-states') # Set to plot as US States
fig.update_layout(title_text = prop, # Create a Title
    geo_scope='usa'),  # Plot only the USA instead of globe
fig.show()  # Output the plot to the screen

Deaths involving coronavirus disease 2019 (COVID-19), pneumonia, and influenza reported to NCHS by sex and age group. United States. Week ending 2/1/2020 to 1/12/2021.

Thus, New York is the origin of the epidemic, but TX, NC, CA, FL are the states with the highest number of deaths in the United States.

In [None]:
x0 = 'Age Group'
x1 = 'Total Deaths'
x2 = 'COVID-19 Deaths'
x3 = 'Pneumonia Deaths'
x4 = 'Pneumonia and COVID-19 Deaths'
x5 = 'Influenza Deaths'
x6 = 'Pneumonia, Influenza, or COVID-19 Deaths'
temp = df2.groupby(by=x0).sum().reset_index()
fig = go.Figure(data=[
    go.Bar(name=x2, x=temp[x0], y=temp[x2]),
    go.Bar(name=x3, x=temp[x0], y=temp[x3]),
    go.Bar(name=x4, x=temp[x0], y=temp[x4]),
    go.Bar(name=x5, x=temp[x0], y=temp[x5]),
    go.Bar(name=x6, x=temp[x0], y=temp[x6])])
fig.update_layout(barmode='group')
fig.update_layout(legend=dict(orientation="h",
    yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

In [None]:
px.bar(df2.groupby(by='Age Group').sum().reset_index(),x='Age Group', y='Pneumonia and COVID-19 Deaths')

In [None]:
import plotly.graph_objects as go
temp = df2.groupby(by='Age Group').sum().reset_index()
fig = go.Figure(data=go.Scatterpolar(r=[temp['Pneumonia and COVID-19 Deaths'][2],temp['Pneumonia and COVID-19 Deaths'][3],temp['Pneumonia and COVID-19 Deaths'][4],temp['Pneumonia and COVID-19 Deaths'][5],temp['Pneumonia and COVID-19 Deaths'][6]],
  theta=['30-49 years', '50-64 years','65-74 years', '75-84 years', '85 years and over'],
  fill='toself'))
fig.update_layout(polar=dict(radialaxis=dict(visible=True),),showlegend=False,title="Pneumonia and COVID-19 Deaths")
fig.show()

In [None]:
fig = go.Figure(data=[go.Pie(labels=df2['Age Group'], values=df2['Total Deaths'], hole=.3)])
fig.update_layout(legend=dict(orientation="h",
    yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

Deaths due to pneumonia and / or Covid-19 mainly occur in patients over 50 years of age, most serious is for patients over 75 years old. So, NOT just over 65 years old as announced previously by CDC.

In [None]:
plt.figure(figsize=(15,10))
xprop = 'Age Group'
yprop = 'COVID-19 Deaths'
sns.boxplot(data=df2, x=xprop, y=yprop, hue='Sex')
plt.xlabel('{} range'.format(xprop), size=14)
plt.ylabel('Number of {}'.format(yprop), size=14)
plt.title('Boxplot of {}'.format(yprop), size=20)
plt.show()

In all ages, Men have a higher mortality than Women, except for those over 85, the majority is Female. The are lot of outliers on age 50+, this could be due to some other serious disease or specific reasons NOT just due to the combination of Covid-19 and  Pneumonia / Influenza.

In [None]:
px.violin(df2, y='Pneumonia, Influenza, or COVID-19 Deaths', x='Age Group', color=None, 
          box=True, points="all", hover_data=df2.columns)

In [None]:
yprop = 'Pneumonia and COVID-19 Deaths'
xprop = 'Pneumonia Deaths'
h= 'Age Group'
px.scatter(df2, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")

In [None]:
yprop = 'Pneumonia, Influenza, or COVID-19 Deaths'
xprop = 'Influenza Deaths'
h= 'Age Group'
px.scatter(df2, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")

In [None]:
x0 = 'Age Group'
x1 = 'Pneumonia Deaths'
x2 = 'Pneumonia and COVID-19 Deaths'

temp = df2.groupby(by=x0).sum().reset_index()
x = temp[x1]
y = temp[x2]
fig, ax = plt.subplots(figsize=(15,15))
for xi,yi in zip(x,y):
    r2i = xi / (xi + yi)
    r1i = 1 - r2i
    print(r1i)
    ax.pie([r1i,r2i], colors=['indigo', "gold"],
           center=(xi, yi), radius=xi/20, 
           wedgeprops=dict(width=(xi/20)/2), frame=True)
ax.autoscale()
plt.show()

<h3> 2. Race and Hispanic origin and Covid-19 <h3>

In this section, the coronavirus disease 2019 (COVID-19) deaths are based on a current flow of mortality data in the United States updated on 15 Feb, 2021 (Counts include deaths occurring within the 50 states and the District of Columbia).

We next study the indicators that can be used to illustrate potential differences in the burden of deaths due to COVID-19 reported for each race and Hispanic origin group.

In [None]:
df3.head(2)

In [None]:
df3['Age group'].unique()

In [None]:
options = ['0-17 years','18-29 years','30-49 years', '50-64 years', '65-74 years', '75-84 years', '85 years and over']
df3 = df3.iloc[(df3['State']!='United States').values]
df3 = df3[df3['Age group'].isin(options)]
df3.head(2)

In [None]:
x0 = 'Race and Hispanic Origin Group'
x1 = 'Total Deaths'
x2 = 'COVID-19 Deaths'
x3 = 'Pneumonia Deaths'
x4 = 'Pneumonia and COVID-19 Deaths'
x5 = 'Influenza Deaths'
x6 = 'Pneumonia, Influenza, or COVID-19 Deaths'
px.scatter(df3.groupby(by=x0).sum().reset_index(), x=x0, y=x1, color=x4, size=x3)

Among total number of people who were death and for whom the CDC has race and ethnicity information:

+ 9.84 percent of patients are Hispanic, they make up 18 percent of the U.S. population;
+ 13 percent are black, while they constitute 13 percent of the population; 
+ and 0.675 percent are Native American or Alaskan Natives, nearly double their representation in the overall population.

In [None]:
fig = go.Figure(data=[go.Pie(labels=df3['Race and Hispanic Origin Group'], values=df3['Total Deaths'], hole=.3)])
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

+ And this analyse also shows that non-Hispanic white populations account for the majority of deaths related to COVID-19 –73 percent, but account for 60% of the U.S. population. 
+ This means, Hispanics and Blacks have a disproportionally high percentage of deaths from COVID-19. The mortality rate for Asians, in contrast, is less than their percentage of the population.

In [None]:
px.bar(df3, x='Race and Hispanic Origin Group', y='Pneumonia and COVID-19 Deaths', 
       color='Age group', title='Pneumonia and COVID-19 Deaths')

There are multiple reasons for these disparities, but they may be based in differences in underlying medical conditions among the races, according to Lisa Cooper, M.D., director of the Johns Hopkins Center for Health Equity.

Living conditions, socioeconomic status and access to healthcare (and insurance) also play a role in exposure, transmission and ability to seek medical help.

In [None]:
fig = px.box(df3, x='Age group',y='Pneumonia, Influenza, or COVID-19 Deaths', color='Race and Hispanic Origin Group', notched=True)
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

The disparity in mortality rates for Hispanics is greatest among those 30 to 49, followed by 50 to 64. For White, mortality disparity is greatest for those over 65.

In [None]:
plt.figure(figsize=(15,10))
xprop = 'Age group'
yprop = 'COVID-19 Deaths'
sns.boxplot(data=df3, x=xprop, y=yprop, hue='Race and Hispanic Origin Group')
plt.xlabel('{} range'.format(xprop), size=14)
plt.ylabel('Number of {}'.format(yprop), size=14)
plt.title('Boxplot of {}'.format(yprop), size=20)
plt.show()

In [None]:
yprop = 'Pneumonia and COVID-19 Deaths'
xprop = 'Pneumonia Deaths'
h= 'Race and Hispanic Origin Group'
fig = px.scatter(df3, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

In [None]:
yprop = 'Influenza Deaths'
xprop = 'Pneumonia, Influenza, or COVID-19 Deaths'
h= 'Race and Hispanic Origin Group'
fig=px.scatter(df3, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

<h3> 3. Comorbidities and Covid-19 <h3>

What are comorbidities?

The CDC reports that comorbidity is more than one disease or condition that is present in the same person at the same time. These conditions are often chronic or long-term diseases, which are also called comorbid conditions, multimorbidity, multiple chronic conditions, or coexisting conditions.

Amid the coronavirus pandemic, the CDC said that recent data has shown that the increased risk of COVID-19 severe disease or even death in people with other conditions. Further, people of any age with underlying medical conditions are at a heightened risk for severe illness from the coronavirus infection. The conditions include cancer, chronic kidney disease, chronic obstructive pulmonary disease, immunocompromised people, those who underwent organ transplant, obesity, sickle cell disease, severe heart conditions, and type 2 diabetes.

Some individuals may be at an increased risk from COVID-19 infection if they have asthma, cerebrovascular disease, cystic fibrosis, high blood pressure, HIV, immune deficiencies, dementia, liver disease, smoking, pregnancy, pulmonary fibrosis, thalassemia, and type 1 diabetes.

Globally, the number of recorded infections has surpassed 136,475,875 with over 2,945,824 lives lost (updated on 10 Apr 2021).

Let's discover.

In [None]:
df4 = df4.iloc[(df4['State']!='US').values]
df4.head(2)

In [None]:
df4['Condition Group'].unique()

In [None]:
df4['Condition'].unique()

In [None]:
x0 = 'Age Group'
x1 = 'Condition Group'
x3 = 'COVID-19 Deaths'
x4 = 'Number of Mentions'

px.scatter(df4.groupby(by=x1).sum().reset_index(), x=x1, y=x3, color=x4, size=x3)

In [None]:
import plotly.graph_objects as go
temp = df4.groupby(by='Condition Group').sum().sort_values(by='COVID-19 Deaths', ascending=False).reset_index()

fig = go.Figure(data=go.Scatterpolar(r=[temp['COVID-19 Deaths'][0],temp['COVID-19 Deaths'][1],temp['COVID-19 Deaths'][2],temp['COVID-19 Deaths'][4],temp['COVID-19 Deaths'][5]],
  theta=['Respiratory diseases','COVID-19','Circulatory diseases','Diabetes','Vascular and unspecified dementia'],
  fill='toself'))
fig.update_layout(polar=dict(radialaxis=dict(visible=True),),showlegend=False,title="COVID-19 Deaths by Group condition")
fig.show()

These illustrations shown that Americans who have died from COVID-19 also had other contributing conditions. The number of deaths with each condition or cause is shown for all deaths.

This study shown that, 27.5% of deaths listed COVID-19 as the only cause (compared to only 6 percent published previously on April 2020), revealing that 72.5 percent of the patients who have died from the infection also had other underlying health conditions.

In [None]:
fig = go.Figure(data=[go.Pie(labels=df4['Condition Group'], values=df4['COVID-19 Deaths'], hole=.3)])
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

For deaths with conditions or causes in addition to COVID-19, on average, there were around 2 additional conditions or causes per death. Note that, the number of mentions for each condition or cause is shown for all deaths.

In [None]:
df4['Age Group'].unique()

In [None]:
option = ['0-24', '25-34', '35-44', '45-54', '55-64', '65-74', '75-84','85+']
df4 = df4[df4['Age Group'].isin(option)]
df4 = df4[df4['State']!='US']

In [None]:
fig = px.scatter(df4.groupby(by=['Condition Group','Condition', 'Age Group']).sum().reset_index(), 
           x='Condition Group', y='Age Group', color='COVID-19 Deaths', size='COVID-19 Deaths')
fig.update_yaxes(automargin=True)

Of course, people with certain underlying medical conditions are at increased risk for severe COVID-19 illness, regardless of their age. Severe illness means that the person with COVID-19 may require hospitalization, intensive care, or a ventilator to help them breathe, or that they may even die. Early vaccine access is critical to ensuring the health and safety of this population that is disproportionately affected by COVID-19.

Among adults, the risk for severe illness and death from COVID-19 increases with age, with older adults at highest risk. Early vaccine access is critical to help protect this population that is disproportionately affected by COVID-19.

In [None]:
fig = px.box(df4, x='Condition Group',y='COVID-19 Deaths', color='Age Group', notched=True)
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

In [None]:
fig = go.Figure(data=[go.Pie(labels=df4['Condition'], values=df4['COVID-19 Deaths'], hole=.3)])
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

The top underlying health conditions include influenza and pneumonia, respiratory failure, hypertensive disease, diabetes, cardiac arrest, vascular and unspecified dementia, renal failure, heart failure and other medical conditions.

The leading co-morbidities is shown and we could see that the Covid-19 deaths contribution is increasing and it's TRUE that Covid-19 IS NOT JUST a NEW FLU.

In [None]:
resp = df4[(df4['Condition Group']=='Respiratory diseases').values]
cov = df4[(df4['Condition Group']=='COVID-19').values]
cir = df4[(df4['Condition Group']=='Circulatory diseases').values]
diab = df4[(df4['Condition Group']=='Diabetes').values]
other = df4[(df4['Condition Group']=='All other conditions and causes (residual)').values]

In [None]:
px.treemap(resp.groupby(by='Condition').sum().reset_index(), path=['Condition'], labels='Condition', 
           values='COVID-19 Deaths', title='COVID-19 Deaths by Respiratory diseases')

In [None]:
fig = go.Figure(data=[go.Pie(labels=resp['Condition'], values=resp['COVID-19 Deaths'], hole=.3)])
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

In [None]:
px.box(cov, x='Age Group', y="COVID-19 Deaths", title='Deaths due to Covid, directly')

As such, patients dying from Covid may be of any age, not necessarily older than 65 years. This shows the dangers of Covid-19, which is not a common new flu. Therefore, there should be vaccines for all ages soon, which is what we want in the future.

In [None]:
fig = go.Figure(data=[go.Pie(labels=cir['Condition'], values=cir['COVID-19 Deaths'], pull=[0, 0, 0, 0, 0, 0.30, 0])])
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

In [None]:
px.bar(diab, x='Age Group', y='COVID-19 Deaths')

<h2> V. 4. Statistics <h2>

Function to check statistical hypothesis test

In [None]:
df4['COVID-19 Deaths'].fillna(value=0,inplace=True)

In [None]:
df4.isnull().sum()

In [None]:
from scipy.stats import ttest_ind

def Series_stats(df_in, var, category, prop1, prop2):
    df = df_in

    # Step 1: State the null and alternative hypothesis and select a level of significance is 5% or 0.05

    # Step 2: Collect data and calculate the values of test statistic
    s1 = df[(df[category]==prop1)][var]
    s2 = df[(df[category]==prop2)][var]
    t, p = ttest_ind(s1,s2,equal_var = False)
    print("Two-sample t-test: t={}, p={}".format(round(t,5),p))

    # Step 3: Compare the probability associated with the test statistic with level of significance specified
    if ((p < 0.05) and (np.abs(t) > 1.96)):
        print("\n REJECT the Null Hypothesis and state that: \n at 5% significance level, the mean {} of {}-{} and {}-{} are not equal.".format(var, prop1, category, prop2, category))
        print("\n YES, the {} of {}-{} differ significantly from {}-{} in the current dataset.".format(var, prop1, category, prop2, category))
        print("\n The mean value of {} for {}-{} is {} and for {}-{} is {}".format(var, prop1, category, round(s1.mean(),2), prop2, category, round(s2.mean(),2)))
    else:
        print("\n FAIL to Reject the Null Hypothesis and state that: \n at 5% significance level, the mean {} of {} - {} and {} - {} are equal.".format(var, prop1, category, prop2, category))
        print("\n NO, the {} of {}-{} NOT differ significantly from {}-{} in the current dataset".format(var, prop1, category, prop2, category))
        print("\n The mean value of {} for {}-{} is {} and for {}-{} is {}".format(var, prop1, category, round(s1.mean(),2), prop2, category, round(s2.mean(),2)))

In [None]:
Series_stats(df4, 'COVID-19 Deaths','Age Group','25-34','55-64')

In [None]:
Series_stats(df4, 'COVID-19 Deaths','Condition Group','Respiratory diseases','Circulatory diseases')

In [None]:
Series_stats(df4, 'COVID-19 Deaths','Condition Group','Diabetes','Obesity')

In [None]:
Series_stats(df4, 'COVID-19 Deaths','Condition','COVID-19','Diabetes')

In [None]:
Series_stats(df4, 'COVID-19 Deaths','Condition','COVID-19','Hypertensive diseases')

In [None]:
Series_stats(df4, 'COVID-19 Deaths','Condition','Diabetes','Hypertensive diseases')

<h2> VI. Recommendations and conclusions <h2>

Based on the above analysis and following the tendency of the data during last 2 weeks, we have some conclusion & recommendation as following:

Conclusions:
+ The most 3 countries effected: US, Brazil, India.
+ Top 3 countries with the most vaccines: Ú, China, Israel

Recommendations:

+ US States should have vaccines first: TX, CA, NC, FL

People should have vaccine firest:
+ Aged over 65+
+ Male aged 65+ and Female aged 85+
+ Individuals with any underlying medical condition: Pneumonia, Influenza
+ Black-race group with Pneumina comorbidities first, then Asian with Pneumina then White-groupe with Pneumina
+ People have underlying issues related to: Respiratory > Circulatory > Diabeties or Hypertensive > Cardiac arrest > Ischemic heart disease.
+ But, for Race group: Non-Hispanic-White > Black > Hispanic, in priority.
