# <font color="#E74C3C">COVID-19 Vaccine Efficacy </font>

Understanding COVID-19 vaccine efficacy

Authors: Marc Lipsitch, Natalie E. Dean - Science  13 Nov 2020: Vol. 370, Issue 6518, pp. 763-765
DOI: 10.1126/science.abe5938

The elderly and people with comorbidities are at greatest risk of severe coronavirus disease 2019 (COVID-19). A safe and effective vaccine could help to protect these groups in two distinct ways: direct protection, where high-risk groups are vaccinated to prevent disease, and indirect protection, where those in contact with high-risk individuals are vaccinated to reduce transmission.

### <b><mark style="background-color: #9B59B6"><font color="white"> Direct and Indirect Protection </font></mark></b>


Influenza vaccine campaigns initially targeted the elderly, in an effort at direct protection, but more recently have focused on the general population, in part to enhance indirect protection. Because influenza vaccines induce weaker, shorter-lived immune responses in the elderly than in young adults, increasing indirect protection may be a more effective strategy. It is unknown whether the same is true for COVID-19 vaccines.


<font color="#9B59B6">How, and How well, these Vaccines work and in Which groups of people?</font>

For COVID-19, age-structured mathematical models with realistic contact patterns are being used to explore different vaccination plans, with the recognition that vaccine doses may be limited at first and so should be deployed strategically. But as supplies grow large enough to contemplate an indirect protection strategy, the recommendations of these models depend on the details of how, and how well, these vaccines work and in which groups of people. How can the evidence needed to inform strategic decisions be generated for COVID-19 vaccines?
https://science.sciencemag.org/content/370/6518/763

![](https://www.researchgate.net/profile/Peter_Mcintyre4/publication/9080141/figure/fig3/AS:651514842464256@1532344651114/Formula-for-assessing-vaccine-effectiveness-using-screening-method.png)researchgate.net

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.offline as py
import plotly.graph_objs as go
import plotly.offline as py
import plotly.express as px

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# <font color="#E74C3C">Codes by Radoslav Kirkov https://www.kaggle.com/rkirkov/bcg-vaccine-strain-types</font>

In [None]:
df = pd.read_csv('../input/hackathon/BCG-Strain.csv', encoding='utf8')
pd.set_option('display.max_columns', None)
df.head()

### <b><mark style="background-color: #9B59B6"><font color="white"> Phase 3 vaccine trials </font></mark></b>

<font color="#E74C3C">Individual-level Efficacy and safety  </font>


Phase 3 vaccine trials are designed to assess individual-level efficacy and safety. These trials typically focus on a primary endpoint of virologically confirmed, symptomatic disease to capture the direct benefit of the vaccine that forms the basis for regulatory decisions.

Secondary endpoints, such as infection or viral shedding, provide supporting data, along with analyses of vaccine efficacy in subgroups. Nonetheless, unanswered questions about COVID-19 vaccine characteristics are likely to remain even after trials are completed.

First, trials are typically not powered to establish subgroup-specific efficacy, yet the performance of the vaccine in high-risk groups affects the success of a direct-protection strategy. Second, can vaccines prevent infection or reduce contagiousness? This matters for achieving indirect protection. Expanding ongoing efforts or planning new studies may generate the data needed to address these questions.

<font color="#E74C3C"> Estimating Subgroup-Specific Efficacy  </font>

For estimating subgroup-specific efficacy, randomized controlled trials can provide early estimates, yet these will have wide confidence intervals, leaving substantial uncertainty about true effects in high-risk subgroups. This uncertainty would be greater in interim analyses that are based on the number of events across the whole trial population and may be exacerbated if high-risk participants are more cautious and have lower exposure to infection, reducing their contribution to the efficacy estimates.

https://science.sciencemag.org/content/370/6518/763

In [None]:
fig ,ax = plt.subplots(2,2,figsize=(16,16))
ax1,ax2,ax3,ax4 = ax.flatten()
sns.countplot(data=df,x='Resistance to TCH\n1',hue='T Cell Epitopes Group 3',palette='gist_rainbow',ax=ax1)
sns.countplot(data=df,x='Catalase (room temperature)\nBubble column \n(mm)',hue='T Cell Epitopes Group 3',palette='viridis',ax=ax2)
sns.countplot(data=df,x='Resistance to TCH\n10',hue='T Cell Epitopes Group 3',palette='viridis',ax=ax3)
sns.countplot(data=df,x='Catalase (room temperature)\nActivity',hue='T Cell Epitopes Group 3',palette='gist_rainbow',ax=ax4)

<font color="#E74C3C">Strategies to address Subgroup-Specific Efficacy </font>

There are several strategies to address subgroup-specific efficacy, some of them already in place. Ensuring that high-risk adults are well represented in the trial population can be achieved by setting minimum enrollment targets for older adults and/or adults with comorbidities.

Another consideration relates to the stopping rules for interim analyses in trials. Vaccine trials with early interim analyses that are planning to discontinue randomization and vaccinate placebo participants after declaring efficacy are most prone to subgroup uncertainty. 

To improve the precision of efficacy estimates in high-risk subgroups, regulators could insist that interim analyses be performed only after a certain number of confirmed disease cases occur in these subgroups, in addition to existing monitoring of the overall number of events in the study.

<font color="#E74C3C">Blinded follow-up </font>

Trials that maintain blinded follow-up to assess long-term efficacy and safety may also generate more-reliable evidence on age-specific effects. For example, the World Health Organization's Solidarity Vaccines Trial will preserve placebo-controlled follow-up through month 12 or when an effective vaccine is deployed locally.

However, depending on where the trials are being done and whether the vaccine becomes rapidly available in sufficient quantities after emergency-use authorization in the population undergoing the trial, it may become unethical and/or impractical to ask participants in some subgroups to forego access to an available vaccine.

For vaccine candidates evaluated in multiple trials, such as the Oxford-AstraZeneca vaccine being studied in the United Kingdom, South Africa, Brazil, and the United States, meta-analyses can synthesize results across locations to improve precision of subgroup-specific effect estimates.
https://science.sciencemag.org/content/370/6518/763

In [None]:
# Get the latest BCG vaccine policy information for each contry
bcg_strain_country_policy = pd.read_csv('/kaggle/input/hackathon/task_2-BCG_strain_per_country-8Nov2020.csv', delimiter=',')
bcg_strain_country_policy.tail()

<font color="#E74C3C"> Regulatory Approval, Deployment and Postapproval studies. </font>

Ideally, the phase 3 trials in progress will identify more than one safe, effective vaccine for regulatory approval and deployment.

Postapproval studies will then take on an important role for continued assessment of vaccine effectiveness. These may include individual- or community-level randomized trials to compare different active vaccines without a control arm, as in the U.S. Department of Defense's individually randomized Pragmatic Assessment of Influenza Vaccine Effectiveness in the DoD (PAIVED) trial, which assesses the relative merits of three licensed influenza vaccines (NCT03734237).

Another approach to amass evidence on subgroup-specific efficacy is post-approval observational studies.This includes active surveillance of high-priority cohorts from, for example, nursing homes or assisted living facilities, as has been done for influenza. 

This also includes test-negative designs, which are routinely used to assess vaccine effectiveness. Symptomatic individuals that test negative for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) function as controls for test-positive cases, and their vaccination status is compared, adjusting for selected confounders. 

<font color="#E74C3C">Monitoring more than one vaccine. </font>

Test-negative designs can be integrated into outpatient testing in the community or use emergency department visits to estimate vaccine effectiveness against severe disease. To rapidly establish these systems, researchers can leverage ongoing influenza surveillance. Conveniently, these programs can simultaneously monitor more than one vaccine, enabling assessment of their relative merits.
https://science.sciencemag.org/content/370/6518/763

In [None]:
df_country_policy = pd.merge(bcg_strain_country_policy, df, left_on='mandatory_bcg_strain_2015-2020',right_on='Internal BCG Strain ID',how='left',suffixes=('_left','_right'))
df_country_policy[["country_name", "BCG Atlas: Timing of 1st BCG?", "mandatory_bcg_strain_2015-2020", "BCG Atlas: BCG Strain", "T Cell Epitopes Group 3"]].tail()

<font color="#E74C3C">Viral Shedding. Relationship between Viral Loads and Infectiousness </font>

A key limitation of observational studies is confounding. There may be many differences between individuals who do and do not get vaccinated, which may create noncausal correlations between vaccine status and outcomes. Although such biases can threaten any observational study of vaccine effectiveness, there are some approaches to detect such biases and reduce their magnitude.

The clearest evidence of indirect protection is from a vaccine that prevents infection entirely, thereby reducing transmission. These data will be generated in efficacy trials that include infection as a secondary endpoint. This endpoint is measured by a specialized assay to distinguish an infection-induced response from a vaccine-induced antibody response.

A vaccine can provide indirect protection even if it does not fully prevent infection . Vaccines that reduce disease severity can also reduce infectiousness by reducing viral shedding and/or symptoms that increase viral spread (e.g., coughing and sneezing). A worst-case scenario is a vaccine that reduces disease while permitting viral shedding; this could fail to reduce transmission or conceivably even increase transmission if it suppressed symptoms.

To assess a vaccine's impact on infectiousness, some phase 3 trials examine the amount or duration of viral shedding in laboratory-confirmed, symptomatic participants by home collection of saliva samples and frequent polymerase chain reaction (PCR) testing. However, this would not capture any change in viral shedding for asymptomatic participants.

Moreover, serology tests detect previous infection and cannot reconstruct shedding during active infection. To measure viral load in both symptomatic and asymptomatic participants, it is necessary to conduct frequent (e.g., weekly) viral testing, irrespective of symptoms, to capture participants during their period of acute infectiousness. 

Even weekly testing will not give detailed information about the effect of the vaccine on viral shedding, and the relationship between viral loads and infectiousness is unknown; nonetheless, this approach is likely to provide some evidence if viral loads are on average lower among vaccinated people. Human challenge vaccine studies, in which individuals in a randomized controlled trial are deliberately exposed to the virus, could generate high-quality data on the effect of vaccines on viral shedding.
https://science.sciencemag.org/content/370/6518/763

In [None]:
# Get the COVID-19 mortality statistics
covid_death_cases = pd.read_csv('/kaggle/input/hackathon/task_2-COVID-19-death_cases_per_country_after_fifth_death-till_22_September_2020.csv', delimiter=',')
covid_death_cases.tail()

<font color="#E74C3C">Long-term Safety. Duration of Protection. Efficacy. </font>

Other open questions about the rapidly developed COVID-19 vaccines include long-term safety (indicating the critical need for pharmacovigilance activities), the duration of vaccine protection, the efficacy of a partial vaccination series or of lower doses, the vaccine's level of protection against severe infection and death, efficacy by baseline serostatus, and the potential for the virus to evolve to escape vaccine-induced immunity. The answers to such questions inform the optimal use of any vaccine.

Availability of a COVID-19 vaccine will initially be limited, and so several expert committees are exploring strategic prioritization plans. 

Health care workers are a common first-tier group, which in turn preserves health care systems by protecting those who run them . 

A next priority is to directly protect those who are at highest risk of death or hospitalization when infected: specifically, those over 65 and people with certain comorbid conditions. This strategy may be optimal for reducing mortality even if the vaccine is somewhat less effective in these groups. But if a vaccine offers little to no protection in high-risk groups yet is able to reduce infection or infectiousness in younger adults, an indirect strategy could be preferred as vaccine supplies become large enough.

A worst-case scenario for an effective vaccine is one that reduces disease in younger adults but provides neither direct nor indirect protection to high-risk groups, leaving the most vulnerable at risk. Knowing these vaccine characteristics is important when evaluating the relative merits of other products. Fortunately, there are many vaccine candidates in development that use a mixture of innovative and existing technologies. Although vaccines may vary in their characteristics, having reliable evidence on direct and indirect protection can help plan how to use these vaccines in a coordinated way.
https://science.sciencemag.org/content/370/6518/763

In [None]:
# Get country related information which might be useful for the COVID-19 analysis, e.g. income group, median age, 
country_data = pd.read_csv('/kaggle/input/hackathon/BCG_country_data.csv', delimiter=',')
country_data.tail()

#GRAPHIC: KELLIE HOLOSKI/SCIENCE

![](https://science.sciencemag.org/content/sci/370/6518/763/F1.large.jpg?width=800&height=600&carousel=1)https://science.sciencemag.org/content/370/6518/763

In [None]:
# Get "alpha 2 code" for each country in covid_death_cases
covid_death_cases_country = pd.merge(covid_death_cases, country_data, left_on='alpha_3_code',right_on='alpha_3_code',how='left',
        suffixes=('_left','_right'))
covid_death_cases_country[["country_name_left", 
                            "alpha_2_code", 
                            "alpha_3_code",                             
                            "deaths_per_million_50_days_after_fifth_death", 
                            "deaths_per_million_100_days_after_fifth_death", 
                            "deaths_per_million_150_days_after_fifth_death"]].tail()

<font color="#E74C3C">Randomized Controlled Trials and Observational Studies. </font>

How do we measure how well influenza vaccines work? Two general types of studies are used to determine how well vaccines work: randomized controlled trials and observational studies.

Randomized controlled trials (RCTs)

The first type of study design is called a randomized controlled trial (RCT). In a RCT, volunteers are assigned randomly to receive a vaccine or a placebo (e.g., a shot of saline). Vaccine efficacy is measured by comparing the frequency of influenza illness in the vaccinated and the unvaccinated (placebo) groups. The RCT study design minimizes bias that could lead to invalid study results. Bias is an unintended systematic error in the way researchers select study participants, measure outcomes, or analyze data that can lead to inaccurate results. In a RCT, vaccine allocation is usually double-blinded, which means neither the study volunteers nor the researchers know if a given person has received vaccine or placebo. National regulatory authorities, require RCTs to be conducted and to demonstrate the protective benefits of a new vaccine before the vaccine is licensed for routine use. However, some vaccines are licensed based on RCTs that use antibody response to the vaccine as measured in the laboratory, rather than decreases in influenza disease among people who were vaccinated.

Observational Studies

The second type of study design is an observational study. There are several types of observational studies, including cohort and case-control studies. Observational studies assess how influenza vaccines work by comparing the occurrence of influenza among people who have been vaccinated compared to people not vaccinated. Vaccine effectiveness is the percent reduction in the frequency of influenza illness among vaccinated people compared to people not vaccinated, usually with adjustment for factors (like presence of chronic medical conditions) that are related to both influenza illness and vaccination. 
https://www.cdc.gov/flu/vaccines-work/effectivenessqa.htm

In [None]:
# Merge country, covid-19 and bcg policy information
covid_death_cases_bcg_strain = pd.merge(covid_death_cases_country, df_country_policy, left_on='alpha_2_code',right_on='country_code',how='left',suffixes=('_left','_right'))
covid_death_cases_bcg_strain.tail()

<font color="#E74C3C">How do vaccine effectiveness studies differ from vaccine efficacy studies? </font>

Vaccine efficacy refers to vaccine protection measured in RCTs usually under optimal conditions where vaccine storage and delivery are monitored and participants are usually healthy. Vaccine effectiveness refers to vaccine protection measured in observational studies that include people with underlying medical conditions who have been administered vaccines by different health care providers under real-world conditions.

These universal vaccine recommendations make it unethical to perform placebo-controlled RCTs because assigning people to a placebo group could place them at risk for serious complications. Also, observational studies often are the only option to measure vaccine effectiveness against more severe, less common outcomes, such as hospitalization.
https://www.cdc.gov/flu/vaccines-work/effectivenessqa.htm

In [None]:
covid_death_cases_bcg_strain[[
    "country_name_left", 
    "covid_19_test_date_reported", 
    "BCG Atlas: BCG Strain", 
    "mandatory_bcg_strain_2015-2020", 
    "lockdown_start", "T Cell Epitopes Group 3", 
    "date_first_death", 
    "BCG Atlas: Timing of 1st BCG?", 
    "deaths_per_million_150_days_after_fifth_death"]].tail()

<font color="#E74C3C">Vaccine efficacy and Vaccine Effectiveness </font>

Vaccine efficacy

% reduction in disease incidence in a vaccinated group compared to an unvaccinated group under optimal
conditions (eg RCT). Typically use objective outcomes- eg. labconfirmed designed to maximize internal validity (by randomization and allocation concealment) often at the expense of generalizability.

Vaccine effectiveness 

Ability of vaccine to prevent outcomes of interest in the “real world”. Primary care settings
Less stringent eligibility. Assessment of relevant health outcomes. Clinically relevant treatment selection and followup duration. Assessment of relevant adverse events. Adequate sample size to detect clinically relevant differences. Intention to treat analysis. (Agency for Healthcare Research and Policy, US Dept HHS, 2006)
https://www.who.int/influenza_vaccines_plan/resources/Session4_VEfficacy_VEffectiveness.PDF

In [None]:
#covid_deatch_cases_bcg_strain[covid_deatch_cases_bcg_strain.mandatory_bcg_strain_2015 == "Unknown_BCG_Strain 	"].tail()
covid_death_cases_bcg_strain.columns = covid_death_cases_bcg_strain.columns.str.replace('-','_')
covid_death_cases_bcg_strain.columns = covid_death_cases_bcg_strain.columns.str.replace(' ','_')

Effectiveness of Vaccine in Reducing the Risk of Hospitalization (%)

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT-xSw-oR1PmLwVHQLXE3yYYb1_rDa5S0L3OQ&usqp=CAU)https://www.who.int/influenza_vaccines_plan/resources/Session4_VEfficacy_VEffectiveness.PDF

In [None]:
# Remove the records where:
# * we don't know whether BCG vaccine has been mandatory in the last 5 years;
# * deaths_per_million_50_days_after_fifth_death is not null; (TBD)
# * the countries are not High or Upper-middle income group. (TBD)
covid_death_cases_bcg_strain_filtered = covid_death_cases_bcg_strain[
        covid_death_cases_bcg_strain.mandatory_bcg_strain_2015_2020.notnull()]

covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]].head()

MEASURING EFFECTIVENESS OF IMMUNIZATION PROGRAMS

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTKFA38fF0-SFQaUd-fCNeVgXjOD-TXQVwy-Q&usqp=CAU)ph.ucla.edu

In [None]:
plt.figure(figsize=(15, 7))
sns.scatterplot(x="population_per_km2", y="deaths_per_million_50_days_after_fifth_death", hue="T_Cell_Epitopes_Group_3", size="Class" , data=covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]], palette='vlag')

### <b><mark style="background-color: #9B59B6"><font color="white"> Vaccines Are 95% Effective. What Does That Mean? </font></mark></b>

2 Companies Say Their Vaccines Are 95% Effective. What Does That Mean? by,  Carl Zimmer
Published Nov. 20, 2020

Pfizer and BioNTech announced this week that their vaccine had an efficacy rate of 95 percent. Moderna put the figure for its vaccine at 94.5 percent. In Russia, the makers of the Sputnik vaccine claimed their efficacy rate was over 90 percent.

“We were all expecting 50 to 70 percent.” Indeed, the Food and Drug Administration had said it would consider granting emergency approval for vaccines that showed just 50 percent efficacy.

From the headlines, you might well assume that these vaccines — which some people may receive in a matter of weeks — will protect 95 out of 100 people who get them. But that’s not actually what the trials have shown. Exactly how the vaccines perform out in the real world will depend on a lot of factors we just don’t have answers to yet — such as whether vaccinated people can get asymptomatic infections and how many people will get vaccinated.


<font color="#E74C3C">What do the companies mean when they say their vaccines are 95 percent effective? </font>

The fundamental logic behind today’s vaccine trials was worked out by statisticians over a century ago. Researchers vaccinate some people and give a placebo to others. They then wait for participants to get sick and look at how many of the illnesses came from each group.

https://www.nytimes.com/2020/11/20/health/covid-vaccine-95-effective.html

![]()

In [None]:
sns.set(style="white")
# Create a covariance matrix
corr = covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]].corr(method='spearman')
# Creating a mask the size of our covariance matrix
mask = np.zeros_like(corr, dtype=bool)
mask[np.triu_indices_from(mask)] = True
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(15,12))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(220,10,as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr,mask=mask,cmap=cmap,vmax=1,center=0,square=True, annot=True,
            linewidth=.5, cbar_kws={'shrink': .5})
ax.set_title('Multi-Collinearity of Features')
plt.savefig('correlation2.png')

In the case of Pfizer, for example, the company recruited 43,661 volunteers and waited for 170 people to come down with symptoms of Covid-19 and then get a positive test. Out of these 170, 162 had received a placebo shot, and just eight had received the real vaccine.

From these numbers, Pfizer’s researchers calculated the fraction of volunteers in each group who got sick. Both fractions were small, but the fraction of unvaccinated volunteers who got sick was much bigger than the fraction of vaccinated ones. The scientists then determined the relative difference between those two fractions. Scientists express that difference with a value they call efficacy. If there’s no difference between the vaccine and placebo groups, the efficacy is zero. If none of the sick people had been vaccinated, the efficacy is 100 percent.


A 95 percent efficacy is certainly compelling evidence that a vaccine works well. But that number doesn’t tell you what your chances are of becoming sick if you get vaccinated. And on its own, it also doesn’t say how well the vaccine will bring down Covid-19 across the United States.


What’s the difference between efficacy and effectiveness?

<font color="#E74C3C">Efficacy is a measurement made during a clinical trial. “Effectiveness is how well the vaccine works out in the real world,” </font>

Efficacy and effectiveness are related to each other, but they’re not the same thing. And vaccine experts say it’s crucial not to mix them up. Efficacy is just a measurement made during a clinical trial. “Effectiveness is how well the vaccine works out in the real world,” 

It’s possible that the effectiveness of coronavirus vaccines will match their impressive efficacy in clinical trials. But if previous vaccines are any guide, effectiveness may prove somewhat lower.
https://www.nytimes.com/2020/11/20/health/covid-vaccine-95-effective.html

In [None]:
cats = [c for c in covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]].columns if covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]][c].dtypes == 'object']
nums = [c for c in covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]].columns if c not in cats]
cats

What exactly are these vaccines effective at doing?

The clinical trials run by Pfizer and other companies were specifically designed to see whether vaccines protect people from getting sick from Covid-19. If volunteers developed symptoms like a fever or cough, they were then tested for the coronavirus.

But there’s abundant evidence that people can get infected with the coronavirus without ever showing symptoms. And so it’s possible that a number of people who got vaccinated in the clinical trials got infected, too, without ever realizing it. If those cases indeed exist, none of them are reflected in the 95 percent efficacy rate.

<font color="#E74C3C">Stop taking safety measures: Chances of spreading the coronavirus to others could increase. </font>

People who are asymptomatic can still spread the virus to others. Some studies suggest that they produce fewer viruses, making them less of a threat than infected people who go on to develop symptoms. But if people get vaccinated and then stop wearing masks and taking other safety measures, their chances of spreading the coronavirus to others could increase.

Will these vaccines put a dent in the epidemic?

Vaccines don’t protect only the people who get them. Because they slow the spread of the virus, they can, over time, also drive down new infection rates and protect society as a whole.

Scientists call this broad form of effectiveness a vaccine’s impact. The smallpox vaccine had the greatest impact of all, driving the virus into oblivion in the 1970s. But even a vaccine with extremely high efficacy in clinical trials will have a small impact if only a few people end up getting it.

<font color="#E74C3C">Vaccination Programs save lives </font>

“Vaccines don’t save lives,”  “Vaccination programs save lives.”

Dr. Paltiel and his colleagues published a study in the journal Health Affairs in which they simulated the coming rollout of coronavirus vaccines. They modeled vaccines with efficacy rates ranging from high to low, but also considered how quickly and widely a vaccine could be distributed as the pandemic continues to rage.
https://www.nytimes.com/2020/11/20/health/covid-vaccine-95-effective.html

In [None]:
target = covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]].pop('mandatory_bcg_strain_2015_2020')

The results, Dr. Paltiel said, were heartbreaking. He and his colleagues found that when it comes to cutting down on infections, hospitalizations and deaths, the deployment mattered just as much as the efficacy. The study left Dr. Paltiel worried that the United States has not done enough to prepare for the massive distribution of the vaccine in the months to come.

<font color="#E74C3C">Infrastructure is going to contribute to the success of the program. </font>

“Time is really running out,” he warned. “Infrastructure is going to contribute at least as much, if not more, than the vaccine itself to the success of the program. https://www.nytimes.com/2020/11/20/health/covid-vaccine-95-effective.html

In [None]:
target.unique()

In [None]:
missing = covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]].isnull().sum()##/len(train)
missing = missing[missing>0]
missing = missing.sort_values()

#missingTs = test.isnull().sum()#/len(test)
#missingTs = missingTs[missingTs>0]
#missingTs = missingTs.sort_values()

plt.style.use('fivethirtyeight')
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))

missing.plot.bar(color='k', ax=axes[0])   
#missingTs.plot.bar(color='k', ax=axes[1])   

axes[0].set_title('Covid Death Cases BCG Strain Filtered Missing Values');
#axes[1].set_title('test');


In [None]:
#Code by Puru Behl https://www.kaggle.com/accountstatus/mt-cars-data-analysis

sns.distplot(covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]]['deaths_per_million_50_days_after_fifth_death'])
plt.axvline(covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]]['deaths_per_million_50_days_after_fifth_death'].values.mean(), color='red', linestyle='dashed', linewidth=1)
plt.title('Deaths per million 50 days after fifth death Distribution')

In [None]:
plt.figure(figsize=(8,6))
fig,ax = plt.subplots(2,3,figsize=(10,8))
sns.regplot(x = 'population_per_km2', y = 'deaths_per_million_50_days_after_fifth_death',data= covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]],color='coral',ax=ax[0][0])
sns.regplot(x = 'population_per_km2', y = 'deaths_per_million_100_days_after_fifth_death',data= covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]],color='coral',ax=ax[0][1])
sns.regplot(x = 'deaths_per_million_150_days_after_fifth_death', y = 'population_per_km2',data= covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]],color='coral',ax=ax[0][2])
sns.regplot(x = 'deaths_per_million_100_days_after_fifth_death', y = 'population_per_km2',data= covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]],color='coral',ax=ax[1][0])
sns.countplot(x='T_Cell_Epitopes_Group_3',hue = 'mandatory_bcg_strain_2015_2020', data= covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]],ax=ax[1][1])
sns.countplot(x='Class',hue = 'T_Cell_Epitopes_Group_3', data= covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]],ax=ax[1][2])

plt.tight_layout()
plt.show()

In [None]:
from scipy import stats

plt.figure(figsize=(8,6))
fig,ax = plt.subplots(2,2,figsize=(10,8))
sns.distplot(covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]]['population_per_km2'], fit = stats.norm,color='coral',ax=ax[0][0])
sns.distplot(covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]]['deaths_per_million_50_days_after_fifth_death'], fit = stats.norm,color='coral',ax=ax[0][1])
sns.distplot(covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]]['deaths_per_million_100_days_after_fifth_death'], fit = stats.norm,color='coral',ax=ax[1][0])
sns.distplot(covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]]['deaths_per_million_150_days_after_fifth_death'], fit = stats.norm,color='coral',ax=ax[1][1])
#sns.distplot(df['CLOSING_STOCKS_OF_BARLEY'], fit = stats.norm,color='coral',ax=ax[1][1])
#sns.distplot(df['TOTAL_WHEAT_USAGE'], fit = stats.norm,color='coral',ax=ax[1][2])
#sns.distplot(df['CLOSING_STOCKS_OF_WHEAT'], fit = stats.norm,color='coral',ax=ax[2][0])
#sns.distplot(df['MONTH'], fit = stats.norm,color='coral',ax=ax[2][1])
#sns.distplot(df['TOTAL_WHEAT_USAGE'], fit = stats.norm,color='coral',ax=ax[2][2])

plt.tight_layout()
plt.show()

In [None]:
!pip install autoviz
 
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()

In [None]:
covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]] = AV.AutoViz(filename="",sep=',', depVar='mandatory_bcg_strain_2015_2020', dfte=covid_death_cases_bcg_strain_filtered[[
    "country_name_left",     
    "population_per_km2",
    "mandatory_bcg_strain_2015_2020", 
    "Class", "T_Cell_Epitopes_Group_3", 
    "deaths_per_million_50_days_after_fifth_death", 
    "deaths_per_million_100_days_after_fifth_death", 
    "deaths_per_million_150_days_after_fifth_death"]], header=0, verbose=2, 
                 lowess=False, chart_format='svg', max_rows_analyzed=150000, max_cols_analyzed=30)

In [None]:
#Code by Olga Belitskaya https://www.kaggle.com/olgabelitskaya/sequential-data/comments
from IPython.display import display,HTML
c1,c2,f1,f2,fs1,fs2=\
'#2B3A67','#42a7f5','Akronim','Smokum',30,15
def dhtml(string,fontcolor=c1,font=f1,fontsize=fs1):
    display(HTML("""<style>
    @import 'https://fonts.googleapis.com/css?family="""\
    +font+"""&effect=3d-float';</style>
    <h1 class='font-effect-3d-float' style='font-family:"""+\
    font+"""; color:"""+fontcolor+"""; font-size:"""+\
    str(fontsize)+"""px;'>%s</h1>"""%string))
    
    
dhtml('Be patient. Marília Prata, @mpwolke was Here.' )