![](https://www.pnas.org/content/pnas/early/2020/07/07/2008410117/F1.medium.gif) By Luis E. Escobar,  View ORCID ProfileAlvaro Molina-Cruz, and Carolina Barillas-Mury PNAS first published July 9, 2020 https://doi.org/10.1073/pnas.2008410117

https://www.pnas.org/content/early/2020/07/07/2008410117

Figure above: COVID-19 mortality, human development, and BCG vaccination policy by country. (A) Map showing the COVID-19 mortality per million inhabitants in countries worldwide. COVID-19−related deaths per country per million inhabitants denoting countries with low (yellow) to high (red) mortality. (B) COVID-19 mortality per million inhabitants (log) vs. HDI in different countries worldwide. United States data appear by state. (C) Map showing the BCG vaccination policy in countries that currently have universal BCG vaccination program (Current), countries with interrupted BCG vaccination programs (Interrupted), and countries that never implemented a universal vaccination program (Never). Countries without information appear in white. https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
!pip install calmap

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
import plotly.express as px

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('../input/hackathon/BCG_COVID-19_clinical_trials-2020_06_06.csv', encoding='ISO-8859-2')
df.head()

In [None]:
df = df.rename(columns={'Study Start Date':'date'})

In [None]:
df.date = pd.to_datetime(df.date)

In [None]:
import calmap

In [None]:
fig,ax = calmap.calendarplot(df.groupby(['date']).Country.count(), monthticks=1, daylabels='MTWTFSS',cmap='PuRd',
                    linewidth=0, fig_kws=dict(figsize=(20,20)))
fig.show()

#BCG Recommendations

Children: BCG vaccination should only be considered for children who have a negative tuberculin skin test and who are continually exposed, and cannot be separated from, adults who are untreated or ineffectively treated for TB disease (if the child cannot be given long-term treatment for infection); or have TB caused by strains resistant to isoniazid and rifampin.

Health Care Workers. BCG vaccination of health care workers should be considered on an individual basis in settings in which a high percentage of TB patients are infected with M. tuberculosis strains resistant to both isoniazid and rifampin;
There is ongoing transmission of such drug-resistant M. tuberculosis strains to health care workers and subsequent infection is likely; or comprehensive TB infection-control precautions have been implemented, but have not been successful.

Health care workers considered for BCG vaccination should be counseled regarding the risks and benefits associated with both BCG vaccination and treatment of Latent TB Infection (LTBI).https://www.cdc.gov/tb/publications/factsheets/prevention/bcg.htm

These are the recommendations before Covid19 Pandemic. Are they going to change?

In [None]:
cnt_srs = df['Strain'].value_counts().head()
trace = go.Bar(
    y=cnt_srs.index[::-1],
    x=cnt_srs.values[::-1],
    orientation = 'h',
    marker=dict(
        color=cnt_srs.values[::-1],
        colorscale = 'Blues',
        reversescale = True
    ),
)

layout = dict(
    title='BCG strain that has been used',
    )
data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename="Strain")

In [None]:
plt.style.use('fivethirtyeight')
sns.countplot(df['Strain'],linewidth=3,palette="Set2",edgecolor='black')
plt.title('BCG Strain used')
plt.xticks(rotation=45)
plt.show()

In [None]:
#Codes from Mario Filho https://www.kaggle.com/mariofilho/live26-https-youtu-be-zseefujo0zq
from category_encoders import OneHotEncoder
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import StandardScaler, MinMaxScaler, MaxAbsScaler

cols_selected = ['Strain']
ohe = OneHotEncoder(cols=cols_selected, use_cat_names=True)
df_t = ohe.fit_transform(df[cols_selected+['Target Sample Size']])

#scaler = MaxAbsScaler()
X = df_t.iloc[:,:-1]
y = df_t.iloc[:, -1].fillna(df_t.iloc[:, -1].mean()) / df_t.iloc[:, -1].max()

mdl = Ridge(alpha=0.1)
mdl.fit(X,y)

pd.Series(mdl.coef_, index=X.columns).sort_values().head(10).plot.barh()

plt.yticks(rotation=45)
plt.title('BCG Strain used')

#BCG vaccine protection from severe coronavirus disease 2019 (COVID-19)

Authors: Luis E. Escobar,  View ORCID ProfileAlvaro Molina-Cruz, and Carolina Barillas-Mury
PNAS first published July 9, 2020 https://doi.org/10.1073/pnas.2008410117

That epidemiological study assessed the global linkage between BCG vaccination and COVID-19 mortality. Signals of BCG vaccination effect on COVID-19 mortality are influenced by social, economic, and demographic differences between countries. After mitigating multiple confounding factors, several significant associations between BCG vaccination and reduced COVID-19 deaths were observed. That study highlights the need for mechanistic studies behind the effect of BCG vaccination on COVID-19, and for clinical evaluation of the effectiveness of BCG vaccination to protect from severe COVID-19.

They reviewed evidence for a potential biological basis of BCG cross-protection from severe COVID-19, and refined the epidemiological analysis to mitigate effects of potentially confounding factors (e.g., stage of the COVID-19 epidemic, development, rurality, population density, and age structure). A strong correlation between the BCG index, an estimation of the degree of universal BCG vaccination deployment in a country, and COVID-19 mortality in different socially similar European countries was observed (r2 = 0.88; P = 8 × 10−7), indicating that every 10% increase in the BCG index was associated with a 10.4% reduction in COVID-19 mortality. Results fail to confirm the null hypothesis of no association between BCG vaccination and COVID-19 mortality, and suggest that BCG could have a protective effect. Nevertheless, the analyses are restricted to coarse-scale signals and should be considered with caution. BCG vaccination clinical trials are required to corroborate the patterns detected there, and to establish causality between BCG vaccination and protection from severe COVID-19. Public health implications of a plausible BCG cross-protection from severe COVID-19 were discussed.https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
df1 = pd.read_csv('../input/hackathon/task_2-BCG_world_atlas_data-bcg_strain-7July2020.csv', encoding='utf8')
df1.head()

Even though the BCG vaccine has been in use for more than 90 years, with proven safety, its efficacy is still controversial. BCG vaccination has shown clear protection in children, but, in adults, its effects have been inconsistent. Many countries initiated national BCG vaccination in the middle of the twentieth century with variable levels of coverage using different BCG strains, number of doses, and delivery method.

As the prevalence of TB decreased, countries like France, Germany, and Spain stopped mass vaccination of children and moved to vaccinate only individuals at high risk. Other countries like Russia, Ukraine, and China have continued national BCG vaccination to date. Some countries never established national universal BCG vaccination, including the United States and Italy, and only target high-risk individuals.https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
plt.style.use('fivethirtyeight')
sns.countplot(df1['vaccination_timing'],linewidth=3,palette="Set2",edgecolor='black')
plt.title('Vaccination Timing')
plt.xticks(rotation=45)
figsize=(16, 10)
plt.show()

There is ample epidemiological evidence that BCG vaccination has broad protective effects that are not specific to M. tuberculosis infection. For example, in 1927, Swedish children who received BCG vaccination at birth had a mortality rate almost threefold lower than unvaccinated children. This decrease of mortality could not be explained by TB infection, and thus, early on, it was suggested that the very low mortality among BCG-vaccinated children may be caused by nonspecific immunity.

In West Africa, a BCG vaccination scar and a positive tuberculin reaction were associated with better survival during early childhood in an area with high mortality; this was not observed with other childhood vaccines. A general reduction in neonatal mortality linked to BCG vaccination was also reported in children from Guinea-Bissau, mainly due to fewer cases of neonatal sepsis, respiratory infection, and fever. The BCG vaccine appears to confer broad enhanced immunity to respiratory infections, as infants from Guinea-Bissau with acute viral infections of the lower respiratory tract were more likely to not have received BCG vaccination than matched controls.

In Spain, hospitalizations due to respiratory infections in 0- to 14-y-old children not attributable to TB were significantly lower in BCG-vaccinated compared to non-BCG-vaccinated children. The observation that this protection was still present in 14-y-old children suggests that the broad protective effect of BCG can be long lasting. Taken together, our current understanding of broad immune protection mediated by trained immunity and the epidemiologic evidence of long-lasting protection from viral infections of the respiratory tract, conferred by BCG vaccination, offer a rational biological basis for the potential protective effect of BCG vaccination from severe coronavirus disease 2019 (COVID-19).https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
plt.style.use('fivethirtyeight')
sns.countplot(df1['were_revaccinations_recommended'],linewidth=3,palette="Set2",edgecolor='black')
plt.title('Were Revaccinations Recommended?')
plt.xticks(rotation=45)
figsize=(16, 10)
plt.show()

Considering the cross-protection reported for BCG vaccination on viral respiratory infections, recent publications have proposed that BCG vaccination could have protective effects against COVID-19 infection. These publications, however, do not include extensive statistical analysis, and the World Health Organization has cautioned about the lack of research regarding BCG vaccination against COVID-19 infection.

In view of the growing interest in assessing the plausible association between BCG vaccination and protection from severe COVID-19, they assessed available global data on BCG and COVID-19 to investigate the hypothesis that countries without a national BCG vaccination program would have greater COVID-19 mortality than countries that have a program. They attempted to control for potential confounding variables among countries, such as level of urbanization, population density, age classes, access to health, income, education, and stage and size of the COVID-19 epidemic. https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
plt.style.use('fivethirtyeight')
sns.countplot(df1['timing_of_revaccination'],linewidth=3,palette="Set2",edgecolor='black')
plt.title('Revaccination Timing')
plt.xticks(rotation=45)
figsize=(16, 10)
plt.show()

#Data Science Methods  

Although media reports suggest consistent underreporting of COVID-19 deaths globally, this parameter is harder to manipulate than case numbers. They used ANOVA and t tests to assess effect of BCG vaccination policy on COVID-19 mortality, and used linear regressions to assess linkages between use of BCG vaccine and number of COVID-19 deaths (α = 0.05). BCG vaccination data were collected at the country level based on policy of vaccination (i.e., current, interrupted, never) and the mean and median percentage of vaccination coverage during the period 1980–2018, assigning zero coverage to countries where BCG national vaccination campaigns have never been conducted. COVID-19−related deaths were collected until April 22, 2020, and were standardized by population and stage of the epidemic. Independent (i.e., with vs. without BCG vaccination, policy of BCG vaccination in place, and percentage of vaccine coverage) and dependent variables (i.e., mean, median, and maximum deaths per million) were compared using data at specific times of each country’s epidemic since the first death (e.g., 21 and 30 d since first death; days since first death until 0.1 and 1 death per million; middle point and full period since first death). This design allowed fair comparisons between countries at different epidemic stages and of different population sizes, accounting for incubation period.https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
ax = df1['vaccination_timing'].value_counts().plot.barh(figsize=(16, 8))
ax.set_title('BCG Vaccination timing', size=18)
ax.set_ylabel('vaccination_timing', size=10)
ax.set_xlabel('bcg_strain_id', size=10)

#Refined Analysis.

The hypothesis that countries without a national BCG vaccination program would have greater COVID-19 mortality in adult populations than countries that have a program was investigated at the global level and filtered by social conditions. Analyses were performed initially for countries with BCG vaccination data, ≥1 million inhabitants, and nonzero COVID-19−related deaths. They focused on social variables as potential confounding factors, excluding climatic conditions in view of the lack of evidence for temperature dependence of the COVID-19 epidemic.

Potential confounding variables were assessed based on access to health and education services and income (i.e., Human Development Index) , population size, human density, urbanization, and age structure of the population. They developed multiple linear regression models between COVID-19 deaths accumulated during the first month of mortalities by country (log) and the potential confounding variables and BCG data. They used adjusted Bayesian information criterion, adjusted r2, and Mallow’s Cp metrics to determine the best variable combination and select the optimal model for COVID-19 death estimation.

Analyses included all predictor variables combinations and were performed in R software using the leaps package. Considering the size and social disparities of the United States, BCG effect analyses were also performed considering the United States as a country, and separately by state.https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
US = df1[(df1['country_name']=='United States of America')].reset_index(drop=True)
US.head()

BCG is NOT generally recommended for use in the United States because of the low risk of infection with Mycobacterium tuberculosis, the variable effectiveness of the vaccine against adult pulmonary TB, and the vaccine’s potential interference with tuberculin skin test reactivity. The BCG vaccine should be considered only for very select persons who meet specific criteria and in consultation with a TB expert. https://www.cdc.gov/tb/publications/factsheets/prevention/bcg.htm

In [None]:
df2 = pd.read_csv('../input/hackathon/BCG_country_data.csv', encoding='ISO-8859-2')
df2.head()

At the regional level, they assessed the COVID-19 pandemic in the Americas assuming virus invasion by air traffic, as the pandemic presumably arrived in the American continent from Europe or Asia through this route. Air traffic has revealed a strong linear correlation between international COVID-19 cases and international passenger volume (r2 = 0.98, P < 0.01). 

In large countries, such as the United States, Mexico, and Brazil, an epidemiological analysis at a country level does not consider the intense clustering of cases in states with large metropolitan areas and international airports, that were focal points of the pandemic. Thus, they decided to compare COVID-19 mortality in US states without BCG vaccination that have a high number of confirmed cases (more than 20,000 by April 20, 2020) with those states that were the main points of entry in Mexico and Brazil, as these two countries have current BCG vaccination programs.

Evaluations were restricted to mortality 25 days after the first COVID-19 related death was registered. At the local level, they assessed the COVID-19 pandemic in Germany considering that East and West Germany followed different BCG vaccination schemes before the reunification in 1990. In West Germany, infants were vaccinated between 1961 and 1998. In East Germany, infants and 15-y-old teenagers with a negative skin test were vaccinated from 1951 to 1975 with at least one dose of BCG. German states were compared accounting by age structure, considering that most of the mortality in Germany (95%) was observed in older individuals (60 y old or more). They explored whether the broader range in BCG vaccination in eastern states of older individuals could reduce the mortality from COVID-19.https://www.pnas.org/content/early/2020/07/07/2008410117

In [None]:
plt.style.use('fivethirtyeight')
sns.countplot(df2['lockdown_start'],linewidth=3,palette="Set2",edgecolor='black')
plt.title('Lockdown Measures')
plt.xticks(rotation=45)
figsize=(12, 8)
plt.show()

![](https://www.pnas.org/content/pnas/early/2020/07/07/2008410117/F2.medium.gif)By Luis E. Escobar,  View ORCID ProfileAlvaro Molina-Cruz, and Carolina Barillas-Mury
PNAS first published July 9, 2020 https://doi.org/10.1073/pnas.2008410117

https://www.pnas.org/content/early/2020/07/07/2008410117

Linkage between COVID-19 mortality and BCG vaccination. (A) Coarse analysis of COVID-19 mortality per million inhabitants in countries with current, interrupted, or that never had BCG vaccination programs. United States data appear by state. (B) Filtered analysis of COVID-19 mortality per million inhabitants in countries with current, interrupted, or that never had BCG vaccination programs and similar social, economic, and epidemic stage conditions. (C) Filtered analysis of COVID-19 mortality per million inhabitants in countries with current vs. interrupted or that never had BCG vaccination programs, including only countries with similar social, economic, and epidemic stage conditions. (D) Negative association between percentage of vaccination coverage (mean) between 1980 and 2018, as a proxy of population protection, and maximum COVID-19 deaths per million inhabitants registered by country in a day, as a proxy of COVID-19 severity. 

In [None]:
plt.style.use('fivethirtyeight')
sns.countplot(df2['lockdown_end'],linewidth=3,palette="Set2",edgecolor='black')
plt.title('Lockdown Measures')
plt.xticks(rotation=45)
figsize=(12, 8)
plt.show()

![](https://www.pnas.org/content/pnas/early/2020/07/07/2008410117/F3.medium.gif)By Luis E. Escobar,  View ORCID ProfileAlvaro Molina-Cruz, and Carolina Barillas-Mury
PNAS first published July 9, 2020 https://doi.org/10.1073/pnas.2008410117

COVID-19 mortality in comparable regions that have had different BCG vaccination policies. (A) COVID-19 mortality by time in populous North and South American States. (Left) COVID-19 mortality per 10 million inhabitants in a 3-d centered average. Time adjusted according to the day with the first death in each region as day 1, up to 25 d of the epidemic. (Right) Table shows population density and COVID-19 mortality by day 25 of the epidemic for each region. Regions that have had BCG vaccination (blue) had lower mortality than regions without BCG vaccination (red; r2 = 0.84, P < 0.001; t  = 14.274, P < 0.001). (B) Estimated age range of people that received BCG vaccination in East and West Germany. (C) Map illustrating the regions of East and West Germany included in the analysis. (D) Mean COVID-19 mortality in East Germany was lower than mortality in West Germany (t = −2.592, P = 0.025).https://www.pnas.org/content/early/2020/07/07/2008410117

They decided to compare COVID-19 mortality in states from the United States without BCG vaccination with those that were the main points of entry in Mexico and Brazil, as these two countries have current BCG vaccination programs. They found that COVID-19 mortality in the states of New York, Illinois, Louisiana, Alabama, and Florida (unvaccinated) was significantly higher (t [237] = 14.274, P < 0.001) than states from BCG-vaccinated countries (Pernambuco, Rio de Janeiro, and Sao Paulo in Brazil; Mexico State and Mexico City in Mexico) (Fig. 3 A, Left). This is remarkable, considering that three states from Latin America have much higher population densities than the North American states analyzed, including New York.

Germany provides a unique opportunity to compare the potential effect of age of BCG vaccination on susceptibility to COVID-19, as, before the unification, East and West Germany followed different vaccination schemes. In West Germany, those 22 y to 59 y old today were vaccinated, while, in East Germany, those 45 years to 84 years old today received at least one dose of BCG. A comparison of these two regions revealed that the average COVID-19 mortality rate in western German states (40.5 per million) was 2.9-fold higher than in eastern states (14.2 per million) . Similarly, the mean mortality in Western Europe was 9.92 times higher than in Eastern Europe (t [11] = −2.592, P < 0.025), where countries, in general, have active universal BCG vaccination programs.


![](https://www.pnas.org/content/pnas/early/2020/07/07/2008410117/F4.medium.gif)By Luis E. Escobar,  View ORCID ProfileAlvaro Molina-Cruz, and Carolina Barillas-Mury
PNAS first published July 9, 2020 https://doi.org/10.1073/pnas.2008410117

Variation in COVID-19 mortality and BCG vaccination policy in different European countries. (A) Map illustrating COVID-19 deaths per million inhabitants in European countries. (B) Map illustrating BCG vaccination policy in European countries that currently have universal BCG vaccination program (Current), countries that interrupted in the past a BCG vaccination program (Interrupted), and countries that never implemented a universal vaccination program (Never). (C) BCG index = (age of oldest vaccinated cohort × total number of years of vaccination campaign)/standardization parameter (5,625) × Mean BCG vaccination coverage. (D) Correlation between BCG index and COVID-19 mortality per million inhabitants and in European countries with different BCG vaccination policy. (E) Correlation between BCG index and COVID-19 mortality per million inhabitants during the first month of the pandemic in socially similar European countries with different BCG vaccination policy.

In [None]:
ax = df2['lockdown_start'].value_counts().plot.barh(figsize=(16, 8))
ax.set_title('Lockdown Start', size=18)
ax.set_ylabel('lockdown_start', size=10)
ax.set_xlabel('country_name', size=10)

In [None]:
fig = px.line(df2, x="lockdown_start", y="country_name", color_discrete_sequence=['darkseagreen'], 
              title="Lockdown Measures Start")
fig.show()

In [None]:
fig = px.bar(df2, 
             x='median_age', y='population_per_km2', color_discrete_sequence=['crimson'],
             title='Average Age of Population Tested for Covid19', text='covid_19_test_cumulative_total')
fig.show()

In [None]:
fig = px.bar(df2, 
             x='hospital_bed_per_1000_people', y='country_name', color_discrete_sequence=['#27F1E7'],
             title='Hospital Beds by Country and Covid19 Tests', text='covid_19_test_cumulative_total')
fig.show()

#BCG Vaccination and COVID-19 Mortality.

There is limited information on the safety of administering BCG to senior persons, since BCG is a vaccine based on a live attenuated Mycobacterium that should not be administered to immunocompromised individuals. M. tuberculosis infection can remain latent for decades and reactivate in the elderly when a senescent immune system loses the ability to contain the infection. Nevertheless, a small study found that vaccination of adults ≥ 65 y old with BCG prevented acute upper respiratory tract infections, and there is an active clinical trial vaccinating adults aged >65 y with BCG to boost immunity.

If the BCG protection hypothesis holds true, it would have great implications for regions with ongoing universal vaccination programs, including most developing countries, as they may experience lower morbidity and mortality during the pandemic than in Europe and North America. They found that the number of years that universal BCG vaccination has been implemented in a given country and the level of vaccination coverage may play key roles in reducing of COVID-19 severity. 

Similarly, individuals born in years with low BCG vaccine coverage would be populations at risk. Most Asian countries have active universal BCG vaccination programs. If BCG is conferring some basal level of protection from COVID-19, it is possible that some of the social distancing roll-back strategies taken by Asian countries, in order to restart their economies, may not be effective in North America and western European countries, and could result in a second wave of infections.

The understanding of the biology of innate immune training is in its infancy . Little is known about the capacity of BCG vaccination to confer broad immune enhancement and the functional correlates of protection. The inability to confirm the null hypothesis of no effect of BCG on COVID-19 mortality could be explained by an alternative hypothesis of cross-protection mediated by BCG vaccination. They note, however, that the data used in that epidemiological study have important sampling biases, and that the statistical signal detected at the country level may not explain COVID-19 mortality at the local level. The possibility that a single exposure to an attenuated pathogen during infancy could result in lifelong enhancement in immune surveillance is remarkable, but the available epidemiological data, in the absence of direct evidence from clinical trials, is not sufficient to recommend the use of BCG for the control and prevention of COVID-19 or other emerging infectious diseases.https://www.pnas.org/content/early/2020/07/07/2008410117

Das War's, Kaggle Notebook Runner: Marília Prata  @mpwolke 