<a href="https://www.kaggle.com/kvs1998/plotly-eda-covid-testing?scriptVersionId=82738689" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<center><h1>Importing the Libraries 😎</h1></center> 

<h2>Notebooks used as Inspiration:</h2>
<div>
    1. https://www.kaggle.com/gpreda/covid-19-testing-evolution<br>
    2. https://www.kaggle.com/imdevskp/covid-19-analysis-visualization-comparisons<br>
    3. https://www.kaggle.com/shrutidandagi/covid-19-analysis-plotly<br>
</div>    

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import iplot
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/covid19-world-testing-progress/covid-testing.csv


<center><h1>Loading Data 📥</h1></center>


<h2>Introduction</h2>
<p>In this kernel we are going to analyse country level testing for Covid19. Different countries use different modes for testing.<p>
<a href='#1'>1. Find different modes of testing</a><br>
<a href='#2'>2. Distributuion of Modes Country Wise</a><br>
<a href='#3'>3. Sort countries wrt number of daily/cumulative tests</a><br>    
<a href='#4'>4. Positivity Rates in different Countries</a>

In [2]:
data = pd.read_csv('../input/covid19-world-testing-progress/covid-testing.csv')
data.head()

Unnamed: 0,Entity,ISO code,Date,Source URL,Source label,Notes,Daily change in cumulative total,Cumulative total,Cumulative total per thousand,Daily change in cumulative total per thousand,7-day smoothed daily change,7-day smoothed daily change per thousand,Short-term positive rate,Short-term tests per case
0,Albania - tests performed,ALB,2020-02-25,https://shendetesia.gov.al/koronavirusi-mshms-...,Ministry of Health and Social Protection,,8.0,8.0,0.003,0.003,,,,
1,Albania - tests performed,ALB,2020-02-26,https://shendetesia.gov.al/fond-shtese-per-mas...,Ministry of Health and Social Protection,,5.0,13.0,0.005,0.002,,,,
2,Albania - tests performed,ALB,2020-02-27,https://shendetesia.gov.al/ministria-e-shendet...,Ministry of Health and Social Protection,,4.0,17.0,0.006,0.001,,,,
3,Albania - tests performed,ALB,2020-02-28,http://shendetesia.gov.al/manastirliu-asnje-ra...,Ministry of Health and Social Protection,,1.0,18.0,0.006,0.0,,,,
4,Albania - tests performed,ALB,2020-02-29,https://shendetesia.gov.al/ministria-e-shendet...,Ministry of Health and Social Protection,,8.0,26.0,0.009,0.003,,,,


In [3]:
print("Number of columns:", len(data),"\nNumber of rows:",len(data['Entity']))

Number of columns: 74048 
Number of rows: 74048


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74048 entries, 0 to 74047
Data columns (total 14 columns):
 #   Column                                         Non-Null Count  Dtype  
---  ------                                         --------------  -----  
 0   Entity                                         74048 non-null  object 
 1   ISO code                                       74048 non-null  object 
 2   Date                                           74048 non-null  object 
 3   Source URL                                     62975 non-null  object 
 4   Source label                                   63177 non-null  object 
 5   Notes                                          8341 non-null   object 
 6   Daily change in cumulative total               59953 non-null  float64
 7   Cumulative total                               60102 non-null  float64
 8   Cumulative total per thousand                  60102 non-null  float64
 9   Daily change in cumulative total per thousand  599

<center><h1>Data Analysis 📊</h1></center>

<a id="1"></a><h2>Finding different modes of testing</h2>
Splitting the Entity column into two - Country and the mode of Testing.
Here there are some countries which use two or more modes of testing. Majority of the countries use a single mode.

In [5]:
data['Date']=pd.to_datetime(data['Date'])
data['Country'] = data['Entity'].apply(lambda x:x.split(' - ')[0].rstrip().lstrip())
data['Mode'] = data['Entity'].apply(lambda x:x.split(' - ')[1].rstrip().lstrip())
data[data['Mode'] == 'people tested']
data[['Country','Date']].groupby('Country').min().sort_values(by='Date', ascending=False)

Unnamed: 0_level_0,Date
Country,Unnamed: 1_level_1
Saint Vincent and the Grenadines,2021-09-30
Bahamas,2021-09-21
Gambia,2021-07-25
Antigua and Barbuda,2021-07-13
Gabon,2021-07-13
...,...
Taiwan,2020-01-16
Thailand,2020-01-04
Mexico,2020-01-01
Argentina,2020-01-01


In [6]:
print('Modes of Testing:', data['Mode'].unique())

Modes of Testing: ['tests performed' 'people tested' 'units unclear' 'samples tested']


<h2>So there are 4 modes of testing</h2>
1. Tests Performed<br>
2. People Tested<br>
3. Units unclear<br>
4. Samples Tested<br>

In [7]:
tp_data = data[data['Mode'] == 'tests performed'] 
m_tp=tp_data[['Country','Date']].groupby('Country').agg(['min', 'max'])

pt_data = data[data['Mode'] == 'people tested'] 
m_pt=pt_data[['Country','Date']].groupby('Country').agg(['min', 'max'])

uu_data = data[data['Mode'] == 'units unclear'] 
m_uu=uu_data[['Country','Date']].groupby('Country').agg(['min', 'max'])

st_data = data[data['Mode'] == 'samples tested'] 
m_st=st_data[['Country','Date']].groupby('Country').agg(['min', 'max'])

In [8]:
result = pd.concat([m_pt, m_tp], axis=1)
result = result.dropna()
print("people tested & tests performed\n",result)
result = pd.concat([m_pt, m_uu], axis=1)
result = result.dropna()
print("\n\npeople tested & units unclear\n",result)
result = pd.concat([m_pt, m_st], axis=1)
result = result.dropna()
print("\n\npeople tested & samples tested\n",result)
result = pd.concat([m_tp, m_uu], axis=1)
result = result.dropna()
print("\n\ntests performed & units unclear\n",result)
result = pd.concat([m_tp, m_st], axis=1)
result = result.dropna()
print("\n\ntests performed & samples.tested\n",result)
result = pd.concat([m_uu, m_st], axis=1)
result = result.dropna()
print("\n\nunits unclear & samples tested\n",result)

people tested & tests performed
               Date                                 
               min        max        min        max
Country                                            
Canada  2020-03-11 2021-01-31 2020-01-31 2021-12-09
Italy   2020-04-19 2021-12-09 2020-02-24 2021-12-09


people tested & units unclear
 Empty DataFrame
Columns: [(Date, min), (Date, max), (Date, min), (Date, max)]
Index: []


people tested & samples tested
               Date                                 
               min        max        min        max
Country                                            
Poland  2020-04-28 2021-12-10 2020-03-06 2021-12-09


tests performed & units unclear
 Empty DataFrame
Columns: [(Date, min), (Date, max), (Date, min), (Date, max)]
Index: []


tests performed & samples.tested
 Empty DataFrame
Columns: [(Date, min), (Date, max), (Date, min), (Date, max)]
Index: []


units unclear & samples tested
 Empty DataFrame
Columns: [(Date, min), (Date, max), (Date, min)

<h4>Only Poland overlaps for People tested and samples tested, <br>
Canada and Italy overlap in people tested & tests performed</h4>

<a id="2"></a><h2>Distributuion of Modes Country Wise</h2>

In [9]:
df = data[['Country', 'Mode']].groupby('Country').max().reset_index()
fig= px.choropleth(df, locations="Country", locationmode='country names', 
                   color='Mode', hover_name="Country", 
                   title='Testing Modes used by Countries', hover_data=['Mode'], color_continuous_scale='matter')
fig.show()

<h2>Top 10 countries testing in each mode</h2>

In [10]:
df = tp_data[['Country', 'Cumulative total per thousand']].groupby('Country').max().reset_index().sort_values(ascending=False, by='Cumulative total per thousand')[:15]
fig = px.bar(df, y='Country', x='Cumulative total per thousand', text='Cumulative total per thousand', 
      color_discrete_sequence = px.colors.qualitative.Dark2, title='Tests Performed')
fig.show()

df = pt_data[['Country', 'Cumulative total per thousand']].groupby('Country').max().reset_index().sort_values(ascending=False, by='Cumulative total per thousand')[:15]
fig = px.bar(df, y='Country', x='Cumulative total per thousand', text='Cumulative total per thousand', 
      color_discrete_sequence = px.colors.qualitative.Dark2, title='People Tested')
fig.show()

df = uu_data[['Country', 'Cumulative total per thousand']].groupby('Country').max().reset_index().reset_index().sort_values(ascending=False, by='Cumulative total per thousand')[:15]
fig = px.bar(df, y='Country', x='Cumulative total per thousand', text='Cumulative total per thousand',
      color_discrete_sequence = px.colors.qualitative.Dark2, title='Units uncleared')
fig.show()

df = st_data[['Country', 'Cumulative total per thousand']].groupby('Country').max().reset_index().reset_index().sort_values(ascending=False, by='Cumulative total per thousand')[:15]
fig = px.bar(df, y='Country', x='Cumulative total per thousand', text='Cumulative total per thousand',
      color_discrete_sequence = px.colors.qualitative.Dark2, title='Samples Tested')
fig.show()


In [11]:
fig = px.pie(data.groupby('Mode').size().reset_index(), values=0, names='Mode',
             template="plotly_dark",hole=0.3,)
fig.show()

<a id="3"></a><h2>Sort countries wrt number of daily/cumulative tests</h2>
<h4>In which country are more people vaccinated per day?</h4>

In [12]:
df = data.groupby(['Country']).agg({'Daily change in cumulative total':'mean'}).reset_index().sort_values(by='Daily change in cumulative total', ascending=False)[:10]
fig = px.scatter(df, y="Country", x="Daily change in cumulative total", title='Top 10 - Avg Daily change in cumulative total')
fig.show()

df = data.groupby(['Country']).agg({'Daily change in cumulative total per thousand':'mean'}).reset_index().sort_values(by='Daily change in cumulative total per thousand', ascending=False)[:10]
fig = px.scatter(df, y="Country", x="Daily change in cumulative total per thousand", title='Top 10 - Avg Daily change in cumulative total per thousand')
fig.show()

df = data.groupby(['Country']).agg({'Cumulative total':'max'}).reset_index().sort_values(by='Cumulative total', ascending=False)[:10]
fig = px.scatter(df, y="Country", x="Cumulative total", title='Top 10 - Cumulative total')
fig.show()

df = data.groupby(['Country']).agg({'Cumulative total per thousand':'max'}).reset_index().sort_values(by='Cumulative total per thousand', ascending=False)[:10]
fig = px.scatter(df, y="Country", x="Cumulative total per thousand", title='Top 10 - Cumulative total per thousand')
fig.show()

In [13]:
fig = px.density_heatmap(data.corr())
fig.show()

<a id='4'></a><h2>Positivity Rates in different Countries</h2>
<h4>The percent positive is exactly what it sounds like: the percentage of all coronavirus tests performed that are actually positive, or: (positive tests)/(total tests) x 100%. </h4>

<h3>Positive Rate in India</h3>


In [14]:
def positivityRate(country):
    df = data[data['Country'] == country]
    title = 'Positive rate for '+country
    fig = px.line(df, x="Date", y="Short-term positive rate", title= title, color='Country')
    fig.show()
    fig=px.line(df ,x="Date",y="Daily change in cumulative total",title="Covid cases w.r.t. date - "+country,template="plotly_dark")
    fig.show()

In [15]:
positivityRate('India')

In [16]:
positivityRate('United States')

In [17]:
positivityRate('Cuba')

<h4>In most countries the Daily Tests increase around the same time the positivity graph reaches its peak.</h4>