<center><h1>COVID-19 Testing Evolution</h1></center>
<br>
<center><img src="https://oklahoma.gov/content/dam/ok/en/covid19/images/homepage-images/COVID-Web_Card%20-%20Testing.png" width=400></img></center>  

<br>

<h1 style='background:#2676DE; border:0; color:white'><center>Introduction</center></h1> 

The data (country testings) contains the following information:
* **Entity**- this is the  country for which the testing information is provided as well as type of testing counting, such as: `people tested`, `tests performed` or `unclear unit`; this entity can be further split in `country` and `type of testing counting`;     
* **ISO Code** - ISO code for the country;   
* **Date** - date for the data entry;   
* **Source URL** - this is the source for daily data in the country; usually, this will be a public site; 
* **Source Label** - human readable title of the entity providing the information; ex: Ministry of Health;  
* **Notes** - explanatory notes related to the collected data;  
* **Daily change in cumulative total** - for a certain data entry, the number of tests for that date/country;  
* **Cumulative total** - for a certain data entry, the total (cumulative, up to date) number of tests for that date/country;  
* **Cumulative total per thousands** - for a certain data entry, the total (cumulative, up to date)number of tests for that date/country per thousands of people in the population;   
* **Daily change in cumulative total per thousands** - for a certain data entry, the number of tests for that date/country per thousands of people in the population;  
* **7-day smoothed daily change** - averaged value of daily change in number of tests for the current date in the country;    
* **7-day smoothed daily change per thousands** - averaged value of daily change in number of tests per thousands people for the current date in the country;    
* **Short-term positive rate** - Averaged on short period positive rate for the tests - for date/country;   
* **Short-term tests per case** - Averaged on short period tests/confirmed case - for date/country;    



<a id="0"></a>

### Content  

* <a href='#1'><font style='background:#2676DE; border:0; color:white'>Analysis preparation</font></a>  
* <a href='#2'><font style='background:#2676DE; border:0; color:white'>Testing reporting mode per country</font></a> 
* <a href='#3'><font style='background:#2676DE; border:0; color:white'>Which country tests more?</font></a> 
* <a href='#4'><font style='background:#2676DE; border:0; color:white'>Top per country - how many are tested</font></a>  
* <a href='#5'><font style='background:#2676DE; border:0; color:white'>Countries selection - how testing progressed</font></a>  
* <a href='#6'><font style='background:#2676DE; border:0; color:white'>What is in the notes?</font></a>  


### Last updated


In [None]:
import datetime
import os
import time
with os.scandir("/kaggle/input/covid19-world-testing-progress") as dir_entries:
    for entry in dir_entries:
        unix_timestamp  = int(entry.stat().st_mtime)
        utc_time = time.gmtime(unix_timestamp)
        print(f"Dataset last time updated: {utc_time.tm_year}-{utc_time.tm_mon}-{utc_time.tm_mday}")
        break
        
ldt = datetime.datetime.now()
print(f"Noteboook last time updated: {ldt.year}-{ldt.month}-{ldt.day}")

<a id="1"></a><h1 style='background:#2676DE; border:0; color:white'><center>Analysis preparation</center></h1>


We initialize the Python packages we will use for data ingestion, preparation and visualization. We will use mostly Plotly for visualization.
Then we read the data file and aggregate the data on few fields (entity and iso_code).

We will mainly look to:
* Which country tests mostly;  
* Total (cumulative) number of tests and tests per thousands;  
* Daily tests and daily tests per thousands; 

We visualize the latest (maximum) values and as well for the variation in time of the above mentioned values.


In [None]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px
init_notebook_mode(connected=True)
import warnings
warnings.filterwarnings("ignore")

In [None]:
data_df = pd.read_csv("/kaggle/input/covid19-world-testing-progress/covid-testing.csv")

Let's first split `Entity` in `Country` and `Mode`.

In [None]:
data_df['Country'] = data_df['Entity'].apply(lambda x: x.split(" - ")[0].rstrip().lstrip())
data_df['Mode'] = data_df['Entity'].apply(lambda x: x.split(" - ")[1].rstrip().lstrip())

Let's check the result for few rows.

In [None]:
data_df[['Entity', 'Country', 'Mode']].sample(10)

In [None]:
country_testing = data_df.groupby(["Entity", "Country", "Mode", "ISO code"])['Daily change in cumulative total', 
                                                          'Cumulative total',
                                                           'Cumulative total per thousand',
                                                           'Daily change in cumulative total per thousand',
                                                           '7-day smoothed daily change',
                                                           '7-day smoothed daily change per thousand', 
                                                           'Short-term positive rate',
                                                           'Short-term tests per case'].max().reset_index()
country_testing.columns = ["Entity", "Country", "Mode", "iso_code", 'Daily change in cumulative total', 
                                                          'Cumulative total',
                                                           'Cumulative total per thousand',
                                                           'Daily change in cumulative total per thousand',
                                                           '7-day smoothed daily change',
                                                           '7-day smoothed daily change per thousand', 
                                                           'Short-term positive rate',
                                                           'Short-term tests per case']

<small><a href='#0'>Go to top</a></small>  


<a id="2"></a><h1 style='background:#2676DE; border:0; color:white'><center>Testing reporting mode per country</center></h1>

Each country is using a different mode to report the testing.  
Let's look to the list of the reporting modes.

In [None]:
print(f"Reporting mode (number): {data_df.Mode.nunique()}")
print(f"Reporting mode (items): {data_df.Mode.unique()}")

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['Mode'],
                    title="Countries with each type of reporting mode",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'Reporting mode'})
fig.show()

<small><a href='#0'>Go to top</a></small>  

Let's also visualize the testing reported values per mode and country using treemaps.

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = 'Cumulative total',
                title="Cumulative total (max registered value) per country, grouped by reporting mode")
fig.show()

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = 'Cumulative total per thousand',
                title="Cumulative total per thousand (max registered value) per country, grouped by reporting mode")
fig.show()

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = 'Daily change in cumulative total',
                title="Daily change in cumulative total (max registered value) per country, grouped by reporting mode")
fig.show()

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = 'Daily change in cumulative total per thousand',
                title="Daily change in cumulative total per thousand (max registered value) per country, grouped by reporting mode")
fig.show()

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = '7-day smoothed daily change',
                title="7-day smoothed daily change (max registered value) per country, grouped by reporting mode")
fig.show()

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = '7-day smoothed daily change per thousand',
                title="7-day smoothed daily change per thousand (max registered value) per country, grouped by reporting mode")
fig.show()

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = 'Short-term positive rate',
                title="Short-term positive rate (max registered value) per country, grouped by reporting mode")
fig.show()

In [None]:
fig = px.treemap(country_testing, path = ['Mode', 'Country'], values = 'Short-term tests per case',
                title="Short-term tests per case (max registered value) per country, grouped by reporting mode")
fig.show()

<small><a href='#0'>Go to top</a></small>  


<a id="3"></a><h1 style='background:#2676DE; border:0; color:white'><center>Which country tests most?</center></h1>

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['Cumulative total'],
                    title="Cumulative total of tests per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'Cumulative total'})
fig.show()

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['Cumulative total per thousand'],
                    title="Cumulative total per thousands (population) of tests per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'Cumulative total'})
fig.show()

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['Daily change in cumulative total per thousand'],
                    title="Daily change in cumulative total per thousand (max value) per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'Max daily change/THS'})
fig.show()

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['Daily change in cumulative total'],
                    title="Daily change in cumulative total (max value) per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'Max daily change'})
fig.show()

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['7-day smoothed daily change per thousand'],
                    title="7-day smoothed daily change per thousand (max value) per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'7-day avg change/THS'})
fig.show()

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['7-day smoothed daily change'],
                    title="7-day smoothed daily change (max value) per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'7-day avg change'})
fig.show()

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['Short-term positive rate'],
                    title="Short-term positive rate (max value) per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'Short-term positive rate'})
fig.show()

In [None]:
fig = px.choropleth(locations=country_testing['Country'], 
                    locationmode="country names",
                    color=country_testing['Short-term tests per case'],
                    title="Short-term tests per case (max value) per countries",
                    height = 500
                   )
fig.update_layout({'legend_orientation':'v'})
fig.update_layout({'legend_title':'Short-term tests per case'})
fig.show()

<small><a href='#0'>Go to top</a></small>  


<a id="4"></a><h1 style='background:#2676DE; border:0; color:white'><center>Top per country - how many are tested</center></h1>

Let's look now to the top (per countries) in terms of:

* Daily change in cumulative total;   
* Cumulative total;  
* Cumulative total per thousand;  
* Daily change in cumulative total per thousand;  
* 7-day smoothed daily change;  
* 7-day smoothed daily change per thousand;    
* Short-term positive rate;  
* Short-term tests per case.


We will look to **top 50** per country. Multiple modes are shown on the same graph.

In [None]:
def draw_trace_bar(data, feature, title, xlab, ylab,color='Blue'):
    data = data.sort_values(feature, ascending=False)[0:50]
    trace = go.Bar(
            x = data['Country'],
            y = data[feature],
            marker=dict(color=color),
            text=data['Entity']
        )
    data = [trace]

    layout = dict(title = title,
              xaxis = dict(title = xlab, showticklabels=True, tickangle=45, 
                           zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                           showline=True, linewidth=2, linecolor='black', mirror=True,
                          tickfont=dict(
                            size=8,
                            color='black'),), 
              yaxis = dict(title = ylab, gridcolor='lightgrey', zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                          showline=True, linewidth=2, linecolor='black', mirror=True),
              plot_bgcolor = 'rgba(0, 0, 0, 0)', paper_bgcolor = 'rgba(0, 0, 0, 0)',
              hovermode = 'closest',
              barmode='group',
              height = 600
             )
    fig = dict(data = data, layout = layout)
    iplot(fig, filename='draw_trace')


In [None]:
draw_trace_bar(country_testing, 'Daily change in cumulative total', 'Daily change in cumulative total per country (max value, top 50)', 'Country', 'Daily change in cumulative total', "Darkgreen" )

In [None]:
draw_trace_bar(country_testing, 'Cumulative total', 'Cumulative total per country (max value, top 50)', 'Country', 'Cumulative total' )

In [None]:
draw_trace_bar(country_testing, 'Cumulative total per thousand', 'Cumulative total per thousand per country (max value, top 50)', 'Country', 'Cumulative total per thousand', "red" )

In [None]:
draw_trace_bar(country_testing, 'Daily change in cumulative total per thousand', 'Daily change in cumulative total per thousand per country (max value, top 50)', 'Country',\
               'Daily change in cumulative total per thousand', "magenta" )

In [None]:
draw_trace_bar(country_testing, '7-day smoothed daily change', '7-day smoothed daily change per country (max value, top 50)', 'Country',\
               '7-day smoothed daily change', "lightblue" )

In [None]:
draw_trace_bar(country_testing, '7-day smoothed daily change per thousand', '7-day smoothed daily change per thousand per country (max value, top 50)', 'Country',\
               '7-day smoothed daily change per thousand', "orange" )

In [None]:
draw_trace_bar(country_testing, 'Short-term positive rate', 'Short-term positive rate per country (max value, top 50)', 'Country',\
               'Short-term positive rate', "lightgreen" )

In [None]:
draw_trace_bar(country_testing, 'Short-term tests per case', 'Short-term tests per case per country (max value, top 50)', 'Country',\
               'Short-term tests per case', "darkgreen" )

**Observation**   

Some of the countries appears with 2 entries: this is because in that country there are two different modes of reporting - and there is one entry per each reporting mode.




In [None]:
def plot_custom_scatter(df, x, y, size, color, hover_name, title):
    fig = px.scatter(df, x=x, y=y, size=size, color=color,
               hover_name=hover_name, size_max=80, title = title)
    fig.update_layout({'legend_orientation':'h'})
    fig.update_layout({'height': 800})
    fig.update_layout(legend=dict(yanchor="top", y=-0.2))
    fig.update_layout({'legend_title':'Testing country and modality'})
    fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)','paper_bgcolor': 'rgba(0, 0, 0, 0)'})
    fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
    fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
    fig.update_xaxes(zeroline=True, zerolinewidth=1, zerolinecolor='grey')
    fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='grey')
    fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey')
    fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey')
    fig.show()    

In [None]:
country_testing = country_testing.dropna()

In [None]:
plot_custom_scatter(country_testing, x="Cumulative total", y="Cumulative total per thousand", size="Cumulative total", color="Mode",
           hover_name="Country", title = "Testing cumulative total and total/THS, grouped per mode (max values)")

In [None]:
plot_custom_scatter(country_testing, x="Daily change in cumulative total", y="Daily change in cumulative total per thousand", 
                    size="Daily change in cumulative total", color="Mode",
           hover_name="Country", title = "Daily change in cumulative total vs total per thousand grouped per mode (max values)")

In [None]:
plot_custom_scatter(country_testing, x="Cumulative total", y="Daily change in cumulative total", size="Cumulative total", color="Mode",
           hover_name="Country", title = "Testing cumulative total and daily change in cumulative total, grouped per mode (max values)")

In [None]:
plot_custom_scatter(country_testing, x="Cumulative total per thousand", y="Daily change in cumulative total per thousand", size="Cumulative total", color="Mode",
           hover_name="Country", title = "Testing cumulative total/THS and daily change in cumulative total/THS, per country, grouped per mode (max values)")

In [None]:
plot_custom_scatter(country_testing, x="7-day smoothed daily change", y="7-day smoothed daily change per thousand", size="Cumulative total", color="Mode",
           hover_name="Country", title = "7-day smoothed daily change (total and per THS), per country, grouped per mode (max values)")

In [None]:
plot_custom_scatter(country_testing, x="Daily change in cumulative total", y="Short-term tests per case", size="Cumulative total", color="Mode",
           hover_name="Country", title = "Short-term tests per case vs Daily change in cumulative total, per country, grouped per mode (max values)")

In [None]:
plot_custom_scatter(country_testing, x="Daily change in cumulative total", y="Short-term positive rate", size="Cumulative total", color="Mode",
           hover_name="Country", title = "Short-term positive rate vs Daily change in cumulative total, per country, grouped per mode (max values)")

<small><a href='#0'>Go to top</a></small>  


<a id="5"></a><h1 style='background:#2676DE; border:0; color:white'><center>Countries selection - how testing progressed</center></h1>

We will only show progress for a selection of countries.

In [None]:
country_testing_time = data_df[["Entity", "Country", "Mode", "Date", "ISO code", 'Daily change in cumulative total', 
                                                          'Cumulative total',
                                                           'Cumulative total per thousand',
                                                           'Daily change in cumulative total per thousand',
                                                           '7-day smoothed daily change',
                                                           '7-day smoothed daily change per thousand', 
                                                           'Short-term positive rate',
                                                           'Short-term tests per case']].dropna()

In [None]:
countries = ['Austria', 'Belgium','Cyprus', 'Czechia', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany',
             'Greece', 'Israel', 'Italy', 'Japan', 'Malta','Netherlands', 'Norway','Poland', 'Portugal', 'Romania', 'Spain', 'Sweden',
             'United Kingdom', 'United States', 'China']

In [None]:
def plot_time_variation_countries_group(data_df, feature, title, countries):
    data = []
    for country in countries:
        df = data_df.loc[data_df.Country==country]
        trace = go.Scatter(
            x = df['Date'],y = df[feature],
            name=country,
            mode = "lines",
            marker_line_width = 1,
            marker_size = 8,
            marker_symbol = 'circle',
            text=df['Country'])
        data.append(trace)
    layout = dict(title = title,
          xaxis = dict(title = 'Date', showticklabels=True,zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                       showline=True, linewidth=2, linecolor='black', mirror=True,
                       tickfont=dict(size=10,color='darkblue'),), 
          yaxis = dict(title = feature, gridcolor='lightgrey', zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                       showline=True, linewidth=2, linecolor='black', mirror=True, type="log"),
                       plot_bgcolor = 'rgba(0, 0, 0, 0)', paper_bgcolor = 'rgba(0, 0, 0, 0)',
         hovermode = 'x', 
         height=600
         )
    fig = dict(data=data, layout=layout)
    iplot(fig, filename='all_countries')

In [None]:
plot_time_variation_countries_group(country_testing_time, 'Cumulative total', 'Total testing evolution (selected countries, log scale)', countries)

In [None]:
plot_time_variation_countries_group(country_testing_time, 'Daily change in cumulative total', 'Daily change in cumulative total testing evolution (selected countries, log scale)', countries)

In [None]:
plot_time_variation_countries_group(country_testing_time, 'Cumulative total per thousand', 'Cumulative total per thousand testing evolution (selected countries, log scale)', countries)

In [None]:
plot_time_variation_countries_group(country_testing_time, 'Daily change in cumulative total per thousand', 'Daily change in cumulative total per thousand testing evolution (selected countries, log scale)', countries)

In [None]:
plot_time_variation_countries_group(country_testing_time, '7-day smoothed daily change', '7-day smoothed daily change testing evolution (selected countries, log scale)', countries)

In [None]:
plot_time_variation_countries_group(country_testing_time, '7-day smoothed daily change per thousand', '7-day smoothed daily change per thousand testing evolution (selected countries, log scale)', countries)

<small><a href='#0'>Go to top</a></small>  


<a id="6"></a><h1 style='background:#2676DE; border:0; color:white'><center>What is in the notes?</center></h1>

Let's inspect, using WordCloud, the most frequent words that appears in Notes and Source label.

In [None]:
from wordcloud import WordCloud, STOPWORDS
def show_wordcloud(data, title=""):
    text = " ".join(t for t in data.dropna())
    stopwords = set(STOPWORDS)
    stopwords.update(["t", "co", "https", "amp", "U"])
    wordcloud = WordCloud(stopwords=stopwords, scale=4, max_font_size=50, max_words=500,background_color="black").generate(text)
    fig = plt.figure(1, figsize=(16,16))
    plt.axis('off')
    fig.suptitle(title, fontsize=20)
    fig.subplots_adjust(top=2.3)
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.show()

In [None]:
show_wordcloud(data_df['Notes'], title = 'Prevalent words in Notes')

In [None]:
show_wordcloud(data_df['Source label'], title = 'Prevalent words in Source label')

<small><a href='#0'>Go to top</a></small> 