# Positivity Rate of SARS-CoV-2 in Ukraine and Neighbouring Countries

In this notebook, I am interested in looking at the positivity rate (the percentage of all tests done that are positive) of SARS-CoV-2.
This value provides an interesting insight because it will be high if the number of positive tests is too high, or if the number of total tests done is too low. In either case, a high value suggests higher transmission and, most likely, there are more people in the community who are infected and have not been tested.

The World Health Organization recommended in May that the positivity rate should remain below 5% for at least two weeks before governments consider reopening.

In [2]:
# Pandas, Numpy for data manipulation
import numpy as np
import pandas as pd

# Altair for data visualization
import altair as alt

In [3]:
# Read data
tests_done = pd.read_csv("https://raw.githubusercontent.com/ritamds/msc-data-science-public/master/Data%20Visualization/Covid%20Viz/Data/covid-testing.csv")
confirmed_cases = pd.read_csv("https://raw.githubusercontent.com/ritamds/msc-data-science-public/master/Data%20Visualization/Covid%20Viz/Data/covid19_global_confirmed_cases.csv")
moldova_covid = pd.read_csv("https://raw.githubusercontent.com/ritamds/msc-data-science-public/master/Data%20Visualization/Covid%20Viz/Data/moldova_cases_tests.csv")

In [4]:
# List countries
countries = ['Belarus','Hungary','Moldova','Poland','Romania','Russia','Slovakia','Ukraine']

# Clean Total Confirmed Cases dataset
countries_conf_cases = confirmed_cases.loc[confirmed_cases['country_region'].isin(countries)]
countries_conf_cases = countries_conf_cases.rename(columns={'date':'Date', 'country_region':'Country', 'confirmed_cases':'Confirmed Cases'})

# Clean Testing dataset
countries_tests = tests_done[tests_done['Entity'].str.split().str.get(0).isin(countries)]
countries_tests = countries_tests[countries_tests['Entity'] != 'Poland - samples tested'] #remove duplicate
countries_tests = countries_tests.drop(columns=['ISO code', 'Source URL', 'Source label', 'Notes','Daily change in cumulative total','Cumulative total per thousand', 'Daily change in cumulative total per thousand', '7-day smoothed daily change', '7-day smoothed daily change per thousand','Short-term tests per case','Short-term positive rate'])
moldova_tests = moldova_covid.drop(columns=['New tests', 'New Cases', 'New Recoveries', 'Total Recoveries','New Deaths','Total Deaths', 'Total Cases', 'Active Cases'])

countries_tests['Country'] = countries_tests['Entity'].str.split().str.get(0)
countries_tests = countries_tests.drop(columns=['Entity'])
countries_tests = countries_tests.rename(columns={'Cumulative total':'Total Tests'})

countries_tests_comp = pd.concat([countries_tests,moldova_tests])

# Merge datasets
data = pd.merge(countries_tests_comp, countries_conf_cases,  how='left', left_on=['Country','Date'], right_on = ['Country','Date'])

# Work data
data['Positivity Rate'] = ((data['Confirmed Cases'] / data['Total Tests'])*100).round(2)

earliest_date = data['Date'] >= '2020-05-01'
latest_date = data['Date'] <= '2020-07-31'
filtered_data = data[earliest_date]
filtered_data = filtered_data[latest_date]



In [5]:
# Plot
chart = alt.Chart(filtered_data.dropna()
        ).mark_line(opacity = 0.8
        ).encode(
          x = alt.X('Date:T'),  
          y = alt.Y('Positivity Rate:Q', 
              axis=alt.Axis(title='Positivity Rate (%)')),
          color = alt.Color('Country:N'),
          tooltip = alt.Tooltip(['Country:N','Positivity Rate:Q','Date:T'])
        ).properties(
          title='Positivity Rate of SARS-CoV-2 between 01 May and 31 July 2020',
          width = 800, height = 600
        )

line = alt.Chart(pd.DataFrame({'y': [5]})
        ).mark_rule(
            strokeDash=[3,3],
            color='black'
        ).encode(
            y='y',
            tooltip=[alt.Tooltip('y', 
                                 title='Threshold (%)')]
        )

chart + line

#### Plot Explanation

I decided to use a line plot for Positivity Rate over a period time, because it is a familiar plot, so it's easier to understand, and it allows to visualize how the value changed as time passed.

I chose this period of time due to limitations in the available data (there is no uniformity in when and how countries report tests, so this was the period of time that I found that was common to all countries).

In the beginning, my goal was to make this visualization for all the countries in Europe. It ended being too many lines, so it was hard to tell which colour corresponded to which country, and most of them were concentrated at 2-3% range so it was hard to tell them apart. So, I decided to narrow down to a set of countries that made logical sense (Ukraine and the countries around it) and had an adequate size to be understood.

I also decided to add the Tooltip, to have access to the individual information.

This visualization allows to:
* See the evolution of Positivity Rate over time for each country
* Compare the Positivity Rates between the countries
* Compare the Positivity Rates of the countries and the threshold recommended by WHO
* Access the details of any point in the line

#### Data Sources
* https://github.com/CSSEGISandData/COVID-19
* https://github.com/owid/covid-19-data/tree/master/public/data
* https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Moldova