# COVID-19 Computational Essay

## Introduction

In late 2019, reports of patients being admitted to hospital with pneumonia like symptoms, secondary to respiratory infection, began to increase in the Wuhan region of Mainland China (Harapan, 2020). The novel pathogen responsible has been named SARS-CoV-2; SARS meaning "serious acute respiratory syndrome", and CoV-2 as it is the second coronavirus to cause SARS in humans. The disease spread rapidly within the region, causing the Chinese government to implement a regional lockdown, banning travel to and from Wuhan, shutting down local transport and confining residents to their homes (Royal Society, 2021). However, due to travel in and out of China, the disease soon spread throughout many countries of the world, and by late March 2020, Italy had overtaken China with the most reported deaths due to COVID-19 (BBC, 2020). As time has progressed, there have been major differences between countries in the measures employed by different countries in response to the disease, and their effectiveness in reducing infections and death rates. This essay looks at 3 interesting topics related to the COVID-19 pandemic, using freely available data and tools within python. Questions this essay covers are:

* How do cases and deaths due to COVID-19 vary between countries?
* What features of a country explain and predict variability in COVID-19 death rates?
* How has the spread of COVID-19 in Scotland been effected by the governments implimentation and easing of restrictions?


## COVID-19 infections and deaths across different countries

Data for global infection and death rates is collected and published daily by the World Health Organisation, freely available online at https://covid19.who.int/table. The table contains data for cumulative cases (total and per 100,00 population, as well as deaths. For the purposes of reproducibility within this project, data published on the 28th of April was exported to CSV and stored in a GitHub repository, making it accessible without the need to store the data locally. First, the data was read in and stored in a variable. Below, the information for the raw dataset, as well as the first few rows, can be seen.

In [660]:
import pandas as pd
import plotly.express as px

url = "https://raw.githubusercontent.com/michaelgent2/Covid-Data-Analysis-CS5703/main/WHO%20COVID-19%20global%20table%20data%20April%2029th%202021%20at%201.47.44%20PM.csv"

who_data = pd.read_csv(url)

who_data.info()

who_data.head(3)



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 13 columns):
 #   Column                                                        Non-Null Count  Dtype  
---  ------                                                        --------------  -----  
 0   Name                                                          238 non-null    object 
 1   WHO Region                                                    237 non-null    object 
 2   Cases - cumulative total                                      238 non-null    int64  
 3   Cases - cumulative total per 100000 population                237 non-null    float64
 4   Cases - newly reported in last 7 days                         238 non-null    int64  
 5   Cases - newly reported in last 7 days per 100000 population   237 non-null    float64
 6   Cases - newly reported in last 24 hours                       238 non-null    int64  
 7   Deaths - cumulative total                                     238 non-n

Unnamed: 0,Name,WHO Region,Cases - cumulative total,Cases - cumulative total per 100000 population,Cases - newly reported in last 7 days,Cases - newly reported in last 7 days per 100000 population,Cases - newly reported in last 24 hours,Deaths - cumulative total,Deaths - cumulative total per 100000 population,Deaths - newly reported in last 7 days,Deaths - newly reported in last 7 days per 100000 population,Deaths - newly reported in last 24 hours,Transmission Classification
0,Global,,148999876,1908.710561,5793002,74.209217,856330,3140115,40.22534,92170,1.180711,15185,
1,United States of America,Americas,31835314,9617.84,367742,111.1,51939,567971,171.59,4747,1.43,644,Community transmission
2,India,South-East Asia,18376524,1331.63,2445559,177.21,379257,204832,14.84,20175,1.46,3645,Clusters of cases


The columns of interest for the geospatial analysis of COVID-19 infections and deaths were: Name, WHO Region, Cases (cumulative total and total per 100000) and deaths (cumulative total and total per 100000). The top row was removed as it was not needed for the analysis, as well as the columns outside of those needed for analysis, and retained columns were renamed.

In [661]:
who_data = who_data.iloc[1:,[0,1,2,3,7,8]].rename(columns={"Name": "Country",
                                        "Cases - cumulative total": "Total Cases",
                                        "Cases - cumulative total per 100000 population": "Cases per 100,000",
                                        "Deaths - cumulative total": "Total Deaths",
                                        "Deaths - cumulative total per 100000 population": "Deaths per 100,000"})
who_data.head(3)

Unnamed: 0,Country,WHO Region,Total Cases,"Cases per 100,000",Total Deaths,"Deaths per 100,000"
1,United States of America,Americas,31835314,9617.84,567971,171.59
2,India,South-East Asia,18376524,1331.63,204832,14.84
3,Brazil,Americas,14441563,6794.13,395022,185.84


To visualise the spread of COVID-19 across various countries, the plotly express module was imported. However, in order to plot geospatial data using plotly, country names need to be in the format of their official ISO listed names. Several of the countries in the WHO dataset or not listed as such, so a dictionary was produced and applied to the dataset to remedy this issue.

In [662]:
Replace_names = {"United States of America": "United States",
             "The United Kingdom": "United Kingdom",
             "Iran (Islamic Republic of)": "Iran, Islamic Republic of",
             "occupied Palestinian territory, including east Jerusalem": "Palestine, State of",
             "Bolivia (Plurinational State of)": "Bolivia, Plurinational State of",
             "Republic of Moldova": "Maldova, Republic of",
             "Venezuela (Bolivarian Republic of)": "Venezuela, Bolivarian Republic of",
             "Republic of Korea": "Korea, Republic of",
             "Côte d’Ivoire": "Côte d'Ivoire",
             'Democratic Republic of the Congo': "Congo, The Democratic Republic of the",
             "United States Virgin Islands":"Virgin Islands, U.S.",
             "Sint Maarten": "Sint Maarten (Dutch part)",
             "Bonaire": "Bonaire, Sint Eustatius and Saba",
             "United Republic of Tanzania": "Tanzania, United Republic of",
             "British Virgin Islands": "Virgin Islands, British",
             "Northern Mariana Islands (Commonwealth of the)": "Northern Mariana Islands",
             "Saint Martin": "Saint Martin (French part)",
             "Holy See": "Holy See (Vatican City State)",
             "Democratic People's Republic of Korea": "Korea, Democratic People's Republic of",
             "Micronesia (Federated States of)": "Micronesia, Federated States of",
             "Pitcairn Islands": "Pitcairn",
             "Saint Helena": "Saint Helena, Ascension and Tristan da Cunha"}

who_data = who_data.replace(Replace_names)

Using plotly, a three-dimensional, interactive globe was plotted. The size of the country markers corresponds to the number of cases reported by each country, and the markers are colour coded by WHO region. The plot can be rotated, and the zoom can be controlled. Hovering the mouse over the countries in the plot displays a pop up with details for the country, including the name and the number of cases.

In [663]:
globe = px.scatter_geo(who_data.dropna(), locations="Country",
                       hover_name="Country", size="Total Cases", projection="orthographic",
                       color="WHO Region", title="Global Covid-19 Cases", locationmode="country names")
globe.show()

It can be seen in the plot that there is a notable disparity in the number of COVID-19 cases reported by individual countries. The markers for a few countries, namely India, The United States and Brazil are particularly large, while a number of the European countries have reported similar numbers of cases, relative to the rest of the world. Countries in Europe and the Americas appear to report more cases than other regions, suggesting these regions have performed worse than others in controlling the spread of the virus. However, the population size of each of countries should be considered, as countries with larger populations are inevitably more likely to report higher numbers of cases. Population size can be adjusted for, by reporting cases per 100,000, which is a common denominator used in epidemiology. Below, the cases per 100,000 population have plotted.

In [664]:
globe2 = px.scatter_geo(who_data.dropna(), locations="Country",
                        hover_name="Country", size="Cases per 100,000", projection="orthographic",
                        color="WHO Region", title="Global COVID-19 cases (per 100,000 population)", 
                        locationmode="country names")
globe2.show()

The plot shows clearer trends across the WHO regions. Many of the European countries report higher numbers of cases per 100,000 population than the Americas and South-East Asia. Although countries in the Easter Mediterranean have reported low numbers of total cases, the countries in that region have performed similarly to the Americas in terms of limiting the cases per population size. Despite India reporting the second most total cases, with population size adjusted for the country has performed similarly to some other South-East Asian countries and has been able to keep cases per 100,000 much lower than most European countries. The regions which have experienced the least numbers of cases per 100,000 are the Western Pacific and Africa.

As with any potentially deadly disease, the number of COVID-19 attributable deaths increases with the number of cases. However, numbers of deaths cannot be explained fully by case numbers alone. Death rates can also be affected by population demographics and environmental factors, as well as quality of (and access to) healthcare. The total number of deaths for each country, and the deaths per 100,000, are visualised below using a heatmap/choropleth.


In [665]:
map1 = px.choropleth(who_data, color="Total Deaths",
                     hover_name='Country', locations="Country", color_continuous_scale="Reds",
                     title="Total Deaths", locationmode="country names")

map2 = px.choropleth(who_data, color="Deaths per 100,000",
                     hover_name='Country', locations="Country", color_continuous_scale="Blues",
                     title="Deaths per 100,000", locationmode="country names")

map1.show()
map2.show()

For the most part, countries which reported a higher number of total deaths also reported a higher number of deaths per 100,000, although there are some exceptions. India, despite being close to the middle of the scale in terms of total reported deaths, reported very few deaths per 100,000 population. Whereas countries such as Belarus and Hungary reported much higher rates of death per 100,000 population, despite reporting few total deaths relative to other countries. Plotting cases per 100,000 against deaths per 100,000 visualises the extent to which cases equate to deaths in different countries. The scatter plot below has an ordinary least squares regression model fitted and is interactive. Hovering the mouse over specific data points displays additional information for the countries in the dataset.

In [666]:
scatter = px.scatter(who_data, x="Cases per 100,000", y="Deaths per 100,000",
                     trendline="ols", hover_name="Country",
                     title="Case vs Death Rate of COVID-19")

scatter.show()

The plot can be interpreted as countries above the regression line have a higher-than-average death rate for the number of cases per 100,000 population they have recorded, and the opposite is true for countries below the regression line. The countries with the highest death rate proportional to cases per 100,000 include Mexico, Bulgaria, Bosnia and Hungary. This is likely a reflection of the underlying health of the general populations within these countries, as well as socioeconomic deprivation, limiting access to high quality healthcare. Mexico and Hungary have some of the highest rates of obesity in the world, while Bosnia is still in a period of improving its healthcare system post war and Bulgaria operates a system that mixes public and private healthcare. Countries with the lowest death rate proportional to their reported number of cases per 100,000 include Andorra, Bahrain, Saint Barthelemy and Qatar. Each of these countries has a low population and generally have high a high income (GDP) per capita, meaning that these populations are more likely to have access to a higher standard of healthcare.

Overall, these analyses show that when population sizes are adjusted for, clear trends appear in the number of COVID-19 cases across the continents. While increased number of cases does factor into the death rate, the variability in death rate across different countries is not fully explained by the prevalence of COVID-19 alone, and that other factors such as underlying health of populations and socioeconomic status are likely contributing factors.

## Linear regression to explain deaths per 100,000 population

The previous analysis discussed that other potential factors, outside the scope of the WHO COVID-19 dataset, likely play a role in explaining some of the variability of deaths per 100,000 across different countries. However, it is possible that large amount of the variability in deaths per 100,000 could be explained, by using a combination of the variables within the dataset to fit a multiple regression model, which could then in turn be adapted to make predictions based on input for the independent variables. The countries that were previously highlighted as having low death rate relative to cases were generally small in population size, thereby it is possible that population size may partially explain deaths per 100,000. To better visualise the effect of population size on death rate in relation to number of cases, the population size for each country was computed. The same scatterplot as above was plotted, with marker sizes proportional to the population size.


In [667]:
who_data['Population']= who_data['Total Cases']*(100000/who_data['Cases per 100,000'])
who_data['Population'] = who_data['Population'].fillna(0).round(0).astype('int')

scatter = px.scatter(who_data, x="Cases per 100,000", y="Deaths per 100,000", 
                     trendline="ols", hover_name="Country",
                     size="Population", title="Case vs Death Rate of COVID-19")

scatter.show()

As markers are proportional in size, many are small and difficult to see. However, there are more countries with larger populations above the regression line than below it, suggesting that population size may in fact be associated with increased deaths per 100,000. To further investigate potential effects and significance of other variables in the dataset in explaining variability in deaths per 100,000, a multiple linear regression model, using the ordinary least squares method was fitted. WHO region is however a categorical variable with values of string data type. To fit the model this variable needed to be encoded, which was done by generating dummy variables for each of the regions.

In [668]:
regions = pd.get_dummies(who_data['WHO Region'])
data_encoded = who_data.join(regions)
data_encoded = data_encoded.dropna()
data_encoded = data_encoded.drop("Other", axis=1)
data_encoded.head(3)

Unnamed: 0,Country,WHO Region,Total Cases,"Cases per 100,000",Total Deaths,"Deaths per 100,000",Population,Africa,Americas,Eastern Mediterranean,Europe,South-East Asia,Western Pacific
1,United States,Americas,31835314,9617.84,567971,171.59,331002741,0,1,0,0,0,0
2,India,South-East Asia,18376524,1331.63,204832,14.84,1380002253,0,0,0,0,1,0
3,Brazil,Americas,14441563,6794.13,395022,185.84,212559415,0,1,0,0,0,0


A multiple regression model was then fit with the statsmodels.api module, including all variables to assess correlation and significance. The model summary can be viewed bellow, which includes the model coefficients, p-values, R-squared and residual error.

In [669]:
x = data_encoded.iloc[:,[2,3,4,6,7,8,9,10,11,12]]
y = data_encoded['Deaths per 100,000']

model = sm.OLS(y,x).fit()
print(model.summary(), 
      "\n===============================================================================", 
      "\nSUM OF SQUARED RESIDUALS = ", model.ssr)

                            OLS Regression Results                            
Dep. Variable:     Deaths per 100,000   R-squared:                       0.764
Model:                            OLS   Adj. R-squared:                  0.755
Method:                 Least Squares   F-statistic:                     81.28
Date:                Thu, 13 May 2021   Prob (F-statistic):           7.72e-66
Time:                        20:59:45   Log-Likelihood:                -1153.6
No. Observations:                 236   AIC:                             2327.
Df Residuals:                     226   BIC:                             2362.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Total Cases           -1.428e-

Total cases, cases per 100,000 and total deaths were all statistically significant in the model, therefore the null hypothesis that these coefficients are equal to zero was rejected. A negative correlation between total cases and deaths per 100,000 exists in the model, while a positive correlation between both cases per 100,000 and total deaths exists. Population size was positively corelated with deaths per 100,000 in the model; although, this was not statistically significant and the null hypothesis that the coefficient was equal to zero was not rejected. However, it is worth noting that in this case population could be considered a redundant variable, as it is a computation of the ratio between total cases and cases per 100,000 which are both already included in the model. In the model, a country being in Europe was positively correlated with deaths per 100,000 and was statistically significant. Countries from Africa and the Americas were also positively correlated with deaths per 100,000, but both were statistically insignificant. The rest of the regions were negatively correlated with deaths per 100,000 in the model and were also insignificant. The adjusted R-squared for the model was 0.755, meaning that the model explains 75.5% of the variability in deaths per 100,000. Generally, when fitting a regression model, it is good practice to include a constant in the model, as not including a constant assumes that the intercept must equal zero. This is one case however, that the intercept must equal zero, as when values for all independent variables are zero (no cases, no total deaths etc.) then deaths per 100,000 cannot be any value other than zero. For this reason, no constant was included in the model. To improve the goodness of fit, a second regression model was then fitted without including insignificant terms.

In [670]:
x = data_encoded.iloc[:,[2,3,4,10]]
y = data_encoded['Deaths per 100,000']

model2 = sm.OLS(y,x).fit()
print(model2.summary(), 
      "\n===============================================================================", 
      "\nSUM OF SQUARED RESIDUALS = ", model2.ssr)

                                 OLS Regression Results                                
Dep. Variable:     Deaths per 100,000   R-squared (uncentered):                   0.846
Model:                            OLS   Adj. R-squared (uncentered):              0.843
Method:                 Least Squares   F-statistic:                              318.1
Date:                Thu, 13 May 2021   Prob (F-statistic):                    6.62e-93
Time:                        20:59:45   Log-Likelihood:                         -1155.1
No. Observations:                 236   AIC:                                      2318.
Df Residuals:                     232   BIC:                                      2332.
Df Model:                           4                                                  
Covariance Type:            nonrobust                                                  
                        coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------

The adjusted R-squared increased in comparison to the previous model. The improved model explains 84.3% of the variability in deaths per 100,000. However, the sum of squared residuals/residual error is slightly higher in the second model than in the first. The second model may therefore be no better, or even a worse fit, than the first model. There is a remaining 15.7% of the variability in deaths per 100,000 which is not explainable by the number of cases, cases per 100,000, total deaths and whether a country is within Europe or not.

The resultant model, however, is not particularly useful for making predictions. This is because the terms included in the model are sufficient to calculate deaths per 100,000 without the use of regression. Population size of a country can be calculated using total cases and cases per 100,000, which can in turn be used to calculate deaths per 100,000 from total deaths. If there was a lack of one or more of these terms however, then a predictive model might be more useful. A third model was fitted to illustrate this point, including only terms total cases, cases per 100,000 and Europe.

In [671]:
x = data_encoded.iloc[:,[2,3,10]]
y = data_encoded['Deaths per 100,000']

model3 = sm.OLS(y,x).fit()
print(model3.summary(), 
      "\n===============================================================================", 
      "\nSUM OF SQUARED RESIDUALS = ", model3.ssr)

                                 OLS Regression Results                                
Dep. Variable:     Deaths per 100,000   R-squared (uncentered):                   0.807
Model:                            OLS   Adj. R-squared (uncentered):              0.805
Method:                 Least Squares   F-statistic:                              325.7
Date:                Thu, 13 May 2021   Prob (F-statistic):                    4.92e-83
Time:                        20:59:45   Log-Likelihood:                         -1181.3
No. Observations:                 236   AIC:                                      2369.
Df Residuals:                     233   BIC:                                      2379.
Df Model:                           3                                                  
Covariance Type:            nonrobust                                                  
                        coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------

This model has a higher sum of squared residuals than previous models, as well as a lower R-squared than the second model. However, this was expected given that total deaths is not included in the model. The purpose of this model was not be explanatory, rather the intention was prediction. Specifically, the model is designed to predict death rates for countries in the case than only data for cases and cases per 100,000 is vailable (as well as knowing if the input country is in Europe or not). Below is an example of how this model could be used to predict deaths per 100,000.

In [672]:
# The indices were selected at random, using: data_encoded.sample(5)
indices = [62, 160, 192, 164, 220]
data_encoded.loc[indices]

Unnamed: 0,Country,WHO Region,Total Cases,"Cases per 100,000",Total Deaths,"Deaths per 100,000",Population,Africa,Americas,Eastern Mediterranean,Europe,South-East Asia,Western Pacific
62,"Maldova, Republic of",Europe,249714,6190.29,5762,142.84,4033963,0,0,0,1,0,0
160,Iceland,Europe,6447,1770.5,29,7.96,364134,0,0,0,1,0,0
192,"Bonaire, Sint Eustatius and Saba",Americas,1530,7315.32,15,71.72,20915,0,1,0,0,0,0
164,Nicaragua,Americas,5498,82.99,182,2.75,6624895,0,1,0,0,0,0
220,Saba,Americas,6,310.4,0,0.0,1933,0,1,0,0,0,0


In [673]:
maldova = model3.predict([249714, 6190.29, 1])
iceland = model3.predict([6447, 29, 1])
bonaire = model3.predict([1530, 7315.32, 0])
nicaragua = model3.predict([5498, 82.99, 0])
saba = model3.predict([6, 0, 0])

maldova_true = 142.84
iceland_true = 7.96
bonaire_true = 71.72
nicaragua_true = 2.75
saba_true = 0

print("PREDICTION RESULTS\n",
      "\nMaldova predicted:  ", maldova[0].round(2), "|",
      "Maldova true:  ", maldova_true,
      "\nIceland predicted:  ", iceland[0].round(2), " |",
      "Iceland true:  ", iceland_true,
      "\nBonaire predicted:  ", bonaire[0].round(2), " |",
      "Bonaire true:  ", bonaire_true,
      "\nNicaragua predicted:", nicaragua[0].round(2), "  |",
      "Nicaragua true:", nicaragua_true,
      "\nSaba predicted:     ", saba[0].round(2), "   |",
      "Saba true:     ", saba_true)
      

PREDICTION RESULTS
 
Maldova predicted:   111.19 | Maldova true:   142.84 
Iceland predicted:   30.87  | Iceland true:   7.96 
Bonaire predicted:   94.52  | Bonaire true:   71.72 
Nicaragua predicted: 1.09   | Nicaragua true: 2.75 
Saba predicted:      0.0    | Saba true:      0


Predictions were performed using a sample of 5 countries from the dataset. A few of the predictions were reasonably accurate, however there are obvious issues with the goodness of fit of the model and its subsequent usability in predicting death rates. A predictive model would likely benefit from additional data in regard to health and socioeconomic status for countries. 

## Restrictive measures to control the spread of COVID-19

Issues relating to the variability in cases and deaths reported by different countries have been discussed, but the varying degree to which restrictions have been used to effectively reduce the spread of the virus is a major determining factor in how badly countries have been affected by the pandemic. The analysis in the first section showed that countries in Europe have generally reported more cases per 100,000 than countries from other regions. While factors such as population size and density may play a role in influencing this, the restrictive measures used in response to the virus have also varied from region to region, and country to country. The devolved United Kingdom is a good example of how effective social restrictions can be, but also how lifting restrictions can negate the progress of restrictive measures. 

On the 20th of March 2020, the Scottish government issued an order to close pubs and restaurants. Three days later, the entire UK was in lockdown. Since then, the country has seen a lifting of measures, the implementation of a regional tier system, a second wave which spurred a second full lockdown and subsequent easing of measures again. Cumulative case data is available for each of the NHS Scotland health boards at https://www.gov.scot/publications/coronavirus-covid-19-trends-in-daily-data/. This data was used to plot time series figures, analysing the effect of key events and measures on the rate of infection in Scotland.


In [674]:
url2 = "https://raw.githubusercontent.com/michaelgent2/Covid-Data-Analysis-CS5703/main/COVID-19%2Bdaily%2Bdata%2B-%2Bby%2BNHS%2BBoard%2B-%2B28%2BApril%2B2021.csv"
scot_data = pd.read_csv(url2, skiprows=2) 

### Data cleaning
scot_data = scot_data.iloc[0:417,0:16] 
health_boards = scot_data.iloc[:,1:15].columns.tolist() 

replace = {" * ": 0}
scot_data = scot_data.replace(replace) 

commas = {",": ""}  
scot_data = scot_data.replace(commas, regex=True) 

scot_data.iloc[:,1:] = scot_data.iloc[:,1:].astype("int")

scot_data["Date notified"] = pd.to_datetime(scot_data["Date notified"], dayfirst=True)

### Time series plot
fig = px.line(scot_data, 
              x="Date notified", 
              y=health_boards, 
              labels={"variable": "Health Board", 
                      "Date notified": "Date", 
                      "value": "Total Cases"})

fig.update_xaxes(rangeslider_visible=True)

fig.show()

The trends in cases of COVID-19 have been largely consistent across the various health boards. There have been three time periods in which the infection rate was high and case numbers grew rapidly; shortly after the first cases were reported, between September and November, and around Christmas 2020. The plot bellow highlights different periods of time in which specific measures were implemented and lifted in response to the spread of the virus, across Scotland as a whole.

In [675]:
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(x=scot_data["Date notified"],
                          y=scot_data["Scotland"], line={"color":"black",
                                                         "width": 2}))

fig.add_vrect(x0="2020-03-23", x1="2020-05-28", 
              fillcolor="Red", opacity=0.3, layer="below", line_width=0)

fig.add_vrect(x0="2020-05-28", x1="2020-08-11", 
              fillcolor="Yellow", opacity=0.3, layer="below", line_width=0)

fig.add_vrect(x0="2020-08-11", x1="2020-09-21", 
              fillcolor="Green", opacity=0.3, layer="below", line_width=0)

fig.add_vrect(x0="2020-09-21", x1="2020-11-05", 
              fillcolor="LightGreen", opacity=0.3, layer="below", line_width=0)

fig.add_vrect(x0="2020-11-05", x1="2020-12-25", 
              fillcolor="Yellow", opacity=0.3, layer="below", line_width=0)

fig.add_vrect(x0="2020-12-25", x1="2021-05-01", 
              fillcolor="Red", opacity=0.3, layer="below", line_width=0)



fig.add_annotation(x="2020-04-25", y='125000', text="National Lockdown", showarrow=False)
fig.add_annotation(x="2020-07-04", y='125000', text="Phased lockdown exit", showarrow=False)
fig.add_annotation(x="2020-08-30", y='125000', text="Schools", showarrow=False)
fig.add_annotation(x="2020-08-30", y='110000', text="Open", showarrow=False)
fig.add_annotation(x="2020-10-12", y='125000', text="Universities", showarrow=False)
fig.add_annotation(x="2020-10-12", y='110000', text="Return", showarrow=False)
fig.add_annotation(x="2020-12-01", y='125000', text="Tier system", showarrow=False)
fig.add_annotation(x="2021-03-01", y='125000', text="National lockdown", showarrow=False)

fig.update_layout(title="COVID-19 cases over time in Scotland",
                  yaxis_title="Total COVID-19 cases", xaxis_title="Date")

fig.update_xaxes(rangeslider_visible=True)

fig.show()

Blocks of colour in the graph indicate different time periods in which restrictions were enforeced and lifted, in response to trends in the spread of the virus in Scotland. The colours assigned somewhat relate to the level of restrictions in place during the time periods, however it should be noted that even during times with the minimal amount of restrictions (green blocks) there were still restrictions in place, with some changes in the restrictions being made when neccesary.

The growth in case numbers at the beginning of the pandemic seems small in comparison to the growth experienced later on in the pandemic. Several factors may be involved in explaining this, but above all the availability of COVID-19 tests at the beginning of the pandemic should be considered. In Scotland, COVID-19 tests were reserved for hospitalised patients, and those with symptoms were ordered to stay at home unless they became seriously ill. It was widely reported during this time that the true number of COVID-19 cases in Scotland, and the rest of the world, was certin to be higher than the data suggested. The first drive through testing centres opened in mid April, with tests reserved for frontline workers only untilt he 18th of May - by which date anyone with symptoms could book a COVID-19 test. 

Following a considerable period of reduced infection rate and number of new cases, the governmant began a phased lockdown exit strategy. The first phase allowed small groups from different households to socialise outdoors, with social distancing measures in place. The second phase allowed indoor work to resume, outdoor weddings and several other restrictions to be lifted. Following this, other restrictions were gradually lifted, allowing limited indoor socialising, opening of pubs and restaurants and eventually the re-opening of primary schools in August. The graph shows that the increase in total cases was very minimal, until around mid September. This was around the time many universities returned for the start of the new academic year, and shortly after this began the second duration of major growth in COVID-19 cases in Scotland.

During this period, some additional restrictions were re-introduced. Local lockdowns were employed when cases in specific areas were growing at increased rates, curfews were put in place to limit the opening time of hospitality establishments. Due to the variation in infection rate across different regions in Scotland, a tier system was introduced in November. Countys were classified in one of tiers 1 to 4, dependant on the number of cases they were reporting, with each teir representing an increased level of restrictions. The graph shows that the implimentation of the tier system does allign with a slight reduction in the growth of case numbers. However, it was during this same time period that the government announced an easing of restrictions to allow families to reunite for christmas day. This resultant widespread travel and indoor socialising was enough to spark another period of increased infection rate in Scotland, which the graph clearly shows. 

On the 26th of December, every region in Scotland was placed under level 4 restrictions to minimise the spread of COVID-19 following the relaxation of restrictions for Christmas. Although the government has come under increased scrutiny for the implementation of a full lockdown, particularly from those in industries that are struggling financially, the second full lockdown has been imperitive to managing the spread of the virus while the country roles out the vaccine. 

The data tells a clear story. COVID-19 is a very infectious disease, and in the absence of an effective vaccine, the only viable option for managing the spread of the disease has been to impliment widespread social restrictions. Although the use of lockdowns has been ridiculed by some, the data shows just how effective they are in managing the infection rate, and how easily the infection rate can increase when restrictions are lifted.

## Conclusion

The effects of COVID-19 have been felt across the globe; the degree to which countries have been negatively impcated by the virus has been largely variable, however. Clear trends have emerged in both the number of cases and the number of deaths across different rergions, with Europe arguably being the worst effected. Analysis shows that cases per 100,000 positively correlates with deaths per 100,000 - possibly due to increased cases per population size resulting in reduced availability of resources for treating the seriously ill. Other factors, such as wealth and baseline health fo the general population likely explain some degree of the variability in death rate also. In the absence of an effective vaccine, restricting social interaction has been an effective strategy in managing the spread of COVID-19. Easing restrisions, even for short periods of time howeverm, can quickly lead to an increase in the spread of the virus. Until the vaccination programme is complete, there is a clear need for some level of social restriction.

                                                                                                                      
### References


BBC News. 2021. Coronavirus: Italy's death toll overtakes China's. [online] Available at: <https://www.bbc.co.uk/news/world-europe-51964307> [Accessed 12 May 2021].                                            
 
Harapan, H., Itoh, N., Yufika, A., Winardi, W., Keam, S., Te, H., Megawati, D., Hayati, Z., Wagner, A. and Mudatsir, M., 2020. Coronavirus disease 2019 (COVID-19): A literature review. Journal of Infection and Public Health, 13(5).

Royalsociety.org. 2021. What does the “new normal” look like after strict 76-day lockdown in Wuhan, China?. [online] Available at: <https://royalsociety.org/blog/2020/10/life-after-lockdown-in-wuhan/> [Accessed 12 May 2021].

### Data sources

WHO data: https://covid19.who.int/table

Scotland data: https://www.gov.scot/publications/coronavirus-covid-19-trends-in-daily-data/

### Additional sources

Timeline of COVID-19 in Scotland: https://spice-spotlight.scot/2021/03/19/timeline-of-coronavirus-covid-19-in-scotland/

GDP per country: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD

Obesity irates of different countries: https://www.oecd.org/els/health-systems/Obesity-Update-2017.pdf

Socioeconomic status and COVID-19 outcomes in American patients): https://link.springer.com/article/10.1007/s11606-020-06527-1