In this project we're going to analyze Covid data along with economic data of different countries to see what kind of effect Covid had on the economies.

In [1]:
# %%
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

We have three datasets.

Two of them are taken from here : [Kaggle Dataset: Impact of Covid-19 Pandemic on the Global Economy](https://www.kaggle.com/datasets/shashwatwork/impact-of-covid19-pandemic-on-the-global-economy), \
which in turn is taken from: Vitenu-Sackey, Prince Asare (2020), “The Impact of Covid-19 Pandemic on the Global Economy: Emphasis on Poverty Alleviation and Economic Growth”, Mendeley Data, V1, doi: 10.17632/b2wvnbnpj9.1

The columns in the first dataset are: ['CODE', 'COUNTRY', 'DATE', 'HDI', 'TC', 'TD', 'STI', 'POP', 'GDPCAP'] \
The columns in the second dataset (a transformed dataset) are: ['iso_code', 'location', 'date', 'total_cases', 'total_deaths', \
'stringency_index', 'population', 'gdp_per_capita'] and some other unnamed columns. The data is available upto 2022.

Sadly, the GDP Per Capita data in these datasets has no variance in it over time, so I had to resort to finding that data from another source. I found this data here: \
[Our world in data: GDP per capitaIn constant 2017 international $ – World Bank](https://ourworldindata.org/grapher/gdp-per-capita-worldbank?tab=chart)

The columns in this dataset are: ['Entity', 'Code', 'Year', 'GDP per capita, PPP (constant 2017 international $)']

In [2]:
# %%
data = pd.read_csv("../data/transformed_data.csv")
data2 = pd.read_csv("../data/raw_data.csv")
gdp_per_capita_data = pd.read_csv("../data/gdp-per-capita-worldbank.csv")

I initially did the data analysis here in interactive python and a Python file using VSCode and later converted it to this notebook, thus the markdown # %% syntax for cells.

In [3]:
# %%
data.head()

Unnamed: 0,CODE,COUNTRY,DATE,HDI,TC,TD,STI,POP,GDPCAP
0,AFG,Afghanistan,2019-12-31,0.498,0.0,0.0,0.0,17.477233,7.497754
1,AFG,Afghanistan,2020-01-01,0.498,0.0,0.0,0.0,17.477233,7.497754
2,AFG,Afghanistan,2020-01-02,0.498,0.0,0.0,0.0,17.477233,7.497754
3,AFG,Afghanistan,2020-01-03,0.498,0.0,0.0,0.0,17.477233,7.497754
4,AFG,Afghanistan,2020-01-04,0.498,0.0,0.0,0.0,17.477233,7.497754


In [4]:
# %%
data2.head()

Unnamed: 0,iso_code,location,date,total_cases,total_deaths,stringency_index,population,gdp_per_capita,human_development_index,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,AFG,Afghanistan,2019-12-31,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
1,AFG,Afghanistan,2020-01-01,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
2,AFG,Afghanistan,2020-01-02,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
3,AFG,Afghanistan,2020-01-03,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
4,AFG,Afghanistan,2020-01-04,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494


In [5]:
gdp_per_capita_data.head()

Unnamed: 0,Entity,Code,Year,"GDP per capita, PPP (constant 2017 international $)"
0,Afghanistan,AFG,2002,1280.4631
1,Afghanistan,AFG,2003,1292.3335
2,Afghanistan,AFG,2004,1260.0605
3,Afghanistan,AFG,2005,1352.3207
4,Afghanistan,AFG,2006,1366.9932


We'll only need some of these columns, let's get a sense for how many countries we have data for before we do some data preprocessing and aggregation.

In [6]:
# %%
data["COUNTRY"].value_counts()

COUNTRY
Afghanistan        294
Indonesia          294
Macedonia          294
Luxembourg         294
Lithuania          294
                  ... 
Tajikistan         172
Comoros            171
Lesotho            158
Hong Kong           51
Solomon Islands      4
Name: count, Length: 210, dtype: int64

In [7]:
# %%
data["COUNTRY"].value_counts().mode()

0    294
Name: count, dtype: int64

We have 294 entries for most of the countries, as expressed by the mode above. We will need to use it for dividing the sum of all the samples related to the human development index and the population, to get an aggregate.

In [8]:
len(data["COUNTRY"].unique())

210

In [9]:
len(gdp_per_capita_data.loc[gdp_per_capita_data["Year"] == 2022]["Entity"].unique())

202

Since our GDP Per Capita and Covid data are taken from different sources, we don't have completely matching data in both the datasets. One has data for 210 countries, while the other has data for 202 countries. (for 2022) We'll thus have to merge our data and see how many countries we can get all our required data for.

In [10]:
# %%
code = data["CODE"].unique().tolist()
country = sorted(list(set(data["COUNTRY"].unique()) & 
                      set(
                          gdp_per_capita_data.loc[gdp_per_capita_data["Year"] == 2022]["Entity"].unique()
                          )))

print(len(country))

population = []
hdi = []
tc = []
td = []
sti = []
gdp = []
gdp_per_capita_before_covid = []
gdp_per_capita_during_covid = []
gdp_per_capita_after_covid = []

175


We have adequate data for 175 countries, which for our purposes might be enough because we won't be able to visually analyze the data for all the countries.

I'm taking 2019 as the year "before covid", 2021 as "during covid", and 2022 as "after covid", though technically the effects of covid still ravage the world's economies. This is just for the sake of easy analysis.

In [11]:
# %%
for i in country:
    hdi.append((data.loc[data["COUNTRY"] == i, "HDI"]).sum()/294) #dividing by mode to get the aggregate
    tc.append((data2.loc[data2["location"] == i, "total_cases"]).sum())
    td.append((data2.loc[data2["location"] == i, "total_deaths"]).sum())
    sti.append((data.loc[data["COUNTRY"] == i, "STI"]).sum()/294)
    population.append((data2.loc[data2["location"] == i, "population"]).sum()/294)
    gdp_per_capita_before_covid.append(
        gdp_per_capita_data.loc[(gdp_per_capita_data["Entity"] == i) 
                                & (gdp_per_capita_data["Year"] == 2019), 
                                "GDP per capita, PPP (constant 2017 international $)"].item()
        )
    gdp_per_capita_during_covid.append(
        gdp_per_capita_data.loc[(gdp_per_capita_data["Entity"] == i) 
                                & (gdp_per_capita_data["Year"] == 2021), 
                                "GDP per capita, PPP (constant 2017 international $)"].item()
        )
    gdp_per_capita_after_covid.append(
        gdp_per_capita_data.loc[(gdp_per_capita_data["Entity"] == i) 
                                & (gdp_per_capita_data["Year"] == 2022), 
                                "GDP per capita, PPP (constant 2017 international $)"].item()
        )

aggregated_data = pd.DataFrame(list(zip(code, country, hdi, tc, td, sti, population, 
                                        gdp_per_capita_before_covid, 
                                        gdp_per_capita_during_covid, 
                                        gdp_per_capita_after_covid)), 
                               columns = ["Country Code", "Country", "HDI", 
                                          "Total Cases", "Total Deaths", 
                                          "Stringency Index", "Population", 
                                          "GDP Per Capita Before Covid", 
                                          "GDP Per Capita During Covid",
                                          "GDP Per Capita After Covid"])

aggregated_data.head()

Unnamed: 0,Country Code,Country,HDI,Total Cases,Total Deaths,Stringency Index,Population,GDP Per Capita Before Covid,GDP Per Capita During Covid,GDP Per Capita After Covid
0,AFG,Albania,0.600765,1071951.0,31056.0,3.005624,2202398.0,13653.249,14596.016,15492.067
1,ALB,Algeria,0.754,4893999.0,206429.0,3.195168,43851040.0,11627.28,11029.139,11198.233
2,DZA,Angola,0.418952,304005.0,11820.0,2.96556,23699490.0,6602.269,5911.8354,5906.1157
3,AND,Antigua and Barbuda,0.567755,12619.0,568.0,0.0,71280.93,23638.686,20501.162,22321.87
4,AGO,Argentina,0.707143,47155234.0,1077426.0,3.475438,38739240.0,22071.748,21599.797,22461.441


Let's see the top 20 countries by total cases.

In [12]:
# %%
data = aggregated_data.sort_values(by=["Total Cases"], ascending=False)
data.head(20)

Unnamed: 0,Country Code,Country,HDI,Total Cases,Total Deaths,Stringency Index,Population,GDP Per Capita Before Covid,GDP Per Capita During Covid,GDP Per Capita After Covid
169,SGP,United States,0.924,746014098.0,26477574.0,3.350949,331002600.0,62470.93,63635.824,64623.125
22,BTN,Brazil,0.759,425704517.0,14340567.0,3.136028,212559400.0,14685.128,14735.582,15093.465
74,GHA,India,0.64,407771615.0,7247327.0,3.610552,1380004000.0,6617.13,6677.185,7112.0396
132,MOZ,Russia,0.816,132888951.0,2131571.0,3.380088,145934500.0,27254.574,28057.031,27450.45
125,MEX,Peru,0.59949,74882695.0,3020038.0,3.430126,26355050.0,12735.168,12533.842,12743.942
104,OWID_KOS,Mexico,0.774,74347548.0,7295850.0,3.019289,128932800.0,20553.955,19617.76,20254.781
151,PHL,Spain,0.887969,73717676.0,5510624.0,3.393922,46595750.0,40782.234,38319.29,40223.01
149,PRY,South Africa,0.608653,63027659.0,1357682.0,3.364333,51642940.0,13850.8955,13337.79,13478.754
36,CPV,Colombia,0.581847,60543682.0,1936134.0,3.357923,39633270.0,14616.135,14661.213,15616.752
168,SLE,United Kingdom,0.922,59475032.0,7249573.0,3.353883,67886000.0,46909.08,45567.57,47587.168


Understandably, the countries with bigger populations, like the USA, Brazil, India, and Russia were leading in the number of cases over the data that we have. 
The US and India reportedly handled their cases catastrophically, so that might reflect in the number of deaths. We'll look at this later. For now, let's see which countries had more than 10 million covid cases.

In [13]:
# %%
figure = px.bar(data[data["Total Cases"] > 1e7], 
                y = 'Total Cases', 
                x = 'Country',
                title = "Countries with more than 10 million Covid cases")
figure.show()

Let's see which countries had the highest number of cases per capita.

In [14]:
data["Cases Per Capita"] = data["Total Cases"] / data["Population"]
data["Cases Per Thousand"] = data["Cases Per Capita"] * 1000
data.head()

Unnamed: 0,Country Code,Country,HDI,Total Cases,Total Deaths,Stringency Index,Population,GDP Per Capita Before Covid,GDP Per Capita During Covid,GDP Per Capita After Covid,Cases Per Capita,Cases Per Thousand
169,SGP,United States,0.924,746014098.0,26477574.0,3.350949,331002600.0,62470.93,63635.824,64623.125,2.253801,2253.801004
22,BTN,Brazil,0.759,425704517.0,14340567.0,3.136028,212559400.0,14685.128,14735.582,15093.465,2.002755,2002.75546
74,GHA,India,0.64,407771615.0,7247327.0,3.610552,1380004000.0,6617.13,6677.185,7112.0396,0.295486,295.485739
132,MOZ,Russia,0.816,132888951.0,2131571.0,3.380088,145934500.0,27254.574,28057.031,27450.45,0.910607,910.60707
125,MEX,Peru,0.59949,74882695.0,3020038.0,3.430126,26355050.0,12735.168,12533.842,12743.942,2.841304,2841.303796


In [15]:
# %%
figure = px.bar(data.sort_values(by=["Cases Per Thousand"], ascending=False).head(30), 
                y = 'Cases Per Thousand', 
                x = 'Country',
                title = "Highest number of cases of Covid per thousand people")
figure.show()

Unless there's some fault in our data, it seems like many countries had more that a thousand cases per thousand people in their population. \
This could be because of cases of visitors, or because the same people got Covid multiple times over the three years we have data for. The latter is highly likely.

Now let's take a look at countries with more than a million deaths.

In [16]:
# %%
figure = px.bar(data[data["Total Deaths"] > 1e6], 
                y = 'Total Deaths', 
                x = 'Country',
                title = "Countries with more than one million deaths")
figure.show()

Now let's compare the total cases vs total deaths for the countries with the highest number of cases.

In [17]:
# %%
top_performers = data.head(20)
fig = go.Figure()
fig.add_trace(go.Bar(
    x = top_performers["Country"],
    y = top_performers["Total Cases"],
    name='Total Cases',
    marker_color='indianred'
))
fig.add_trace(go.Bar(
    x = top_performers["Country"],
    y = top_performers["Total Deaths"],
    name='Total Deaths',
    marker_color='lightsalmon'
))
fig.update_layout(barmode='group', xaxis_tickangle = -45)
fig.show()

It might help us to look at these as a percentage.

In [18]:
data["percentage_of_deaths"] = (data["Total Deaths"] / data["Total Cases"]) * 100
figure = px.bar(data.sort_values(by = ["percentage_of_deaths"], ascending=False).head(20), 
                y = 'percentage_of_deaths', 
                x = 'Country',
                title = "Highest number of deaths as a percentage of detected covid cases")
figure.show()

This is unsurprising - there was news all over media of how terribly Italy had done at managing Covid-19, to the point where everything in the country was shut down and there were mass deaths.\
The next three top countries leading at percentage of deaths are also western European countries - some of these countries didn't take covid as seriously and refused to regiment a lockdown.\
Later we will look at the stringency index, an index of how strictly countries abided by lockdown rules, and see if that correlates with these percentages.\
For now, let as look at the percentage of deaths vs detected cases all over the world.

In [19]:
# %%
cases = data["Total Cases"].sum()
deceased = data["Total Deaths"].sum()

labels = ["Total Cases", "Total Deaths"]
values = [cases, deceased]

fig = px.pie(data, values=values, names = labels,
             title = "Percentage of Deaths vs Cases", hole = 0.5)
fig.show()

We have 3.6% overall death rate, and as we saw before, there are some countries that overshot this.

In [20]:
figure = px.bar(data.sort_values(by = ["percentage_of_deaths"], ascending=False)[data["percentage_of_deaths"] > 3.6], 
                y = 'percentage_of_deaths', 
                x = 'Country',
                title = "Countries with higher than average percentage of deaths")
figure.show()


Boolean Series key will be reindexed to match DataFrame index.



Now, let's look at the stringency index: it is a composite measure of response indicators, including school closures, workplace closures, and travel bans. It shows how strictly countries are following these measures to control the spread of covid-19.

In [21]:
# %%
fig = px.bar(data.head(20), x ='Country', y = 'Total Cases', 
             hover_data=['Population', 'Total Deaths'],
             color = 'Stringency Index',
             title = "Stringency Index by Country during Covid-19")
fig.show()

We can clearly see that some countries acted more strictly compared to others when it came to responding to covid. India and Italy both had a really high number of deaths at different points, and it makes sense that they \
would then have very stringently followed lockdown rules, whereas with some other countries like the US and Brazil, we see a comparatively lower stringency index. These countries chose to flout lockdown etiquette \
in favor of letting the economy reopen more quickly. We will later see if this is reflected in their economies rebouncing quickly.

In [22]:
# %%
# Analyzing Covid-19 Impacts on Economy
fig = px.bar(data.head(20), x = 'Country', y = 'Total Cases', 
             hover_data=['Population', 'Total Deaths'],
             color = "GDP Per Capita Before Covid",
             title = "GDP Per Capita before Covid-19")
fig.show()

This way of visualising the GDP per capita is not very useful. Let's juxtapose the GDP Per Capita for the countries with the highest number of cases before, during, and after Covid.

In [23]:
# %%
fig = go.Figure()
fig.add_trace(go.Bar(
    x = top_performers["Country"],
    y = top_performers["GDP Per Capita Before Covid"],
    name='GDP Per Capita before Covid-19',
    marker_color='indianred'
))
fig.add_trace(go.Bar(
    x = top_performers["Country"],
    y = top_performers["GDP Per Capita During Covid"],
    name='GDP Per Capita during Covid-19',
    marker_color='lightsalmon'
))
fig.add_trace(go.Bar(
    x = top_performers["Country"],
    y = top_performers["GDP Per Capita After Covid"],
    name='GDP Per Capita after Covid-19',
    marker_color='pink'
))
fig.update_layout(barmode='group', xaxis_tickangle = -45)
fig.show()

As we see, for most countries we see a slight drop during covid and subsequent bounce-back. For the US, curiously, it kept increasing year on year, while for Russia the GDP per capita was paradoxically higher during the peak of covid. \

Now, let look at some other metrics like HDI.

In [24]:
# %%
# Human Development Index. It is a statistic composite index of life expectancy, education, and per capita indicators
fig = px.bar(data.head(20), x = 'Country', y = 'Total Cases', 
             hover_data=['Population', 'Total Deaths'],
             color = "HDI", height= 400,
             title = "Human Development Index during Covid-19")
fig.show()

As expected, for the most part. We have one thing left that we haven't confirmed from the data yet - how the stringency index correlates with the percentage death as well as the bounce-back in the economy.

## Predictive Modelling

In [26]:
data.head()

Unnamed: 0,Country Code,Country,HDI,Total Cases,Total Deaths,Stringency Index,Population,GDP Per Capita Before Covid,GDP Per Capita During Covid,GDP Per Capita After Covid,Cases Per Capita,Cases Per Thousand,percentage_of_deaths
169,SGP,United States,0.924,746014098.0,26477574.0,3.350949,331002600.0,62470.93,63635.824,64623.125,2.253801,2253.801004,3.549206
22,BTN,Brazil,0.759,425704517.0,14340567.0,3.136028,212559400.0,14685.128,14735.582,15093.465,2.002755,2002.75546,3.368667
74,GHA,India,0.64,407771615.0,7247327.0,3.610552,1380004000.0,6617.13,6677.185,7112.0396,0.295486,295.485739,1.777301
132,MOZ,Russia,0.816,132888951.0,2131571.0,3.380088,145934500.0,27254.574,28057.031,27450.45,0.910607,910.60707,1.604024
125,MEX,Peru,0.59949,74882695.0,3020038.0,3.430126,26355050.0,12735.168,12533.842,12743.942,2.841304,2841.303796,4.033025


In [32]:
data["Bounce Back"] = data["GDP Per Capita After Covid"] - data["GDP Per Capita During Covid"]
data["Bounce Back Percentage"] = (data["Bounce Back"] / data["GDP Per Capita During Covid"]) * 100
#plot of bounce back percentage against stringency index
fig = px.scatter(data, x="Stringency Index", y="Bounce Back Percentage", 
                 color="Country", size="Total Cases", 
                 title="Bounce Back Percentage vs Stringency Index")
fig.update_layout(xaxis_range=[3, 4],
                  yaxis_range=[0, 10])
fig.show()

In [35]:
top_countries = data.sort_values(by="Total Cases", ascending=False).head(20)
top_countries_sorted = top_countries.sort_values(by="Bounce Back Percentage", ascending=False)
top_countries_sorted

Unnamed: 0,Country Code,Country,HDI,Total Cases,Total Deaths,Stringency Index,Population,GDP Per Capita Before Covid,GDP Per Capita During Covid,GDP Per Capita After Covid,Cases Per Capita,Cases Per Thousand,percentage_of_deaths,Bounce Back,Bounce Back Percentage
138,NZL,Saudi Arabia,0.673116,38585191.0,442507.0,3.263361,27472170.0,47024.543,46768.434,50188.297,1.404519,1404.519373,1.146831,3419.863,7.312332
36,CPV,Colombia,0.581847,60543682.0,1936134.0,3.357923,39633270.0,14616.135,14661.213,15616.752,1.527598,1527.597569,3.197913,955.539,6.517462
74,GHA,India,0.64,407771615.0,7247327.0,3.610552,1380004000.0,6617.13,6677.185,7112.0396,0.295486,295.485739,1.777301,434.8546,6.512544
12,AZE,Bangladesh,0.477714,35266178.0,484534.0,3.18641,129398800.0,5467.2075,5911.013,6263.0044,0.272539,272.538677,1.373934,351.9914,5.954841
151,PHL,Spain,0.887969,73717676.0,5510624.0,3.393922,46595750.0,40782.234,38319.29,40223.01,1.582069,1582.068559,7.475309,1903.72,4.968046
163,STP,Turkey,0.591905,41431948.0,1049757.0,3.041418,63110870.0,28476.684,31722.16,33149.5,0.656495,656.494679,2.53369,1427.34,4.499504
168,SLE,United Kingdom,0.922,59475032.0,7249573.0,3.353883,67886000.0,46909.08,45567.57,47587.168,0.876102,876.10153,12.189271,2019.598,4.432095
80,GTM,Italy,0.88,50752853.0,6664225.0,3.629838,60461830.0,42739.05,42562.555,44292.19,0.83942,839.419758,13.13074,1729.635,4.063748
4,AGO,Argentina,0.707143,47155234.0,1077426.0,3.475438,38739240.0,22071.748,21599.797,22461.441,1.217247,1217.247347,2.284849,861.644,3.98913
104,OWID_KOS,Mexico,0.774,74347548.0,7295850.0,3.019289,128932800.0,20553.955,19617.76,20254.781,0.576638,576.63818,9.813168,637.021,3.247165


The above scatter plot and table tell us what kind of recovery in their economy countries were able to gain, vs what kind of stringency they showed with respect to the lockdown rules. \
We see that out of the countries that had a large number of cases, India seems to have gotten quite a bounce back in the economy, while having a comparatively high stringency index, \
whereas the United States, which had lower stringency, had a lower increase in the GDP per capita compared to covid. Do note that the US never had a drop in the GDP per capita during covid either and kept gradually increasing, which might play a factor here.