### *Covid-19 Impacts Analysis* ###

The first wave of covid-19 impacted the global economy as the world was never ready for the pandemic. It resulted in a rise in cases, a rise in deaths, a rise in unemployment and a rise in poverty, resulting in an economic slowdown.

 Here, you are required to analyze the spread of Covid-19 cases and all the impacts of covid-19 on the economy

The dataset we are using to analyze the impacts of covid-19 is downloaded from Kaggle. It contains data about:

- the country code
- name of all the countries
- date of the record
- Human development index of all the countries
- Daily covid-19 cases
- Daily deaths due to covid-19
- stringency index of the countries
- the population of the countries
- GDP per capita of the countries

Let’s start the task of Covid-19 impacts analysis by importing the necessary Python libraries and the dataset:

##### *Imports & Loads* #####

In [9]:
import pandas as pd
import numpy as np

import seaborn as sns

import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo
import plotly.offline as py
import plotly.tools as tls

import matplotlib
import matplotlib.pyplot as plt
from matplotlib.patches import Circle, RegularPolygon
from matplotlib.path import Path
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
from matplotlib.spines import Spine
from matplotlib.transforms import Affine2D
%matplotlib inline

py.init_notebook_mode(connected=True)

pio.templates.default = "plotly_dark"
matplotlib.style.use('dark_background')

In [2]:
data = pd.read_csv("transformed_data.csv")
data2 = pd.read_csv("raw_data.csv")

##### *Data preparation* #####

The dataset that we are using here contains two data files. One file contains raw data, and the other file contains transformed one. But we have to use both datasets for this task, as both of them contain equally important information in different columns. So let’s have a look at both the datasets one by one

In [7]:
# Basic stats
print("Number of rows : {}".format(data.shape[0]))
print()

print("Display of dataset: ")
display(data.head())
print()

print("Basics statistics: ")
data_desc = data.describe(include='all')
display(data_desc)
print()

print("Percentage of missing values: ")
display(100*data.isnull().sum()/data.shape[0])

Number of rows : 50418

Display of dataset: 


Unnamed: 0,CODE,COUNTRY,DATE,HDI,TC,TD,STI,POP,GDPCAP
0,AFG,Afghanistan,2019-12-31,0.498,0.0,0.0,0.0,17.477233,7.497754
1,AFG,Afghanistan,2020-01-01,0.498,0.0,0.0,0.0,17.477233,7.497754
2,AFG,Afghanistan,2020-01-02,0.498,0.0,0.0,0.0,17.477233,7.497754
3,AFG,Afghanistan,2020-01-03,0.498,0.0,0.0,0.0,17.477233,7.497754
4,AFG,Afghanistan,2020-01-04,0.498,0.0,0.0,0.0,17.477233,7.497754



Basics statistics: 


Unnamed: 0,CODE,COUNTRY,DATE,HDI,TC,TD,STI,POP,GDPCAP
count,50418,50418,50418,44216.0,50418.0,50418.0,50418.0,50418.0,50418.0
unique,210,210,294,,,,,,
top,AFG,Afghanistan,2020-08-31,,,,,,
freq,294,294,209,,,,,,
mean,,,,0.720139,6.762125,3.413681,3.178897,15.442097,8.31858
std,,,,0.160902,3.637347,3.082761,1.673451,2.495039,3.17713
min,,,,0.0,0.0,0.0,0.0,6.695799,0.0
25%,,,,0.601,4.158883,0.0,2.867331,14.151619,7.955479
50%,,,,0.752,7.092574,3.178054,4.000583,15.929201,9.368531
75%,,,,0.847,9.504669,5.620401,4.335852,17.187513,10.237704



Percentage of missing values: 


CODE        0.000000
COUNTRY     0.000000
DATE        0.000000
HDI        12.301162
TC          0.000000
TD          0.000000
STI         0.000000
POP         0.000000
GDPCAP      0.000000
dtype: float64

In [8]:
# Basic stats
print("Number of rows : {}".format(data2.shape[0]))
print()

print("Display of dataset: ")
display(data2.head())
print()

print("Basics statistics: ")
data2_desc = data2.describe(include='all')
display(data2_desc)
print()

print("Percentage of missing values: ")
display(100*data2.isnull().sum()/data2.shape[0])

Number of rows : 50418

Display of dataset: 


Unnamed: 0,iso_code,location,date,total_cases,total_deaths,stringency_index,population,gdp_per_capita,human_development_index,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,AFG,Afghanistan,2019-12-31,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
1,AFG,Afghanistan,2020-01-01,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
2,AFG,Afghanistan,2020-01-02,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
3,AFG,Afghanistan,2020-01-03,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
4,AFG,Afghanistan,2020-01-04,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494



Basics statistics: 


Unnamed: 0,iso_code,location,date,total_cases,total_deaths,stringency_index,population,gdp_per_capita,human_development_index,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
count,50418,50418,50418,47324.0,39228.0,43292.0,50418.0,44706.0,44216.0,50418,50418,50418,50418.0,50418
unique,210,210,294,,,,,,,19172,6374,170,,185
top,AFG,Afghanistan,2020-08-31,,,,,,,#NUM!,#NUM!,#NUM!,,#NUM!
freq,294,294,209,,,,,,,3594,12298,10042,,5712
mean,,,,66219.27,2978.767819,56.162022,42516010.0,20818.70624,0.720139,,,,15.442097,
std,,,,404558.2,13836.644013,27.532685,156460700.0,20441.365392,0.160902,,,,2.495039,
min,,,,0.0,0.0,0.0,809.0,661.24,0.0,,,,6.695799,
25%,,,,126.0,10.0,37.96,1399491.0,5338.454,0.601,,,,14.151619,
50%,,,,1594.0,64.0,61.11,8278737.0,13913.839,0.752,,,,15.929201,
75%,,,,15847.75,564.0,78.7,29136810.0,31400.84,0.847,,,,17.187513,



Percentage of missing values: 


iso_code                    0.000000
location                    0.000000
date                        0.000000
total_cases                 6.136697
total_deaths               22.194454
stringency_index           14.133841
population                  0.000000
gdp_per_capita             11.329287
human_development_index    12.301162
Unnamed: 9                  0.000000
Unnamed: 10                 0.000000
Unnamed: 11                 0.000000
Unnamed: 12                 0.000000
Unnamed: 13                 0.000000
dtype: float64

After having initial impressions of both datasets, I found that we have to combine both datasets by creating a new dataset. But before we create a new dataset, let’s have a look at how many samples of each country are present in the dataset

In [11]:
# Let's check the value of each country
data["COUNTRY"].value_counts()

Afghanistan        294
Indonesia          294
Macedonia          294
Luxembourg         294
Lithuania          294
                  ... 
Tajikistan         172
Comoros            171
Lesotho            158
Hong Kong           51
Solomon Islands      4
Name: COUNTRY, Length: 210, dtype: int64

We don't have an equal number of samples of each coutntry in dataset, let's have a look at the mode value to have obtain the most frequent value

In [12]:
# Let's check the mode value
data["COUNTRY"].value_counts().mode()

0    294
Name: COUNTRY, dtype: int64

So 294 is the mode value. We will need to use it for dividing the sum of all the samples related to the human development index, GDP per capita, and the population. Now let’s create a new dataset by combining the necessary columns from both the datasets

In [13]:
# Let's aggregate the data
code = data["CODE"].unique().tolist()
country = data["COUNTRY"].unique().tolist()
hdi = []
tc = []
td = []
sti = []
population = data["POP"].unique().tolist()
gdp = []

for i in country:
    hdi.append((data.loc[data["COUNTRY"] == i, "HDI"]).sum()/294)
    tc.append((data2.loc[data2["location"] == i, "total_cases"]).sum())
    td.append((data2.loc[data2["location"] == i, "total_deaths"]).sum())
    sti.append((data.loc[data["COUNTRY"] == i, "STI"]).sum()/294)
    population.append((data2.loc[data2["location"] == i, "population"]).sum()/294)

aggregated_data = pd.DataFrame(list(zip(code, country, hdi, tc, td, sti, population)), 
                               columns = ["Country Code", "Country", "HDI", 
                                          "Total Cases", "Total Deaths", 
                                          "Stringency Index", "Population"])
print(aggregated_data.head())

  Country Code      Country       HDI  Total Cases  Total Deaths  \
0          AFG  Afghanistan  0.498000    5126433.0      165875.0   
1          ALB      Albania  0.600765    1071951.0       31056.0   
2          DZA      Algeria  0.754000    4893999.0      206429.0   
3          AND      Andorra  0.659551     223576.0        9850.0   
4          AGO       Angola  0.418952     304005.0       11820.0   

   Stringency Index  Population  
0          3.049673   17.477233  
1          3.005624   14.872537  
2          3.195168   17.596309  
3          2.677654   11.254996  
4          2.965560   17.307957  


I have not included the GDP per capita column yet. I didn’t find the correct figures for GDP per capita in the dataset. So it will be better to manually collect the data about the GDP per capita of the countries

As we have so many countries in this data, it will not be easy to manually collect the data about the GDP per capita of all the countries. So let’s select a subsample from this dataset. To create a subsample from this dataset, I will be selecting the top 10 countries with the highest number of covid-19 cases. It will be a perfect sample to study the economic impacts of covid-19. So let’s sort the data according to the total cases of Covid-19

In [14]:
# Let's sort the data according to total cases
data = aggregated_data.sort_values(by=["Total Cases"], ascending=False)
print(data.head())

    Country Code        Country      HDI  Total Cases  Total Deaths  \
200          USA  United States  0.92400  746014098.0    26477574.0   
27           BRA         Brazil  0.75900  425704517.0    14340567.0   
90           IND          India  0.64000  407771615.0     7247327.0   
157          RUS         Russia  0.81600  132888951.0     2131571.0   
150          PER           Peru  0.59949   74882695.0     3020038.0   

     Stringency Index  Population  
200          3.350949   19.617637  
27           3.136028   19.174732  
90           3.610552   21.045353  
157          3.380088   18.798668  
150          3.430126   17.311165  


Now here’s how we can select the top 10 countries with the highest number of cases

In [15]:
# Let's check the top 10 countries with highest covid cases
data = data.head(10)
print(data)

    Country Code         Country       HDI  Total Cases  Total Deaths  \
200          USA   United States  0.924000  746014098.0    26477574.0   
27           BRA          Brazil  0.759000  425704517.0    14340567.0   
90           IND           India  0.640000  407771615.0     7247327.0   
157          RUS          Russia  0.816000  132888951.0     2131571.0   
150          PER            Peru  0.599490   74882695.0     3020038.0   
125          MEX          Mexico  0.774000   74347548.0     7295850.0   
178          ESP           Spain  0.887969   73717676.0     5510624.0   
175          ZAF    South Africa  0.608653   63027659.0     1357682.0   
42           COL        Colombia  0.581847   60543682.0     1936134.0   
199          GBR  United Kingdom  0.922000   59475032.0     7249573.0   

     Stringency Index  Population  
200          3.350949   19.617637  
27           3.136028   19.174732  
90           3.610552   21.045353  
157          3.380088   18.798668  
150          3.4

Now I will add two more columns (GDP per capita before Covid-19, GDP per capita during Covid-19) to this dataset

In [16]:
# Let's add the two columns GDP
data["GDP Before Covid"] = [65279.53, 8897.49, 2100.75, 
                            11497.65, 7027.61, 9946.03, 
                            29564.74, 6001.40, 6424.98, 42354.41]
data["GDP During Covid"] = [63543.58, 6796.84, 1900.71, 
                            10126.72, 6126.87, 8346.70, 
                            27057.16, 5090.72, 5332.77, 40284.64]
print(data)

    Country Code         Country       HDI  Total Cases  Total Deaths  \
200          USA   United States  0.924000  746014098.0    26477574.0   
27           BRA          Brazil  0.759000  425704517.0    14340567.0   
90           IND           India  0.640000  407771615.0     7247327.0   
157          RUS          Russia  0.816000  132888951.0     2131571.0   
150          PER            Peru  0.599490   74882695.0     3020038.0   
125          MEX          Mexico  0.774000   74347548.0     7295850.0   
178          ESP           Spain  0.887969   73717676.0     5510624.0   
175          ZAF    South Africa  0.608653   63027659.0     1357682.0   
42           COL        Colombia  0.581847   60543682.0     1936134.0   
199          GBR  United Kingdom  0.922000   59475032.0     7249573.0   

     Stringency Index  Population  GDP Before Covid  GDP During Covid  
200          3.350949   19.617637          65279.53          63543.58  
27           3.136028   19.174732           8897.49 

The data about the GDP per capita is collected manually

##### *Analyzing the Spread of Covid-19* #####

Now let’s start by analyzing the spread of covid-19 in all the countries with the highest number of covid-19 cases.

I will first have a look at all the countries with the highest number of covid-19 cases.

In [17]:
# Let's check all the countires with the highest number of covide-19 cases
figure = px.bar(data, y='Total Cases', x='Country',
            title="Countries with Highest Covid Cases")
figure.show()

We can see that the USA is comparatively having a very high number of covid-19 cases as compared to Brazil and India in the second and third positions.

Now let’s have a look at the total number of deaths among the countries with the highest number of covid-19 cases

In [18]:
# Let's check the total number of deaths among the countries witht he highest number of covid-19 cases
figure = px.bar(data, y='Total Deaths', x='Country',
            title="Countries with Highest Deaths")
figure.show()


Just like the total number of covid-19 cases, the USA is leading in the deaths, with Brazil and India in the second and third positions.

One thing to notice here is that the death rate in India, Russia, and South Africa is comparatively low according to the total number of cases.

Now let’s compare the total number of cases and total deaths in all these countries.

In [22]:
# Let's compare the total number of cases and total deaths in all these countries

fig = go.Figure()
fig.add_trace(go.Bar(
    x=data["Country"],
    y=data["Total Cases"],
    name='Total Cases',
    marker_color='indianred'
))
fig.add_trace(go.Bar(
    x=data["Country"],
    y=data["Total Deaths"],
    name='Total Deaths',
    marker_color='lightsalmon'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()

Now let’s have a look at the percentage of total deaths and total cases among all the countries with the highest number of covid-19 cases

In [24]:
# Let's check the percentage of total deaths and total cases among all the countries with the highest number of covide-19 cases

cases = data["Total Cases"].sum()
deceased = data["Total Deaths"].sum()

labels = ["Total Cases", "Total Deaths"]
values = [cases, deceased]

fig = px.pie(data, values=values, names=labels, 
             title='Percentage of Total Cases and Deaths', hole=0.5)
fig.show()

Below is how you can calculate the death rate of Covid-19 cases

In [25]:
death_rate = (data["Total Deaths"].sum() / data["Total Cases"].sum()) * 100
print("Death Rate = ", death_rate)

Death Rate =  3.6144212045653767


Another important column in this dataset is the stringency index.

It is a composite measure of response indicators, including school closures, workplace closures, and travel bans. 

It shows how strictly countries are following these measures to control the spread of covid-19.

In [26]:
fig = px.bar(data, x='Country', y='Total Cases',
             hover_data=['Population', 'Total Deaths'], 
             color='Stringency Index', height=400, 
             title= "Stringency Index during Covid-19")
fig.show()

Here we can see that India is performing well in the stringency index during the outbreak of covid-19.

##### *Analyzing Covid-19 Impacts on Economy* #####

Now let’s move to analyze the impacts of covid-19 on the economy.

Here the GDP per capita is the primary factor for analyzing the economic slowdowns caused due to the outbreak of covid-19. 

Let’s have a look at the GDP per capita before the outbreak of covid-19 among the countries with the highest number of covid-19 cases.

In [27]:
# Let's check the GDP per capita before the outbreak of covid-19 among the countries with the highest number of covid-19 cases

fig = px.bar(data, x='Country', y='Total Cases',
             hover_data=['Population', 'Total Deaths'], 
             color='GDP Before Covid', height=400, 
             title="GDP Per Capita Before Covid-19")
fig.show()

Now let’s have a look at the GDP per capita during the rise in the cases of covid-19

In [28]:
#  Let’s check the GDP per capita during the rise in the cases of covid-19

fig = px.bar(data, x='Country', y='Total Cases',
             hover_data=['Population', 'Total Deaths'], 
             color='GDP During Covid', height=400, 
             title="GDP Per Capita During Covid-19")
fig.show()

Now let’s compare the GDP per capita before covid-19 and during covid-19 to have a look at the impact of covid-19 on the GDP per capita

In [29]:
fig = go.Figure()
fig.add_trace(go.Bar(
    x=data["Country"],
    y=data["GDP Before Covid"],
    name='GDP Per Capita Before Covid-19',
    marker_color='indianred'
))
fig.add_trace(go.Bar(
    x=data["Country"],
    y=data["GDP During Covid"],
    name='GDP Per Capita During Covid-19',
    marker_color='lightsalmon'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()

You can see a drop in GDP per capita in all the countries with the highest number of covid-19 cases.

One other important economic factor is Human Development Index.

It is a statistic composite index of life expectancy, education, and per capita indicators.

Let’s have a look at how many countries were spending their budget on the human development

In [30]:
# Let’s check how many countries were spending their budget on the human development

fig = px.bar(data, x='Country', y='Total Cases',
             hover_data=['Population', 'Total Deaths'], 
             color='HDI', height=400, 
             title="Human Development Index during Covid-19")
fig.show()

So this is how we can analyze the spread of Covid-19 and its impact on the economy.

##### *Conclusion* #####

In this task, we studied the spread of covid-19 among the countries and its impact on the global economy. 

We saw that the outbreak of covid-19 resulted in the highest number of covid-19 cases and deaths in the united states. 

One major reason behind this is the stringency index of the United States. 

It is comparatively low according to the population. 

We also analyzed how the GDP per capita of every country was affected during the outbreak of covid-19. 