#### **Data exploratory**
Here, we import the dataset, and perform some data manipulation and visualization in order to get familiar with the data at hand and proceed
with a the data engineering step.

In [2]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

data_transformed = pd.read_csv('Covid-19 impacts on global economy/dataset/transformed_data.csv')
data_raw = pd.read_csv('Covid-19 impacts on global economy/dataset/raw_data.csv')

In [3]:
data_transformed.head()

Unnamed: 0,CODE,COUNTRY,DATE,HDI,TC,TD,STI,POP,GDPCAP
0,AFG,Afghanistan,2019-12-31,0.498,0.0,0.0,0.0,17.477233,7.497754
1,AFG,Afghanistan,2020-01-01,0.498,0.0,0.0,0.0,17.477233,7.497754
2,AFG,Afghanistan,2020-01-02,0.498,0.0,0.0,0.0,17.477233,7.497754
3,AFG,Afghanistan,2020-01-03,0.498,0.0,0.0,0.0,17.477233,7.497754
4,AFG,Afghanistan,2020-01-04,0.498,0.0,0.0,0.0,17.477233,7.497754


In [4]:
data_raw.head()

Unnamed: 0,iso_code,location,date,total_cases,total_deaths,stringency_index,population,gdp_per_capita,human_development_index,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,AFG,Afghanistan,2019-12-31,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
1,AFG,Afghanistan,2020-01-01,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
2,AFG,Afghanistan,2020-01-02,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
3,AFG,Afghanistan,2020-01-03,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494
4,AFG,Afghanistan,2020-01-04,0.0,0.0,0.0,38928341,1803.987,0.498,#NUM!,#NUM!,#NUM!,17.477233,7.497754494


As presented above, we have two datasets,the `raw dataset` and the `transformed one`. They share some colums and values. However, We can see that the raw dataset contains more more columns which could be useful for the next steps. Thus we are going to combine both datasets by taking out only the columns 
we find useful.
1. First we check the frequency of countries in the dataset and retrieve the `mode()`. By performing this step we have an idea of the country data distribution
within the data, so we can then normalize the data with the common value.
2. We retrieve the columns which interests us, and normalize their values by dividing their `sum()` with the `mode()`

In [46]:
data_transformed['COUNTRY'].value_counts().mode()

0    294
dtype: int64

In [50]:
country_code = data_transformed['CODE'].unique().tolist()
country = data_transformed['COUNTRY'].unique().tolist()
population =  data_transformed['POP'].unique().tolist()
date = data_transformed['DATE'].unique().tolist()
hdi = []
tc = []
td = []
sti = []

for element in country:
    hdi.append((data_transformed.loc[data_transformed['COUNTRY'] == element, 'HDI']).sum()/294)
    tc.append((data_raw.loc[data_raw['location'] == element, 'total_cases']).sum())
    td.append((data_raw.loc[data_raw['location']== element, 'total_deaths']).sum())
    sti.append((data_transformed.loc[data_transformed['COUNTRY']== element, 'STI']).sum()/294)
    population.append((data_raw.loc[data_raw['location'] == element, 'population']).sum())
    date.append((data_raw.loc[data_raw['location']== element, 'date']))

combined_data = pd.DataFrame(list(zip(country_code, country, hdi, tc, td, date, sti, population)), columns=
                                ["Country Code", "Country", "HDI", 
                                "Total Cases", "Total Deaths", "Date",
                                "Stringency Index", "Population"])

combined_data.head()

Unnamed: 0,Country Code,Country,HDI,Total Cases,Total Deaths,Date,Stringency Index,Population
0,AFG,Afghanistan,0.498,5126433.0,165875.0,2019-12-31,3.049673,17.477233
1,ALB,Albania,0.600765,1071951.0,31056.0,2020-01-01,3.005624,14.872537
2,DZA,Algeria,0.754,4893999.0,206429.0,2020-01-02,3.195168,17.596309
3,AND,Andorra,0.659551,223576.0,9850.0,2020-01-03,2.677654,11.254996
4,AGO,Angola,0.418952,304005.0,11820.0,2020-01-04,2.96556,17.307957


As shown above we can observe the structure of the new dataset which contains unique countries with a summation of their values across
different columns. The next step consit of sorting the values based on the `Total Cases` and take the first 10 countries with highest
cases for the subsequent analysis.

In [48]:
# Sort Combined data according to the total number of covid cases 
data = combined_data.sort_values(by=['Total Cases'], ascending=False)

In [49]:
# geting the first 10 countries with the highest covid case
sorted_data = data.head(10)
print(sorted_data)

    Country Code         Country       HDI  Total Cases  Total Deaths  \
200          USA   United States  0.924000  746014098.0    26477574.0   
27           BRA          Brazil  0.759000  425704517.0    14340567.0   
90           IND           India  0.640000  407771615.0     7247327.0   
157          RUS          Russia  0.816000  132888951.0     2131571.0   
150          PER            Peru  0.599490   74882695.0     3020038.0   
125          MEX          Mexico  0.774000   74347548.0     7295850.0   
178          ESP           Spain  0.887969   73717676.0     5510624.0   
175          ZAF    South Africa  0.608653   63027659.0     1357682.0   
42           COL        Colombia  0.581847   60543682.0     1936134.0   
199          GBR  United Kingdom  0.922000   59475032.0     7249573.0   

           Date  Stringency Index  Population  
200  2020-07-18          3.350949   19.617637  
27   2020-01-27          3.136028   19.174732  
90   2020-03-30          3.610552   21.045353  
157 

#### **Adding the GPD per capita Before and After Covid-19 for the countries with highest cases**
We did not add the GPD per capita before and after Covid because the values were not added to the dataset, thus
we manually collected them and added them to the dataset.

In [10]:
GPGBeforeCovid = [65279.53, 8897.49, 2100.75, 11497.65, 7027.61, 9946.03,29564.74, 6001.40, 6424.98, 42354.41]
GPGAfterCovid = [63543.58, 6796.84, 1900.71, 10126.72, 6126.87, 8346.70, 27057.16, 5090.72, 5332.77, 40284.64]

sorted_data.insert(5,'GPG Before Covid', GPGBeforeCovid)
sorted_data.insert(6,'GPG After Covid', GPGAfterCovid)

# another way of addinf the new columns
# sorted_data['GPA Before Covid'] = GPGBeforeCovid
# sorted_data['GPA After Covid'] = GPGAfterCovid

print(sorted_data)

    Country Code         Country       HDI  Total Cases  Total Deaths  \
200          USA   United States  0.924000  746014098.0    26477574.0   
27           BRA          Brazil  0.759000  425704517.0    14340567.0   
90           IND           India  0.640000  407771615.0     7247327.0   
157          RUS          Russia  0.816000  132888951.0     2131571.0   
150          PER            Peru  0.599490   74882695.0     3020038.0   
125          MEX          Mexico  0.774000   74347548.0     7295850.0   
178          ESP           Spain  0.887969   73717676.0     5510624.0   
175          ZAF    South Africa  0.608653   63027659.0     1357682.0   
42           COL        Colombia  0.581847   60543682.0     1936134.0   
199          GBR  United Kingdom  0.922000   59475032.0     7249573.0   

     GPG Before Covid  GPG After Covid        Date  Stringency Index  \
200          65279.53         63543.58  2020-07-18          3.350949   
27            8897.49          6796.84  2020-01-27  

#### **Analyzing the Spread of Covid-19**
Here, we perform a Series of analysis of Covid-19, such as:
* Highest Covid-19 Cases among countries
* Highest Covid-19 Deaths among countries
* Highest Covid-19 Cases vs Highest Covid-19 Deaths amoung countries
* Percentage of Total Deaths and Total Cases among countries
* Stringency Index impact on Total Deaths and Total Cases among countries
* GPG per capita before Covid-19
* GPG per capita After Covid-19
* GPG per capita before vs GPG per capita After Covid-19
* HDI (Human Development Index) during Covid-19

* **Highest Covid-19 Cases among countries**

Here we can see that the country with the highest Covid-19 Cases is `USA`, following by `Brazil` and `India`.
While other countries present a relatively low proportion.

In [18]:
figure = px.bar(sorted_data, x = 'Country', y = 'Total Cases', hover_data=['Total Deaths', 'Population'], color = 'Date')
figure.update_layout(title = 'Highest Covid Cases per Countries', title_x = 0.5, xaxis_tickangle = -45, height = 400, width = 800)
figure.show()

* **Highest Covid-19 Death among countries**

Here as well, `USA` is leading with the highest death rate, following by `Brazil` and `India`.
However, `India` present a low death rate given the number of Cases, same applies to `Russia`.
Also, a country like `Mexico` with a low rate of Cases present on the other hand a high Death rate,
same applies to `United kingdom`, `Peru`, and `Spain`.

In [12]:
figure = px.bar(sorted_data, x = 'Country', y = 'Total Deaths', hover_data=['Total Cases', 'Population'], color = 'Date')
figure.update_layout(title = 'Highest Covid Death per Countries', title_x = 0.5, xaxis_tickangle = -45, height = 400, width = 800)
figure.show()

* **Highest Covid-19 Cases vs Highest Covid-19 Deaths amoung countries**

After separately analyzing the Total Covid-19 Cases and Deaths rate among countries,
we perform close analysis of the two factors.

In [51]:
fig = go.Figure()
fig.add_trace(go.Bar(x = sorted_data['Country'], y = sorted_data['Total Cases'], name = 'Total Cases'))
fig.add_trace(go.Bar(x = sorted_data['Country'], y = sorted_data['Total Deaths'], name = 'Total Deaths'))
fig.update_layout(barmode = 'group', xaxis_tickangle = -45, height = 400, width = 800)
fig.show()

* **Percentage of Total Deaths and Total Cases among countries**

We analyze the percentage of total deaths and total
cases among countries with the highest number of covid-19 cases:

In [31]:
cases = sorted_data['Total Cases'].sum()
deaths = sorted_data['Total Deaths'].sum()
labels = ['Total Cases', 'Total Deaths']
values = [cases, deaths]
fig = px.pie(sorted_data, values = values, names = labels, color_discrete_map={'Total Cases':'orange','Total Deaths':'cyan'}, hole=0.5)
fig.update_layout(title = 'Percentage of Total Cases and Deaths', title_x = 0.5, height = 400, width = 800)
fig.show()

How one can calculate the Death rate is provided below:

In [56]:
Death_rate = (sorted_data['Total Deaths'].sum() / sorted_data['Total Cases'].sum()) * 100
Death_rate = round(Death_rate, 2)
print(f" The total death rate is: {Death_rate}")

 The total death rate is: 3.61


* **Stringency Index impact on Total Deaths and Total Cases among countries**

We analyze the impact of Covid-19 against the Stringency Index.
`The stringency index`: is a composite measure.
Based on nine response indicators including school. closures,
workplace closures, and travel bans, rescaled to a value from 0 to 100 (100 = strictest)

In [35]:
fig = px.bar(sorted_data, x = 'Country', y = 'Total Cases', hover_data = ['Population', 'Total Deaths', 'Date'], color = 'Stringency Index')
fig.update_layout(title = 'Stringency Index Impact on Total Cases and Death during Covid-19', title_x = 0.5, xaxis_tickangle = -45, height = 400, width = 800)
fig.show()

#### **Analyzing Covid-19 Impacts on Economy**

We move on with analyzing the impact of Covid-19 on the economy, by looking at the GPG factor
GDP per capita is gross domestic product divided by midyear population. 
GDP at purchaser's prices is the sum of gross value added by all resident 
producers in the economy plus any product taxes and minus any subsidies not included in the value of the products.

* **GPG per capita before Covid-19**


In [38]:
fig = px.bar(sorted_data, x = 'Country', y = 'Total Cases', hover_data = ['Population', 'Total Deaths'], color = 'GPG Before Covid')
fig.update_layout(title = 'GDP per capita before covid', title_x = 0.5, xaxis_tickangle = -45, height = 400, width = 800)
fig.show()

* **GPG per capita after Covid-19**

In [39]:
fig = px.bar(sorted_data, x = 'Country', y = 'Total Cases', hover_data = ['Population', 'Total Deaths'], color = 'GPG After Covid')
fig.update_layout(title = 'GDP per capita After covid', title_x = 0.5, xaxis_tickangle = -45, height = 400, width = 800)
fig.show()

* **GPG per capita before vs GPG per capita After Covid-19**

In [42]:
fig = go.Figure()
fig.add_trace(go.Bar(x=sorted_data['Country'], y=sorted_data['GPG Before Covid'], name='GPG Before Covid'))
fig.add_trace(go.Bar(x=sorted_data['Country'], y=sorted_data['GPG After Covid'], name='GPG After Covid'))
fig.update_layout(title='GPD per capita before and after Covid', title_x = 0.5, xaxis_tickangle = -45, height = 400, width = 800)
fig.show()

* **HDI (Human Development Index) during Covid-19**

Now let's have a look at the the HDI during Covid-19.
The `HDI`: is a summary composite measure of a country's average 
achievements in three basic aspects of human development: health, 
knowledge and standard of living.

In [43]:
fig = px.bar(sorted_data, x = 'Country', y = 'Total Cases', hover_data = ['Population', 'Total Deaths'], color = 'HDI')
fig.update_layout(title = 'Human Development Index during Covid-19', title_x = 0.5, xaxis_tickangle = -45, height = 400, width = 800)
fig.show()

#### **Conclusion**

We investigated the proliferation of covid-19 across countries as well as its impact on the global economy. We provided a detailed analysis and implementation of our approach. Here is the result of our findings: We discovered that the covid-19 outbreak resulted in the greatest number of covid-19 cases and deaths in the United States. One key explanation for this is the United States' stringency index which is quite low. We also looked at how each country's GDP per capita was affected by the covid-19 outbreak.