# Impact of Covid-19 - Geographic Visualization

In this notebook, I visualized the data in raw_data.csv by bringing up interactive plots with plotly and geographical plots with geoplot.

This data set contains 170 countries' data with respect to the impact of covid-19 on the global economy.

In particular, the following three indicators are used to show the policy and economic situation of each countries.

1. [Stringency Index](https://data.humdata.org/dataset/oxford-covid-19-government-response-tracker): The score systematically collected information on several different common policy responses governments have taken, measured and aggregated.

1. gdp_per_capita: Gross Domestic Product (GDP) per capita.

1. [human_development_index](https://en.wikipedia.org/wiki/Human_Development_Index): Composite statistic of life expectancy, education, literacy and income indices used to rank countries into four stages of human development.

For those of you who are interested in this dataset, I will provide an overview of the data and examples of analysis using lineplot and geographical plot.

## Load libraries and data

In [None]:
from datetime import datetime as dt
import os
import warnings
warnings.filterwarnings("ignore")

import geopandas as gpd
import geoplot as gplt
import mapclassify
import matplotlib.pyplot as plt
from matplotlib import colors, cm
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px
import re
import seaborn as sns

In [None]:
!ls ../input/impact-of-covid19-pandemic-on-the-global-economy

In [None]:
raw_data = pd.read_csv("../input/impact-of-covid19-pandemic-on-the-global-economy/raw_data.csv")
#There are some meaningless columns.
cols = ["iso_code", "location", "date", "total_cases", "total_deaths", "stringency_index", "population", "gdp_per_capita", "human_development_index"]
raw_data = raw_data[cols]

# Data overview

Let's see data overview.

In [None]:
raw_data.head()

In [None]:
raw_data.info()

In [None]:
raw_data.describe()

In [None]:
raw_data.isna().sum()*100/len(raw_data)

# Preprocess data

Let's preprocess the data for later analysis.

In [None]:
raw_data["date"] = raw_data["date"].map(lambda x: dt.strptime(x, '%Y-%m-%d'))
raw_data["year"] = raw_data["date"].map(lambda x: x.year)
raw_data["month"] = raw_data["date"].map(lambda x: x.month)

In [None]:
#raw_data_visualize = raw_data[""]
data_mean = raw_data.groupby(["iso_code", "year", "month"]).mean()
data_mean = data_mean.reset_index()

In [None]:
def convert_YM2datetime(cols):
    year, month = cols
    str_tmp = str(year) + "-" + str(month)
    date = dt.strptime(str_tmp, '%Y-%m')
    return date

data_mean["date"] = data_mean[["year", "month"]].apply(convert_YM2datetime, axis=1)

In [None]:
NORM_TOTAL_DEATH = colors.Normalize(np.nanmin(data_mean["total_deaths"]), np.nanmax((data_mean["total_deaths"])))
NORM_STRINGENCY = colors.Normalize(np.nanmin(data_mean["stringency_index"]), np.nanmax((data_mean["stringency_index"])))
NORM_GDP_PER_CAPITA = colors.Normalize(np.nanmin(data_mean["gdp_per_capita"]), np.nanmax((data_mean["gdp_per_capita"])))
NORM_HUMAN_DEVELOPMENT = colors.Normalize(np.nanmin(data_mean["human_development_index"]), np.nanmax((data_mean["human_development_index"])))

In [None]:
data_mean.head()

# Check which countries have data.

Let's find out which countries have data. Since there are 210 different values in the location column, some of them seem to be the same country registered under different names.

In [None]:
print(f"There are {len(set(raw_data['location']))} countries' data.\n")
print(set(raw_data["location"]))

Purple countries are countries with data. On the other hand, the countries in yellow are those for which no data is available. This can be achieved using geoplot and iso code.

In [None]:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

world_is_in_data = world[world["iso_a3"].isin(set(raw_data["iso_code"]))]
world_is_in_data["is_in_data"] = "data exists"
world_not_is_in_data = world[~world["iso_a3"].isin(set(raw_data["iso_code"]))]
world_not_is_in_data["is_in_data"] = "data not exists"
world = pd.concat([world_is_in_data, world_not_is_in_data])

gplt.choropleth(world, hue='is_in_data', legend=True)

# Correlation of each data

There seems to be no correlation between "total_cases", "total_deaths", and "stringency_index".

In [None]:
plt.figure(figsize=(10,8))
corr_cols = ["total_cases", "total_deaths", "stringency_index", "population", "gdp_per_capita", "human_development_index"]
sns.clustermap(raw_data[corr_cols].corr(),annot = True)
plt.show()

# Data transition for each month

Let's take a look at how the monthly averages of the following data have shifted.

* total_deaths

* stringency_index

* gdp_per_capita

* human_development_index

We can see that total_deaths is continuously increasing for the United States, Brazil, Mexico, and India. In these countries, there will likely be a greater impact on the economy.

In [None]:
fig = px.line(data_mean, x="date", y="total_deaths", color="iso_code", 
              title='total_deaths',
              template="simple_white")
fig.show()

We can see that stringency_index is rising in each country toward around April, but it has been constant or falling since then.

In [None]:
fig = px.line(data_mean, x="date", y="stringency_index", color="iso_code", 
              title='stringency_index',
              template="simple_white")
fig.show()

gdp_per_capita is constant over a period of time.

In [None]:
fig = px.line(data_mean, x="date", y="gdp_per_capita", color="iso_code", 
              title='gdp_per_capita',
              template="simple_white")
fig.show()

human_development_index is also constant over a period of time.

In [None]:
fig = px.line(data_mean, x="date", y="human_development_index", color="iso_code", 
              title='human_development_index',
              template="simple_white")
fig.show()

# Data visualization on a world map

Let's display the data for each month on a world map and see how the data transitioned.

In [None]:
def plot_world_data(df, year, month):
    world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
    df_query = df.query(f'year == {year} and month == {month}')
    world = pd.merge(world, df_query, left_on='iso_a3', right_on='iso_code')
    
    fig, axs = plt.subplots(2, 2, figsize=(20,10))
    
    fig.suptitle(f'Data at {year}/{month}', fontsize=20)
    g00 = gplt.choropleth(world, hue='total_deaths', legend=True, norm=NORM_TOTAL_DEATH, ax=axs[0][0])
    g00.set_title(f"total_deaths")
    g10 = gplt.choropleth(world, hue='stringency_index', legend=True, norm=NORM_STRINGENCY, ax=axs[1][0])
    g10.set_title(f"stringency_index")
    g01 = gplt.choropleth(world, hue='gdp_per_capita', legend=True, norm=NORM_GDP_PER_CAPITA, ax=axs[0][1])
    g01.set_title(f"gdp_per_capita")
    g11 = gplt.choropleth(world, hue='human_development_index', legend=True, norm=NORM_HUMAN_DEVELOPMENT, ax=axs[1][1])
    g11.set_title(f"human_development_index")

In [None]:
time_points = [(2019, 12), (2020, 2), (2020, 3), (2020, 4), (2020, 5), (2020, 6), (2020, 7), (2020, 8), (2020, 9), (2020,10)]
for time_point in time_points:
    year, month = time_point
    plot_world_data(data_mean, year, month)