This notebook is still being worked on and will continually be updated. Feel free to suggest edits. Thanks
# WELCOME

With the increasing spread of COVID19 (an ongoing pandemic of the coronavirus 2019), there has been a cause for alarm all over the world with every country at alert. This has led to the total shut down of some countries while a lot have remain fervent in their response to the disease. There have been rumors about its origin, some believe it originates from animals, some believe it was bioengineered, but no matter what the belief is, it is here already and we need to defeat it.... together. In this report, I make an analysis of the current situation of things as reported by the data provided here on kaggle. I hope you find it insightful and interesting. If you do, don't forget to upvote. Thanks!

Special thanks to kaggler [Abhinand05](https://www.kaggle.com/abhinand05/covid-19-digging-a-bit-deeper) for such an insightful kernel.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

pd.set_option("display.width", 300)

### General Information

The disease broke out in the city of Wuhan, China, in December 2019 and has since spread out to the rest of the world mainly through travellers.
Coronavirus disease spreads primarily through contact with an infected person when they cough or sneeze. It also spreads when a person touches a surface or object that has the virus on it, then touches their eyes, nose, or mouth. Its common symptoms include:
1. Fever
2. Cough
3. Shortness of breath

And the symptoms can start to show up 2 - 14 days after exposure

Let me also state that stigmatization of races, ethnicity or tribe as a result of this issue is very poor of humans. We can all fight stigma by sharing the fact that the viruses do not target specific people based on race or ethnicity and the fact that it can affect anyone means that we are all one at the end of the day.

#### The data

We begin our analysis by introducing the dataset

In [None]:
DIR = '/kaggle/input/corona-virus-report/'

data = pd.read_csv(DIR + 'covid_19_clean_complete.csv', parse_dates=['Date'])
data.sample(7)

We then perform some little data preprocessing magic on it and check out some basic information on the dataset

In [None]:
def engineer_features(data):
    data['year'] = data.Date.dt.year
    data['month'] = data.Date.dt.month
    data['day'] = data.Date.dt.day
    data['dayofweek'] = data.Date.dt.dayofweek
    return data
    
data = engineer_features(data)

# rename for convenience
data.rename({'Confirmed': 'cases',
             'Country/Region': 'country',
             'Province/State': 'state',
             'Deaths': 'fatalities',
             'Date': 'date'}, axis='columns', inplace=True)

def basic_eda(df):
    print("First five rows of data")
    print("=" * 100)
    print(df.head())
    print("\n")
    print("Data Information")
    print("=" * 100)
    print(df.info())
    print("\n")
    print("Data Statistics")
    print("=" * 100)
    print(df.describe())
    
basic_eda(data)

We then start our analysis by viewing the total number of confirmed and fatal cases

In [None]:
# Total number worldwide

data['state'] = data['state'].fillna('')
temp = data[[col for col in data.columns if col != 'state']]

latest = temp[temp['date'] == max(temp['date'])].reset_index()
latest_grouped = latest.groupby('country')['cases', 'fatalities'].sum().reset_index()

sns.set_style("darkgrid")
plt.figure(figsize=(25, 10))
sns.barplot(x="cases", y="country", data=latest_grouped.sort_values('cases', ascending=False)[:20])
plt.title("WORLDWIDE CONFIRMED CASES")
plt.show()

sns.set_style("darkgrid")
plt.figure(figsize=(25, 10))
sns.barplot(x="fatalities", y="country", data=latest_grouped.sort_values('fatalities', ascending=False)[:20])
plt.title("WORLDWIDE FATALITIES")
plt.show()

We can see that there are more confirmed cases in China than in any other country, however Italy has more fatalities. Let's view the growth with respect to time

In [None]:
# world wide confirmed cases overtime
group = data.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

sns.set_style("ticks")
sns.set(rc={'axes.facecolor':'#80d8e0', 'figure.facecolor':'#80d8e0'})

plt.figure(figsize=(25, 10))
fig = sns.lineplot(x="date", y="cases", data=data)
plt.title("darkgrid")
plt.show()

sns.set_style("darkgrid")
sns.set(rc={'axes.facecolor':'#9697c8', 'figure.facecolor':'#9697c8'})
plt.figure(figsize=(25, 10))
fig = sns.lineplot(x="date", y="fatalities", data=data)
plt.title("darkgrid")
plt.show()

We can see from the charts above, that there has been an upward trend of both confirmed cases and fatality globally. And the cap of the chart shows a possibility of a continuous increase. This does not look good for the world at all. Let's dive deep into our study. We begin with China...

According to [the verge](https://www.theverge.com/2020/1/23/21078457/coronavirus-outbreak-china-wuhan-quarantine-who-sars-cdc-symptoms-risk), the outbreak began in December of 2019 and has since continued to spread. However, we only have data for 2020. Let us analyze the trend

In [None]:
grouped_china = data[data['country'] == "China"].reset_index()
grouped_china_date = grouped_china.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

sns.set_style("ticks")
sns.set(rc={'axes.facecolor':'#80d8e0', 'figure.facecolor':'#80d8e0'})

plt.figure(figsize=(25, 10))
fig = sns.lineplot(x="date", y="cases", data=grouped_china_date)
plt.title("darkgrid")
plt.show()

We can see a continuous increase in the spread of the disease from 22nd January to March 1 with peak increase on 12th February. However, the spread gets contained from the 1st of March up till this moment (22nd March). The chinese have currently reported no new cases of the virus. Big kudos to them. Let us also look at the trend map for 3 more countries: Italy, US and Nigeria. Then we'll group the rest of the world.

In [None]:
grouped_country = data[data['country'] == "Italy"].reset_index()
grouped_country_date = grouped_country.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

sns.set_style("ticks")
sns.set(rc={'axes.facecolor':'#cccccc', 'figure.facecolor':'#cccccc'})

plt.figure(figsize=(25, 10))
fig = sns.lineplot(x="date", y="cases", data=grouped_country_date)
plt.title("darkgrid")
plt.show()

grouped_country = data[data['country'] == "US"].reset_index()
grouped_country_date = grouped_country.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

sns.set_style("ticks")
sns.set(rc={'axes.facecolor':'#cccccc', 'figure.facecolor':'#cccccc'})

plt.figure(figsize=(25, 10))
fig = sns.lineplot(x="date", y="cases", data=grouped_country_date)
plt.title("darkgrid")
plt.show()

grouped_country = data[data['country'] == "Nigeria"].reset_index()
grouped_country_date = grouped_country.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

sns.set_style("ticks")
sns.set(rc={'axes.facecolor':'#cccccc', 'figure.facecolor':'#cccccc'})

plt.figure(figsize=(25, 10))
fig = sns.lineplot(x="date", y="cases", data=grouped_country_date)
plt.title("darkgrid")
plt.show()

They all seem to have the upward trend. This does not look good. Now for the other countries

In [None]:
grouped_rest = data[~data['country'].isin(['China', 'Italy', 'US', 'Nigeria'])].reset_index()
grouped_rest_date = grouped_rest.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

sns.set_style("ticks")
sns.set(rc={'axes.facecolor':'cornflowerblue', 'figure.facecolor':'cornflowerblue'})

plt.figure(figsize=(25, 10))
fig = sns.lineplot(x="date", y="cases", data=grouped_rest_date)
plt.title("darkgrid")
plt.show()

A similar upward pattern is observed. How can this spread be curbed?

### Cleanliness is Next to Godliness

It is common knowledge that being clean can help one avoid getting the virus. This is what has led to the wide search for sanitizers for constant sanitization of the hand. Now you should not just sanitize yourself but also your environment. The battle against COVID19 is a group battle. If you are clean but your neighbor is not, you're still at risk, therefore your environment is very important. Next, we'll analyze top ten cleanest countries in the world according to [improb.com](https://improb.com/top-cleanest-countries-in-the-world/)

In [None]:
cleanest_countries = ["Finland", "Iceland", "Sweden", "Denmark", "Slovenia"]

for country in cleanest_countries:
    grouped_country = data[data['country'] == country].reset_index()
    grouped_country_date = grouped_country.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

    sns.set_style("ticks")
    sns.set(rc={'axes.facecolor':'#8ac7d7', 'figure.facecolor':'#8ac7d7'})

    plt.figure(figsize=(25, 10))
    fig = sns.lineplot(x="date", y="cases", data=grouped_country_date)
    plt.title("Cases for " + country)
    plt.show()

We can see from the above that even in the cleanest ranked countries, an upward trend is noticed. Things are really looking bad especially for sweden. Let us count the number of cases and fatalities

In [None]:
print(f"Cases as at {max(data['date'])}:")
for country in cleanest_countries:
    max_vals = data.loc[data['country'] == country]
    max_case = max(max_vals['cases'])
    max_fatalities = max(max_vals['fatalities'])
    print(f"{ country } ---> { max_case } cases, { max_fatalities } fatalities {(max_fatalities * 100) / max_case:.4f} %")

Finland shows the lowest fatality rate

Well, another way to curb the spread is for everyone to get quarantined. What does this mean? Everyone needs to stay indoors. How many countries could be obedient to the instruction? According to [WorldAtlas.com](https://www.worldatlas.com/articles/10-countries-most-likely-to-follow-the-law.html), the top 10 countries most likely to follow the law are respectively: Denmark, Norway, Finland, Sweden, Netherlands, Germany, New Zealand, Austria, Canada, Australia. We have already seen results for Denmark, Finland, Sweden, so we would be analysing for Norway, Netherlands, Germany, New Zealand and Austria.

In [None]:
lawful_countries = ["Norway", "Netherlands", "Germany", "New Zealand", "Austria"]

for country in lawful_countries:
    grouped_country = data[data['country'] == country].reset_index()
    grouped_country_date = grouped_country.groupby('date')['date', 'cases', 'fatalities'].sum().reset_index()

    sns.set_style("ticks")
    sns.set(rc={'axes.facecolor':'#8ac7d7', 'figure.facecolor':'#8ac7d7'})

    plt.figure(figsize=(25, 10))
    fig = sns.lineplot(x="date", y="cases", data=grouped_country_date)
    plt.title("Cases for " + country)
    plt.show()

We can see that the situation in these countries with germany approaching 25,000 and new zealand just below 70. Let's count the cases and fatalities

In [None]:
print(f"Cases as at {max(data['date'])}:")
for country in lawful_countries:
    max_vals = data.loc[data['country'] == country]
    max_case = max(max_vals['cases'])
    max_fatalities = max(max_vals['fatalities'])
    print(f"{ country } ---> { max_case } cases, { max_fatalities } fatalities {(max_fatalities * 100) / max_case:.4f} %")

Situation in New Zealand seems impressive with 0 fatalities

Let us now go over to the recoveries.

For the top 10 most affected countries

In [None]:
countries_ = ["China", "Italy", "US", "Spain", "Germany", "Iran", "France", "United Kingdom", "Netherlands", "Belgium"]

for country in countries_:
    grouped_country = data[data['country'] == country].reset_index()
    grouped_country_date = grouped_country.groupby('date')['Recovered'].sum().reset_index()

    sns.set_style("ticks")
    sns.set(rc={'axes.facecolor':'#ffe8d1', 'figure.facecolor':'#ffe8d1'})

    plt.figure(figsize=(25, 10))
    fig = sns.lineplot(x="date", y="Recovered", data=grouped_country_date)
    plt.title("Recovery for " + country)
    plt.show()

Let us count them

In [None]:
print(f"Cases as at {max(data['date'])}:")
for country in countries_:
    max_vals = data.loc[data['country'] == country]
    max_case = max(max_vals['cases'])
    max_recoveries = max(max_vals['Recovered'])
    print(f"{ country } ---> { max_case } cases, { max_recoveries } recoveries. {(max_recoveries * 100) / max_case:.4f} %")

Recovery rate seems low, however, remember that death rate is also of the same range. We currently in the struggle for humanity. What do we do about our situation?

**A new challenge, pushing humanity to the wall**

**The trembling of nations, the rise and fall**

**Science or Religion, on what do we depend?**

**COVID19: The story, the trend, the data, the end?**

please upvote if the kernel was inspiring, helpful or insightful. Thanks