This notebook is reference [Abhinand's great work](https://www.kaggle.com/abhinand05/covid-19-digging-a-bit-deeper/data)
Based on his tremendous effort, I add daily death toll and confirmed cases using SQL.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

# Load Data
1. Load using Pandas
1. Pass data to SQL Alchemy engine.

In [None]:
import plotly.express as px
import plotly.io as pio

from collections import defaultdict

import json


pio.templates.default = "plotly_dark"

In [None]:
filename_train = '/kaggle/input/covid19-global-forecasting-week-3/train.csv'
filename_test = '/kaggle/input/covid19-global-forecasting-week-3/test.csv'

### Load using Pandas

In [None]:
train = pd.read_csv(filename_train)
test = pd.read_csv(filename_test)

### Inspect content

In [None]:
display(train.head())
display(test.head())

### Using SQL Alchemy Engine

In [None]:
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)

train.to_sql('train', con=engine)
test.to_sql('test', con=engine)

print(engine.table_names())

# 1. Confirmed Cases Over Time

In [None]:
table = "train"
query = "SELECT Date, SUM(ConfirmedCases) as confirmed FROM {} GROUP BY Date".format(table)
df = pd.read_sql(query, engine)

df.head()

In [None]:
fig = px.line(df, x='Date', y='confirmed',
              title="Worldwide Confirmed Cases Over Time")
fig.show()

fig = px.line(df, x='Date', y='confirmed',
              title="Worldwide Confirmed Cases (Logarithmic Scale) Over Time",
              log_y=True)

fig.show()

1. Looks like the exponential growth of the pandemic is still in it's peaks and that is not good at all.
1. The slope of the line at the latest time frame is very high making matters even worse.
1. Looking at the same graph in Logarithmic scale reveals the matter is very very serious all over the World maybe because the disease has just started to grow outside of China.
1. At the current rate anything may happen. Maybe even a million cases in just a weeks time. Who knows.

In [None]:
def query_country (country, table):
    query = """
        WITH tmp AS
        (
            SELECT * FROM {} WHERE Country_Region=\'{}\'
        )
        SELECT Date, SUM(ConfirmedCases) AS confirmed
        FROM tmp
        GROUP BY Date
            """.format(table, country)

    print(query)
    df = pd.read_sql(query, engine)
    # df.head()
    return df



results = defaultdict()

table = 'train'
country = 'China'
results[country] = query_country(country, table)
######
table = 'train'
country = 'Italy'
results[country] = query_country(country, table)
######
table = 'train'
country = 'US'
results[country] = query_country(country, table)
######
table = 'train'
country = 'Korea, South'
results[country] = query_country(country, table)

In [None]:
table = 'train'
excluded_countries = ('China', 'Italy', 'US', 'Korea, South')

query = """
    WITH tmp AS
    (
        SELECT * FROM {} WHERE Country_Region NOT IN {}
    )
    SELECT Date, SUM(ConfirmedCases) AS confirmed
    FROM tmp
    GROUP BY Date
""".format(table, excluded_countries)

print(query)

df = pd.read_sql(query, engine)

results['Rest of the World'] = df

In [None]:
colors = ('#F61067', '#91C4F2', '#6F2DBD', '#00FF00', '#FFDF64')

for c, (country, df) in zip(colors, results.items()):
    fig = px.line(results[country], x='Date', y='confirmed',
                 title="Confirmed Cases in {} Over Time".format(country),
                 color_discrete_sequence=[c],
                 height=500)
    fig.show()

1. Looking at the plot of China's cases it is pretty clear that the disease has not been at dire levels since the turn of March. WHICH IS REALLY GOOD NEWS FOR CHINA.
1. Well not so much for Italy by the looks of it. They are getting affected very badly.
1. Italy's steep rise is concerning and the new few days are really crucial.
1. The clear spike in USA's graph might be the result of more cases getting testing for the first time.
1. USA's situation is also very concerning. That increase in the past week or so is really significant.
1. The rest of the World combined is also seeing a steady increase in confirmed cases over time.
1. South Korea has contained the corona-virus well.

In [None]:
table = "train"

query = """
    SELECT t.Date, t.Country_Region AS country, SUM(t.ConfirmedCases) as confirmed
    FROM {0} AS t
    INNER JOIN (
        SELECT max(Date) AS MaxDate, Country_Region FROM {0} GROUP BY Country_Region
    ) tmp
    ON tmp.MaxDate=t.Date and tmp.Country_Region = t.Country_Region
    GROUP BY t.Country_Region
    """.format(table)

print(query)

latest_grouped = pd.read_sql(query, engine)
latest_grouped.head()

In [None]:
fig = px.choropleth(latest_grouped, locations='country',
                   locationmode='country names', color='confirmed',
                   hover_name='country', range_color=[1, 5000],
                   color_continuous_scale='peach',
                   title='Countries with Confirmed Cases')

fig.show()

Feel free to zoom into the interactive maps.

The above graph is just an illustration of how the virus is spread out across the globe.

### I think, looking at Europe, it's worth having a closer look.

In [None]:
table = "train"

europe = list(['Austria','Belgium','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Germany','Greece','Hungary','Ireland',
               'Italy', 'Latvia','Luxembourg','Lithuania','Malta','Norway','Netherlands','Poland','Portugal','Romania','Slovakia','Slovenia',
               'Spain', 'Sweden', 'United Kingdom', 'Iceland', 'Russia', 'Switzerland', 'Serbia', 'Ukraine', 'Belarus',
               'Albania', 'Bosnia and Herzegovina', 'Kosovo', 'Moldova', 'Montenegro', 'North Macedonia'])


europe_grouped_latest  = latest_grouped[latest_grouped['country'].isin(europe)]
europe_grouped_latest.head()

In [None]:
fig = px.choropleth(europe_grouped_latest, locations='country',
                   locationmode='country names', color='confirmed',
                   hover_name='country', range_color=[1, 2000],
                   color_continuous_scale='portland', 
                    title='European Countries with Confirmed Cases', scope='europe', height=800)

fig.show()

### Looks like the COVID-19 has the strongest of holds in Western Europe right now.

Cases in Most European countries have rapidly increased.

In [None]:
fig = px.bar(latest_grouped.sort_values('confirmed', ascending=False)[:20][::-1], 
             x='confirmed', y='country',
             title='Confirmed Cases Worldwide', text='confirmed', height=1000, orientation='h')
fig.show()

In [None]:
fig = px.bar(europe_grouped_latest.sort_values('confirmed', ascending=False)[:10][::-1], 
             x='confirmed', y='country', color_discrete_sequence=['#84DCC6'],
             title='Confirmed Cases in Europe', text='confirmed', orientation='h')
fig.show()

In [None]:
table = "train"
country = 'US'

query = """
    SELECT t.Date, t.Province_State AS state, t.ConfirmedCases AS confirmed
    FROM {0} AS t
    INNER JOIN
    (
        SELECT MAX(Date) as MaxDate, Province_State, Country_Region 
        FROM {0}
        GROUP BY Province_State, Country_Region
    ) AS tmp
    ON tmp.MaxDate = t.Date AND 
         tmp.Province_State = t.Province_State AND
         tmp.Country_Region = t.Country_Region
    WHERE t.Country_Region = \'{1}\'
    ORDER BY t.ConfirmedCases DESC
""".format(table, country)

print(query)

usa_latest = pd.read_sql(query, engine)
usa_latest.head()

In [None]:
fig = px.bar(usa_latest[:10][::-1], 
            x = 'confirmed', y='state', color_discrete_sequence=['#D63230'],
            title='Confirmed Cases in USA', text='confirmed', orientation='h')

fig.show()

### Confirmed Cases in the US

In [None]:
us_states_json = json.loads("""
{
    "AL": "Alabama",
    "AK": "Alaska",
    "AS": "American Samoa",
    "AZ": "Arizona",
    "AR": "Arkansas",
    "CA": "California",
    "CO": "Colorado",
    "CT": "Connecticut",
    "DE": "Delaware",
    "DC": "District Of Columbia",
    "FM": "Federated States Of Micronesia",
    "FL": "Florida",
    "GA": "Georgia",
    "GU": "Guam",
    "HI": "Hawaii",
    "ID": "Idaho",
    "IL": "Illinois",
    "IN": "Indiana",
    "IA": "Iowa",
    "KS": "Kansas",
    "KY": "Kentucky",
    "LA": "Louisiana",
    "ME": "Maine",
    "MH": "Marshall Islands",
    "MD": "Maryland",
    "MA": "Massachusetts",
    "MI": "Michigan",
    "MN": "Minnesota",
    "MS": "Mississippi",
    "MO": "Missouri",
    "MT": "Montana",
    "NE": "Nebraska",
    "NV": "Nevada",
    "NH": "New Hampshire",
    "NJ": "New Jersey",
    "NM": "New Mexico",
    "NY": "New York",
    "NC": "North Carolina",
    "ND": "North Dakota",
    "MP": "Northern Mariana Islands",
    "OH": "Ohio",
    "OK": "Oklahoma",
    "OR": "Oregon",
    "PW": "Palau",
    "PA": "Pennsylvania",
    "PR": "Puerto Rico",
    "RI": "Rhode Island",
    "SC": "South Carolina",
    "SD": "South Dakota",
    "TN": "Tennessee",
    "TX": "Texas",
    "UT": "Utah",
    "VT": "Vermont",
    "VI": "Virgin Islands",
    "VA": "Virginia",
    "WA": "Washington",
    "WV": "West Virginia",
    "WI": "Wisconsin",
    "WY": "Wyoming"
} 
""")
    
# switch key/value from code/state to state/code.
us_states = {state: abbrev for abbrev, state in us_states_json.items()}
    
    
# add state code column
usa_latest['code'] = usa_latest['state'].map(us_states)

In [None]:
fig = px.choropleth(usa_latest, locations='code',
                   locationmode='USA-states', color='confirmed',
                    hover_name='state', range_color=[1, 10000],
                   scope='usa')

fig.show()

There are 8 states which confirmed cases are over 10,000.

### How did it happen?

#### Worldwide Analysis

In [None]:
table = 'train'

query = """
    SELECT Date, Country_Region AS country, SUM(ConfirmedCases) AS confirmed, SUM(Fatalities) AS deaths
    FROM {}
    GROUP BY Date, Country_Region
""".format(table)

print(query)

formated_gdf = pd.read_sql(query, engine)


formated_gdf['size'] = formated_gdf['confirmed'].pow(0.3)

display(formated_gdf)


In [None]:
fig = px.scatter_geo(formated_gdf, locations='country',
                    locationmode='country names', color='confirmed',
                    size='size', hover_name='country', range_color=[0, 1500],
                    projection='natural earth', animation_frame='Date',
                    title='COVID-19: Spread Over Time', color_continuous_scale="portland")

fig.show()

At the earliest point (from the data available) the disease seems to be only around China and its neighboring countries.

However it quickly spread off to Europe, Autralia and even the US which is very interesting.

Things seem to be in fairly good light even in mid February for European countries.

West Asia especially Iran and Iraq begins to catch fire at the end of February along with Italy showing signs of the dread to come. South Korea and China peaking at the moment.

By March 5 look at Europe. They could've have locked down right at that moment.

The disease has taken away Africa and Americas too by early March with alarm bells ringing loudly for the US with just over 500 cases.

Needless to say how it ended.

According to the data so far, USA, UK, Spain, Italy, Germany, France and the UK are in deep trouble. Next few days are crucial for how the disease develops around the world.

In [None]:
fig = px.scatter_geo(formated_gdf, locations='country',
                    locationmode='country names', color='deaths',
                    size='size', hover_name='country', range_color=[0, 1500],
                    projection='natural earth', animation_frame='Date',
                    title='COVID-19: Death Over Time', color_continuous_scale="portland")

fig.show()

# 2. Daily New Confirmed Cases Over Time
The following shows new cases each day. This data is calculated by subtracting the 'ConfirmedCases' of current date with that of previous date.

### Use window function to get difference

In [None]:
table = 'train'

query = """
    WITH country_table AS (
        SELECT Date, 
                Country_Region As country,
                SUM(ConfirmedCases) AS confirmed, 
                SUM(Fatalities) AS deaths
        FROM {0}
        GROUP BY Date, Country_Region
    )
    
    
    SELECT *,
            confirmed - LAG(confirmed) OVER country_window AS new_confirmed_cases,
            deaths - LAG(deaths) OVER country_window AS new_death_cases
    FROM country_table
    
    
    WINDOW country_window AS (
        PARTITION BY country ORDER BY Date
    )
""".format(table)

print(query)

new_cases_world = pd.read_sql(query, engine)

new_cases_world['new_confirmed_cases'].fillna(0, inplace=True)
new_cases_world['new_death_cases'].fillna(0, inplace=True)

In [None]:
countries = ('US', 'Italy', 'Spain', 'Korea, South', 'United Kingdom')

fig = px.line(new_cases_world[new_cases_world['country'].isin(countries)], 
              x='Date', y='new_confirmed_cases',
             color='country',
             title='Daily Confirmed Cases in {} Over Time'.format(countries))

fig.show()

**South Korea, Italy, and Spain have an stable increment of confirmed cases.**[](http://)

In [None]:
countries = ('US', 'Italy', 'Spain', 'Korea, South', 'United Kingdom')

fig = px.line(new_cases_world[new_cases_world['country'].isin(countries)], 
              x='Date', y='new_death_cases',
             color='country',
             title='Daily Death Cases in {} Over Time'.format(countries))

fig.show()

## Daily Confirmed Cases Over Time

In [None]:
new_cases_world['size_new_confirmed_cases'] = new_cases_world['new_confirmed_cases'].pow(0.3) 

display(new_cases_world[new_cases_world['size_new_confirmed_cases'].isnull()])

new_cases_world['size_new_confirmed_cases'].fillna(0, inplace=True)



new_cases_world['size_new_death_cases'] = new_cases_world['new_death_cases'].pow(0.3) 

display(new_cases_world[new_cases_world['size_new_death_cases'].isnull()])

new_cases_world['size_new_death_cases'].fillna(0, inplace=True)


In [None]:

fig = px.scatter_geo(new_cases_world, locations='country',
                   locationmode='country names', color='new_confirmed_cases',
                    size='size_new_confirmed_cases',
                   hover_name='new_confirmed_cases', range_color= [0, 1500],
                     projection='natural earth', animation_frame='Date',
                    title='Countries with Daily Confirmed Cases', color_continuous_scale="portland")

fig.show()

### Daily Fatalities Over Time

In [None]:

fig = px.scatter_geo(new_cases_world, locations='country',
                   locationmode='country names', color='new_death_cases',
                    size='size_new_death_cases',
                   hover_name='new_death_cases', range_color= [0, 500],
                     projection='natural earth', animation_frame='Date',
                    title='Countries with Daily Confirmed Cases', color_continuous_scale="portland")

fig.show()

1. Amazingly the daily fatality number of China has never exceeded 300.
1. On April 4th, there are 5 countries which death toll exceeds 500 per day. That is US, UK, France, Spain, Italy.
1. It seems a trend that the death toll can easily spike from 100 to 500 within 5 days.
1. The death toll in Brazil is significanly increases. Its current death toll is 86.

In [None]:
table = 'train'
country = 'US'

query = """
    SELECT Date, Province_State AS state, Country_Region AS country, ConfirmedCases AS confirmed, ConfirmedCases AS death
    FROM {0}
    WHERE Country_Region = \'{1}\'
""".format(table, country)

print(query)

us_df = pd.read_sql(query, engine)

In [None]:

fig = px.line(us_df,
             x='Date', y='confirmed',
             color='state')

fig.show()

In [None]:

fig = px.line(us_df,
             x='Date', y='death',
             color='state')

fig.show()

## Daily New Confirmed Case in US

In [None]:
table = 'train'
country = 'US'

query = """
    WITH us_table AS (
        SELECT t.*
        FROM {0} AS t
        WHERE t.Country_Region = \'{1}\'
    )
    
    SELECT u.*,
            u.ConfirmedCases - LAG(u.ConfirmedCases) OVER (PARTITION BY u.Province_State ORDER BY u.Date) AS new_confirmed,
            u.Fatalities - LAG(u.Fatalities) OVER (PARTITION BY u.Province_State ORDER BY u.Date) AS new_death
    FROM us_table AS u
""".format(table, country)

print(query)

daily_us = pd.read_sql(query, engine)

daily_us.head()

In [None]:
fig = px.line(daily_us,
             x='Date', y='new_confirmed',
             color='Province_State',
             title='Daily New Confirmed Cases in US')

fig.show()

In [None]:
fig = px.line(daily_us,
             x='Date', y='new_death',
             color='Province_State',
             title='Daily New Confirmed Cases in US')

fig.show()