<a href="https://colab.research.google.com/github/niltontac/EspAnalise-EngDados/blob/master/Covid_19_Analysis_and_Predictions%20-%20In%20Progress.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Sobre este conjunto de dados

#####Estes conjuntos de dados fornecidos da Johns Hopkins University possui informações com atualizações diárias sobre os números de casos confirmados, de mortes e de recuperação do Covid-19. Observe que esses são dados de séries temporais e, portando, os números de casos em um determinado dia são números acumulados.


#About this Dataset

#####These data sets provides from Johns Hopkins University have information with daily updates on the numbers of confirmed cases, deaths and recovery from Covid-19. Note that these are data from time series and the numbers of cases on a given day are cumulative numbers.

---

#####Fonte | Source (Datasets): 
##### https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
##### https://github.com/niltontac/EspAnalise-EngDados/tree/master/data/Novel_Corona_Virus_2019_Dataset

---

#####Analyst: Nilton Thiago de Andrade Coura


# Covid-19 - Exploratory Analysis and Predictions

![alt text](https://cdn.cnn.com/cnnnext/dam/assets/200130165125-corona-virus-cdc-image-super-tease.jpg)

In [0]:
# Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from datetime import date

# Loading dataset
# Last dataset update 03/27/2020

covid19confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')

covid19deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')

covid19recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

covid19 = pd.read_csv('https://raw.githubusercontent.com/niltontac/EspAnalise-EngDados/master/data/Novel_Corona_Virus_2019_Dataset/covid_19_data.csv', parse_dates=['ObservationDate', 'Last Update'])

In [0]:
last_date_update = '3/28/20'

Checking the last 5 cases to confirm when all the data sets were updated:

In [3]:
print('covid19confirmed:')
print(covid19confirmed.tail())
####
print('covid19deaths:')
print(covid19deaths.tail())
####
print('covid19recovered:')
print(covid19recovered.tail())
####
print('covid19:')
print(covid19.tail())

covid19confirmed:
               Province/State  Country/Region  ...  3/27/20  3/28/20
248                       NaN           Burma  ...        8        8
249                  Anguilla  United Kingdom  ...        0        2
250    British Virgin Islands  United Kingdom  ...        0        2
251  Turks and Caicos Islands  United Kingdom  ...        0        4
252                       NaN      MS Zaandam  ...        0        2

[5 rows x 71 columns]
covid19deaths:
               Province/State  Country/Region  ...  3/27/20  3/28/20
248                       NaN           Burma  ...        0        0
249                  Anguilla  United Kingdom  ...        0        0
250    British Virgin Islands  United Kingdom  ...        0        0
251  Turks and Caicos Islands  United Kingdom  ...        0        0
252                       NaN      MS Zaandam  ...        0        0

[5 rows x 71 columns]
covid19recovered:
               Province/State  Country/Region  ...  3/27/20  3/28/20
234   

Dimension of data sets (rows vs columns):

In [4]:
print('covid19confirmed:')
print(covid19confirmed.shape)
####
print('covid19deaths:')
print(covid19deaths.shape)
####
print('covid19recovered:')
print(covid19recovered.shape)
####
print('covid19:')
print(covid19.shape)

covid19confirmed:
(253, 71)
covid19deaths:
(253, 71)
covid19recovered:
(239, 71)
covid19:
(9735, 8)


Checking for null or missing values:

In [5]:
print('covid19confirmed:')
print(pd.DataFrame(covid19confirmed.isnull().sum()))
####
print('covid19deaths:')
print(pd.DataFrame(covid19deaths.isnull().sum()))
####
print('covid19recovered:')
print(pd.DataFrame(covid19recovered.isnull().sum()))
####
print('covid19:')
print(pd.DataFrame(covid19.isnull().sum()))

covid19confirmed:
                  0
Province/State  174
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
3/24/20           0
3/25/20           0
3/26/20           0
3/27/20           0
3/28/20           0

[71 rows x 1 columns]
covid19deaths:
                  0
Province/State  174
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
3/24/20           0
3/25/20           0
3/26/20           0
3/27/20           0
3/28/20           0

[71 rows x 1 columns]
covid19recovered:
                  0
Province/State  175
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
3/24/20           0
3/25/20           0
3/26/20           0
3/27/20           0
3/28/20           0

[71 rows x 1 columns]
covid19:
                    0
SNo                 0
ObservationDate     0
Province/State   4433
Country/Region      0
Last Update         0
Confirmed          

The data sets have missings values or null in "Province/State" column.
Let's replace them with 'unknow':

In [0]:
# Replacing data missings

covid19confirmed = covid19confirmed.fillna('unknow')
covid19deaths = covid19deaths.fillna('unknow')
covid19recovered = covid19recovered.fillna('unknow')
covid19 = covid19.fillna('unknow')

In [7]:
# Checking for null or missing values again

print('covid19confirmed:')
print(pd.DataFrame(covid19confirmed.isnull().sum()))
####
print('covid19deaths:')
print(pd.DataFrame(covid19deaths.isnull().sum()))
####
print('covid19recovered:')
print(pd.DataFrame(covid19recovered.isnull().sum()))
####
print('covid19:')
print(pd.DataFrame(covid19.isnull().sum()))

covid19confirmed:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
3/24/20         0
3/25/20         0
3/26/20         0
3/27/20         0
3/28/20         0

[71 rows x 1 columns]
covid19deaths:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
3/24/20         0
3/25/20         0
3/26/20         0
3/27/20         0
3/28/20         0

[71 rows x 1 columns]
covid19recovered:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
3/24/20         0
3/25/20         0
3/26/20         0
3/27/20         0
3/28/20         0

[71 rows x 1 columns]
covid19:
                 0
SNo              0
ObservationDate  0
Province/State   0
Country/Region   0
Last Update      0
Confirmed        0
Deaths           0
Recovered        0


#Plotly Visualizations:

All records including confirmed cases, deaths and recovered:

In [9]:
# all confirmed, deaths and recovered cases

case_growth = covid19.groupby('ObservationDate')['Confirmed', 'Deaths', 'Recovered'].sum()
case_growth = case_growth.reset_index()
case_growth = case_growth.sort_values('ObservationDate', ascending=False)

fig = go.Figure()
fig.update_layout(template='plotly_dark')

fig.add_trace(go.Scatter(x=case_growth['ObservationDate'], 
                        y=case_growth['Confirmed'], 
                        mode='lines+markers',
                        name='Confirmed',
                        line=dict(color='Yellow', width=2)))

fig.add_trace(go.Scatter(x=case_growth['ObservationDate'], 
                        y=case_growth['Deaths'], 
                        mode='lines+markers',
                        name='Deaths',
                        line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=case_growth['ObservationDate'], 
                        y=case_growth['Recovered'], 
                        mode='lines+markers',
                        name='Recovered',
                        line=dict(color='green', width=2)))

fig.show()

Confirmed cases, Deaths and Recovered in all affected countries around the world:

In [8]:
cases_temp = covid19confirmed 
cases_temp = cases_temp[['Country/Region', last_date_update]]
cases_temp = cases_temp.groupby('Country/Region').sum().sort_values(by = last_date_update,ascending = False)
cases_temp['Recovered'] = covid19recovered[['Country/Region', last_date_update]].groupby('Country/Region').sum().sort_values(by = last_date_update, ascending = False)
cases_temp['Deaths'] = covid19deaths[['Country/Region', last_date_update]].groupby('Country/Region').sum().sort_values(by = last_date_update, ascending = False)
cases_temp['Non_recovered'] = cases_temp[last_date_update] - cases_temp['Recovered'] - cases_temp['Deaths']
cases_temp = cases_temp.rename(columns = {last_date_update: 'Confirmed', 'Recovered' : 'Recovered', 'Deaths' : 'Deaths', 'Non_recovered' : 'Non_recovered'})

cases_temp.style.background_gradient(cmap='Reds')

Unnamed: 0_level_0,Confirmed,Recovered,Deaths,Non_recovered
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
US,121478,1072,2026,118380
Italy,92472,12384,10023,70065
China,81999,75100,3299,3600
Spain,73235,12285,5982,54968
Germany,57695,8481,433,48781
France,38105,5724,2317,30064
Iran,35408,11679,2517,21212
United Kingdom,17312,151,1021,16140
Switzerland,14076,1530,264,12282
Netherlands,9819,6,640,9173


report in progress for the next few days...