# Brazil infected! Coronavirus (Covid-19) Situation and Prediction
### Current world situation, current Brazil situation VS top 10 countries and Predicting next 3 days cases and deaths with Polynomial Regression

### Introduction

Brazil is a continental country, with pleasant places to visit, happy people and, now, unfortunately fighting the coronavirus. In this notebook, we will understand the current world situation, looking at the largest countries in confirmed number of cases, understanding how the cases grow after the first case appears, how Brazil is positioned in this whole scenario and  predict the cases and deaths in Brazil for the next 3 days using the Polynomial regression algorithm.

In [None]:
# installing external lib opencage
!pip install opencage

In [None]:
import pandas as pd
import numpy as np
import datetime

import folium 
from folium import plugins

import matplotlib.pyplot as plt
import seaborn as sns

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.graph_objs import *

from opencage.geocoder import OpenCageGeocode

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, mean_absolute_error
from math import sqrt

import warnings

%config InlineBackend.figure_format = 'retina'

warnings.filterwarnings('ignore')

%matplotlib inline

In [None]:
brazil = pd.read_csv("../input/corona-virus-brazil/brazil_covid19.csv")

conf = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
rec = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"
dea = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"

confirmed = pd.read_csv(conf)
recovered = pd.read_csv(rec)
deaths = pd.read_csv(dea)

brazil_date = '2020-06-04'

by_state = brazil[['cases','deaths']][brazil['date']==brazil_date].groupby(brazil['state']).max().sort_values(by = 'cases', ascending=False)

key = '5bc0901142f2404cbec4048336ecc36f'  # get api key from:  https://opencagedata.com
geocoder = OpenCageGeocode(key)

states = by_state.index
country ="Brazil"

by_state['latitude'] = 0.0000000
by_state['longitude'] = 0.0000000
by_state['Country'] = 'Brazil'

j = 0

for i in states:
    
    query = str(i+","+country)
    results = geocoder.geocode(query)

    by_state['latitude'][j] = results[0]['geometry']['lat']
    by_state['longitude'][j] = results[0]['geometry']['lng']
    
    j = j+1

In [None]:
world_map = folium.Map(location=[-20, -50]
                       , zoom_start=3.5
                       , tiles='Stamen Terrain')

for lat, lon, value, name in zip(by_state['latitude']
                                 , by_state['longitude']
                                 , by_state['cases']
                                 , by_state['Country']):
    
    folium.CircleMarker([lat, lon]
                        , radius=0.00025*value
                        , popup = ('<strong>Country</strong>: ' 
                                   + str(name).capitalize() + '<br>'
                                '<strong>Confirmed Cases</strong>: ' 
                                   + str(value) + '<br>')
                        , color='red'
                        , fill_color='red'
                        , fill_opacity=0.7 ).add_to(world_map)
    
world_map


***NOTE: I will update the kernel every 3 days- Last update 06-04-20***

##### Please let me know what you think about this kernel and if it is useful and you can leave an upvote I would be very grateful! :)

In [None]:
brazil['date'] = pd.to_datetime(brazil['date'])

confirmed = confirmed.fillna('unknow')
recovered = recovered.fillna('unknow')
deaths = deaths.fillna('unknow')

### Current situation around the world in top 20 countries in number of confirmed cases

China controled the virus with only a fell active cases, US are leading with more than 1.8 million active cases, Brazil is now the second country with more confirmed cases in the world with more than 614000 cases and 34000 registered deaths.

In [None]:
last_update = '6/4/20'
current_cases = confirmed
current_cases = current_cases[['Country/Region',last_update]]

current_cases = current_cases.groupby('Country/Region').sum().sort_values(by=last_update, ascending=False)

current_cases['recovered'] = recovered[['Country/Region',last_update]].groupby('Country/Region').sum().sort_values(by=last_update,ascending=False)

current_cases['deaths'] = deaths[['Country/Region',last_update]].groupby('Country/Region').sum().sort_values(by=last_update,ascending=False)

current_cases['active'] = current_cases[last_update]-current_cases['recovered']-current_cases['deaths']

current_cases = current_cases.rename(columns={last_update:'confirmed'
                                              ,'recovered':'recovered'
                                              ,'deaths':'deaths'
                                              ,'active':'active'})

current_cases.head(20).style.background_gradient(cmap='Blues')

#### Cases growth over the time in top 10 countries in confirmed cases numbers

Confirmed cases explode in US and now United States are the first in the world in number of active cases, followed by Spain, Italy and France.

In [None]:
top_10_confirmed = confirmed[(confirmed['Country/Region']=='Brazil') |
                             (confirmed['Country/Region']=='US') |
                             (confirmed['Country/Region']=='China') |
                             (confirmed['Country/Region']=='Italy') |
                             (confirmed['Country/Region']=='Spain') |
                             (confirmed['Country/Region']=='Germany') |
                             (confirmed['Country/Region']=='France') |
                             (confirmed['Country/Region']=='Iran') |
                             (confirmed['Country/Region']=='United Kingdom') |
                             (confirmed['Country/Region']=='Russia') |
                             (confirmed['Country/Region']=='Turkey')]

top_10_confirmed = top_10_confirmed.groupby(top_10_confirmed['Country/Region']).sum()

top_10_confirmed = top_10_confirmed.drop(['Lat','Long'], axis = 1)
top_10_confirmed = top_10_confirmed.transpose()

In [None]:
top_10_countries = top_10_confirmed.drop('Brazil', axis = 1)

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
    , title="Cases over time in top 10 countries in confirmed cases numbers"
)

index = top_10_countries.index
data = top_10_countries

fig = go.Figure(data=[
    
    go.Line(name='US', x = index, y=data['US'])
    , go.Line(name='China', x = index, y=data['China'])
    , go.Line(name='Italy', x = index, y=data['Italy'])
    , go.Line(name='Spain', x = index, y=data['Spain'])
    , go.Line(name='Germany', x=index, y=data['Germany'])
    , go.Line(name='France', x=index , y=data['France'])
    , go.Line(name='Iran', x = index, y=data['Iran'])
    , go.Line(name='United Kingdom', x = index, y=data['United Kingdom'])
    , go.Line(name='Russia', x = index, y=data['Russia'])
    , go.Line(name='Turkey', x = index, y=data['Turkey'])
    
])

fig['layout'].update(layout)

fig.show()

### Current situation in Brazil

#### Total cases over the time

In [None]:
brazil_over_time = brazil[['cases','deaths']].groupby(brazil['date']).sum().sort_values(by = 'cases', ascending=True)

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
)

fig = make_subplots(rows=2, cols=1
                    , subplot_titles=('Confirmed cases', 'Deaths'))

fig.append_trace(go.Line(name='Confirmed'
                        , x = brazil_over_time.index
                        , y = brazil_over_time['cases']
                        , mode="lines+markers")
                        , row=1, col=1)

fig.append_trace(go.Line(name='Deaths'
                        , x = brazil_over_time.index
                        , y = brazil_over_time['deaths']
                        , mode="lines+markers")
                        , row=2, col=1)

fig['layout'].update(layout)

fig.show()

#### Mortality rate over the time

In [None]:
mortality_over_time = round((brazil_over_time['deaths']/brazil_over_time['cases'])*100,2)

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
    , title="Mortality rate over the time"
)

index = mortality_over_time.index
data = mortality_over_time

fig = go.Figure(data=[
    
    go.Line(name='Mortality in %'
            , x = index
            , y=data
            , mode="lines+markers")
    
])

fig['layout'].update(layout)

fig.show()

#### New cases and deaths per day over the time

In [None]:
cases_today = brazil[['cases']].groupby(brazil['date']).sum().sort_values(by = 'cases', ascending=True).shift(-1)
cases_yesterday = brazil[['cases']].groupby(brazil['date']).sum().sort_values(by = 'cases', ascending=True)

deaths_today = brazil[['deaths']].groupby(brazil['date']).sum().sort_values(by = 'deaths', ascending=True).shift(-1)
deaths_yesterday = brazil[['deaths']].groupby(brazil['date']).sum().sort_values(by = 'deaths', ascending=True)

cases_growth_rate = cases_today-cases_yesterday
cases_growth_rate = cases_growth_rate.dropna()

deaths_growth_rate = deaths_today-deaths_yesterday
deaths_growth_rate = deaths_growth_rate.dropna()

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
)

fig = make_subplots(rows=2, cols=1
                    , subplot_titles=('New cases per day over the time'
                                      , 'New deaths per day over the time'))

fig.append_trace(go.Line(name='New cases per day'
                        , x = cases_growth_rate.index
                        , y = cases_growth_rate['cases']
                        , mode="lines+markers")
                        , row=1, col=1)

fig.append_trace(go.Line(name='New deaths per day'
                        , x = deaths_growth_rate.index
                        , y = deaths_growth_rate['deaths']
                        , mode="lines+markers")
                        , row=2, col=1)

fig['layout'].update(layout)

fig.show()

#### Top 10 countries VS Brazil since first case appear

Brazil still are behind all other countries except for Turkey in number of cases but are growing fast but not so fast as Italy, Spain or US.

In [None]:
top10_since_first_case = top_10_confirmed.reset_index()
top10_since_first_case = top10_since_first_case.drop('index',axis=1)
top10_since_first_case['Brazil'] = top10_since_first_case['Brazil'].shift(-35)
top10_since_first_case['France'] = top10_since_first_case['France'].shift(-2)
top10_since_first_case['Germany'] = top10_since_first_case['Germany'].shift(-5)
top10_since_first_case['Iran'] = top10_since_first_case['Iran'].shift(-28)
top10_since_first_case['Italy'] = top10_since_first_case['Italy'].shift(-9)
top10_since_first_case['Spain'] = top10_since_first_case['Spain'].shift(-10)
top10_since_first_case['Russia'] = top10_since_first_case['Russia'].shift(-9)
top10_since_first_case['Turkey'] = top10_since_first_case['Turkey'].shift(-49)
top10_since_first_case['United Kingdom'] = top10_since_first_case['United Kingdom'].shift(-9)

In [None]:
# creating the plot
top10_since_first_case_log = np.log(top10_since_first_case)

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
    , title="Log Top 10 countries VS Brazil since first case appear"
)

index = top10_since_first_case_log.index
data = top10_since_first_case_log

fig = go.Figure(data=[
    
    go.Line(name='Brazil', x=index , y=data['Brazil'])
    , go.Line(name='US', x = index, y=data['US'])
    , go.Line(name='Italy', x = index, y=data['Italy'])
    , go.Line(name='China', x = index, y=data['China'])
    , go.Line(name='Spain', x = index, y=data['Spain'])
    , go.Line(name='Germany', x=index, y=data['Germany'])
    , go.Line(name='France', x=index , y=data['France'])
    , go.Line(name='Iran', x = index, y=data['Iran'])
    , go.Line(name='United Kingdom', x = index, y=data['United Kingdom'])
    , go.Line(name='Russia', x = index, y=data['Russia'])
    , go.Line(name='Turkey', x = index, y=data['Turkey'])
  
])

fig['layout'].update(layout)

fig.show()

#### Current situation in Brazil by state

São Paulo are leading in confirmed cases followed by Rio de Janeiro, São Paulo seems to have a great percentage of total cases in Brazil, let's see this foward.

In [None]:
by_state = brazil[['cases','deaths']][brazil['date']==brazil_date].groupby(brazil['state']).max().sort_values(by = 'cases', ascending=False)

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    title="Cases and Deaths by state (deaths highlighted by numbers)"
)

fig = go.Figure(data=[
    
    go.Bar(name='cases'
           , x=by_state.index
           , y=by_state['cases']),
    
    go.Bar(name='deaths'
           , x=by_state.index
           , y=by_state['deaths']
           , text=by_state['deaths']
           , textposition='outside')
])

fig.update_layout(barmode='stack')
fig['layout'].update(layout)

fig.show()

#### Mortality rate by state

In [None]:
mortality_by_state = round((by_state['deaths']/by_state['cases'])*100,2)
mortality_by_state = mortality_by_state.sort_values(ascending=False)

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    title="Mortality rate by state"
)

fig = go.Figure(data=[
    
    go.Bar(x=mortality_by_state.index
           , y=mortality_by_state)
])

fig.update_layout(barmode='stack')
fig['layout'].update(layout)

fig.show()

#### Current situation in Brazil by region

In [None]:
by_region = brazil[['cases','deaths']][brazil['date']==brazil_date].groupby(brazil['region']).max().sort_values(by = 'cases', ascending=False)

fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}
                                            , {'type':'domain'}]])

fig.add_trace(go.Pie(labels=by_region.index
                     , values=by_region["cases"]
                     , name="Cases"),1, 1)

fig.add_trace(go.Pie(labels=by_region.index
                     , values=by_region["deaths"]
                     , name="Deaths"),1, 2)

# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent+name")

fig.update_layout(
    
    title_text="Brazil cases and deaths situation by region",
    annotations=[dict(text='Cases', x=0.18, y=0.5, font_size=20, showarrow=False),
                 dict(text='Deaths', x=0.82, y=0.5, font_size=20, showarrow=False)])
fig.show()

#### How much of confirmed and deaths cases are in São Paulo?

Clearly São Paulo are leading confirmed cases and deaths in Brazil, but, How much of all confirmed cases and deaths are in São Paulo?

In [None]:
others = by_state[by_state.index!='São Paulo'].sum()
sp = by_state[by_state.index=='São Paulo'].sum()
cases = pd.DataFrame([others, sp],columns = ['cases','deaths'], index = ['Other states','São Paulo'])


fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}
                                            , {'type':'domain'}]])

fig.add_trace(go.Pie(labels=cases.index
                     , values=cases["cases"]
                     , name="Cases"),1, 1)

fig.add_trace(go.Pie(labels=cases.index
                     , values=cases["deaths"]
                     , name="Deaths"),1, 2)

# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent+name")

fig.update_layout(
    
    title_text="How much of confirmed cases and deaths cases are in São Paulo?",
    annotations=[dict(text='Cases', x=0.18, y=0.5, font_size=20, showarrow=False),
                 dict(text='Deaths', x=0.82, y=0.5, font_size=20, showarrow=False)])
fig.show()

#### Total cases in other states vs São Paulo over the time

The first case appear in 2/26/20 but just start to explode after 3/1/20

In [None]:
other_states_over_time = brazil[['cases','deaths']][brazil['state']!='São Paulo'].groupby(brazil['date']).sum().sort_values(by = 'cases', ascending=True)

sp_over_time = brazil[['cases','deaths']][brazil['state']=='São Paulo'].groupby(brazil['date']).sum().sort_values(by = 'cases', ascending=True)

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
    , title="Total cases in other states vs São Paulo over the time"
)

fig = go.Figure(data=[
    
    go.Line(name='Other states'
            , x=other_states_over_time.index
            , y=other_states_over_time['cases']
            , mode="lines+markers")
    , go.Line(name='São Paulo'
              , x=sp_over_time.index
              , y=sp_over_time['cases']
              , mode="lines+markers")
    
])

fig.update_layout(barmode='stack')
fig['layout'].update(layout)

fig.show()

#### Total deaths in other states vs São Paulo over the time

The first death appear in 3/17/20 and continue to growth fast

In [None]:
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
    , title="Deaths in other states vs São Paulo over the time"
)

fig = go.Figure(data=[
    
    go.Line(name='Other states'
            , x=other_states_over_time.index
            , y=other_states_over_time['deaths']
            , mode="lines+markers")
    , go.Line(name='São Paulo'
              , x=sp_over_time.index
              , y=sp_over_time['deaths']
              , mode="lines+markers")
    
])

fig.update_layout(barmode='stack')
fig['layout'].update(layout)

fig.show()

### Preparing data for modeling

To modeling confirmed cases and deaths let's take cases and deaths since first case appear, convert our data into 1D arrays, split into train and test and train_death and test_death, transform our data using polynomial fit. Every 3 days let's add new data to make a new prediction.

In [None]:
# Taking confirmed cases since first case appear in 2/26/2020
cases = brazil['cases'].groupby(brazil['date']).sum().sort_values(ascending=True)
cases = cases[cases>0].reset_index().drop('date',axis=1)

deaths = brazil['deaths'].groupby(brazil['date']).sum().sort_values(ascending=True)
deaths = deaths[deaths>0].reset_index().drop('date',axis=1)

# add new 3 days here
cases = cases[0:100]
deaths = deaths[0:80]

In [None]:
# Converting our data into a array
days_since_first_case = np.array([i for i in range(len(cases.index))]).reshape(-1, 1)
brazil_cases = np.array(cases).reshape(-1, 1)

days_since_first_death = np.array([i for i in range(len(deaths.index))]).reshape(-1, 1)
brazil_deaths = np.array(deaths).reshape(-1, 1)

In [None]:
#Preparing indexes to predict next 15 days
days_in_future = 3
future_forcast = np.array([i for i in range(len(cases.index)+days_in_future)]).reshape(-1, 1)
adjusted_dates = future_forcast[:-3]

future_forcast_deaths = np.array([i for i in range(len(deaths.index)+days_in_future)]).reshape(-1, 1)
adjusted_dates_deaths = future_forcast_deaths[:-3]

In [None]:
#Splitting data into train and test to evaluate our model
X_train, X_test, y_train, y_test = train_test_split(days_since_first_case
                                                    , brazil_cases
                                                    , test_size= 10
                                                    , shuffle=False
                                                    , random_state = 42) 

X_train_death, X_test_death, y_train_death, y_test_death = train_test_split(days_since_first_death
                                                    , brazil_deaths
                                                    , test_size= 10
                                                    , shuffle=False
                                                    , random_state = 42) 

### Modeling

Now let's use our prepared data into poly.fit function and transform our data to be used into a regression latter. To find the best degree for our PolynomialFeatures function, let's use a simple loop that evaluate RMSE in test dataset and selected the best degree for the current prediction.

In [None]:
# looking for best degree for deaths
rmse = 10000
degree = 0
for i in range(101):
    # Transform our cases data for polynomial regression
    poly = PolynomialFeatures(degree=i)
    poly_X_train = poly.fit_transform(X_train)
    poly_X_test = poly.fit_transform(X_test)
    poly_future_forcast = poly.fit_transform(future_forcast)

    # polynomial regression cases
    linear_model = LinearRegression(normalize=True, fit_intercept=False)
    linear_model.fit(poly_X_train, y_train)
    test_linear_pred = linear_model.predict(poly_X_test)
    linear_pred = linear_model.predict(poly_future_forcast)

    # evaluating with RMSE
    rm = sqrt(mean_squared_error(y_test, test_linear_pred))
    if(rm<rmse):
        rmse = rm
        degree = i
    if(i==100):
        print('the best mae is:',round(rmse,2))
        print('the best degree for cases is:',degree)

In [None]:
# looking for best degree for deaths
rmse = 10000
degree = 0
for i in range(101):
    # Transform our death data for polynomial regression
    poly_death = PolynomialFeatures(degree=i)
    poly_X_train_death = poly_death.fit_transform(X_train_death)
    poly_X_test_death = poly_death.fit_transform(X_test_death)
    poly_future_forcast_death = poly_death.fit_transform(future_forcast_deaths)

    # polynomial regression deaths
    linear_model_death = LinearRegression(normalize=True, fit_intercept=False)
    linear_model_death.fit(poly_X_train_death, y_train_death)
    test_linear_pred_death = linear_model_death.predict(poly_X_test_death)
    linear_pred_death = linear_model_death.predict(poly_future_forcast_death)

     # evaluating with RMSE
    rm = sqrt(mean_squared_error(y_test_death, test_linear_pred_death))
    if(rm<rmse):
        rmse = rm
        degree = i
    if(i==100):
        print('the best mae is:',round(rmse,2))
        print('the best degree for cases is:',degree)

***Perfect!*** Now that we already have the bests degree for death and cases prediction, let's put into poly.fit again and transform our data for polynomial regression.

In [None]:
# Transform our cases data for polynomial regression
poly = PolynomialFeatures(degree=7)
poly_X_train = poly.fit_transform(X_train)
poly_X_test = poly.fit_transform(X_test)
poly_future_forcast = poly.fit_transform(future_forcast)

# Transform our death data for polynomial regression
poly_death = PolynomialFeatures(degree=6)
poly_X_train_death = poly_death.fit_transform(X_train_death)
poly_X_test_death = poly_death.fit_transform(X_test_death)
poly_future_forcast_death = poly_death.fit_transform(future_forcast_deaths)

#### Training, predicting and evaluating polynomial regression into confirmed cases

In [None]:
# polynomial regression cases
linear_model = LinearRegression(normalize=True, fit_intercept=False)
linear_model.fit(poly_X_train, y_train)
test_linear_pred = linear_model.predict(poly_X_test)
linear_pred = linear_model.predict(poly_future_forcast)

# evaluating with RMSE
print('RMSE:', sqrt(mean_squared_error(y_test, test_linear_pred)))

In [None]:
plt.figure(figsize=(12,7))

plt.plot(y_test, label = "Real cases")
plt.plot(test_linear_pred, label = "Predicted")
plt.title("Predicted vs Real cases", size = 20)
plt.xlabel('Days', size = 15)
plt.ylabel('Cases', size = 15)
plt.xticks(size=12)
plt.yticks(size=12)

# defyning legend config
plt.legend(loc = "upper left"
           , frameon = True
           , ncol = 2 
           , fancybox = True
           , framealpha = 0.95
           , shadow = True
           , borderpad = 1
           , prop={'size': 15});

In [None]:
plt.figure(figsize=(16, 9))

plt.plot(adjusted_dates
         , brazil_cases
         , label = "Real cases")

plt.plot(future_forcast
         , linear_pred
         , label = "Polynomial Regression Predictions"
         , linestyle='dashed'
         , color='orange')

plt.title('Cases in Brazil over the time: Predicting Next 3 days', size=30)
plt.xlabel('Days Since 2/26/20', size=30)
plt.ylabel('Cases', size=30)
plt.xticks(size=20)
plt.yticks(size=20)

plt.axvline(len(X_train), color='black'
            , linestyle="--"
            , linewidth=1)

plt.text(18, 5000
         , "model training"
         , size = 15
         , color = "black")

plt.text((len(X_train)+0.2), 15000
         , "prediction"
         , size = 15
         , color = "black")

# defyning legend config
plt.legend(loc = "upper left"
           , frameon = True
           , ncol = 2 
           , fancybox = True
           , framealpha = 0.95
           , shadow = True
           , borderpad = 1
           , prop={'size': 15})

plt.show();

#### Last predictions

Bellow you can see the predictions from last days vs real cases

In [None]:
#brazil['cases'].groupby(brazil['date']).sum().sort_values(ascending=True)[50:]

In [None]:
last_predictions = pd.DataFrame([11358,12955,14842
                                 ,17699,19662,21786
                                 ,22927,25172,27576
                                ,29127,31544,34091
                                ,36399,38975,41622
                                ,43905,43959,50150
                                ,53847,57243,60754
                                ,64343,68257,72326
                                ,80683,87617,95599
                                ,105137,111865,117749
                                ,123360,130245,137383
                                ,155546,166505,178187
                                ,184195,196411,209366
                                ,218811,231836,245470
                                ,254394,268675,283579
                                ,305524,323032,341383
                                ,344890,366454,389838
                                ,416676,437880,459836
                                ,488148,513437,539800
                                ,570328,598446,627637]
                                , columns = ['Predicted']
                                , index = ['04/06/20','04/07/20','04/08/20'
                                           ,'04/09/20','04/10/20','04/11/20'
                                          ,'04/12/20','04/13/20','04/14/20'
                                          ,'04/15/20','04/16/20','04/17/20'
                                          ,'04/18/20','04/19/20','04/20/20'
                                          ,'04/21/20','04/22/20','04/23/20'
                                          ,'04/24/20','04/25/20','04/26/20'
                                          ,'04/27/20','04/28/20','04/29/20'
                                          ,'04/30/20','05/01/20','05/02/20'
                                          ,'05/03/20','05/04/20','05/05/20'
                                          ,'05/06/20','05/07/20','05/08/20'
                                          ,'05/09/20','05/10/20','05/11/20'
                                          ,'05/12/20','05/13/20','05/14/20'
                                          ,'05/15/20','05/16/20','05/17/20'
                                          ,'05/18/20','05/19/20','05/20/20'
                                          ,'05/21/20','05/22/20','05/23/20'
                                          ,'05/24/20','05/25/20','05/26/20'
                                          ,'05/27/20','05/28/20','05/29/20'
                                          ,'05/30/20','05/31/20','06/01/20'
                                          ,'06/02/20','06/03/20','06/04/20'])

last_predictions['Real cases'] = [12056,13717,15927
                                  ,17857,19638,20727
                                  ,22169,23430,25262
                                 ,28320,30425,33682
                                 ,36599,38654,40581
                                 ,43079,45757,49492
                                 ,52995,58509,61888
                                 ,66501,71886,78162
                                 ,85380,91589,96559
                                 ,101147,107780,114715
                                 ,125128,135106,145328
                                 ,155939,162699,168331
                                 ,177589,188974,202918
                                 ,218223,233142,241080
                                 ,254220,271628,291579
                                 ,310087,330890,347398
                                 ,363211,374898,391222
                                 ,411821,438238,465166
                                 ,498440,514200,526447
                                 ,555383,584016,614941]

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
    , title="Last predictions vs Real cases"
)

fig = go.Figure(data=[
    
    go.Line(name='Predicted'
            , x=last_predictions.index
            , y=last_predictions['Predicted']
            , mode="lines+markers")
    , go.Line(name='Real cases'
              , x=last_predictions.index
              , y=last_predictions['Real cases']
              , mode="lines+markers")
    
])

fig.update_layout(barmode='stack')
fig['layout'].update(layout)

fig.show()

#### Cases prediction for next 3 days

In [None]:
pd.DataFrame(linear_pred[len(cases):].astype('Int64'), columns = ['Predicted'], index = ['06-05-20','06-06-20','06-07-20']).style.background_gradient(cmap='Blues')

#### Training, predicting and evaluating polynomial regression into death cases

In [None]:
# polynomial regression deaths
linear_model_death = LinearRegression(normalize=True, fit_intercept=False)
linear_model_death.fit(poly_X_train_death, y_train_death)
test_linear_pred_death = linear_model_death.predict(poly_X_test_death)
linear_pred_death = linear_model_death.predict(poly_future_forcast_death)

# evaluating with RMSE
print('RMSE:', sqrt(mean_squared_error(y_test_death, test_linear_pred_death)))

In [None]:
plt.figure(figsize=(12,7))

plt.plot(test_linear_pred_death, label = "Predicted")
plt.plot(y_test_death, label = "Real deaths")
plt.title("Predicted vs Real deaths", size = 20)
plt.xlabel('Days', size = 15)
plt.ylabel('Deaths', size = 15)
plt.xticks(size=12)
plt.yticks(size=12)

# defyning legend config
plt.legend(loc = "upper left"
           , frameon = True
           , ncol = 2 
           , fancybox = True
           , framealpha = 0.95
           , shadow = True
           , borderpad = 1
           , prop={'size': 15});

In [None]:
plt.figure(figsize=(16, 9))

plt.plot(adjusted_dates_deaths
         , brazil_deaths
         , label = "Real deaths")

plt.plot(future_forcast_deaths
         , linear_pred_death
         , label = "Polynomial Regression Predictions"
         , linestyle='dashed'
         , color='red')

plt.title('Deaths in Brazil over the time: Predicting Next 3 days', size=30)
plt.xlabel('Days Since 03/17/20', size=30)
plt.ylabel('Deaths', size=30)
plt.xticks(size=20)
plt.yticks(size=20)

plt.axvline(len(X_train_death), color='black'
            , linestyle="--"
            , linewidth=1)

plt.text(10, 200
         , "model training"
         , size = 15
         , color = "black")

plt.text((len(X_train_death)+0.2), 600
         , "prediction"
         , size = 15
         , color = "black")

# defyning legend config
plt.legend(loc = "upper left"
           , frameon = True
           , ncol = 2 
           , fancybox = True
           , framealpha = 0.95
           , shadow = True
           , borderpad = 1
           , prop={'size': 15})

plt.show();

#### Last predictions

Bellow you can see the predictions from last days vs real deaths

In [None]:
#brazil['deaths'].groupby(brazil['date']).sum().sort_values(ascending=True)[47:]

In [None]:
last_predictions = pd.DataFrame([574,678,799
                                 ,876,998,1132
                                ,1227,1375,1534
                                ,1507,1633,1765
                                ,2384,2616,2862
                                ,2961,3215,3483
                                ,3385,3603,3827
                                ,4390,4688,4999
                                ,5586,6028,6510
                                ,7482,8083,8706
                                ,7949,8343,8746
                                ,10254,10861,11492
                                ,12173,12879,13614
                                ,14494,15279,16092
                                ,17339,18279,19256
                                ,19953,20946,21972
                                ,21952,23010,24102
                                ,26532,27741,28987
                                ,28446,29469,30495
                                ,31199,32548,34021]
                                , columns = ['Predicted']
                                , index = ['04/06/20','04/07/20','04/08/20'
                                           ,'04/09/20','04/10/20','04/11/20'
                                          ,'04/12/20','04/13/20','04/14/20'
                                          ,'04/15/20','04/16/20','04/17/20'
                                          ,'04/18/20','04/19/20','04/20/20'
                                          ,'04/21/20','04/22/20','04/23/20'
                                          ,'04/24/20','04/25/20','04/26/20'
                                          ,'04/27/20','04/28/20','04/29/20'
                                          ,'04/30/20','05/01/20','05/02/20'
                                          ,'05/03/20','05/04/20','05/05/20'
                                          ,'05/06/20','05/07/20','05/08/20'
                                          ,'05/09/20','05/10/20','05/11/20'
                                          ,'05/12/20','05/13/20','05/14/20'
                                          ,'05/15/20','05/16/20','05/17/20'
                                          ,'05/18/20','05/19/20','05/20/20'
                                          ,'05/21/20','05/22/20','05/23/20'
                                          ,'05/24/20','05/25/20','05/26/20'
                                          ,'05/27/20','05/28/20','05/29/20'
                                          ,'05/30/20','05/31/20','06/01/20'
                                          ,'06/02/20','06/03/20','06/04/20'])

last_predictions['Real cases'] = [553,667,800
                                  ,941,1056,1124
                                 ,1223,1328,1534
                                 ,1736,1924,2141
                                 ,2347,2462,2575
                                 ,2741,2906,3313
                                 ,3670,4016,4205
                                 ,4543,5017,5466
                                 ,5901,6329,6724
                                 ,7025,7321,7921
                                 ,8536,9146,9897
                                 ,10627,11123,11519
                                 ,12400,13149,13993
                                 ,14817,15633,16118
                                 ,16792,17971,18859
                                 ,20047,21048,22013
                                 ,22666,23473,24512
                                 ,25598,26754,27878
                                 ,28834,29314,29937
                                 ,33754,35094,36460]

layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)'
    , plot_bgcolor='rgba(0,0,0,0)'
    , title="Last predictions vs Real deaths"
)

fig = go.Figure(data=[
    
    go.Line(name='Predicted'
            , x=last_predictions.index
            , y=last_predictions['Predicted']
            , mode="lines+markers")
    , go.Line(name='Real deaths'
              , x=last_predictions.index
              , y=last_predictions['Real cases']
              , mode="lines+markers")
    
])

fig.update_layout(barmode='stack')
fig['layout'].update(layout)

fig.show()

#### Deaths prediction for the next 3 days

In [None]:
pd.DataFrame(linear_pred_death[len(deaths):].astype('Int64'), columns = ['Predicted'], index = ['06-05-20','06-06-20','06-07-20']).style.background_gradient(cmap='Reds')

#### Thank you very much for read this kernel, if you think that this kernel was useful please give a upvote so i can continue to create quality content to all kagglers :)

### Referencies

https://www.kaggle.com/elloaguedes/panorama-do-covid-19-no-brasil

https://www.kaggle.com/dferhadi/covid-19-predictions-growth-factor-and-calculus/comments#783008

https://www.kaggle.com/therealcyberlord/coronavirus-covid-19-visualization-prediction

https://www.kaggle.com/pedrohenriquecardoso/predicting-covid-cases-with-lowest-rmse-possible