# Analysis COVID-19 Brazil

This notebook is intended to analyze contagion data in Brazil, compare with other countries and try to predict the peak of the contagion curve.

Keep in mind that the data used here comes from the Ministry of Health, and there are controversies about the accuracy of the data.

This analysis was completed on 07/23/2020

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

plt.style.use('ggplot')

df = pd.read_csv('/kaggle/input/corona-virus-report/covid_19_clean_complete.csv')
df.head()



# Brazil

First, we will analyze Brazil separately from other countries. Contagion, curve behavior, and others.

We took the data exclusively from Brazil. These are the last 5 rows of the archive with data from Brazil.

In [None]:
country='Brazil'
italydf = df[df['Country/Region'] == country]
italydf.tail()

## Main Graphic

Here we have the main graph, with the number of confirmed, recovered and deaths over time. The death curve seems meager compared to the confirmed ones, but that is because the numbers are altering the scale of the graph. We will see that there is a need for some concern.

In [None]:
#italydf = italydf.loc[28:35538]

italydf.index = italydf['Date']



spacing =2

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, italydf['Confirmed'], 'b', label='Confirmed')
ax.plot(italydf.index, italydf['Deaths'], 'r', label='Deaths')
ax.plot(italydf.index, italydf['Recovered'], 'g', label='Recovered')
ax.plot(italydf.index, italydf['Confirmed'] - (italydf['Recovered']+italydf['Deaths']), label='Active Cases')
ax.legend(loc=2)
ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])


for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Brazil')

plt.show()


This is the same graph above, but with **Normalized** data. This means that we put them on an equal scale that makes the comparison fairer and easier (more about normalization -> https://medium.com/@urvashilluniya/why-data-normalization-is-necessary-for-machine-learning-models-681b65a05029).

Here, the numbers were divided by their highest value, causing all data to range from 0 to 1 (or 0 to 100%). So we can see the behavior of the curves until they reach their maximum (1).

In this graph we see that Deaths grow at a lower rate than Confirmed and Recovered, which grow more exponentially. And the Recovered are fortunately growing at a higher rate than the Confirmed. It is good to note that Brazil has not reached its peak yet, this behavior of the numbers may change.

Just to better understand this graph, below it we have the graph with X being raised to different powers, showing that bigger powers curve to the right and smaller ones to the left.


It is important to know that **x ^ m** grows faster than **x ^ n**, if **m> n**.

In [None]:
spacing =2

country='Brazil'
italydf = df[df['Country/Region'] == country]
italydf.index = italydf['Date']

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, italydf['Confirmed']/italydf['Confirmed'].max(), 'b', label='Confirmed')
ax.plot(italydf.index, italydf['Deaths']/italydf['Deaths'].max(), 'r', label='Deaths')
ax.plot(italydf.index, italydf['Recovered']/italydf['Recovered'].max(), 'g', label='Recovered')
ax.plot(italydf.index, italydf['Active']/italydf['Active'].max(), label='Active')
ax.legend(loc=2)
ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])


for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Data from Brazil normalized between 1 and 0')

plt.show()

x3 = [c for c in range(0,100)]
x = [c for c in range(0,100)]
x2 = [c for c in range(0,100)]

x3 = [c**3 for c in x3]
x2 = [c**2 for c in x2]
xh = [c**0.2 for c in x]

index = [c for c in range(0,100)]


plt.subplots(figsize=(15,7))
plt.plot(index, [c/max(x) for c in x], label='x')
plt.plot(index,[c/max(x2) for c in x2], label='x^2')
plt.plot(index, [c/max(x3) for c in x3], label='x^3')
plt.plot(index, [c/max(xh) for c in xh], label='x^0.2')
plt.legend(loc=2)
plt.title('Functions of X normalized between 0 and 1')
plt.show()


Here we have the graph with the difference between one day and the previous, that is, how many deaths, confirmed, and recovered were added each day. And below the same graph but normalized.

In the normalized graph we see that in the last few days, more deaths per day, has been slowing growth, and confirmed per day has broken a new maximum. It also seems that the deaths have not passed the maximum (1), which indicates, for now, that they are about to decrease and flatten the curve. But that can still change.

And below that we have a chart also normalized but with a 7-day moving average.

The recovered per day had abnormal peaks, so we removed these points in the second graph, considering them as *outliers*.

In [None]:
spacing =2

country='Brazil'
italydf = df[df['Country/Region'] == country]
italydf.index = italydf['Date']

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, italydf['Confirmed']-italydf['Confirmed'].shift(1), 'b', label='+ Confirmed p/day')
ax.plot(italydf.index, italydf['Recovered']-italydf['Recovered'].shift(1), 'g', label='+ Recovered p/day')
ax.plot(italydf.index, italydf['Deaths']-italydf['Deaths'].shift(1), 'r', label='+ Deaths p/day')
ax.legend(loc=2)
ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])


for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Brazil')

plt.show()


spacing =2

fig, ax = plt.subplots(figsize=(15,7))

recuperados = (italydf['Recovered']-italydf['Recovered'].shift(1))/(italydf['Recovered']-italydf['Recovered'].shift(1)).max()
recuperados = recuperados.drop(recuperados.idxmax())
recuperados = recuperados.drop(recuperados.idxmax())
recuperados = recuperados/recuperados.max()

ax.plot(italydf.index, (italydf['Confirmed']-italydf['Confirmed'].shift(1))/(italydf['Confirmed']-italydf['Confirmed'].shift(1)).max(), 'b', label='+ Confirmed p/day')
ax.plot(italydf.index[2:], recuperados, 'g', label='+ Recovered p/day')
ax.plot(italydf.index, (italydf['Deaths']-italydf['Deaths'].shift(1))/(italydf['Deaths']-italydf['Deaths'].shift(1)).max(), 'r', label='+ Deaths p/day')
ax.legend(loc=2)
ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])


for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Normalized')

plt.show()

spacing =2

fig, ax = plt.subplots(figsize=(15,7))

conf = (italydf['Confirmed']-italydf['Confirmed'].shift(1)).rolling(7).mean()
deaths = (italydf['Deaths']-italydf['Deaths'].shift(1)).rolling(7).mean()
recuperados = (italydf['Recovered']-italydf['Recovered'].shift(1))
recuperados = recuperados.drop(recuperados.idxmax())
recuperados = recuperados.drop(recuperados.idxmax())
recuperados = recuperados/recuperados.max()
recuperados = recuperados.rolling(7).mean()

ax.plot(italydf.index, conf/conf.max(), 'b', label='+ Confirmed p/day')
ax.plot(italydf.index[2:],recuperados/recuperados.max(), 'g', label='+ Recovered p/day')
ax.plot(italydf.index, deaths/deaths.max() , 'r', label='+ Deaths p/day')
ax.legend(loc=2)
ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])


for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Normalized with Moving Average')

plt.show()

## Rates

Here we will analyze the various rates related to the contagion of the virus, such as mortality rate over time, percentage of contagion, etc.

Below is the death rate (Deaths / Confirmed) for the last 5 days.

In [None]:
death_ratio = italydf['Deaths']/italydf['Confirmed']

print('Death rate of the last 5 days')
print(death_ratio.tail())

In [None]:
import datetime as dt




In this graph we see that the mortality rate of the virus has varied over time, reaching a maximum of 7% and since then it has fallen and today it is around 4%.

The death rate is on a downward trend for a few weeks now. This is a good sign, and the trend looks like it will continue for a while. But you can't be sure.

This graph tells us that trying to predict the number of deaths is very difficult, perhaps even impossible, as it varies over time. So we will not predict deaths with the same method as we will predict confirmed cases.

In [None]:
italydf = df[df['Country/Region'] == country]
italydf = italydf[italydf['Confirmed']>=1]
italydf.index = italydf['Date']

death_ratio = italydf['Deaths']/italydf['Confirmed']

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, death_ratio, label = 'Deaths p/ Confirmed')


ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])


spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Death Rate over days')

ax.legend(loc=2)

plt.show()

In [None]:
italydf = df[df['Country/Region'] == country]
italydf = italydf[italydf['Confirmed']>=1]
italydf.index = italydf['Date']

death_ratio = italydf['Recovered']/italydf['Confirmed']

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, death_ratio, label = 'Recovered p/ Confirmed')


ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])


spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Recovered Rate over the days')

ax.legend(loc=2)

plt.show()

Below we will see the rate of extra cases per day.

The following calculation was made:

        ( (Ncases_t1 - Ncases_t0) / Ncases_t0) x 100


For each day, where **t** is time or day.

The beginning of the graph is full of extreme peaks, because as we were at the beginning of the contagion, there are cases, for example, of a 500% increase in the contagion, when the virus spread from 1 to 5 people. After the virus spreads to more people the percentage starts to normalize.

In [None]:
italydf = df[df['Country/Region'] == country]
italydf.index = italydf['Date']
italydf = italydf[italydf['Confirmed']>=1]

death_ratio = italydf['Deaths'] - italydf['Deaths'].shift(1)

death_ratio = (death_ratio/(italydf['Deaths'].shift(1)))*100

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, death_ratio, label = 'More Deaths p/ day')
ax.plot(italydf.index, death_ratio.rolling(10).mean(), label = 'Moving Average 10 days')


ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])

ax.set_ylim([0,80])


spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Percentage of Deaths + per day')

ax.legend(loc=2)

plt.show()

In [None]:
italydf = df[df['Country/Region'] == country]
italydf.index = italydf['Date']
italydf = italydf[italydf['Confirmed']>=1]

death_ratio = italydf['Confirmed'] - italydf['Confirmed'].shift(1)

death_ratio = (death_ratio/(italydf['Confirmed'].shift(1)))*100

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, death_ratio, label = 'More Confirmed p/ day')
ax.plot(italydf.index, death_ratio.rolling(10).mean(), label = 'Moving Average 10 days')



ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])
ax.set_ylim([0,80])


spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Percentage of Confirmed + per day')

ax.legend(loc=2)

plt.show()

In [None]:
italydf = df[df['Country/Region'] == country]
italydf = italydf[italydf['Confirmed']>=1]
italydf.index = italydf['Date']

death_ratio = italydf['Confirmed'] - italydf['Confirmed'].shift(1)

death_ratio = (death_ratio/(italydf['Confirmed'].shift(1)))*100

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, death_ratio, label = 'More Confirmed p/ day')
ax.plot(italydf.index, death_ratio.rolling(10).mean(), label = 'Moving Average 10 days')



ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])
ax.set_ylim([0,30])


spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Percentage of Confirmed + per day (Amplified)')

ax.legend(loc=2)


In [None]:
italydf = df[df['Country/Region'] == country]
italydf = italydf[italydf['Confirmed']>=1]
italydf.index = italydf['Date']
 
death_ratio = italydf['Recovered'] - italydf['Recovered'].shift(1)

death_ratio = (death_ratio/(italydf['Recovered'].shift(1)))*100

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, death_ratio, label = 'More Recovered p/ day')
ax.plot(italydf.index, death_ratio.rolling(10).mean(), label = 'Moving Average 10 days')


ax.set_xticks(np.arange(len(italydf.index)))
ax.set_xticklabels([c for c in italydf.index])

ax.set_ylim([0,80])


spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Percentage of Recovered + per day')

ax.legend(loc=2)


# Analysis with Other Countries

## First case of each country

In [None]:
usdf = df[df['Country/Region'] == 'US']
usdf.index = usdf['Date']


firstcase = usdf[usdf['Confirmed'] >= 1]
firstcase.head(1)

In [None]:
italydf = df[df['Country/Region'] == 'Italy']
italydf.index = italydf['Date']


firstcase = italydf[italydf['Confirmed'] >= 1]
firstcase.head(1)

In [None]:
germandf = df[df['Country/Region'] == 'Germany']
germandf.index = germandf['Date']


firstcase = germandf[germandf['Confirmed'] >= 1]
firstcase.head(1)

In [None]:
BRdf = df[df['Country/Region'] == 'Brazil']
BRdf.index = BRdf['Date']


firstcase = BRdf[BRdf['Confirmed'] >= 1]
firstcase.head(1)

In [None]:
swedendf = df[df['Country/Region'] == 'Sweden']
swedendf.index = swedendf['Date']
#swedendf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in swedendf.index] 

firstcase = swedendf[swedendf['Confirmed'] >= 1]
firstcase.head(1)

In [None]:
argdf = df[df['Country/Region'] == 'Argentina']
argdf.index = argdf['Date']
#argdf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in argdf.index] 

firstcase = argdf[argdf['Confirmed'] >= 1]
firstcase.head(1)

With the information above we see that it took Brazil almost 1 month to have its first case of the virus, after the various countries in Europe, which have already left quarantine. So it is expected that we did not pass the peak with these countries.

## Graphics 

In [None]:
italydf = italydf[italydf['Confirmed']>=1]
BRdf = BRdf[BRdf['Confirmed']>=1]
germandf = germandf[germandf['Confirmed']>=1]
usdf = usdf[usdf['Confirmed']>=1]
swedendf = swedendf[swedendf['Confirmed']>=1]
argdf = argdf[argdf['Confirmed']>=1]




usdf.index = [c for c in range(0,len(usdf))]
italydf.index = [c for c in range(0,len(italydf))]
BRdf.index = [c for c in range(0,len(BRdf))]
germandf.index = [c for c in range(0,len(germandf))]
swedendf.index = [c for c in range(0,len(swedendf))]
argdf.index = [c for c in range(0,len(argdf))]

arg_dr =  argdf['Confirmed'] - argdf['Confirmed'].shift(1)
Italy_dr = italydf['Confirmed'] - italydf['Confirmed'].shift(1)
Brazil_dr = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)
Germany_dr =  germandf['Confirmed'] - germandf['Confirmed'].shift(1)
US_dr =  usdf['Confirmed'] - usdf['Confirmed'].shift(1)
Sweden_dr =  swedendf['Confirmed'] - swedendf['Confirmed'].shift(1)

Italy_dr = (Italy_dr/(italydf['Confirmed'].shift(1)))*100
Brazil_dr = (Brazil_dr/(BRdf['Confirmed'].shift(1)))*100
Germany_dr = (Germany_dr/(germandf['Confirmed'].shift(1)))*100
US_dr = (US_dr/(usdf['Confirmed'].shift(1)))*100
Sweden_dr = (Sweden_dr/(swedendf['Confirmed'].shift(1)))*100
arg_dr = (arg_dr/(argdf['Confirmed'].shift(1)))*100

fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, Italy_dr, label = '+Confirmed % p/day Italy')
ax.plot(BRdf.index, Brazil_dr, label = '+Confirmed % p/day Brazil')
ax.plot(germandf.index, Germany_dr, label = '+Confirmed % p/day Germany')
ax.plot(usdf.index, US_dr, label = '+Confirmed % p/day USA')
ax.plot(swedendf.index, Sweden_dr, label = '+Confirmed % p/day Sweden')
ax.plot(argdf.index, arg_dr, label = '+Confirmed % p/day Argentina')


ax.set_xticks(np.arange(len(usdf.index)))
ax.set_xticklabels([c for c in usdf.index])
ax.set_ylim([0,30])


spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Percentage of Confirmed + p/day p/Country')

ax.legend(loc=3)

plt.show()

In [None]:
italydf = italydf[italydf['Confirmed']>=1]
BRdf = BRdf[BRdf['Confirmed']>=1]
germandf = germandf[germandf['Confirmed']>=1]
usdf = usdf[usdf['Confirmed']>=1]
swedendf = swedendf[swedendf['Confirmed']>=1]
argdf = argdf[argdf['Confirmed']>=1]


usdf.index = [c for c in range(0,len(usdf))]
italydf.index = [c for c in range(0,len(italydf))]
BRdf.index = [c for c in range(0,len(BRdf))]
germandf.index = [c for c in range(0,len(germandf))]
swedendf.index = [c for c in range(0,len(swedendf))]
argdf.index = [c for c in range(0,len(argdf))]




Italy_dr = italydf['Deaths']/italydf['Confirmed']
Brazil_dr = BRdf['Deaths']/BRdf['Confirmed']
Germany_dr =  germandf['Deaths']/germandf['Confirmed']
US_dr =  usdf['Deaths']/usdf['Confirmed']
Sweden_dr = swedendf['Deaths']/swedendf['Confirmed']
arg_dr = argdf['Deaths']/argdf['Confirmed']



fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, Italy_dr, label = 'Deaths p / Confirmed Italy')
ax.plot(BRdf.index, Brazil_dr, label = 'Deaths p / Confirmed Brazil')
ax.plot(germandf.index, Germany_dr, label = 'Deaths p / Confirmed Germany')
ax.plot(usdf.index, US_dr, label = 'Deaths p / Confirmed USA')
ax.plot(swedendf.index, Sweden_dr, label = 'Deaths p / Confirmed Sweden')
ax.plot(argdf.index, arg_dr, label = 'Deaths p / Confirmed Argentina')


ax.set_xticks(np.arange(len(usdf.index)))
ax.set_xticklabels([c for c in usdf.index])



spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Deaths p/Confirmed p/day p/Country')

ax.legend(loc=2)


plt.show()

In [None]:
italydf = italydf[italydf['Confirmed']>=1]
BRdf = BRdf[BRdf['Confirmed']>=1]
germandf = germandf[germandf['Confirmed']>=1]
usdf = usdf[usdf['Confirmed']>=1]
swedendf = swedendf[swedendf['Confirmed']>=1]
argdf = argdf[argdf['Confirmed']>=1]


usdf.index = [c for c in range(0,len(usdf))]
italydf.index = [c for c in range(0,len(italydf))]
BRdf.index = [c for c in range(0,len(BRdf))]
germandf.index = [c for c in range(0,len(germandf))]
swedendf.index = [c for c in range(0,len(swedendf))]
argdf.index = [c for c in range(0,len(argdf))]




Italy_dr = italydf['Recovered']/italydf['Confirmed']
Brazil_dr = BRdf['Recovered']/BRdf['Confirmed']
Germany_dr =  germandf['Recovered']/germandf['Confirmed']
US_dr =  usdf['Recovered']/usdf['Confirmed']
Sweden_dr = swedendf['Recovered']/swedendf['Confirmed']
arg_dr = argdf['Recovered']/argdf['Confirmed']



fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, Italy_dr, label = 'Recovered p/ Confirmed Italy')
ax.plot(BRdf.index, Brazil_dr, label = 'Recovered p/ Confirmed Brazil')
ax.plot(germandf.index, Germany_dr, label = 'Recovered p/ Confirmed Germany')
ax.plot(usdf.index, US_dr, label = 'Recovered p/ Confirmed USA')
ax.plot(swedendf.index, Sweden_dr, label = 'Recovered p/ Confirmed Sweden')
ax.plot(argdf.index, arg_dr, label = 'Recovered p/ Confirmed Argentina')


ax.set_xticks(np.arange(len(usdf.index)))
ax.set_xticklabels([c for c in usdf.index])



spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Recovered p/Confirmed p/day p/Country')

ax.legend(loc=2)


plt.show()

In [None]:
Italy_dr = italydf['Confirmed'] - (italydf['Recovered']+italydf['Deaths'])
Brazil_dr = BRdf['Confirmed'] - (BRdf['Recovered']+BRdf['Deaths'])
Germany_dr = germandf['Confirmed'] - (germandf['Recovered']+germandf['Deaths'])
US_dr = usdf['Confirmed'] - (usdf['Recovered']+usdf['Deaths'])
Sweden_dr = swedendf['Confirmed'] - (swedendf['Recovered']+swedendf['Deaths'])
arg_dr = argdf['Confirmed'] - (argdf['Recovered']+argdf['Deaths'])

Italy_dr = Italy_dr / italydf['Confirmed']
Brazil_dr = Brazil_dr / BRdf['Confirmed']
Germany_dr = Germany_dr / germandf['Confirmed']
US_dr = US_dr / usdf['Confirmed']
Sweden_dr = Sweden_dr / swedendf['Confirmed']
arg_dr = arg_dr / argdf['Confirmed']




fig, ax = plt.subplots(figsize=(15,7))

ax.plot(italydf.index, Italy_dr, label = 'Active p/ Confirmed Italy')
ax.plot(BRdf.index, Brazil_dr, label = 'Active p/ Confirmed Brazil')
ax.plot(germandf.index, Germany_dr, label = 'Active p/ Confirmed Alemanha')
ax.plot(usdf.index, US_dr, label = 'Active p/ Confirmed USA')
ax.plot(swedendf.index, Sweden_dr, label = 'Active p/ Confirmed Sweden')
ax.plot(argdf.index, arg_dr, label = 'Active p/ Confirmed Argentina')


ax.set_xticks(np.arange(len(usdf.index)))
ax.set_xticklabels([c for c in usdf.index])



spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)

plt.title('Active Cases p/Confirmed p/day p/Country')

ax.legend(loc=2)


plt.show()

### Country Behavior Graphs

This graph is the number of confirmed cases of each country over time, normalized. In it we can see the behavior of each country's curves and compare.

In this ellipse format, the countries with the leftmost curve have already passed the peak and are at the end of contagion, such as Germany and Italy.

Countries that have a straight curve have recently passed the peak and are starting to descend the contagion curve and will eventually go further to the left. This is the case in Sweden and the United States.

Brazil and Argentina that are curved to the right are countries that have not yet reached the peak of cases.

All countries tend to bend to the left in this elliptical shape, as they pass through the contagion. And the distance of the curves to those on the left defines how far they are from contagion to end.

In [None]:
usdf = df[df['Country/Region'] == 'US']
usdf.index = usdf['Date']
#usdf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in usdf.index] 

italydf = df[df['Country/Region'] == 'Italy']
italydf.index = italydf['Date']
#italydf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in italydf.index] 

germandf = df[df['Country/Region'] == 'Germany']
germandf.index = germandf['Date']
#germandf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in germandf.index] 

BRdf = df[df['Country/Region'] == 'Brazil']
BRdf.index = BRdf['Date']
#BRdf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in BRdf.index] 

swedendf = df[df['Country/Region'] == 'Sweden']
swedendf.index = swedendf['Date']
#swedendf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in swedendf.index] 

argdf = df[df['Country/Region'] == 'Argentina']
argdf.index = argdf['Date']
#argdf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in argdf.index] 


usdf.index = [c for c in range(0,len(usdf))]
italydf.index = [c for c in range(0,len(italydf))]
BRdf.index = [c for c in range(0,len(BRdf))]
germandf.index = [c for c in range(0,len(germandf))]
swedendf.index = [c for c in range(0,len(swedendf))]
argdf.index = [c for c in range(0,len(argdf))]


fig, ax = plt.subplots(figsize=(15,7))



ax.plot(italydf.index, italydf['Confirmed']/italydf['Confirmed'].max(), label = 'Confirmed Normalized Italy')
ax.plot(BRdf.index, BRdf['Confirmed']/BRdf['Confirmed'].max(), label = 'Confirmed Normalized Brazil')
ax.plot(germandf.index, germandf['Confirmed']/germandf['Confirmed'].max(), label = 'Confirmed Normalized Germany')
ax.plot(usdf.index, usdf['Confirmed']/usdf['Confirmed'].max(), label = 'Confirmed Normalized USA')
ax.plot(swedendf.index, swedendf['Confirmed']/swedendf['Confirmed'].max(), label = 'Confirmed Normalized Sweden')
ax.plot(argdf.index, argdf['Confirmed']/argdf['Confirmed'].max(), label = 'Confirmed Normalized Argentina')


ax.set_xticks(np.arange(len(usdf.index)))
ax.set_xticklabels([c for c in usdf.index])



spacing = 2
for label in ax.xaxis.get_ticklabels()[::spacing]:
    label.set_visible(False)
    
plt.xticks(rotation=60)


ax.legend(loc=2)

plt.show()

### Distribution of cases

The graphs below are the distribution of how many days each country stayed at each stage of contagion. The confirmed cases were normalized from 0 to 1, where 1 is the total number of cases so far and 0 is when there were none or almost no cases. On the y-axis we have how many days the country stayed at that stage.

Countries that have passed the peak and the contagion in general will have 2 peaks, one close to 0 and the other close to 1. This means that there are several days without a large increase in the total number of cases, and that the number of cases increasing for several days. is relatively stagnant. Just as it was stagnant when there were 0 cases.

Countries that have not reached the end of contagion will only have a peak at 0, and a small rise close to 1 if it has recently passed the peak of daily contagion.

In [None]:
sns.distplot(italydf['Confirmed']/italydf['Confirmed'].max(),kde=False, rug=True, label='Italy', bins=15)
plt.title('Italy')
plt.show()
sns.distplot(BRdf['Confirmed']/BRdf['Confirmed'].max(),kde=False, rug=True,label='Brazil', bins=15)
plt.title('Brazil')
plt.show()
sns.distplot(germandf['Confirmed']/germandf['Confirmed'].max(),kde=False, rug=True, label='Germany', bins=15)
plt.title('Germany')
plt.show()
sns.distplot(usdf['Confirmed']/usdf['Confirmed'].max(),kde=False, rug=True,label='USA', bins=15)
plt.title('USA')
plt.show()
sns.distplot(swedendf['Confirmed']/swedendf['Confirmed'].max(),kde=False, rug=True,label='Sweden', bins=15)
plt.title('Sweden')
plt.show()
sns.distplot(argdf['Confirmed']/argdf['Confirmed'].max(),kde=False, rug=True,label='Argentina', bins=15)
plt.title('Argentina')
plt.show()

Those countries that have 2 peaks will have an average of cases (normalized) around 0.5, because of those 2 peaks in 1 and 0. Therefore, we can use this average as a parameter to know how a country has progressed to reach the end of the Contagion.

Average number of (normalized) cases in Italy:

In [None]:
(italydf['Confirmed']/italydf['Confirmed'].max()).mean()

### Progress of Each Country


This graph shows the average of the total (normalized) cases of each country, over time. We see that Brazil rises very slowly to 0.5 average. And the United States was rising at a good pace, but now with a second spike in daily cases, the line has flattened out. Sweden was in the same situation.

The chart starts from the 40th day of contagion of the countries.



In [None]:

usdf = df[df['Country/Region'] == 'US']
usdf.index = usdf['Date']
#usdf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in usdf.index] 

italydf = df[df['Country/Region'] == 'Italy']
italydf.index = italydf['Date']
#italydf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in italydf.index] 

germandf = df[df['Country/Region'] == 'Germany']
germandf.index = germandf['Date']
#germandf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in germandf.index] 

BRdf = df[df['Country/Region'] == 'Brazil']
BRdf.index = BRdf['Date']
#BRdf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in BRdf.index] 

swedendf = df[df['Country/Region'] == 'Sweden']
swedendf.index = swedendf['Date']
#swedendf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in swedendf.index] 

argdf = df[df['Country/Region'] == 'Argentina']
argdf.index = argdf['Date']
#argdf.index = [dt.datetime.strptime(c, "%m/%d/%y").strftime("%Y-%m-%d") for c in argdf.index] 


usdf.index = [c for c in range(0,len(usdf))]
italydf.index = [c for c in range(0,len(italydf))]
BRdf.index = [c for c in range(0,len(BRdf))]
germandf.index = [c for c in range(0,len(germandf))]
swedendf.index = [c for c in range(0,len(swedendf))]
argdf.index = [c for c in range(0,len(argdf))]


fig, ax = plt.subplots(figsize=(15,7))

mean = []

for c in range(40,len(italydf.index)):
    
    mindf = italydf.loc[:c]
    
    mean.append((mindf['Confirmed']/mindf['Confirmed'].max()).mean())

    
index = [c+40 for c in range(0,len(mean))]
ax.plot(index, mean, label='Italy')

mean = []

for c in range(40,len(usdf.index)):
    
    mindf = usdf.loc[:c]
    
    mean.append((mindf['Confirmed']/mindf['Confirmed'].max()).mean())

        
index = [c+40 for c in range(0,len(mean))]
ax.plot(index, mean, label='USA')

mean = []

for c in range(40,len(BRdf.index)):
    
    mindf = BRdf.loc[:c]
    
    mean.append((mindf['Confirmed']/mindf['Confirmed'].max()).mean())
    
index = [c+40 for c in range(0,len(mean))]
ax.plot(index, mean, label='Brazil')

mean = []

for c in range(40,len(swedendf.index)):
    
    mindf = swedendf.loc[:c]
    
    mean.append((mindf['Confirmed']/mindf['Confirmed'].max()).mean())
    
index = [c+40 for c in range(0,len(mean))]
ax.plot(index, mean, label='Sweden')


mean = []

for c in range(40,len(germandf.index)):
    
    mindf = germandf.loc[:c]
    
    mean.append((mindf['Confirmed']/mindf['Confirmed'].max()).mean())
    
index = [c+40 for c in range(0,len(mean))]
ax.plot(index, mean, label='Germany')


plt.legend(loc=2)
plt.title('Graph of Progression of each Country')
plt.show()
    

### Reconstruction of the Curves 


Showing the confirmed cases curve progressing through contagion. For Italy, USA and Brazil. Italy being an example of a country that reached the end of the crisis, the USA in the middle to the end, and Brazil in the beginning / middle of the crisis.

Each line is the graph at a given time, we can see how the curve was flattening, and also how the USA is going through a second peak. The numbers have been normalized.

In [None]:
fig, ax = plt.subplots(figsize=(15,7))

for c in range(20,len(italydf.index)):
    
    mindf = italydf.loc[:c]
    
    ax.plot(mindf.index, mindf['Confirmed']/mindf['Confirmed'].max())
 
plt.title('Reconstruction of the curve Italy')
plt.show()

fig, ax = plt.subplots(figsize=(15,7))

for c in range(20,len(usdf.index)):
    
    mindf = usdf.loc[:c]
    ax.plot(mindf.index, mindf['Confirmed']/mindf['Confirmed'].max())

plt.title('Reconstruction of the curve USA')
plt.show()

fig, ax = plt.subplots(figsize=(15,7))

for c in range(20,len(BRdf.index)):
    
    mindf = BRdf.loc[:c]
    ax.plot(mindf.index, mindf['Confirmed']/mindf['Confirmed'].max())

plt.title('Reconstruction of the curve Brazil')
plt.show()

# Prediction

## Preparing

Let's analyze some numbers to build a model where we will try to predict what the entire contagion curve in Brazil will be like.

First, let's look at the contamination rates, that is, how many more people are being infected and dying. As the numbers vary so much, we made a 10-day moving average to compensate for weekends that usually have fewer case records.

In [None]:
BRdf = BRdf[BRdf['Confirmed']>=1]

death_ratio = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)

death_ratio = (death_ratio/(BRdf['Confirmed'].shift(1)))*100

ax.plot(BRdf.index, death_ratio.rolling(10).mean(), label = 'Mobile Media 10 days')

print('Average +confirmed percentage over the past 30 days')
print(death_ratio.rolling(10).mean().tail(31))


In [None]:
death_ratio = BRdf['Deaths'] - BRdf['Deaths'].shift(1)

death_ratio = (death_ratio/(BRdf['Deaths'].shift(1)))*100

ax.plot(BRdf.index, death_ratio.rolling(10).mean(), label = 'Media Movel 10 dias')

print('Average +death percentage over the past 30 days')
print(death_ratio.rolling(10).mean().tail(31))


We see that rates are falling over time. What we are looking for is a rate at which they fall, and use as a constant in our model. For that we will use Linear Regression to have a better idea for the values of the constants that we will use.

The model will start on the 106th day of contagion so that we can compare it with the other days ahead. So let's take the contagion rate for that day.

In [None]:
death_ratio = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)

death_ratio = (death_ratio/(BRdf['Confirmed'].shift(1)))*100

death_ratio = death_ratio.rolling(10).mean()

print(death_ratio.loc[104:108])

We see that around that time the rate of contagion was around 7% on average, so we write this down and use it later.

## Linear Regression

Let's use linear regression to get an idea of the numbers we should be using. In this first model, we will use the numbers of a day *t* to predict a day *t + 1*. In other words, the model will try to predict the next day's cases based on the previous day's. Thus, the coefficient of the model will be the contagion rate that best fits every day of contagion. Here we will use the data only until the 106th day of the world crisis.

In [None]:
from sklearn.svm import SVR

BRdf = BRdf[BRdf['Confirmed'] >= 1]

taxa_contagio = SVR(kernel='linear')

X = np.array(BRdf['Confirmed'])[:-52].reshape(-1,1)
y = np.array(BRdf['Confirmed'].shift(-1))[:-52]


taxa_contagio.fit(X,y)
print(f'score: {taxa_contagio.score(X,y)}/100')
plt.subplots(figsize=(15,7))
index = [c for c in range(0,len(X))]
plt.plot(index, (taxa_contagio.predict(X)-y), 'b.')
plt.title('Errors')
plt.show()
plt.subplots(figsize=(15,7))
plt.plot(index, taxa_contagio.predict(X), 'b.', label='Predicted')
plt.plot(index, y, 'r.', label='Real')
plt.title('Predicted x Real')
plt.show()

The 2 models below will try to predict the contagion rate for the next day with the rate for the previous day, similar to the model above. Thus the models will return a constant of how much the rate declines over time.

The reason for using 2 models is because of the wide variation in the rate, which makes it unpredictable. So we will train a model with a 7-day moving average and another with a 3-day moving average and we will try to get an idea of the constants we should use.

In [None]:
contagio_decai = SVR(kernel='linear')

traindf = pd.DataFrame()
Brazil_dr = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Confirmed'].shift(1)))*100
traindf = BRdf[['Confirmed','Deaths','Active','Recovered']]

traindf['Taxa Contagio']  = Brazil_dr

Brazil_dr = BRdf['Deaths'] - BRdf['Deaths'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Deaths'].shift(1)))*100

traindf['Taxa Mortes']  = Brazil_dr

traindf['Morte p/dia'] = BRdf['Deaths'] - BRdf['Deaths'].shift(1)
traindf['Contaminados p/dia'] = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)
traindf = traindf.dropna()

X = np.array(traindf['Taxa Contagio'].rolling(7).mean())[40:-1].reshape(-1,1)
y = np.array(traindf['Taxa Contagio'].rolling(7).mean().shift(-1))[40:-1]



contagio_decai.fit(X,y)
print(f'score: {contagio_decai.score(X,y)}/100')
plt.subplots(figsize=(15,7))
index = [c for c in range(0,len(X))]
plt.plot(index, (contagio_decai.predict(X)-y), 'b.')
plt.title('Errors')
plt.show()
plt.subplots(figsize=(15,7))
plt.plot(index, contagio_decai.predict(X), 'b', 'Predicted')
plt.plot(index, y, 'r.', label='Real')
plt.title('Predicted x Real')
plt.show()


In [None]:
contagio_decai2 = SVR(kernel='linear')

traindf = pd.DataFrame()
Brazil_dr = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Confirmed'].shift(1)))*100
traindf = BRdf[['Confirmed','Deaths','Active','Recovered']]

traindf['Taxa Contagio']  = Brazil_dr

Brazil_dr = BRdf['Deaths'] - BRdf['Deaths'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Deaths'].shift(1)))*100

traindf['Taxa Mortes']  = Brazil_dr

traindf['Morte p/dia'] = BRdf['Deaths'] - BRdf['Deaths'].shift(1)
traindf['Contaminados p/dia'] = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)
traindf = traindf.dropna()

X = np.array(traindf['Taxa Contagio'].rolling(3).mean())[30:-1].reshape(-1,1)
y = np.array(traindf['Taxa Contagio'].rolling(3).mean().shift(-1))[30:-1]



contagio_decai2.fit(X,y)

print(f'score: {contagio_decai2.score(X,y)}/100')
plt.subplots(figsize=(15,7))
index = [c for c in range(0,len(X))]
plt.plot(index, (contagio_decai2.predict(X)-y), 'b.')
plt.title('Errors')
plt.show()
plt.subplots(figsize=(15,7))
plt.plot(index, contagio_decai2.predict(X), 'b', label='Predicted')
plt.plot(index, y, 'r.', label='Real')
plt.title('Predicted x Real')
plt.show()


In [None]:

print(f'contagion rate {taxa_contagio.coef_}')
print(f'contagion decay 1 {contagio_decai.coef_}')
print(f'contagion decay 2 {contagio_decai2.coef_}')

The 2 constants 'contagion decays' are from the last 2 models and the number is expected to change over time.

## Prediction

The forecasting model is going to be a function:
    
    
    
    (1 + ( contagion . contagio_decay^i)) . last_confirmed
   
Where:
* contagion = 7% (0.07)

* contagion_decay = 98.2% (0.982)

* i = iteration (1,2,3...n)

So the contagion rate starts at 7% and goes down as it is multiplied by the decay. The numbers of confirmed will then be multiplied by *1 + contagion_final* at each iteration.

Below we have the graph predicted by the model. It had a great score of 99.9%.

In [None]:
import statistics as st


#epsilon IN: quedac0 -> queda1
#contagio_decai IN: tm0 -> OUT: tc1
#taxa_contagio IN: c0 -> OUT:c1

#\\\\\\\\\\\\\\ MODELO \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

def func(deaths, confirmed,  i):
 
    #Variaveis dos modelos podem não estar de acordo com o comportamento dos ultimos dias
    #Pra isso use para ter noção
    
    #Variaveis
    contagio = 1.07
    decai = 0.982
    reg = 0.00
    
    
    confirm = np.array(confirmed).reshape(-1,1)
    
    contagio = contagio-1
    contagio = (contagio*100)* ((decai-reg)**i)
    contagio = (contagio/100)+1
    
    u_confirmed = (confirm[-1:])[-1] * contagio
    

    u_confirmed = u_confirmed.flatten()[0]
    
   
    
    u_death = u_confirmed * 0.05
    
    
    confirmed.append(u_confirmed)
    deaths.append(u_death)
    
    return  confirmed, deaths

#\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

BRdf = BRdf[BRdf['Confirmed'] >= 1]

ultimos_confirmados = BRdf['Confirmed'].loc[106]
ultimos_mortes = BRdf['Deaths'].loc[106]

confirmed = []
for c in BRdf['Confirmed'].loc[:106]:
    confirmed.append(c)

    
deaths = []
for c in BRdf['Deaths'].loc[:106]:
    deaths.append(c)

dias = 0


newdf = pd.DataFrame()
newdf['Confirmed'] = confirmed
media = (newdf['Confirmed']/newdf['Confirmed'].max()).mean()



contagios = []
Brazil_dr = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Confirmed'].shift(1)))*100
Brazil_dr = Brazil_dr.loc[:106]
for c in (Brazil_dr):
    contagios.append(c)

mortalidade = []
Brazil_dr = BRdf['Deaths'] - BRdf['Deaths'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Deaths'].shift(1)))*100
Brazil_dr = Brazil_dr.loc[:106]
for c in (Brazil_dr):
    mortalidade.append(c)

newdf = pd.DataFrame()
i=1
while(  media < 0.45):
   

   

    newdf = pd.DataFrame()
    newdf['Confirmed'] = confirmed
    newdf['Deaths'] = deaths
    
    newdf['Taxa Mortes'] = mortalidade
    
    newdf['Taxa Contagio'] = contagios
    
    newdf['Queda Contagio'] = newdf['Taxa Contagio'] - newdf['Taxa Contagio'].shift(1)
    
    newdf = newdf[['Confirmed','Deaths','Taxa Mortes','Taxa Contagio','Queda Contagio']]
    
    newdf = newdf.dropna(axis=0)
    
    #deaths confirmed queda_contagio taxa_mortes taxa_contagio
    confirmed, deaths = func(deaths,confirmed, i)
    
    newdf = pd.DataFrame()
    
    newdf['Confirmed'] = confirmed
    
    i=i+1
    
    media = (newdf['Confirmed']/newdf['Confirmed'].max()).mean()
    

    newdf['Deaths'] = deaths
    newdf = newdf.dropna()
    
    Brazil_dr = newdf['Deaths'] - newdf['Deaths'].shift(1)
    Brazil_dr = (Brazil_dr/(newdf['Deaths'].shift(1)))*100
    
    
    newdf['Taxa Mortes'] = Brazil_dr
    taxa_mortes = newdf['Taxa Mortes'].values[-1]
    newdf = newdf.dropna()
    
    Brazil_dr = newdf['Confirmed'] - newdf['Confirmed'].shift(1)
    Brazil_dr = (Brazil_dr/(newdf['Confirmed'].shift(1)))*100
    
    newdf['Taxa Contagio'] = Brazil_dr
    newdf = newdf.dropna()
    
    taxa_c = newdf['Taxa Contagio'].values[-1]
    dias = dias+1
    
    contagios.append(taxa_c)
    mortalidade.append(taxa_mortes)
    

    
plt.subplots(figsize=(15,7))

ci = [c for c in range(0,len(confirmed))]
di = [c for c in range(0,len(deaths))]
brindex = [c for c in range (0,len(BRdf.index))]

plt.plot(ci, confirmed, label='Confirmed')
plt.plot(brindex, BRdf['Confirmed'], label='Real Confirmed')
plt.axvline(x=72, label='Start of the model', color='b')
plt.legend(loc=2)
plt.show()

space = 72+(BRdf['Confirmed'].loc[106:].shape[0])


plt.subplots(figsize=(15,7))

error = (BRdf['Confirmed'].loc[106:] - confirmed[72:space])
m=BRdf['Deaths'].loc[106:].mean()
std_err = (BRdf['Confirmed'].loc[106:] - m)

reg_err = sum([c**2 for c in error])
std_err = sum([c**2 for c in std_err])

R2 = 1 - (reg_err/std_err)

print(f'Score: {R2} %')

index = [c for c in range(0,len(confirmed[72:space]))]
plt.plot(index, error, label= 'Gross Error')
plt.legend(loc=2)

plt.show()





With the gross error we can see that the model missed some points, predicting less than the actual result. This indicates some increases in cases above expectations.

Below we have the death forecast, we have the graphs according to the percentage of mortality. As this percentage has varied, as seen previously, we did not try to predict deaths as we have tried with confirmed cases. Instead, we have the graphs for each percentage, and we see that the actual deaths cross the ranges of each forecast. It should continue to vary over time.

For the next analysis we will assume that the rate remains at 5%.

In [None]:
plt.subplots(figsize=(15,7))
plt.plot(ci, [c*0.07 for c in confirmed], 'k',label='Prediction 7% deaths')
plt.plot(ci, [c*0.06 for c in confirmed], 'r',label='Prediction 6% deaths')
plt.plot(di, deaths, label='Prediction 5% deaths')
plt.plot(ci, [c*0.04 for c in confirmed], 'c',label='Prediction 4% deaths')

plt.plot(brindex, BRdf['Deaths'], 'g',label='Real Deaths')

plt.axvline(x=72, label='Start of Model', color='b')
plt.legend(loc=2)
plt.show()




plt.subplots(figsize=(15,7))

error = (BRdf['Deaths'].loc[106:] - deaths[72:space])
m=BRdf['Deaths'].loc[106:].mean()
std_err = (BRdf['Deaths'].loc[106:] - m)


reg_err = sum([c**2 for c in error])
std_err = sum([c**2 for c in std_err])

R2 = 1 - (reg_err/std_err)

print(f'Score: {R2} %')

plt.plot(index, error, label= 'Gross Error')
plt.legend(loc=2)
plt.show()
       


## Prediction Analysis


Now let’s take a closer look at the data from that forecast.

Below we will see the rates of contagion and deaths according to the model.

In [None]:
plt.subplots(figsize=(15,7))

ci = [c for c in range(0,len(confirmed))]
di = [c for c in range(0,len(deaths))]
brindex = [c for c in range (0,len(BRdf.index))]

Brazil_dr = BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Confirmed'].shift(1)))*100

plt.plot(ci, contagios, label='Predicted Contagion')
plt.plot(brindex,Brazil_dr, label='Real' )

plt.axvline(x=72, label='Start of the Model', color='b')
plt.ylim([0,20])
plt.legend(loc=2)
plt.title('% More Cases')
plt.show()

Brazil_dr = BRdf['Deaths'] - BRdf['Deaths'].shift(1)
Brazil_dr = (Brazil_dr/(BRdf['Deaths'].shift(1)))*100

plt.subplots(figsize=(15,7))
plt.plot(di, mortalidade, label='% Predicted ')
plt.plot(brindex,Brazil_dr, label='Real' )
plt.axvline(x=72, label='Start of the Model', color='b')
plt.title('% More Deaths')
plt.ylim([0,20])
plt.legend(loc=2)
plt.show()

So we have some information about the expected end of the crisis.

In [None]:
print(f'days: {dias - len(BRdf["Confirmed"])}')

newdf = pd.DataFrame()
newdf['Confirmed'] = confirmed
newdf['Deaths'] = deaths
newdf['Death Rates'] = newdf['Deaths']/newdf['Confirmed']
newdf['Confirmed p/day'] = newdf['Confirmed'] - newdf['Confirmed'].shift(1)

pico =   newdf['Confirmed p/day'].idxmax()
pico = pico - BRdf['Confirmed'].shape[0]

print(f'Days until peak {pico}')
print(f'Last Confirmed {confirmed[-1]}')
print(f'Last Deaths {deaths[-1]}')
print(f'Death Rate {deaths[-1]/confirmed[-1]} ')

We see that it is 83 days until we reach the end of the daily case curve, and at the moment we are at the peak of daily cases. Which leads us to conclude that we are a little far from going through everything, if it was not already clear with the countries' progress chart.

Here we have the curve of predicted new daily cases, the real ones, and the 10-day moving average of the real cases. We see that the model seems to be right in the trend of the cases.

The daily death curve does not have a close fit because the actual daily deaths are flattening and following below 5% mortality.

In [None]:
newdf = pd.DataFrame()
newdf['Confirmed'] = confirmed
newdf['Deaths'] = deaths
newdf['Death Rates'] = newdf['Deaths']/newdf['Confirmed']
newdf['Confirmed p/day'] = newdf['Confirmed'] - newdf['Confirmed'].shift(1)
newdf['Deaths p/day'] = newdf['Deaths'] - newdf['Deaths'].shift(1)

plt.subplots(figsize=(15,7))
plt.plot(newdf.index, newdf['Confirmed p/day'], label='Model')
plt.plot(brindex, (BRdf['Confirmed'] - BRdf['Confirmed'].shift(1)).rolling(10).mean(),'r' ,label='Moving Average')
plt.plot(brindex, BRdf['Confirmed'] - BRdf['Confirmed'].shift(1), label='Real')
plt.axvline(x=72, label='Start of the Model', color='b')
plt.title('Gross Confirmed p/day')
plt.legend(loc=2)
plt.show()

plt.subplots(figsize=(15,7))
plt.plot(newdf.index, (newdf['Confirmed'] * 0.05) - (newdf['Confirmed'].shift(1)*0.05), label='5%')
plt.plot(brindex, (BRdf['Deaths'] - BRdf['Deaths'].shift(1)).rolling(10).mean(),'r' ,label='Moving Average')
plt.plot(brindex, BRdf['Deaths'] - BRdf['Deaths'].shift(1), label='Real')
plt.axvline(x=72, label='Start of the Model', color='b')
plt.title('Gross Deaths p/day')
plt.legend(loc=2)
plt.show()



This graph shows the reconstruction of the confirmed cases curve in Brazil, as done previously, compared with the predicted curve. And the blue line shows where we are in the 300 days.

In [None]:
fig, ax = plt.subplots(figsize=(15,7))

for c in range(20,len(BRdf.index)):
    
    mindf = BRdf.loc[:c]
    ax.plot(mindf.index, mindf['Confirmed']/mindf['Confirmed'].max())


ax.axvline(x=mindf.index[-1], label='Fase Atual', color='b')
ax.plot(newdf.index, np.array(newdf['Confirmed']/newdf['Confirmed'].max()), color='grey', label='Final')
    
plt.title('Reconstruction of the curve Brasil')
plt.show()

Below, the current distribution of cases compared to that predicted for the end of the crisis.

In [None]:
sns.distplot(BRdf['Confirmed']/BRdf['Confirmed'].max(),kde=False, rug=True,label='Brasil', bins=15)
sns.distplot(newdf['Confirmed']/newdf['Confirmed'].max(),kde=False, rug=True,label='Brasil Previsto', bins=15)
plt.show()

Finally, a graph comparing the curve of several countries with that of Brazil and the curve forecast for Brazil.

In [None]:
italydf = italydf[italydf['Confirmed']>=1]
BRdf = BRdf[BRdf['Confirmed']>=1]
germandf = germandf[germandf['Confirmed']>=1]
usdf = usdf[usdf['Confirmed']>=1]
swedendf = swedendf[swedendf['Confirmed']>=1]
argdf = argdf[argdf['Confirmed']>=1]


usdf.index = [c for c in range(0,len(usdf))]
italydf.index = [c for c in range(0,len(italydf))]
BRdf.index = [c for c in range(0,len(BRdf))]
germandf.index = [c for c in range(0,len(germandf))]
swedendf.index = [c for c in range(0,len(swedendf))]
argdf.index = [c for c in range(0,len(argdf))]

italy_curve = (italydf['Confirmed']-italydf['Confirmed'].shift(1))
italy_curve = italy_curve.rolling(7).mean()
italy_curve = italy_curve/italy_curve.max()


BR_curve = (BRdf['Confirmed']-BRdf['Confirmed'].shift(1))
BR_curve = BR_curve.rolling(7).mean()
BR_curve = BR_curve/BR_curve.max()


us_curve = (usdf['Confirmed']-usdf['Confirmed'].shift(1))
us_curve = us_curve.rolling(7).mean()
us_curve = us_curve/us_curve.max()


sweden_curve = (swedendf['Confirmed']-swedendf['Confirmed'].shift(1))
sweden_curve = sweden_curve.rolling(7).mean()
sweden_curve = sweden_curve/sweden_curve.max()


german_curve = (germandf['Confirmed']-germandf['Confirmed'].shift(1))
german_curve = german_curve.rolling(7).mean()
german_curve = german_curve/german_curve.max()


new_curve = (newdf['Confirmed']-newdf['Confirmed'].shift(1))
new_curve = new_curve.rolling(7).mean()
new_curve = new_curve/new_curve.max()


plt.subplots(figsize=(15,7))

plt.plot(newdf.index, new_curve, label='Brazil_predicted')
plt.plot(italydf.index, italy_curve, label='Italy')
plt.plot(germandf.index, german_curve, label='Germany')
plt.plot(BRdf.index, BR_curve, label='Brazil')
plt.plot(swedendf.index, sweden_curve, label = 'Sweden')
plt.plot(usdf.index, us_curve, label='USA')

plt.legend(loc=2)
plt.show()

# Conclusion

We were able to make a solid analysis of the situation in Brazil, and compare it with other countries. We took valuable information, such as the progression graph, using normalization and data distribution, and the forecasting model, which until now has had a good prediction on the trend. In general there are good and bad points in the Brazilian situation. But it seems that the crisis is far from over.