# Cities EDA for COVID-19 in Brazil and Neural Networks prediction
<img src="https://static.toiimg.com/photo/77502958.cms">


Brazil has reached a Dantesque point of the pandemics. We face a maximum level of ICU beds ocupation and a possible shortage of hospital inputs in the upcoming days. Death records are being beaten in a daily basis and vaccination is going way too slow to stop the spread of the virus.

The current phase of the pandemic can be described as anthropophagic, while deaths and cases are soaring, multiple illegal parties are taking place. In addition to that, unrealistic and non-scientific measures are being adopted by the Federal Government to fight the virus.

This notebook aims for an Explanatory Data Analisys (EDA) for COVID-19 cities data and further predictions using the powerful Neural Networks tool.

I got a lot of inspirations from previous notebooks posted, such as: (Panorama do COVID-19 no Brasil - Elloá B. Guedes). Thank you a lot for sharing your works! 

Let's start by fetching some of the libraries used in this project.
1. Pandas
2. Numpy
3. Matplotlib 
4. Seaborn
5. Tensorflow
6. Sklearn



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import style
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error


* Set an matplotlib charming style.
* Load and assign variables to datasets. 


In [None]:
style.use('seaborn-darkgrid')

# Load the cities data
filepath = '../input/corona-virus-brazil/brazil_covid19_cities.csv'
filepath1 = '../input/corona-virus-brazil/brazil_covid19.csv'
filepath3 = '../input/corona-virus-brazil/brazil_covid19_macro.csv'
filepath4 = '../input/brazilian-cities/BRAZIL_CITIES.csv'

covid_cities = pd.read_csv(filepath, index_col='date')
covid_cities.index = pd.to_datetime(covid_cities.index)
covid_states = pd.read_csv(filepath1, index_col='date')
covid_states.index = pd.to_datetime(covid_states.index)
covid_country = pd.read_csv(filepath3, index_col='date')
covid_country.index = pd.to_datetime(covid_country.index)
cities_data = pd.read_csv(filepath4, sep=";", decimal=",")


Let's take a look at covid in cities data.


In [None]:
covid_cities.head()


In [None]:
print('Dates range from '+str(covid_cities.index.min())+' to '+str(covid_cities.index.max()))

Now we should check the demographic information related to cities and select some useful columns.  

In [None]:
cities_data.columns

In [None]:
useful_cities_columns = ['CITY','GDP', 'GDP_CAPITA', 'COMP_Q', 'IDHM', 'IDHM_Renda', 'ESTIMATED_POP', 'CAPITAL']
cities_data = cities_data[useful_cities_columns]


Plot the separate raw evolution of cases for a sample of capitals that will be used throughout  the entire analysis.


In [None]:
def datavisualization_cases(city):
    plt.figure(figsize=(8,4))
    plt.plot(covid_cities.cases[covid_cities.name == city], label=str(city))
    plt.legend()
    plt.title('Cases in '+city)
    plt.show()
    
datavisualization_cases('Curitiba')
datavisualization_cases('Rio de Janeiro')
datavisualization_cases('Porto Alegre')
datavisualization_cases('São Paulo')
datavisualization_cases('Manaus')
datavisualization_cases('Salvador')
datavisualization_cases('Goiânia')

Plot the separate raw evolution of cases for the sample.



In [None]:
def datavisualization_deaths(city):
    plt.figure(figsize=(8,4))
    plt.plot(covid_cities.deaths[covid_cities.name == city],'r' ,label=str(city))
    plt.legend()
    plt.title('Deaths in '+city)
    plt.show()
    
datavisualization_deaths('Curitiba')
datavisualization_deaths('Rio de Janeiro')
datavisualization_deaths('Porto Alegre')
datavisualization_deaths('São Paulo')
datavisualization_deaths('Manaus')
datavisualization_deaths('Salvador')
datavisualization_deaths('Goiânia')


Altought those capitals are not the greatest pick to represent the reality of brazilians munincipalities, which are many and heterogeneous, we can still extract some explicative trends from them. The graphs are clearly showing the two more agressive phases of the pandemics in Brazil, the "waves".

To assume another perspective we may want to look at the daily cases and deaths, and not only to an absolute value.

In [None]:
def datavisualization_casesbyday(city):
    plt.figure(figsize=(8,4))
    covid_cities['cases_by_day'] = (covid_cities.cases[covid_cities.name == city]-covid_cities.cases[covid_cities.name == city].shift(1))
    plt.plot(covid_cities.cases_by_day[covid_cities.name == city],'g' ,label=str(city))
    plt.legend()
    plt.title('Variations on cases by day in '+city)
    plt.show()
    
datavisualization_casesbyday('Curitiba')
datavisualization_casesbyday('Rio de Janeiro')
datavisualization_casesbyday('Porto Alegre')
datavisualization_casesbyday('São Paulo')
datavisualization_casesbyday('Manaus')
datavisualization_casesbyday('Salvador')
datavisualization_casesbyday('Goiânia')

I want to point out the fact that data gathering might variate among cities: definitely Curitiba uses some alternative method to register their cases.


Those are plots based on the difference between the "current" day number of cases and the previous day number of cases, that explains the presence of some non positive numbers.


Besides that, we notice again the wave feature of this pandemic until now.

Now check the deaths by day variation. 

In [None]:
def datavisualization_deathsbyday(city):
    plt.figure(figsize=(8,4))
    covid_cities['deaths_by_day'] = (covid_cities.deaths[covid_cities.name == city]-covid_cities.deaths[covid_cities.name == city].shift(1))
    plt.plot(covid_cities.deaths_by_day[covid_cities.name == city], 'y' ,label=str(city))
    plt.legend()
    plt.title('Variations on deaths by day in '+city)
    plt.show()
    
datavisualization_deathsbyday('Curitiba')
datavisualization_deathsbyday('Rio de Janeiro')
datavisualization_deathsbyday('Porto Alegre')
datavisualization_deathsbyday('São Paulo')
datavisualization_deathsbyday('Manaus')
datavisualization_deathsbyday('Salvador')
datavisualization_deathsbyday('Goiânia')

How are those cities mortality rates behaving  when compared to the nation? Which is the highest mortality rate, and which is the lowest?

In [None]:
def datavisualization_mortrate(city):
    plt.figure(figsize=(8,4))
    covid_cities['mortality_rate'] = (covid_cities.deaths[covid_cities.name == city]/covid_cities.cases[covid_cities.name == city])
    plt.plot(covid_cities.mortality_rate[covid_cities.name == city], 'brown' ,label=str(city))
    plt.plot(covid_country.deaths/covid_country.cases, label='Mortality in Brazil')
    plt.legend()
    plt.title('Mortality rate in Brazil X '+city)
    plt.show()
    
datavisualization_mortrate('Curitiba')
datavisualization_mortrate('Rio de Janeiro')
datavisualization_mortrate('Porto Alegre')
datavisualization_mortrate('São Paulo')
datavisualization_mortrate('Manaus')
datavisualization_mortrate('Salvador')
datavisualization_mortrate('Goiânia')

In [None]:
covid_cities['mortality'] = covid_cities.deaths/covid_cities.cases
print('Maximum mortality is '+str(covid_cities[(covid_cities.index == covid_cities.index.max())].mortality.max()))
print('Minimum mortality is '+str(covid_cities[(covid_cities.index == covid_cities.index.max())].mortality.min()))


We should now check if there are any significant differences between mortality rates in brazilian regions. If there are, the possibility of explanatory socioeconomics parameters show up. 

In [None]:
regions = covid_states.groupby([covid_states.index,'region']).sum()
regions.reset_index(inplace=True)
regions['mortality'] = regions.deaths/regions.cases
regions_list = ['Centro-Oeste', 'Nordeste', 'Norte', 'Sul', 'Sudeste']
def region_mortality_plotter(lists):
    plt.figure(figsize=(15,8))
    for region in lists:
        plt.plot(regions[regions.region == region].date,regions[regions.region == region].mortality, label=region)
    plt.legend()    
    plt.title('Mortality rates in brazilian regions')
    plt.show()
    
region_mortality_plotter(regions_list)

The last graph shows us a significant increase, in almost every state, with the ability to deal with the virus. The "learning curve" may derive from a better procedure establishment on how to treat the disease in the hospitals.

We can also see that the richest region in Brazil ownes the highest rates of mortality, troughout the whole time of the pandemics.





# **How are the cases by 1000 in the sample capitals?**

In [None]:

my_capitals = ['Curitiba', 'Rio De Janeiro', 'Porto Alegre', 'São Paulo', 'Manaus', 'Salvador', 'Goiânia', 'Recife']
def cases_by_1000(cities):
    plt.figure(figsize=(15,8))
    for city in cities:
        plt.plot(covid_cities[covid_cities.name == city].cases*1000/cities_data[cities_data.CITY == city].ESTIMATED_POP.iloc[0], label=city)
    plt.legend()
    plt.title('Cases by 1000')
    plt.show()
def deaths_by_1000(cities):
    plt.figure(figsize=(15,8))
    for city in cities:
        plt.plot(covid_cities[covid_cities.name == city].deaths*1000/cities_data[cities_data.CITY == city].ESTIMATED_POP.iloc[0], label=city)
    plt.legend()
    plt.title('Deaths by 1000')
    plt.show()

cases_by_1000(my_capitals)
deaths_by_1000(my_capitals)


# Manaus

The soar of Manaus deaths by 1000, from 2021-01 to 2021-03, is certainly due to a collapse in the city's hospitals. A combination of rising cases and an unprecedent shortage on basic health inputs overcharged the system.

As in the picture bellow, the community had to assume the responsability of carriage and supply of oxygen to the hospitals. 


<img src="https://ichef.bbci.co.uk/news/800/cpsprodpb/1803E/production/_116666389_1be0925a-cf77-4afd-8475-5968340cd1f9.jpg">



After some exhausting lineplot analysis, we should now get started with scattering to see if there is any inferable correlation or causation between the variables
. 

In [None]:
data_to_scatter = pd.DataFrame()
data_to_scatter = covid_cities.sort_values(by=['name', 'date'])[['name', 'cases', 'deaths', 'mortality']].reset_index()
data_to_scatter = data_to_scatter.loc[data_to_scatter.groupby('name').date.idxmax(),:].drop(columns=['date']).reset_index().drop(columns=['index'])
cities_data.rename(columns={'CITY': 'name'}, inplace=True)
cities_data = cities_data.sort_values(by='name')
data_to_scatter = pd.merge(data_to_scatter, cities_data, on=['name'])
print(data_to_scatter.columns)


There is an interesting variable at the cities dataset, the "COMP_Q" is the "Number of Companies: Human health and social services".

Before the trial to credit some causation relation to this feature, it's important to notice an aspect of cities in Brazil: most of them are extremely small.

Check out:

In [None]:
print(str(cities_data[cities_data.COMP_Q<=8].name.count()) + ' cities have less than 8 medical instalations.')
print(str(cities_data[cities_data.ESTIMATED_POP<=50000].name.count()) + ' cities have less than 50,000 population.')
print(str(cities_data[cities_data.COMP_Q == cities_data.COMP_Q.max()].name.iloc[0]) +' has '+ str(cities_data.COMP_Q.max())+' medical facilities')
print(str(cities_data[cities_data.COMP_Q == cities_data.COMP_Q.min()].name.iloc[10]) +' has '+ str(cities_data.COMP_Q.min())+' medical facilities')

In [None]:
plt.figure(figsize=(15,8))
sns.distplot(data_to_scatter['COMP_Q'])
plt.title('Distribuition of medical facilities in brazilian cities')

plt.figure(figsize=(15,8))
sns.distplot(data_to_scatter['ESTIMATED_POP'])
plt.title('Distribuition of estimated population in brazilian cities')

In [None]:
gdp_capita_deaths = data_to_scatter[['name', 'GDP_CAPITA', 'deaths']]
gdp_capita_deaths.set_index('name', inplace=True)
gdp_capita_deaths = gdp_capita_deaths.astype(float)
plt.figure(figsize=(15,8))
plt.scatter(gdp_capita_deaths['GDP_CAPITA'], gdp_capita_deaths['deaths'], s=6)
plt.xlim([0,20000])
plt.ylim([0,5000])
plt.xlabel('GDP per Capita')
plt.ylabel('Deaths')
plt.title('Deaths X GDP per Capita')
plt.show()

There is no great insight from the graph above. In Brazil, cities with greater GDP per Capita tend to have higher populations, and that may be the cause for the positive correlation seen.

Notice some obvious results below:

In [None]:
plt.figure(figsize=(15,8))
sns.regplot(x=data_to_scatter['ESTIMATED_POP'].astype(float), y=data_to_scatter['cases'].astype(float), scatter_kws={'s':10})
plt.show()

Now check a correlation matrix for the choosen parameters:

In [None]:
plt.figure(figsize=(15,8))
corr = data_to_scatter.drop(columns=['name', 'CAPITAL']).astype(float).corr()
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, annot=True)
plt.title('Correlation Matrix')
plt.show()

Mortality has almost 0 correlation with all of the parameters. That somehow "kills" the attempt of using social and economics inputs as explanatory variables.

COMP_Q has a really strong relation with cases and deaths (0.93 and 0.94).

How the Number of Companies: Human health and social services/Population influences the deaths?

In [None]:
plt.figure(figsize=(15,8))
sns.scatterplot(x=data_to_scatter['COMP_Q'].astype(float)/data_to_scatter['ESTIMATED_POP'].astype(float), y=data_to_scatter['deaths'].astype(float)*1000/data_to_scatter['ESTIMATED_POP'].astype(float))
sns.scatterplot(x=data_to_scatter['COMP_Q'], y=data_to_scatter['deaths'])
plt.title(' Number of Companies: Human health and social services/Population X Deaths')

plt.show()

Deaths are growing along with the number of medical facilities in cities. At first sight that seems nonsense, but if we look closer we can infere that if some COVID19 patient lives in a city that lack the disease treatment infrastructure, he will be moved to somewhere that can provide him his needs.


In case he dies in a bigger "sattelite" city, his death will be counted as belonging to that place he was transfered for, in search for treatment.

Check a more especific analysis below, targeting only bigger cities.

In [None]:
# Scatter only bigger cities (>100000)
plt.figure(figsize=(15,8))
plt.title('Deaths by 1000 X Medical Facilities in cities with more than 100,000 population')
sns.regplot(x=data_to_scatter[data_to_scatter.ESTIMATED_POP.astype(float) > 100000].COMP_Q, y=data_to_scatter[data_to_scatter.ESTIMATED_POP.astype(float) > 100000].deaths/data_to_scatter[data_to_scatter.ESTIMATED_POP.astype(float) > 100000].ESTIMATED_POP)
plt.show()


# Scatter only bigger cities(>500000)
plt.figure(figsize=(15,8))
plt.title('Deaths by 1000 X Medical Facilities in cities with more than 500,000 population')
sns.regplot(x=data_to_scatter[data_to_scatter.ESTIMATED_POP.astype(float) > 500000].COMP_Q, y=data_to_scatter[data_to_scatter.ESTIMATED_POP.astype(float) > 500000].deaths*1000/data_to_scatter[data_to_scatter.ESTIMATED_POP.astype(float) > 500000].ESTIMATED_POP)
plt.show()

Now let's take a look in the distribution of deaths among the brazilian capitals

In [None]:
capitals_dataframe = pd.DataFrame()
capitals = (cities_data[cities_data.CAPITAL == 1].name)
capitals = (capitals).tolist()
capitals = ['Aracaju', 'Belo Horizonte', 'Belém', 'Boa Vista', 'Brasília', 'Campo Grande', 'Cuiabá', 'Curitiba', 'Florianópolis', 'Fortaleza', 'Goiânia', 'João Pessoa', 'Macapá', 'Maceió', 'Manaus', 'Natal', 'Palmas', 'Porto Alegre', 'Porto Velho', 'Recife', 'Rio Branco', 'Rio De Janeiro', 'Salvador', 'São Luís', 'São Paulo', 'Teresina', 'Vitória']
capitals_dataframe = covid_cities[(covid_cities.name.isin(capitals))][['name', 'cases', 'deaths', 'state']].sort_values(by=['name','date'])
(capitals_dataframe)
for name in capitals:
    last_capitals_dataframe = (capitals_dataframe[capitals_dataframe.index == max(capitals_dataframe.index)])[['name','deaths', 'cases']]

last_capitals_dataframe['cases_to_pie'] = last_capitals_dataframe['cases']*100/last_capitals_dataframe['cases'].sum()
last_capitals_dataframe = (last_capitals_dataframe.sort_values(by=['deaths','name'])[7:])
last_capitals_dataframe.sort_values(by='cases_to_pie', inplace=True)
plt.figure(figsize=(20,12))
plt.pie(last_capitals_dataframe.cases_to_pie, labels=last_capitals_dataframe.name, shadow=True)
plt.show()

Although the heatmap already pointed to a low correlation between mortality and economic variables, lets scatter those to see how it behaves. 

In [None]:
last_capitals_dataframe['mortality_rate'] = last_capitals_dataframe['deaths']/last_capitals_dataframe['cases']
x = (cities_data[cities_data.name.isin(capitals)][['name','GDP_CAPITA', 'CAPITAL', 'COMP_Q', ]])
x = x[x.CAPITAL == 1].sort_values(by='name')
last_capitals_dataframe = pd.merge(x, last_capitals_dataframe.sort_values(by='name'), on=['name'])
plt.figure(figsize=(15,8))
sns.scatterplot(x=last_capitals_dataframe.GDP_CAPITA, y=last_capitals_dataframe.mortality_rate)
plt.figure(figsize=(15,8))
sns.scatterplot(x=last_capitals_dataframe.COMP_Q, y=last_capitals_dataframe.mortality_rate)

plt.show()



Unconclusive results


# Raves over graves
# 
The main topic that catched eyes in social medias was the massive amount of illegal parties that took place during the pandemics. 


Going in a completely different direction from the public efforts to contain the disease, some people decided that having fun was more worthier than stopping the spread of COVID-19. 

Many instagram and twitter accounts surged with the goal to expose those illegal gatherings and to raise concern for the authorities.

<img src="https://i.em.com.br/tBw_y42duTCPjueuS5X650Zhd2Q=/790x/smart/imgsapp.em.com.br/app/noticia_127983242361/2020/11/09/1202754/20201109121431759395a.jpg">



Unfortunetly there is no available informations on illegal parties, so we can't use it directly as an variable to explain deaths nor cases.

A good substitute for the exact number of parties might be the holidays: a period where gathering intensifies.

The next section aims to visualize the holidays along the cases/deaths curve and, maybe, see some relation.


In [None]:
# Deadly Holidays
holiday_path = '../input/feriados-e-dias-da-semana-brasil/feriados.csv'
holidays = pd.read_csv(holiday_path, index_col='Data')
pd.to_datetime(holidays.index)
holidays = (holidays[holidays.index > '2020-03-01'])
holidays.drop(columns=['Feriado', 'Dia da Semana'], inplace=True)
holidays = ((holidays[holidays.index < '2021-04-06']))
holidays = (holidays.index.tolist())
capitals_dataframe.reset_index(inplace=True)
capitals_dataframe['is_holiday'] = 0
capitals_dataframe.loc[capitals_dataframe['date'].isin(holidays), 'is_holiday'] = 1
covid_cities.reset_index(inplace=True)
covid_cities['is_holiday'] = 0
covid_cities.loc[covid_cities['date'].isin(holidays), 'is_holiday'] = 1

def all_cities_holiday_plotter_cases(city):
    plt.figure(figsize=(8,5))
    plt.plot(covid_cities[covid_cities.name == city].date, covid_cities[covid_cities.name == city].cases, 'b', label='Cases in '+str(city))
    plt.scatter(covid_cities[(covid_cities.is_holiday == 1) & (covid_cities.name == city)].date, covid_cities[(covid_cities.name == city) & (covid_cities.is_holiday == 1)].cases, marker='o', color='r', label='Holiday')
    plt.legend()
    plt.title(str(city))
    plt.show()
def all_cities_holiday_plotter_deaths(city):
    plt.figure(figsize=(8,5))
    plt.plot(covid_cities[covid_cities.name == city].date, covid_cities[covid_cities.name == city].deaths, 'g', label='Deaths in ' +str(city))
    plt.scatter(covid_cities[(covid_cities.is_holiday == 1) & (covid_cities.name == city)].date, covid_cities[(covid_cities.name == city) & (covid_cities.is_holiday == 1)].deaths, marker='o', color='r', label='Holiday')
    plt.legend()
    plt.title(str(city))
    plt.show()


In [None]:
all_cities_holiday_plotter_cases('Curitiba')
all_cities_holiday_plotter_cases('Rio de Janeiro')
all_cities_holiday_plotter_cases('Porto Alegre')
all_cities_holiday_plotter_cases('São Paulo')
all_cities_holiday_plotter_cases('Manaus')
all_cities_holiday_plotter_cases('Salvador')
all_cities_holiday_plotter_cases('Goiânia')

In [None]:
all_cities_holiday_plotter_deaths('Curitiba')
all_cities_holiday_plotter_deaths('Rio de Janeiro')
all_cities_holiday_plotter_deaths('Porto Alegre')
all_cities_holiday_plotter_deaths('São Paulo')
all_cities_holiday_plotter_deaths('Manaus')
all_cities_holiday_plotter_deaths('Salvador')
all_cities_holiday_plotter_deaths('Goiânia')

Praia do Rosa is a beautiful spot to spend vacations, the beach get plenty of visitors during holidays. 

Covid wasn't a significant "detail" when peolpe decided whether to go there during quarentine.

The beach became a focus for illegal parties, Imbituba munincipality regulators were strongly criticized for the lack of enforcement.


<img src="https://ichef.bbci.co.uk/news/800/cpsprodpb/17A27/production/_115870869_7a0674c1-8b78-413d-8750-e05f9232ea78.jpg">



The tourists go as they come, and they can find, easily, COVID treatment in their own cities, in their own medical facilities if needed. 

For the locals the situation is harsher. Surely the city of Imbituba doesn't have any structural resources to face a mass demand for ICU beds or respirators.


In [None]:
all_cities_holiday_plotter_cases('Imbituba')
all_cities_holiday_plotter_deaths('Imbituba')

# Neural Network Predictions 

The first prediction will be made using a sequential model containing a single dense layer with 1 neuron.

This DNN was trained on data from every city in Brazil. The data was sliced into 7 days windows (time sequenced) and it tries to predict the next day number of cases or deaths.

Let's look at the outcome.


In [None]:
####### Load single dense layer model!
json_file = open('../input/modeljson/model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = tf.keras.models.model_from_json(loaded_model_json)

# load weights into new model
loaded_model.load_weights("../input/modelh5/model.h5")
x_cases = np.genfromtxt('../input/x-cases/x_cases.csv', delimiter=',')
y_cases = np.genfromtxt('../input/x-cases/y_cases.csv', delimiter=',')
time_frame = 364
all_cities = (covid_cities[covid_cities.date == '2020-09-10'].sort_values(by='name').name).tolist()
print("Loaded model from disk")

In [None]:

def prediction_plotter(city_name):
    prediction = loaded_model.predict(x_cases[time_frame*all_cities.index(city_name):time_frame*(1+all_cities.index(city_name))])
    reality = y_cases[time_frame*all_cities.index(city_name):time_frame*(1+all_cities.index(city_name))]
    plt.figure(figsize=(15,8))
    plt.plot(prediction, label='Prediction')
    plt.plot(reality, label='Reality')
    plt.legend()
    plt.title(city_name)
    plt.show()
    print(mean_squared_error(reality, prediction))

prediction_plotter('Curitiba')
prediction_plotter('Rio de Janeiro')
prediction_plotter('Porto Alegre')
prediction_plotter('São Paulo')
prediction_plotter('Manaus')
prediction_plotter('Salvador')
prediction_plotter('Goiânia')



# Remember: if something is too good to be true, then it probably is! 
# 
Jokes aside, this prediction is way too accurate, but it fits well as an example for some naive assumptions. 

The DNN is being fed with real world data from the last 7 days and it should give us the value for the next day. 

That is, we're not generating our own inputs to refeed the DNN and measure how it performs (yet), we're plotting predictions (good ones) that are based entirely in real data.




Now we won't feed the neural network with complete information. Let's see how it deals with new predictions build from previous predictions made by the DNN itself.

For this, I've built a function that take as argument an arbitrary number, that is, the real world data on those days in order to construct the model over it.

**For the first example i'll feed the DNN with 100 days of real world data**


In [None]:
def reality_plotter(city_name, level):
    prediction = []
    to_plot = []
    for i in range(level):
        prediction.append(x_cases[time_frame*all_cities.index(city_name)+i])
        to_plot.append(loaded_model.predict(prediction[i].reshape(1,7)))
    for i in range(level, time_frame):
        prediction.append(np.append((prediction[i-1][1:7]), loaded_model.predict(prediction[i-1].reshape(1,7))))
        to_plot.append(prediction[i][6])
    plt.figure(figsize=(15,8))
    plt.plot(to_plot, label='Prediction')
    plt.plot(y_cases[time_frame*all_cities.index(city_name):time_frame*all_cities.index(city_name)+time_frame], label='Reality')
    plt.legend()
    plt.title('Forecasting x Reality - '+str(city_name))
    plt.show()

reality_plotter('Curitiba', 100)
reality_plotter('Rio de Janeiro', 100)
reality_plotter('Porto Alegre', 100)
reality_plotter('São Paulo', 100)
reality_plotter('Manaus', 100)
reality_plotter('Salvador', 100)
reality_plotter('Goiânia', 100)

Poor results in general. 
 
**Notice how it reacts by raising the number of real world days to 200**

In [None]:
reality_plotter('Curitiba', 200)
reality_plotter('Rio de Janeiro', 200)
reality_plotter('Porto Alegre', 200)
reality_plotter('São Paulo', 200)
reality_plotter('Manaus', 200)
reality_plotter('Salvador', 200)
reality_plotter('Goiânia', 200)

 The accuracy raised for most of the cities. 


**What if we set the days parameter to 300?**

In [None]:
reality_plotter('Curitiba', 300)
reality_plotter('Rio de Janeiro', 300)
reality_plotter('Porto Alegre', 300)
reality_plotter('São Paulo', 300)
reality_plotter('Manaus', 300)
reality_plotter('Salvador', 300)
reality_plotter('Goiânia', 300)

With 300 days of input we manage to achive more accurate results in general.

Notice that in most cases the DNN overestimates the number of cases. That is due to the intensification on the lockdown policies within the cities: the progression rate of the disease has slowed down.



# Implementing a more complex Deep Neural Network 

The DNN to be used now has more layers than the previous one. With one interesting additional feature, the 5-10 days that come after a **holiday** will receive a "1", all others get a "0".     

This addition will try to figure out some relation between the occurance of holidays (more intensive gathering) and the cases increase in the subsequent days. 

In [None]:
####### Load complex dense layer model!
json_file = open('../input/model-02/model7.json', 'r')
loaded_model_json2 = json_file.read()
json_file.close()
loaded_model2 = tf.keras.models.model_from_json(loaded_model_json2)

# load weights into new model
loaded_model2.load_weights("../input/model-02/model7.h5")
x_cases_holiday = np.genfromtxt('../input/x-cases-holidays/x_cases_holidays.csv', delimiter=',')

print("Loaded model from disk")

Check some "naive" forecasting with the new model:

In [None]:

def prediction_plotter(city_name):
    prediction = loaded_model2.predict(x_cases_holiday[time_frame*all_cities.index(city_name):time_frame*(1+all_cities.index(city_name))])
    reality = y_cases[time_frame*all_cities.index(city_name):time_frame*(1+all_cities.index(city_name))]
    plt.figure(figsize=(15,8))
    plt.plot(prediction, label='Prediction')
    plt.plot(reality, label='Reality')
    plt.legend()
    plt.title(city_name+' Naive prediction')
    plt.show()
    print(mean_squared_error(reality, prediction))

prediction_plotter('Curitiba')
prediction_plotter('Rio de Janeiro')
prediction_plotter('Porto Alegre')
prediction_plotter('São Paulo')
prediction_plotter('Manaus')
prediction_plotter('Salvador')
prediction_plotter('Goiânia')

Now we shall let the DNN walk by it's own legs.

In [None]:
#### FIX IT TO 8
def reality_plotter_8(city_name, level):
    prediction = []
    to_plot = []
    for i in range(level):
        prediction.append(x_cases_holiday[time_frame*all_cities.index(city_name)+i])
        to_plot.append(loaded_model2.predict(prediction[i].reshape(1, 8)))

    for i in range(level, time_frame):
        prediction.append(np.append(x_cases_holiday[(time_frame*all_cities.index(city_name))+i][0], np.append((prediction[i-1][2:8]), loaded_model2.predict(prediction[i-1].reshape(1, 8)))))
        to_plot.append(prediction[i][7])
    reality = y_cases[time_frame*all_cities.index(city_name):time_frame*all_cities.index(city_name)+time_frame]
    print(mean_squared_error(reality, to_plot))
    plt.figure(figsize=(15,8))
    plt.plot(to_plot, label='Prediction')
    plt.plot(reality, label='Reality')
    plt.legend()
    plt.title('Forecasting x Reality - '+str(city_name))
    plt.show()

reality_plotter_8('Curitiba', 100)
reality_plotter_8('Rio de Janeiro', 100)
reality_plotter_8('Porto Alegre', 100)
reality_plotter_8('São Paulo', 100)
reality_plotter_8('Manaus', 100)
reality_plotter_8('Salvador', 100)
reality_plotter_8('Goiânia', 100)

In [None]:
reality_plotter_8('Curitiba', 300)
reality_plotter_8('Rio de Janeiro', 300)
reality_plotter_8('Porto Alegre', 300)
reality_plotter_8('São Paulo', 300)
reality_plotter_8('Manaus', 300)
reality_plotter_8('Salvador', 300)
reality_plotter_8('Goiânia', 300)

I don't know if it works to insert a boolean value transformed to 0/1 in a DNN. But it seems to bring a little greater accuracy if compared to the first model.






An interesting feature for further studies is how the "flag" color influences the virus progression rate. The flag is a measure of the intensity of public enforcement to different levels of quarentine.


<img src="https://media.gazetadopovo.com.br/2020/06/09182006/protocolo-960x540.png">