# How COVID-19 started and how much has it affected the world?

On December 31, 2019, the World Health Organisation’s (WHO) China office heard the first reports of a previously-unknown virus behind a number of pneumonia cases in Wuhan, a city in Eastern China with a population of over 11 million.<br>

What started as an epidemic mainly limited to China has now become a truly global pandemic. There have now been over 5,595,091 confirmed cases and 350,547 deaths, which collates. The disease has been detected in more than 200 countries and territories, with the US, Brazil and Russia experiencing the most widespread outbreaks, followed by the UK, Spain and Italy. In the UK, there have been <b>265,227 confirmed cases</b> and <b>37,048 deaths</b> as of May 26.

The Chinese government responded to the initial outbreak by placing Wuhan and nearby cities under a de-facto quarantine encompassing roughly 50 million people in Hubei province. This quarantine is now slowly being lifted, as authorities watch to see whether cases will rise again. The US is now the new epicentre of the Covid-19 outbreak. As of May 27, the country has 1,681,418 confirmed infections and 98,929 deaths. In Italy, where the death toll surpassed that of China on March 19, the government took the unprecedented step of extending a lockdown to the entire country, shutting cinemas, theatres, gyms, discos and pubs and banning funerals and weddings. In the UK, the government has shut schools, pubs, restaurants, bars, cafés and all non-essential shops for at least nine weeks. On May 10 Boris Johnson outlined a flexible plan that would see some schools reopen by June depending on the current threat posed to the UK by the virus.<br>

Read more about this [here](https://www.newscientist.com/term/covid-19/#ixzz6TbAe5Tlb)

Authorities in 214 countries and territories have reported about 16.8 million Covid‑19 cases and 662,000 deaths since China reported its first cases to the World Health Organization (WHO) in December. 

The outbreak spread from the Chinese city of Wuhan to more than 180 countries and territories—affecting every continent except Antarctica. Efforts to stamp out the pneumonia-like illness have led to entire nations enforcing lockdowns, widespread halts of international travel, mass layoffs and battered financial markets.

<b><i>16,776,680</i></b> <b>Confirmed cases worldwide</b><br>
<b><i>661,203</i></b> <b>Deaths worldwide</b> 

On Jan. 24, the first two European cases were confirmed in France. By Feb. 1, eight European nations had confirmed cases of COVID-19, and a month later that count had risen to 24 countries with at least 2,200 cases, most of them in Italy. On March 11, Italy eclipsed 10,000 cases and the World Health Organization declared the outbreak a pandemic — the first since H1N1 in 2009. That's also when China, the original epicenter, began seeing drops in daily counts of new cases.<br>

March also saw exponential spread of the virus throughout the U.S., with all 50 states reporting cases by March 17.<br> 

Latin America is now the epicenter of the pandemic. Brazil is the worst-hit in the region so far, with almost 90,000 deaths.<br> 

In Asia, India has seen almost 1.5 million cases and several states re-imposed partial lockdowns just weeks after a two-month nationwide lockdown was ended.<br> 

South Africa and Egypt have seen the largest outbreaks so far in Africa. 

# How to prevent the spreading of COVID-19 according to WHO? 

- Limit close contact between infectious people and others. Ensure a physical distance of at least 1 meter from others.  In areas where COVID-19 is circulating and this distance cannot be guaranteed, wear a <b>mask</b>.
- Identify infected people quickly so that they can be isolated and cared for and all of their close contacts can be <b>quarantined</b> in appropriate facilities.
- Clean hands and cover coughs and sneezes with a tissue or bent elbow at all times.
- Avoid crowded places, close-contact settings and confined and enclosed spaces with poor ventilation. 
- Ensure good ventilation in indoor settings, including homes and offices.
- Stay home if feeling unwell and call your medical provider as soon as possible to determine whether medical care is needed.
- In countries or areas where COVID-19 is circulating, health workers should use medical masks continuously during all routine activities in clinical areas in health care facilities.
- Health workers should also use additional personal protective equipment and <b>precautions when caring for COVID-19 patients</b>. Workplaces should have in place protective measures. 

# Preventions to be taken

- To prevent the spread of COVID-19: 
- Clean your hands often. Use soap and water, or an alcohol-based hand rub. 
- Maintain a safe distance from anyone who is coughing or sneezing. 
- Wear a mask when physical distancing is not possible. 
- Don’t touch your eyes, nose or mouth. 
- Cover your nose and mouth with your bent elbow or a tissue when you cough or sneeze. 
- Stay home if you feel unwell. 
- If you have a fever, cough and difficulty breathing, seek medical attention. 
- Calling in advance allows your healthcare provider to quickly direct you to the right health facility. This protects you, and prevents the spread of viruses and other infections. 

![cases_up_to_100.png](attachment:cases_up_to_100.png)

![cases_up_to_1k.png](attachment:cases_up_to_1k.png)

![cases_up_to_10k.png](attachment:cases_up_to_10k.png)

![cases_up_to_40k.png](attachment:cases_up_to_40k.png)

# Predicting future cases of COVID-19 for every country based on previous data

From a technical standpoint, the way we can do this is by creating a linear regression statistical model and train it based on the .csv data containing various cases of COVID-19.

This is a classic machine learning task, used quite frequently in the forecast domain, the finance industry for predicting the value various currencies and many more. Even though this task is basic, it doesn't mean it is necessarily easy as, on average, as much as regression models can be accurate, they shouldn't be a source which we can rely on.

The results given by a regression model <b> heavily </b> rely on previous numeric data and doesn't adapt data is hasn't seen, whereas other models used in other deep learning, such as Long-Short Term Memory (LSTM) models can adapt to give out different results (of course, based on context as data).

This being said, this regression model that we're going to be using <b> should not </b> be relied upon, it simply represents a numeric illustration of how things <b> can </b> evolve in the future

# How do we create such model?

Given the short time frame and the fact that this is a classic problem, we don't need to worry about creating new models from scratch. We can use SOTA (State of the Art) methods that allow us to do this. Thus, our codebase will rely on Keras, a library which relies on TensorFlow's backend, TensorFlow being one of the main frameworks used for various machine learning tasks.

Before this though, we will need to filter our data in some way. The Pandas library can help us for this! With it, we can do various data profiling and data filtering tasks. We will use this to filter out our data.

In [1]:
!pip install numpy tensorflow keras sklearn matplotlib pandas

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [64]:
import numpy as np
import pandas as pd
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization
import keras.backend as K
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam, SGD
from sklearn.model_selection import train_test_split
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.metrics import mean_squared_error

import matplotlib.pyplot as plt

np.set_printoptions(precision = 3)

# Data Analysis

## Examining and Filtering data

Here, we take our data and read is with Pandas so that we can do a bunch of things with it.<br>
We are only interested in the Country, Confirmed, Deaths and Recovered columns.

In [65]:
dataset_path = './data/covid_19_data.csv'

dataset = pd.read_csv(
    dataset_path, 
    usecols = [
        'Country/Region',
        'Confirmed',
        'Deaths',
        'Recovered'
    ]
)

Let's take a look at how our data looks!

In [66]:
dataset.tail()

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered
68553,Ukraine,678,20,551
68554,Netherlands,791,69,0
68555,Mainland China,1270,1,1267
68556,Ukraine,1602,34,1251
68557,Netherlands,11886,1305,0


Let's see what the general statistics of this data are.

In [67]:
dataset.describe()

Unnamed: 0,Confirmed,Deaths,Recovered
count,68558.0,68558.0,68558.0
mean,10472.017883,564.674874,4830.826
std,32092.929744,2516.087659,27123.76
min,0.0,0.0,0.0
25%,107.0,1.0,0.0
50%,998.0,17.0,137.0
75%,5361.0,168.0,1488.0
max,416434.0,41128.0,1160087.0


Alright... The numbers are scary... <br>
What we are interested in is the general mean for each case.<br>
If you take a look, you can see that, on average, worldwide, there are around <b> 10472 confirmed cases </b> of COVID-19, <b> 565 deaths cases </b> and <b> 4830 recovered cases </b><br>
The numbers are quite big... they will aid us in carrying this task.

Nice, now that we've a got a feel of what we're working with, let's get into preprocessing this data.

## Preprocessing the data

What we have come across is that there are a few entries that aren't properly sanitized.<br>
To give an example, the "St. Martin" entry is also entered as "('St. Martin',)".<br>

Also, we need to give our network the countries as input, but, since it can only take numeric values, we'll encode them (we'll also keep a decoded version in case we need it).<br>
We'll a really simple encoding, numbering them from 0 to whatever the number of countries is.

Our new preprocessed dataset will be in the format of:<br>
```
{<country_name> : [<mean_of_confirmed_cases>, <mean_of_death_cases>, <mean_of_recovered_cases>]}
```

In [68]:
def preprocess_dataset(dataset, column_index):
    
    # SANITIZE THE DATA
    for i in range(len(dataset[column_index])):
        if dataset[column_index][i] == "('St. Martin',)":
            dataset[column_index][i] = 'St. Martin'
        if dataset[column_index][i] == " Azerbaijan":
            dataset[column_index][i] = 'Azerbaijan'
        if dataset[column_index][i] == "Bahamas, The":
            dataset[column_index][i] = "Bahamas"
        
    # ENCODE/DECODE THE DATA 
    encoded_countries = {id : country for id, country in enumerate(set(sorted(dataset[column_index])))}
    decoded_countries = {country : id for id, country in enumerate(set(sorted(dataset[column_index])))}
    
    # CREATE NEW DATASET
    dataset = dataset.sort_values(by = column_index)
    new_dataset = {c : None for c in sorted(set(dataset[column_index]))}
    
    # CONSTRUCT DATASET
    confirmed = []
    deaths = []
    recovered = []
    
    # UPDATE DATASET SO THAT EACH COUNTRY HAS ITS CORRESPONDENT NUMBER OF CASES (MEAN)
    cached_country = dataset[column_index][0]
    for i, country in enumerate(dataset[column_index]):
        if cached_country == country:
            confirmed.append(dataset['Confirmed'][i])
            deaths.append(dataset['Deaths'][i])
            recovered.append(dataset['Recovered'][i])

        else:
            if i == len(dataset[column_index]):
                cached_country = country
      
            new_dataset.update(
                { 
                    cached_country : [
                        np.nan_to_num(np.mean(confirmed)),
                        np.nan_to_num(np.mean(deaths)),
                        np.nan_to_num(np.mean(recovered))
                    ]
                }
            )
                        
            confirmed = []
            deaths = []
            recovered = []
            
            cached_country = country
            
    # RETURN THE ENCODED AND DECODED COUNTRIES, AS WELL AS THE PREPROCESSED DATASET
    return encoded_countries, decoded_countries, pd.DataFrame(data = new_dataset)

Alright, now that the preprocessing method is done, let's put it to use and see how our data looks like.

In [69]:
encoded_countries, decoded_countries, preprocessed_dataset = preprocess_dataset(dataset, 'Country/Region')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


In [70]:
encoded_countries

{0: 'Fiji',
 1: 'occupied Palestinian territory',
 2: 'Slovakia',
 3: 'Saint Barthelemy',
 4: 'Ukraine',
 5: 'Nigeria',
 6: 'Iraq',
 7: 'Malta',
 8: 'Canada',
 9: 'French Guiana',
 10: 'Poland',
 11: 'Brunei',
 12: 'Liechtenstein',
 13: 'Kyrgyzstan',
 14: 'Latvia',
 15: 'Rwanda',
 16: 'Afghanistan',
 17: 'Chile',
 18: 'Laos',
 19: 'Congo (Kinshasa)',
 20: 'Thailand',
 21: 'Iran',
 22: 'Georgia',
 23: 'Ivory Coast',
 24: 'Senegal',
 25: 'Kuwait',
 26: 'Sudan',
 27: 'Chad',
 28: 'Guyana',
 29: 'Burkina Faso',
 30: 'Somalia',
 31: 'Benin',
 32: 'Iceland',
 33: 'Bosnia and Herzegovina',
 34: 'Reunion',
 35: 'Burma',
 36: 'Faroe Islands',
 37: 'Zimbabwe',
 38: 'Morocco',
 39: 'Puerto Rico',
 40: 'Cuba',
 41: 'Cape Verde',
 42: 'Hong Kong',
 43: 'Diamond Princess',
 44: 'Jordan',
 45: 'Colombia',
 46: 'Mozambique',
 47: 'Slovenia',
 48: 'Gabon',
 49: 'Grenada',
 50: 'Guinea',
 51: 'Honduras',
 52: 'Argentina',
 53: 'Maldives',
 54: 'Indonesia',
 55: 'Bahamas',
 56: 'Brazil',
 57: 'United Ara

In [71]:
decoded_countries

{'Fiji': 0,
 'occupied Palestinian territory': 1,
 'Slovakia': 2,
 'Saint Barthelemy': 3,
 'Ukraine': 4,
 'Nigeria': 5,
 'Iraq': 6,
 'Malta': 7,
 'Canada': 8,
 'French Guiana': 9,
 'Poland': 10,
 'Brunei': 11,
 'Liechtenstein': 12,
 'Kyrgyzstan': 13,
 'Latvia': 14,
 'Rwanda': 15,
 'Afghanistan': 16,
 'Chile': 17,
 'Laos': 18,
 'Congo (Kinshasa)': 19,
 'Thailand': 20,
 'Iran': 21,
 'Georgia': 22,
 'Ivory Coast': 23,
 'Senegal': 24,
 'Kuwait': 25,
 'Sudan': 26,
 'Chad': 27,
 'Guyana': 28,
 'Burkina Faso': 29,
 'Somalia': 30,
 'Benin': 31,
 'Iceland': 32,
 'Bosnia and Herzegovina': 33,
 'Reunion': 34,
 'Burma': 35,
 'Faroe Islands': 36,
 'Zimbabwe': 37,
 'Morocco': 38,
 'Puerto Rico': 39,
 'Cuba': 40,
 'Cape Verde': 41,
 'Hong Kong': 42,
 'Diamond Princess': 43,
 'Jordan': 44,
 'Colombia': 45,
 'Mozambique': 46,
 'Slovenia': 47,
 'Gabon': 48,
 'Grenada': 49,
 'Guinea': 50,
 'Honduras': 51,
 'Argentina': 52,
 'Maldives': 53,
 'Indonesia': 54,
 'Bahamas': 55,
 'Brazil': 56,
 'United Arab Em

In [72]:
preprocessed_dataset

Unnamed: 0,Afghanistan,Albania,Algeria,Andorra,Angola,Antigua and Barbuda,Argentina,Armenia,Aruba,Australia,...,Uzbekistan,Vatican City,Venezuela,Vietnam,West Bank and Gaza,Western Sahara,Yemen,Zambia,Zimbabwe,occupied Palestinian territory
0,23.959184,76.977444,101.335616,263.071429,176.803279,399.348837,469.100719,547.978723,1.5,914.296181,...,24720.19685,22001.0,16541.023438,21752.648045,16829.568966,17300.056604,23794.554455,23084.225806,13595.090164,
1,0.70068,2.0,2.10274,5.907143,3.54918,8.186047,9.733813,12.141844,0.0,25.758379,...,1255.503937,432.666667,528.976562,728.480447,640.862069,908.679245,1069.336634,1232.274194,374.721311,
2,0.904762,1.609023,1.883562,6.557143,5.54918,15.317829,25.165468,41.531915,0.333333,199.713172,...,20024.338583,11782.333333,8965.796875,13695.73743,7597.939655,4925.886792,11138.405941,19677.725806,6826.155738,


## Normalizing the data & Creating features and labels

Before we pass this data to the model, we absolutely need to normalize it. Other wise, the model will get confused because of the variance of the values.<br>
For the purpose of this task, the way we will normalize this model is by dividing the existing values of a certain case to the general mean of that case.<br>
Take for example the mean of confirmed cases in Afghanistan, like above (you might want to run that code if you want to see what we mean). The way we normalize it is by dividing that (<b>23.959184</b>) with the general confirmed cases mean (<b>10472.017883</b>)

Those cases will serve as our <b>labels</b> for this task, as the cases represent the result we want to see.
The IDs of the countries (the encoded_countries dictionary we created) will serve as <b>features</b> since this is how we will identify those cases by.

In [73]:
# NORMALIZATION METHOD
def normalize(x, index):
    return x / dataset.describe()[index]['mean']

# FEATURE AND LABEL EXTRACTION METHOD
def create_features_labels(preprocessed_dataset):
    features = np.array(list(encoded_countries.keys()))
    
    confirmed = np.nan_to_num(np.array(preprocessed_dataset, "float")[0])
    confirmed = np.nan_to_num(normalize(confirmed, "Confirmed"))
    
    deaths = np.nan_to_num(np.array(preprocessed_dataset, "float")[1])
    deaths = np.nan_to_num(normalize(deaths, "Deaths"))
    
    recovered = np.nan_to_num(np.array(preprocessed_dataset, "float")[2])
    recovered = np.nan_to_num(normalize(recovered, "Recovered"))
 
    labels = []
    
    for c, d, r in zip(confirmed, deaths, recovered):
        labels.append([c, d, r])
        
    labels = np.array(labels)
        
    return features, labels

x, y = create_features_labels(preprocessed_dataset)

# Creating a linear regression model

## Building the model

In order to create such model, we need to take account of 4 things (there can be more, but we can focus on 4 for now):
1. Structure:<br>
    The way we will structure this model is the following (the order matters):
    - 1 <b>input layer</b>
    - 1 <b>dropout layer</b>
    - 1 <b>batch normalization layer</b>
    - 1 <b>hidden layer</b>
    - 1 <b>dropout layer</b>
    - 1 <b>hidden layer</b>
    - 1 <b>output layer</b>
    
2. Hyperparameters:<br>
    Whenever you choose how many neurons a layer should have, what the input shape should be and what not, you are choosing and tweaking <b>hyperparameters</b>
    Those can be viewed as the knob on an oven. Whenever you want to preheat the over you go from 0 degrees to 200 degrees. When you see that your food is burning, you turn it down to 150 degrees. The same happens in a neural network. When things don't work with some hyperparameters, choose some other ones, until a good synergy is constructed.
    You can see the used hyperparameters in the code below.
    
3. Activation functions:<br>
    Since this is a linear regression, we don't want our weights and biases to oscillate like crazy, we want them to change in a linear fashion. Therefore, we'll be using the ReLU (Rectified Linear Unit) activation function for this. Also, for our output, we will use a linear activation function since this is a linear regression.
    
4. Losses, Optimizers and Metrics:<br>
    For the loss, it's straight forward - we work with linear regression - we use mean squared error or any other loss function that determines the states of values from a <b>statistics</b> perspective.<br>
    For the optimizer, during training and experimenting, we found that the Adam optimizer with a learning rate 0.001 works okay, so we stuck with that.
    Metric-wise, we track the same element as the loss: mean squared error. The lower the MSE is, the more accurate we are.
        

In [74]:
def build_model():
    model = Sequential()
    
    input_layer = Dense(
        128,
        input_dim = 1,
        activation = 'relu',
    )
    
    dropout1 = Dropout(0.1)
    batch_norm_layer = BatchNormalization(
        momentum = 0.99,
        trainable = True,
    )
        
    hl1 = Dense(
        128,
        activation = 'relu',
    )
    
    dropout2 = Dropout(0.2)
    
    hl2 = Dense(
        32,
        activation = 'relu'
    )
    
    output_layer = Dense(
        3,
        activation = 'linear'
    )
    
    model.add(input_layer)
    model.add(dropout1)
    model.add(batch_norm_layer)
    model.add(hl1)
    model.add(dropout2)
    model.add(hl2)
    model.add(output_layer)
    
    model.compile(
        loss = 'mse',
        optimizer = Adam(
            learning_rate=0.001
        ),
        metrics = ['mse']
    )
    
    return model

In [75]:
model = build_model()
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 128)               256       
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 128)               512       
_________________________________________________________________
dense_6 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 32)                4128      
_________________________________________________________________
dense_8 (Dense)              (None, 3)                

## Training the model

For training the model, we'll make it run for 100 times throughout the same data, we will give the data in batches of 8 (8 pieces of data at once) and we'll split this data so that it trains on 90% of the data and evaluates it's decisions on 10% of the data. Also, we'll add a checkpoint callback function so that we can store the model. We don't have to use it, as it trains quite quickly, but it's good practice

In [76]:
EPOCHS = 100

In [77]:
checkpoint_filepath = './checkpoint/regression_model.h5'
model_checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=True,
    monitor='val_mse',
)

In [78]:
model.fit(
    x = x,
    y = y,
    batch_size = 8,
    validation_split = 0.1,
    epochs = EPOCHS,
    callbacks = [model_checkpoint_callback]
)

Train on 198 samples, validate on 22 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100


Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.callbacks.History at 0x7fa2f1bff0b8>

Nice, now that we trained it, let's see what it has to say. Before this though, we'll quickly convert the data to being just like it was before normalizing, as it is the mean of each country's cases that we are interested in.

In [79]:
confirmed_general_mean = dataset.describe()["Confirmed"]['mean']
deaths_general_mean = dataset.describe()["Deaths"]['mean']
recovered_general_mean = dataset.describe()["Recovered"]['mean']

In [90]:
encoded_countries

{0: 'Fiji',
 1: 'occupied Palestinian territory',
 2: 'Slovakia',
 3: 'Saint Barthelemy',
 4: 'Ukraine',
 5: 'Nigeria',
 6: 'Iraq',
 7: 'Malta',
 8: 'Canada',
 9: 'French Guiana',
 10: 'Poland',
 11: 'Brunei',
 12: 'Liechtenstein',
 13: 'Kyrgyzstan',
 14: 'Latvia',
 15: 'Rwanda',
 16: 'Afghanistan',
 17: 'Chile',
 18: 'Laos',
 19: 'Congo (Kinshasa)',
 20: 'Thailand',
 21: 'Iran',
 22: 'Georgia',
 23: 'Ivory Coast',
 24: 'Senegal',
 25: 'Kuwait',
 26: 'Sudan',
 27: 'Chad',
 28: 'Guyana',
 29: 'Burkina Faso',
 30: 'Somalia',
 31: 'Benin',
 32: 'Iceland',
 33: 'Bosnia and Herzegovina',
 34: 'Reunion',
 35: 'Burma',
 36: 'Faroe Islands',
 37: 'Zimbabwe',
 38: 'Morocco',
 39: 'Puerto Rico',
 40: 'Cuba',
 41: 'Cape Verde',
 42: 'Hong Kong',
 43: 'Diamond Princess',
 44: 'Jordan',
 45: 'Colombia',
 46: 'Mozambique',
 47: 'Slovenia',
 48: 'Gabon',
 49: 'Grenada',
 50: 'Guinea',
 51: 'Honduras',
 52: 'Argentina',
 53: 'Maldives',
 54: 'Indonesia',
 55: 'Bahamas',
 56: 'Brazil',
 57: 'United Ara

In [80]:
decoded_countries

{'Fiji': 0,
 'occupied Palestinian territory': 1,
 'Slovakia': 2,
 'Saint Barthelemy': 3,
 'Ukraine': 4,
 'Nigeria': 5,
 'Iraq': 6,
 'Malta': 7,
 'Canada': 8,
 'French Guiana': 9,
 'Poland': 10,
 'Brunei': 11,
 'Liechtenstein': 12,
 'Kyrgyzstan': 13,
 'Latvia': 14,
 'Rwanda': 15,
 'Afghanistan': 16,
 'Chile': 17,
 'Laos': 18,
 'Congo (Kinshasa)': 19,
 'Thailand': 20,
 'Iran': 21,
 'Georgia': 22,
 'Ivory Coast': 23,
 'Senegal': 24,
 'Kuwait': 25,
 'Sudan': 26,
 'Chad': 27,
 'Guyana': 28,
 'Burkina Faso': 29,
 'Somalia': 30,
 'Benin': 31,
 'Iceland': 32,
 'Bosnia and Herzegovina': 33,
 'Reunion': 34,
 'Burma': 35,
 'Faroe Islands': 36,
 'Zimbabwe': 37,
 'Morocco': 38,
 'Puerto Rico': 39,
 'Cuba': 40,
 'Cape Verde': 41,
 'Hong Kong': 42,
 'Diamond Princess': 43,
 'Jordan': 44,
 'Colombia': 45,
 'Mozambique': 46,
 'Slovenia': 47,
 'Gabon': 48,
 'Grenada': 49,
 'Guinea': 50,
 'Honduras': 51,
 'Argentina': 52,
 'Maldives': 53,
 'Indonesia': 54,
 'Bahamas': 55,
 'Brazil': 56,
 'United Arab Em

In [87]:
# COUNTRIES OF AFRICA
zimbabwe = decoded_countries['Zimbabwe']
benin = decoded_countries['Benin']
somalia = decoded_countries['Somalia']
sudan = decoded_countries['Sudan']
south_africa = decoded_countries['South Africa']

Here, feel free to change the 'zimbabwe' variable with any other country variable. See the country codes above.

In [92]:
y_pred = model.predict([zimbabwe])[0]
y_current = y[zimbabwe]
y_current, y_pred

(array([0.189, 0.153, 0.147]), array([0.43 , 0.383, 0.146], dtype=float32))

In [93]:
mean_squared_error(y_current, y_pred)

0.03701851330567224

So it appears that we weren't that far off, based on the MSE.

In [94]:
y_current = np.array([y_current[0] * confirmed_general_mean, y_current[1] * deaths_general_mean, y_current[2] * recovered_general_mean])
y_pred = np.array([y_pred[0] * confirmed_general_mean, y_pred[1] * deaths_general_mean, y_pred[2] * recovered_general_mean])

# Prediction
Let's see what the model has predicted

In [96]:
print(f"Current means:\nConfirmed = {y_current[0]}\nDeaths = {y_current[1]}\nRecovered = {y_current[2]}\n\nPredicted means:\nConfirmed = {y_pred[0]}\nDeaths = {y_pred[1]}\nRecovered = {y_pred[2]}")

Current means:
Confirmed = 1977.6829268292686
Deaths = 86.64227642276421
Recovered = 712.0650406504064

Predicted means:
Confirmed = 4503.325108017918
Deaths = 216.49896804095638
Recovered = 703.9827597955307
