# <center><b>Dataset: COVID-19 South Africa Cumulative Total Positive Cases Per Day - 2020</b></center>


## forecast modelling

# Import libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from fbprophet import Prophet
from fbprophet.plot import plot_plotly
import plotly.offline as py
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import os
print(os.listdir("../input"))

## Read in the dataset

In [None]:
df = pd.read_csv('../input/covid19zacumulativetotals/covid19-za-cumulativetotals.csv')

## data analysis

In [None]:
df.head()

In [None]:
#df.drop(24,axis=0,inplace=True)
df.drop(26,axis=0,inplace=True)
#df.drop(23,axis=0,inplace=True)
df.tail()

In [None]:
plt.figure(figsize=(30,15), dpi=80, linewidth=1000)
x = df['DATE']
y = df['TOTAL']
plt.plot(x, y)
plt.xlabel('Date')
plt.ylabel('Total Cumulative Positive Cases')
plt.title('Total Cumulative Positive COVID-19 Cases in South Africa from 5 March 2020 to 2 April 2020')
plt.show()

## Prepare for Prophet

Use the Prophet timeseries forecasting library- two columns required as input in the dataset.DATE column and TOTAL column are gonna be used to get total positive cases. DATE column is changed to be called 'ds' and the TOTAL column is changed to be 'y'.

In [None]:
df = df[['DATE','TOTAL']]
#df.columns = ['ds', 'y']
df = df.rename(columns={'DATE': 'ds', 'TOTAL': 'y'})
#df = pd.set_option('precision', 0)
df.head()

Checking if there are any null values in the dataset.

In [None]:
df.isnull().sum()

The above output shows that there are no null values in the dataset, since the counts are 0.
Let us now take a look at the dataset by plotting the data.

In [None]:
df.set_index('ds').y.plot()

## Running Prophet

Set up Prophet to begin modelling the data.

In [None]:
model = Prophet()
model.fit(df)

Forecast for the next 28 days.

In [None]:
future = model.make_future_dataframe(periods=28)
future.tail()

Need to run it through the Prophet's model.

In [None]:
forecast = model.predict(future)

In [None]:
forecast.tail()

In [None]:
forecast

In [None]:
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

## Plot Prophet results

original data (black dots), the model (blue line) and the error of the forecast (the shaded blue area)

In [None]:
model.plot(forecast);

In [None]:
model.plot_components(forecast);

interractive grap

In [None]:
py.init_notebook_mode()
fig = plot_plotly(model, forecast)  # This returns a plotly Figure
py.iplot(fig);

## Evaluating the performance of the Prophet model

R-Squared, and Mean Squared Error (MSE).

In [None]:
metrics_df = forecast.set_index('ds')[['yhat']].join(df.set_index('ds').y).reset_index()

In [None]:
metrics_df.tail()

In [None]:
metrics_df.dropna(inplace=True)

In [None]:
metrics_df.tail()

R-Squared value for our model.

In [None]:
r2_score(metrics_df.y, metrics_df.yhat)

Mean Squared Error value for our model (want a value closer to 0)

In [None]:
mean_squared_error(metrics_df.y, metrics_df.yhat)

We now look at the Mean Absolute Error.

In [None]:
mean_absolute_error(metrics_df.y, metrics_df.yhat)