# Covid-19 time series prediction using Prophet
From Kaggle Covid-19 Week 5 data competition data.  

This notebook uses the facebook Prophet time series prediction method to compare against the actual data provided.
https://facebook.github.io/prophet/docs/quick_start.html

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from fbprophet import Prophet

# Input data files are available in the read-only "../input/" directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
train=pd.read_csv("/kaggle/input/covid19-global-forecasting-week-5/train.csv")


In [None]:
train['Date'] = pd.to_datetime(train['Date'], errors='coerce')


In [None]:
train.head()

Check what data looks like on a typical day

In [None]:
train.loc[(train['Date'] == '2020-04-15') & (train['Country_Region'] == 'United Kingdom')]

Prepare training set for prediction.  Sub-select UK, confirmed cases data then group by Country_Region to collect states, protectorates, provinces into parent country.

In [None]:
df_train = pd.DataFrame(train[(train['Country_Region'] == 'United Kingdom') & (train['Target'] == 'ConfirmedCases')])

df_train = (df_train[df_train['Target'] == 'ConfirmedCases'].groupby(['Date','Country_Region', 'Target']).agg('sum').reset_index())

df_train.describe


Check a single date or two to make sure no duplicates...

In [None]:
df_train.loc[df_train['Date'] == '2020-04-15']

In [None]:
df_train.loc[df_train['Date'] == '2020-04-16']

Only require date and target values.

In [None]:
df_train = df_train[['Date', 'TargetValue']]

In [None]:
def plot_values(df, from_date=None, target=None):
    fig = px.bar(df.loc[df['Date'] >= from_date], x='Date', \
             y='TargetValue', color="TargetValue", width=800, height=400, color_continuous_scale=px.colors.sequential.BuGn)
    fig.show()
    
plot_values(df_train, '2020-03-01')

In [None]:
df_train.describe

In [None]:
df_train.columns = ["ds", "y"]

In [None]:
df_train.tail()

Really nothing has changed but check one more time :0)

In [None]:
def plot_training_values(df, from_date=None):
    fig = px.bar(df.loc[df['ds'] >= from_date], x='ds', \
             y='y', color="y", width=800, height=400, color_continuous_scale=px.colors.sequential.BuGn)
    fig.show()
    
plot_training_values(df_train, '2020-03-01')

check a random date has just one value

In [None]:
df_train.loc[df_train['ds'] == '2020-04-15']

## Facebook Prophet Forecast

In [None]:
m = Prophet(daily_seasonality=True)
m.fit(df_train)

*make_future_dataframe* is a nice helper function but I decided to take it out and set the prediction date range manually. 

In [None]:
#future = m.make_future_dataframe(periods=10)
date1 = '2020-03-01'
date2 = '2020-07-30'
my_dates = pd.date_range(date1, date2).tolist()
future = pd.DataFrame({'ds' : my_dates})
future.head()

In [None]:
future.tail()

In [None]:
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

In [None]:
fig = m.plot(forecast)

In [None]:
fig = m.plot_components(forecast)

In [None]:
max_train_date = df_train['ds'].max()
max_train_date

In [None]:
def plot_predicted_values(df, country_region, from_date=None, target=None):
    fig = px.bar(df.loc[df['ds'] >= from_date], x='ds', \
             y='yhat', color="yhat", width=800, height=400, color_continuous_scale=px.colors.sequential.BuGn)
    fig.update_layout(title_text=target + ' COVID-19 cases per day in ' + country_region, yaxis_title='cases(' + target + ')', xaxis_title='date')
    fig.show()

plot_predicted_values(forecast, 'United Kingdom', '2020-06-01', target='Predicted')


Plot actual alongside predicted.  Note that barmode='group' will group the number of confirmed cases

In [None]:
fig = go.Figure(data=[
    go.Bar(name='actual', x=df_train['ds'], y=df_train['y']),
    go.Bar(name='predicted', x=forecast['ds'], y=forecast['yhat'])
])
# Change the bar mode
fig.update_layout(barmode='group',
    title='UK predicted versus actual COVID-19 cases',
    yaxis=dict(
        title='cases',
        titlefont_size=16,
        tickfont_size=14,
    ))
fig.show()