## Forecasting using FB Prophet

Finally in this notebook I will use FB Prophet model to forecast the weekly pollutant levels.

As Prophet predicts for given periods and in our case periods/ time step is 1, for each week prediction the model has to be trained repeatedly and hence a loop will be created to make predictions for 1 time step each time.

In [35]:
#loading libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

from fbprophet import Prophet

In [36]:
#loading data and converting date to datetime format
df = pd.read_csv('air_pollution.csv')
df['date'] = pd.to_datetime(df['date'])

In [37]:
#this function will use Prophet model to return predictions
def prophet_prediction(df):
    
        
    #initializing the model
    m = Prophet(daily_seasonality=False,weekly_seasonality=True)
    
    #fitting the model
    m.fit(df)
    #creating predictions with weekly frequency
    future = m.make_future_dataframe(periods=1,freq='W')
    forecast = m.predict(future)
    
    #saving only the prediction for the next week
    forecast = forecast.tail(1)
    prediction = float(np.array(forecast[['yhat']]))
    
    return prediction

In [38]:
#function to pass the required weeks for training to get predictions for next 1 week
def get_predictions(df):
    
    #making a column for weeks starting from  0 to total number of weeks in the data
    df['week'] = np.arange(0,len(df))
    
    #getting the last/max week
    max_week = df['week'].max()
    
    #creating a dataframe of train dataset
    last_df = df[df['ds'].dt.year<2017]
    #getting the last/max week of this dataset
    last_week = last_df['week'].max()
    
    
    predictions = []
    actuals = []
    
    
    #looping from 1 to the number of weeks to add on the train dataset to fit
    for i in range(1,(max_week-last_week)+1):
        
        #actual is the y_true where the actual value is the value from the last week in the dataset
        actualdf = df[df['week']==(last_week+i)]
        #actual appended to the actuals list
        actuals.append(float(np.array(actualdf[['y']])))
        
        #the 'train' df if created as modeldf and passed to prophet function for predictions
        modeldf = df[df['week']<(last_week+i)]
        modeldf = modeldf[['ds','y']]
        #appending predictions
        predictions.append(prophet_prediction(modeldf))
        
        
        
    return actuals, predictions
        

In [39]:
#evaluation metrics
def smape(y_true, pred):
    return 100/len(y_true) * np.sum(2 * np.abs(pred - y_true) / (np.abs(y_true) + np.abs(pred)))

def rmse(y_true,pred):
    return np.sqrt(np.mean((pred-y_true)**2))

In [40]:
#storing all smapes and rmses
smapes = []
rmses = []

#looping through each column
for column in df.columns[1:]:
    
    #getting the date and value column to pass in the function get_prediction
    modeldf = df[['date',column]]
    modeldf.columns = ['ds','y']
    actuals, predictions = get_predictions(modeldf)

#appending smapes and rmses
smapes.append(smape(np.array(actuals),np.array(predictions)))
rmses.append(rmse(np.array(actuals),np.array(predictions)))
    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['week'] = np.arange(0,len(df))


In [41]:
print('SMAPE: ',np.mean(smapes))
print('RMSE: ',np.mean(rmses))

SMAPE:  27.727238282335737
RMSE:  3.438118885183061
