<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Working With Time Series Data in FBProphet

_By Steven Longstreet (Washington DC) and Bryce Peake (Washington DC)_

### Learning Objectives
 
**After this lesson, you will be able to:**
- Create and visualize a Time Series model using FBProphet
- Evaluate a Time Series model

<h2><a id="A">What is a Time Series?</a></h2>
A **time series** is a series of data points that's indexed (or listed, or graphed) in time order. Most commonly, a time series is a sequence that's taken at successive equally spaced points in time. Time series are often represented as a set of observations that have a time-bound relation, which is represented as an index.

Time series are commonly found in sales, analysis, stock market trends, economic phenomena, and social science problems.

These data sets are often investigated to evaluate the long-term trends, forecast the future, or perform some other form of analysis.

## Time Series modeling with FBProphet
[Prophet Documentation](https://facebook.github.io/prophet/docs/quick_start.html)

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

from prophet import Prophet
import matplotlib.pyplot as plt
 
%matplotlib inline
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

# Read in the data

Read the data in from the retail sales CSV file in the examples folder then set the index to the 'date' column. We are also parsing dates in the data file.

In [None]:
sales = pd.read_csv('./data/retail_sales.csv', parse_dates = True, low_memory = False)

In [None]:
sales.head()

# Prepare for Prophet

For prophet to work, we need to change the names of these columns to 'ds' and 'y', so lets just create a new dataframe and keep our old one handy (you'll see why later). The new dataframe will initially be created with an integer index so we can rename the columns

In [None]:
# Since we are making a copy of the dataframe, we don't need inplace = True
sales_df = sales.rename(columns={'date':'ds', 'sales':'y'})
sales_df.head()

Now's a good time to take a look at your data.  Plot the data using pandas' ```plot``` function

In [None]:
# Prophet doesn't require the date ('ds') to be in the index, but df.plot() so we add .set_index('ds')
sales_df.set_index('ds').y.plot();

# Running Prophet

Now, let's set prophet up to begin modeling our data.

Note: Since we are using monthly data, you'll see a message from Prophet saying ```Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.```  This is OK since we are working with monthly data but you can disable it by using ```weekly_seasonality=True``` in the instantiation of Prophet.

In [None]:
# Instantiate Model
model = Prophet()

# Fit Model
model.fit(sales_df)

Forecasting is fairly useless unless you can look into the future, so we need to add some future dates to our dataframe. For this example, I want to forecast 2 years into the future, so I'll built a future dataframe with 24 periods since we are working with monthly data. Note the ```freq='m'``` inclusion to ensure we are adding 24 months of data.

This can be done with the following code:


In [None]:
# Creat future data frame
future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()

To forecast this future data, we need to run it through Prophet's model.

In [None]:
# Add predictions to the forecast dataframe
forecast = model.predict(future)

The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns.  First, let's look at the full dataframe:

In [None]:
forecast.tail().T

We really only want to look at yhat, yhat_lower and yhat_upper, so we can do that with:

In [None]:
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

# Plotting Prophet results

Prophet has a plotting mechanism called ```plot```.  This plot functionality draws the original data (black dots), the model (blue line) and the error of the forecast (shaded blue area).

In [None]:
# Plot the forecast
model.plot(forecast);

## Evaluating Prophet
The shaded blue area is the error of the forecast. But we can only eyeball it. Let's look at the R-squared (amount of variance) and Mean Squared Error. 

In [None]:
#To do this, we have to get the y-hat and original y's from the data
metric_df = pd.concat([forecast[['ds','yhat']],sales_df['y']], axis=1)
metric_df.head()

Remember we added 24 months of forecast so we don't have an actual `y` data for those months.

In [None]:
# check the tail, because we added 24 months of forecast.
metric_df.tail()

In [None]:
# The tail has NaN values, because they're predictions - there was no real Y. Let's drop those for model evaluation.
metric_df.dropna(inplace = True)

In [None]:
metric_df.tail()

Generate some metrics on our model.

In [None]:
#Let's take a look at the numbers - from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
print("R-squared: ", r2_score(metric_df['y'], metric_df['yhat']))
print("Mean Squared Error: ", mean_squared_error(metric_df['y'], metric_df['yhat']))
print("RMSE: ", np.sqrt(mean_squared_error(metric_df['y'], metric_df['yhat'])))

An R2 value of .99 is phenomenal... and too good to be true. Our massive MSE confirms any suspcion tha thte model is overfit and won't be very predictive in the future. Part of the problem in this example is that the its monthly, and there aren't enough data points to build a robust model. 

## Accounting for Seasonality and Trends

We can see from this data that there is a spike in the same month each year. While spike could be due to many different reasons, let's assume its because there's a major promotion that this company runs every year at that time, which is in December for this dataset.

When patterns repeat over *known, fixed periods* of time within a data set, we call this **seasonality**. A seasonal pattern exists when a series is influenced by factors related to the cyclic nature of time — i.e., time of month, quarter, year, etc. Seasonality is of a fixed and known period, otherwise it is not truly seasonality. Additionally, it must be either attributed to another factor or counted as a set of anomalous events in the data.

### Prophet calls them "holidays"

Because we know this promotion occurs every december, we want to use this knowledge to help prophet better forecast those months, so we'll use prohpet's ```holiday``` construct (explained here https://facebookincubator.github.io/prophet/docs/holiday_effects.html).

The holiday object is a pandas dataframe with the holiday and date of the holiday. For this example, the construct would look like this:

```promotions = pd.DataFrame({
  'holiday': 'december_promotion',
  'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
                        '2013-12-01', '2014-12-01', '2015-12-01']),
  'lower_window': 0,
  'upper_window': 0,
})```

This ```promotions``` dataframe consisists of promotion dates for Dec in 2009 through 2015,  The ```lower_window``` and ```upper_window``` values are set to zero to indicate that we don't want prophet to consider any other months than the ones listed.

In [None]:
# Build the promotions dataframe from above here - be sure you understand the syntax and logic!
promotions = pd.DataFrame({
  'holiday': 'december_promotion',
  'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
                        '2013-12-01', '2014-12-01', '2015-12-01']),
  'lower_window': 0,
  'upper_window': 0,
})

promotions

In [None]:
#Now let's set up prophet to model our data using holidays - Instantiate and fit the model
model = Prophet(holidays=promotions,
                weekly_seasonality=False)

model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
model.fit(sales_df)

In [None]:
#We've instantiated the model, so now we need to build our future dates to forecast into!
future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()

#... and then run our future data through prophet's model
forecast = model.predict(future)

forecast.head().T

In [None]:
#while our new df contains a bit of data, we only care about a few features...
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

## Visualizing with holidays!
Same as above at first blush!

In [None]:
# use Prophet's .plot() method to visualize your timeseries.
model.plot(forecast);

Prophet also allows you to examine the ```components``` of a timeseries using the ```.plot_components()``` method

In [None]:
# View the components
model.plot_components(forecast);

## Why holidays matter
Let's re-run our prophet model without holidays, for comparison

In [None]:
model_no_holiday = Prophet()
model_no_holiday.fit(sales_df)

In [None]:
future_no_holiday = model_no_holiday.make_future_dataframe(periods=24, freq = 'm')
future_no_holiday.tail()

In [None]:
forecast_no_holiday = model_no_holiday.predict(future)

There probably won't be a massive difference, given the small amount of data with which we're working on this example. But with greater data comes greater variance...

In [None]:
#join the dataframes
forecast.set_index('ds', inplace=True)
forecast_no_holiday.set_index('ds', inplace=True)
compared_df = forecast.join(forecast_no_holiday, rsuffix="_no_holiday")

In [None]:
#we're only interested in the predictions, and let's move back to the original scale
compared_df = compared_df[['yhat', 'yhat_no_holiday']]
compared_df.head()

In [None]:
# Create a feature that is the percentage difference between holiday vs. none
compared_df['diff_per'] = 100 * (compared_df['yhat'] - compared_df['yhat_no_holiday']) / compared_df['yhat_no_holiday']
print("difference: ", round(compared_df.diff_per.mean(), 2), "%")

This is a 8.37% difference which can be a huge amount of money left on the table if your business is a global enterprise!

# Prophet for Market prediction - lab time!
Prophet can detect changepoints in timeseries data, and we can often use it to our advantage. Let's grab FRED economic data and see how this goes.

In [None]:
#Download 01/2014 - 1/2023 current S&P500 data at https://fred.stlouisfed.org/series/SP500 and import it into pandas
market_df = pd.read_csv('./data/SP500.csv')
market_df.tail()

In [None]:
# There is a data issue that we need to work with
market_df['SP500'].iloc[2342]

In [None]:
# How many rows have the issue
market_df[market_df['SP500'] == '.']

In [None]:
#Let's fix the issue by make a list of index with '.'
data_issues = market_df[market_df['SP500'] == '.'].index.to_list()

# replace '.' with SP500 from the day before (index - 1)
for ind in data_issues:
    market_df['SP500'].iloc[ind] =market_df['SP500'].iloc[ind - 1]

# Check the correction was made
market_df[market_df['SP500'] == '.']

In [None]:
# Check the data types of are correct and fix and issues

In [None]:
# Prepare your data for prophet. Hint: prophet needs "ds" and "y"


In [None]:
# Look at a plot of  the data


>Notice the dip due to COVID from 2020-02-17 through 2020-06-17. Use the following code to create a non-recurring holiday due to covid.  Since the dip due to covid only occurs during 2020, we don't want the model to learn the pattern from the data. This code will tell the model that the dip only occurs on the dates that are specified.

In [None]:
from datetime import datetime
covid_dates = pd.date_range(start='2020-02-17',end='2020-06-17').strftime('%Y-%m-%d').tolist()

# Build the covid dataframe
covid = pd.DataFrame({
  'holiday': 'covid',
  'ds': pd.to_datetime(covid_dates),
  'lower_window': 0,
  'upper_window': 0,
})

covid

In [None]:
#Instantiate the model, and fit our data
model = Prophet(holidays = covid)

# Fit the model


In [None]:
#build the future dataframe, forecasting for 1 year from now. 
future = model.make_future_dataframe(periods = 365, freq = 'D')

# Create a forecast by passing the future into model.predict()


# View the forecast


In [None]:
#now plot it!


> Notice that dip in the actual data around March 2020 (black dots). Notice that the predicted values (blue line) does not repeat the pattern going forward in March 2021, March 2022, etc.. Remember that we told the model that the dip in March 2020 was a non recurring holiday and would not occur again.

In [None]:
# PLot the components


As we saw above, if you're trying to do shortterm trading then this model is useless. But if you are investing with a timeframe of months to years, this forecast might provide some value.

Our forecast does great at trending, but doesn't do well at catching the volatility of the market. This would be very good for 'riding trends', but not so good for catching peaks and dips. 

We can see this in the numbers as well

In [None]:
# Create a metrics_df by concat y-hat and original y's from the data


# The tail has NaN values, because they're predictions - there was no real Y. Let's drop those for model evaluation.


# Check the NaN were dropped


In [None]:
#calculate the r2


In [None]:
#MAE


In [None]:
#RMSE
