# Introduction


#### Business problem:
Pitney Bowes FDR business allows retailers to easily facilitate the return of merchandise (other services are Fullfilment and Delivery). Consumers can drop off the return parcels at any USPS office, and the parcel gets transported to the closest FDR hub to be inducted into the FDR network.

Once inducted into our network, the parcel get transported through the FDR network to the client's warehouse.

The volume of parcels that is being delivered to the clients' warehouse is driven by external factors like seasonal sales, but also by warehouse schedules, holidays etc.

Being able to give our clients reliable forecasts on what parcel volumes they can expect to arrive at their warehouse is very important for them, since the need these forecasts to staff the warehouse accordingly and efficiently manage the available warehouse space.

In this challenge, you are given 3 month of historic data for packages that have been delivered to a single client warehouse. <b>The aim is to use this historic data to predict the parcel volumes to arrive at the client's facility over the next 5 days (i.e., 5 numbers for Monday through Friday).</b>

#### Data:

The data is <b>aggregated by delivery or indiction date</b>. The <b>delivery column</b> shows the total number of parcels that have neem delivered to the client on any given day. The <b>induction columns</b> give you the induction volume per day accross our FDR facilities. The time it takes a parcel to travel from the induction facility to the client facility depends largely on the distance between these facilities.


#### Output Format:
TBD, but all we need are your <b>5 parcel-volume predictions for June, 3rd - June, 7th 2019</b>. To evaluate your preditcions, we use the Mean Absolute Percentage Error.

$MAPE = \frac{100}{n}\sum\limits_{i=1}^{n} \frac{|y_i - \hat{y}_i|}{y_i}$ 

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('work/shared'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [4]:
import warnings                                  # `do not disturbe` mode
warnings.filterwarnings('ignore')

import numpy as np                               # vectors and matrices
import pandas as pd                              # tables and data manipulations
import matplotlib.pyplot as plt                  # plots

from dateutil.relativedelta import relativedelta # working with dates with style
from scipy.optimize import minimize              # for function minimization

import scipy.stats as scs

from itertools import product                    # some useful functions

%matplotlib inline

In [5]:
#Reading the provided csv file into a pandas Dataframe
Delivery = pd.read_csv('~/work/shared/Delivery_Volume.csv', index_col=['DELIVERY_DATE'], parse_dates=['DELIVERY_DATE'])
Delivery.sort_index(inplace=True)
Delivery=Delivery.fillna(0)

In [None]:
#Data Sample
Delivery.head()

In [None]:
plt.figure(figsize=(15, 7))
#Delivery.plot.bar(figsize=(10,5))
plt.bar(Delivery.index,Delivery.DELIVERED_VOLUME)
plt.title('Volume Delivered by Date')
plt.grid(False)
plt.show()

The plot above shows the delivery volumes at the client facility per day. The client facility is closed on the Saturday and Sundays, which is why you don't see any deliveries on the weekends. There were no deliveries on Memorial Day either. As you can see, there is a strong seasonal pattern the volume of parcels fluctuates quite a bit by weekday. This is why the clients are dependent on volume forecasts so they can manage their warehouse accordingly.

# Example
## Moving Averages

There are many different ways in which you could try to tackle this problem.
A very simple approch would be to estimate the next days' volume by using the last days' volume. Given the daily fluctuation of volume, it is obvious that this is not a very promising approach. Another simple way would be to look at moving averages to smoothen out the predictions.

In [None]:
def moving_average(series, n):
    """
        Calculate average of last n observations
    """
    return np.average(series[-n:])

In [None]:
def plotMovingAverage(series, window, plot_intervals=False, scale=1.96, plot_anomalies=False,day_out=5):

    """
        series - dataframe with timeseries
        window - rolling window size 
        plot_intervals - show confidence intervals
        plot_anomalies - show anomalies 

    """
    
    Delivery['SMA_{}'.format(day_out)]=0
    for i in range(0,Delivery.shape[0]-day_out,day_out):
        for j in range(day_out):
            sum_val=0
            for k in range(window):
                sum_val+= Delivery.iloc[i+k,0]
            Delivery.loc[Delivery.index[i+j],'SMA_{}'.format(day_out)] = np.round((sum_val/window),1)
    rolling_mean=Delivery['SMA_{}'.format(day_out)]

    plt.figure(figsize=(15,5))
    plt.title("Moving average\n window size = {}".format(window))
    plt.plot(rolling_mean, "g", label="Rolling mean trend")

    # Plot confidence intervals for smoothed values
    if plot_intervals:
        mae = mean_absolute_error(series[window:].iloc[:,0], rolling_mean[window:])
        deviation = np.std(series[window:].iloc[:,0] - rolling_mean[window:])
        lower_bond = rolling_mean - (mae + scale * deviation)
        upper_bond = rolling_mean + (mae + scale * deviation)
        plt.plot(upper_bond, "r--", label="Upper Bond / Lower Bond")
        plt.plot(lower_bond, "r--")
        
        # Having the intervals, find abnormal values
        if plot_anomalies:
            anomalies = pd.DataFrame(index=series.index, columns=series.columns)
            anomalies[series<lower_bond] = series[series<lower_bond]
            anomalies[series>upper_bond] = series[series>upper_bond]
            plt.plot(anomalies, "ro", markersize=10)
        
    plt.plot(series[window:].iloc[:,0], label="Actual values")
    plt.legend(loc="upper left")
    plt.grid(True)

#### Parameters:
- window - no of days to consider when calculating the average
- day_out - the number of days for which you are generating the forecasts

This is what you get when you calculate the average using the volumes delivered over the last 2 days

In [None]:
plotMovingAverage(Delivery,2,day_out=5)

As you can see, the forecast is a bit more smooth, but it misses the peaks in delivery volume. Now let's try doing so using the volumes over the previous 7 days.

In [None]:
plotMovingAverage(Delivery, 7,day_out=5)

As expected, the predictions smoothen out more, but the also become less usefull since they eventually converge to the average daily delivery volume. Our clients need more accurate forecasts for the individual days.

### Notes and some ideas:
- You are not limited to the 13 columns (date, delivery total, induction totals for 11 facilities) provided in the data set, you can generate additional features that might be useful for your predictions. For instance, you could add a feature like weekday to your model. This might be helpful to capture weekday-specific patterns.

- Lag variables are very usefull for forecast problems. For instance, you could create additional columns like yesterdays volume (day-1), the day before yesterday (day-2), etc. This might be helpful when trying to use the induction volumes for your model, as they delivery volume follow the induction volumes (with a few days lag depending on where the induction happened).

- When building your model, make sure that you get a good understanding of the model's performance by using the historic data for back-testing. Common methods for forecasting problems are rolling window approaches.

- The examples above are very basic examples for illustration. You could for instance explore time-series packages that are readily availabe, or try to build a ML model that makes use of additional features you generate.

### Reach out if you have questions:
<span style="color:red">**Through which channel should they reach out?**</span>

### Further reading:
- [Time Series](https://www.kaggle.com/kashnitsky/topic-9-part-1-time-series-analysis-in-python/notebook)
- [Jupyter Notebooks](https://www.dataquest.io/blog/jupyter-notebook-tutorial/)

## Good luck!