# Popcorn Inventory Recommendations

This iPython Notebook is a tool for suggesting inventory levels for different popcorn products at each storefront. For more information about how it can be used, see the external documentation.

Note: Any time you change a cell, you must run that cell for changes to take effect.

Listed below are the libraries used in this file. If it is ever run offline, these python libraries will need to be installed.

In [None]:
# imports
from math import ceil # round up function
import datetime # handle dates
import json # handle json data
import pandas as pd # create dataframes (tables) and series of data
import numpy as np # handle numeric computation
import matplotlib.pyplot as plt # plotting
import scipy.stats as st # statistical calculations
from scipy.stats import burr, norm # distribution functions
from sklearn.linear_model import LinearRegression # linear regression functionality

This file supports loading data from Google Drive. If you would like to use Google Drive for storing data, upload it and run the cell below. If not, don't run the cell. You can delete it, or stop it from running by putting # characters at the beginning of each line.

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

## File Paths

This code block tells the program where to look for the data files. If you are using Google Drive, the data_folder variable should begin with "drive/MyDrive/", with any further subfolders listed after. For example:
```
# data folder path
data_folder = "drive/MyDrive/data/"
```
If you would rather simply upload the data files into Google Colab, you can put them in a folder:
```
# data folder path
data_folder = "data/"
```
Or, if you just put them in the default files directory, use the following:
```
# data folder path
data_folder = "./"
```
The period indicates we are using the current folder.

For the files, make sure they are all in the same folder, which matches the folder path you gave above. Then, input the names of the files you would like to use into the variables below. For example:
```
# file path for sales history
sales_history_filename = data_folder + "PopcornSalesHistory.csv"
# file path for cost basis data
cost_basis_filename = data_folder + "flavor_pricing.csv"
# file path for product weights and best by periods
shelf_life_filename = data_folder + "shelf_lives.csv"
```

In [None]:
# FILE PATHS
# data folder path
data_folder = "drive/MyDrive/data/"
# file path for sales history
sales_history_filename = data_folder + "PopcornSalesHistory.csv"
# file path for cost basis data
cost_basis_filename = data_folder + "flavor_pricing.csv"
# file path for product weights and best by periods
shelf_life_filename = data_folder + "shelf_lives.csv"

## Configurable Variables
These variables can be changed to modify how the code works and what type of data it will output. 
* `target_products`: This is a list of the flavors to get inventory recommendations for. If you choose to edit it, be sure each flavor is in quotation marks, different flavors are separated by a comma, and that there are square brackets at the beginning and end of the list.
* `target_locations`: This is a list of the locations to make suggestions for. The same editing rules apply as when editing `target_products`.
* `sizes`: These are the size names to consider when making suggestions. The same editing rules apply as when editing `target_products` and `target_locations`, as this is another list.
* `start_date_string`: The data prior to a particular date may be too outdated to use. Change this variable to change when sales history data first starts to be considered. It must be a date in the format "YYYY-MM-DD".
* `forecast_week`: This variable changes how many weeks in advance we are forecasting. The default (zero) is to suggest inventory for the week beginning the next Sunday (or today, if today is Sunday). Increasing this number forecasts that number of weeks past the default week. 







In [None]:
# CONFIGURABLE VARIABLES
# some are redacted to avoid violating our NDA
target_products = ["redacted"]
# limited the locations I'm considering, also for lack of data
target_locations = ["redacted"]
# limited sizes, same reason
sizes = ["redacted"]
# option to remove large orders
remove_large_orders = False
# forecast week
# 0 will forecast the next week, 1 will forecast the week after, etc.
forecast_week = 0

## Functions
The functions below are essential to the operation of the suggestion program. They should only be modified by someone who definitely knows what they're doing. 

In [None]:
def within_days_before(date, days, day, month, year=None):
    """This function indicates whether a date is within a certain number of 
    days prior to another date. 

    Parameters:
    date (datetime.date): the date we are checking to see if it is close to 
        another date
    days (int): the number of days within which date is of the other day
    day (int): day of the month for the target date
    month (int): month 1-12 of the target date
    year (int): if there is a particular year, this is it. If None, checks all
        surrounding years

    Returns:
    bool: True if the date is within `days` days of the specified date, False
        otherwise
    """
    if year==None:
        targets = [
            # the current year
            datetime.date(year=date.year, month=month, day=day),
            # the next year, in case the target date already happened this year
            datetime.date(year=date.year+1, month=month, day=day)
        ]
    else:
        # if a year is given, use that year
        targets = [datetime.date(year=year, month=month, day=day)]
    # if the date is within the given day in any of the possible target years, 
    # return true
    return np.any([0 < (t-date).days <= days for t in targets])

In [None]:
def my_cdf(x, c, d, loc, scale, date, b0, b1, min_residual):
    """This function calculates the value of the burr cumulative distribution
    of the residual for x for a given linear regression.

    Parameters:
    x (float): the sales values for a given date
    c (float): the constant c for the burr distribution
    d (float): the constant d for the burr distribution
    loc (float): the x location of the burr distribution
    scale (float): the scaling factor of the burr distribution
    date (int): the date in ordinal form (use datetime.date.toordinal())
    b0 (float): the y-intercept of the regression
    b1 (float): the slope of the regression
    min_residual (float): the minimum residual, added to the residual to insure 
    it is not zero

    Returns:
    float: the value of the CDF at the given location
    """
    # first, find the residual based on the regression
    residual = max([x-(date*b1 + b0)-min_residual,0])
    # next, find the cdf of the residual
    return burr.cdf(residual, c, d, loc=loc, scale=scale)

In [None]:
def get_weekly_sales(sales_df):
    """Create the weekly sales data from a give dataframe of sales.

    Parameters:
    sales_df (pandas.DataFrame): a dataframe of sales data; if you only want 
        to examine one flavor, remove all the others BEFORE passing it here

    Returns:
    pandas.DataFrame: a dataframe containing columns for the week and for the 
        sales quantity during that week
    """
    # sum the total quantity sold for each day to get daily sales
    daily_sales = sales_df[["Date", "Total quantity sold"]].groupby("Date").apply(np.sum)[["Total quantity sold"]]
    # convert the date to a datetime.date object for each daily sales row
    daily_sales["datetime"] = [datetime.date(int(row[0][:4]),int(row[0][5:7]),int(row[0][8:])) for row in daily_sales.iterrows()]
    # i want the week to start with sunday, so find the sunday immediately preceding the start of the data set
    first_day = min(daily_sales["datetime"])-datetime.timedelta(days=1+min(daily_sales["datetime"]).weekday())
    # create a dictionary to hold lists of the week and sales
    weekly_sales = {'week':[], 'sales':[]}
    # loop over every sunday over the whole data set
    for i in range(ceil((max(daily_sales["datetime"])-first_day).days/7)):
        # add the week to the weeks list
        weekly_sales['week'].append(first_day + datetime.timedelta(days=i*7))
        # add the sum of the sales for every day in the next 7 days to the sales list
        weekly_sales['sales'].append(np.sum(daily_sales[[first_day+datetime.timedelta(days=7*i) <= x <= first_day+datetime.timedelta(days=7*(i+1)) for x in daily_sales["datetime"]]]["Total quantity sold"]))
    # remove the last week because it is incomplete
    weekly_sales = pd.DataFrame(weekly_sales, index=weekly_sales['week']).iloc[:-1,:].copy()
    return weekly_sales

In [None]:
def open_data_files():
    """This is a helper function for opening all the files, it uses the 
    file paths given at the beginning.

    Returns
    pd.DataFrame: Dataframe of sales data
    pd.DataFrame: Dataframe of cost basis data
    pd.DataFrame: Dataframe of shelf life data
    """
    # open sales data
    sales = pd.read_csv(sales_history_filename)
    # substitute location IDs with strings
    sales['Order Location Id'] = [KNOWN_LOCATIONS[int(x)] for x in sales['Order Location Id']]
    # open cost basis data
    cost_basis_data = pd.read_csv(cost_basis_filename, index_col=0)
    # open shelf life data
    shelf_lives = pd.read_csv(shelf_life_filename)
    return (sales, cost_basis_data, shelf_lives)

In [None]:
def get_needed_forecast(weekly_sales):
    """Get the flavor category for a given flavor.

    Parameters:
    weekly_sales (pandas.DataFrame): a dataframe containing weekly sales for the data

    Returns:
    int: the number of weeks between the end of the dataset and next week
    """
    last_date = max(weekly_sales.week)
    today = datetime.date.today()
    next_week = today+datetime.timedelta(days=6-today.weekday())
    return int((next_week-last_date).days/7)

### Inventory Suggestions Function
The function below is the primary function for making the inventory suggestions.

In [None]:
def get_inventory_suggestions(weekly_sales, profit_per_unit, cost_basis, shelf_life_weeks, forecast_length=52):
    """Adds a column to the weekly_sales dataframe with suggested inventory 
    levels.

    This algorithm consists of two main parts:
    1. Construct a probability distribution of expected sales for a given week. 
    2. Use the probability distribution to maximize inventory while keeping 
       expected sales:waste ratio within a given constraint.

    Presently, the probability distribution is constructed by fitting a Burr
    distribution to the residuals of a simple linear regression on weekly
    sales. This could be swapped for another method for constructing a 
    probability distribution.

    The second part of the algorithm could also potentially be replaced, but 
    the goal should still be to have enough in inventory to meet demand most
    weeks, but not to waste very much over the course of a shelf-life period.

    Parameters:
    weekly_sales (pandas.DataFrame): a dataframe containing weekly sales for the data
    profit_per_unit (float): the profit per unit sold (price minus cost basis)
    cost_basis (float): the cost to produce a given product
    shelf_life_weeks (float): the shelf life of the product in weeks

    Returns:
    pandas.DataFrame: Dataframe containing weekly sales and recommended 
    inventory levels
    """
    ### remove 4 weeks preceding christmas (huge outliers)
    # copy the sales data frame 
    weekly_sales_no_christmas = weekly_sales.copy()
    # for each year in the data set
    for year in set([d.year for d in weekly_sales_no_christmas['week']]):
        # set the sales value for the 4 weeks preceding Christmas to the 
        # average sales in that year prior to the 4 weeks preceding Christmas
        weekly_sales_no_christmas.loc[weekly_sales['week'].apply(within_days_before, args=(28, 25, 12),year=year),'sales'] = np.average(weekly_sales[(~weekly_sales['week'].apply(within_days_before, args=(28, 25, 12)))&([d.month!=12 for d in weekly_sales['week']])&([d.year==year for d in weekly_sales['week']])]['sales'])
    
    ### linear regression on sales without christmas, calculate residuals
    # fit a linear regression with date as explanatory variable and sales as dependent variable
    reg = LinearRegression().fit(np.array([d.toordinal() for d in weekly_sales_no_christmas.week]).reshape(-1,1), np.array(list(weekly_sales_no_christmas.sales)).reshape(-1,1))
    # get the regression predicted values for each week
    y_pred = reg.predict(np.array([d.toordinal() for d in weekly_sales_no_christmas.week]).reshape(-1,1))
    # get the regression parameters (y interept and slope)
    reg_params = [reg.intercept_[0], reg.coef_[0][0]]
    #if reg_params[1]<0:
    #    reg_params[1]=0
    #    reg_params[0]=(y_pred[-1]+y_pred[0])/2
    # add the regression predictions to the dataframe
    weekly_sales_no_christmas['reg_pred'] = y_pred.reshape(-1)
    # create residuals from the regression by subtracting the regression 
    # predictions from the true values
    weekly_sales_no_christmas['residuals'] = weekly_sales_no_christmas['sales']-weekly_sales_no_christmas['reg_pred']
    # because the burr distribution only works for positive values, normalize 
    # all the residuals to be greater than or equal to zero
    weekly_sales_no_christmas['pos_residuals'] = [x-min(weekly_sales_no_christmas['residuals']) for x in weekly_sales_no_christmas['residuals']]
    
    ### fit burr distribution to regression residuals
    # a dictionary for storing the parmeters needed to specify a burr 
    # distribution
    burr_params = {}
    # fit the burr distribution to the positive residuals and unpack the 
    # parameters into the dictionary
    burr_params['c'], burr_params['d'], burr_params['loc'], burr_params['scale'] = burr.fit(weekly_sales_no_christmas['pos_residuals'])
    
    ### set inventory levels
    # get the first day from the sales data
    first_date = min(weekly_sales.week)
    # create a dictionary to store both the week and the suggested inventory
    inventory_levels = {'week':[],'inventory':[]}
    # for each week plus some number of forecasting
    for i in range(len(weekly_sales)+forecast_length):
        # initialize profit to absurdly high
        profit_loss = 9999999
        # initialize inventory to zero
        inventory = 0
        # set the week to the starting week plus the current index
        # this allows us to loop over every week
        week = first_date+datetime.timedelta(days=7*i)
        # create the list of params to use for the my_cdf() function
        params = list(burr_params.values())+[week.toordinal(), reg_params[0], reg_params[1], min(weekly_sales_no_christmas['residuals'])]
        # while the profit/loss ratio is better than the minimum desired ratio
        while profit_loss > pl_ratio:
            # increment the amount in inventory
            inventory+=1
            # expected profit is the probability we will sell more than the 
            # current inventory times profit per unit
            expected_profit = (1-my_cdf(inventory,*params))*profit_per_unit
            # expected loss is the probability we will sell less than the 
            # current inventory, divided by shelf life in weeks, times the 
            # cost basis per unit
            expected_loss = (my_cdf(inventory-1, *params)**(shelf_life_weeks))*cost_basis
            # profit loss ratio, but make sure we don't divide by zero
            profit_loss = expected_profit/max([expected_loss,0.0001])
        # when the loop is completed, we've found the maximum viable inventory
        # level, so we add the week to the weeks list
        inventory_levels['week'].append(week)
        # and we add the inventory level to the inventory list
        inventory_levels['inventory'].append(inventory)
    # once we've found the inventory for every week, we create a series from 
    # the dictionary because it is more convenient to add this to the dataframe
    inventory_levels = pd.Series(inventory_levels['inventory'], index=inventory_levels['week'], name="inventory_suggest")
    # then, we add the inventory suggestions to the dataframe without christmas
    weekly_sales_no_christmas = pd.concat([weekly_sales_no_christmas, inventory_levels], axis=1)
    
    ### account for the 4 weeks preceding Christmas
    # get the 4 weeks preceding christmas for each year in the data set
    christmas_weeks = weekly_sales.loc[weekly_sales['week'].apply(within_days_before, args=(28, 25, 12)),'week']
    # calculate the residuals for these weeks compared to the regresssion 
    christmas_residuals = np.array([weekly_sales.loc[w,"sales"] - reg.predict(np.array([w.toordinal()]).reshape(-1,1))[0] for w in christmas_weeks]).reshape((-1, 4))
    # the average residual uses a weighted average, where every additional year 
    # between the given year and the present halves the given year's influence
    # in the weighted average
    average_residual = [sum([val*(i+1)/(2**(len(column))-1) for i, val in enumerate(column)]) for column in christmas_residuals.T]
    # add the inventory suggestions to the dataframe with christmas
    weekly_sales = pd.concat((weekly_sales, weekly_sales_no_christmas['inventory_suggest']),axis=1)
    # update the week attribute to include the additional weeks we just added
    weekly_sales['week'] =pd.Series(weekly_sales.index.values, index=weekly_sales.index)
    # update the christmas weeks to include any weeks we are forecasting for
    # the future
    christmas_weeks = weekly_sales.loc[weekly_sales['week'].apply(within_days_before, args=(28, 25, 12)),'week']
    # for each week in the christmas weeks
    for i, w in enumerate(christmas_weeks):
        # add the calculated weighted average (or zero, if it is somehow negative)
        weekly_sales.loc[w, 'inventory_suggest'] += (ceil(average_residual[i%4]) if average_residual[i%4]>0 else 0)
    
    ### return the dataframe that now contains inventory suggestions
    return weekly_sales

## Main Program Functionality
The cell below contains the main functionality of the program. Running this cell (as long as all other needed cells have already been run) will generate inventory recommendations for the selected products and locations. 

In [None]:
# allow pandas to print more rows of dataframes
pd.set_option('display.max_rows', 500)
# open all data
sales, cost_basis_data, shelf_lives = open_data_files()
# the probailities to use for the conservative, neutral, and aggressive
probabilities = [0.4, 1, 2.5]
# for each location
for location in target_locations:
    all_products = []
    conservative = []
    neutral = []
    aggressive = []
    # and for each product
    for product in target_products:
        # and for each size
        for size in sizes:
            all_products.append((product + ' ' + size).replace(" ", "_"))
            for pl_ratio in probabilities:
                # get the sales only for that product and size and location
                flavor_sales = sales[(
                    (sales["Product name"]==product) & 
                    (sales["Order Location Id"]==location) &
                    (sales["Variant name"]==size) &
                    (sales["Date"]>start_date_string)
                )].copy()
                # if we are removing large orders, remove them
                if remove_large_orders:
                    # currently, this removes orders 3 standard deviations above
                    # the mean, but we may want to use a method that relies less
                    # on normality
                    flavor_sales = flavor_sales[sales["Total sales"]<=np.average(sales["Total sales"])+3*np.std(sales["Total sales"])].copy()
                # if there are fewer than 10 sales, we just skip this data
                if len(flavor_sales)<10:
                    if pl_ratio < 1:
                        aggressive.append(0)
                    elif pl_ratio > 1:
                        conservative.append(0)
                    else:
                        neutral.append(0)
                    continue
                ### get variables about product
                # get the dataframe of weekly sales
                weekly_sales = get_weekly_sales(flavor_sales)
                # get the sales price of the current flavor
                price = flavor_sales['Variant Price'].iloc[0]
                # get the cost basis of the current flavor
                cost_basis = cost_basis_data.loc[product, size]
                # get the profit margin by substracting price from the cost basis
                profit_margin = price - cost_basis
                # get the product shelf life
                shelf_life_weeks = shelf_lives.loc[shelf_lives['product'] == product, 'shelf_life_weeks'].iloc[0]
                # get inventory suggestions
                weekly_sales = get_inventory_suggestions(weekly_sales, profit_margin, cost_basis, shelf_life_weeks, forecast_length=get_needed_forecast(weekly_sales)+forecast_week)
                if pl_ratio < 1:
                    aggressive.append(weekly_sales.inventory_suggest[-1])
                elif pl_ratio > 1:
                    conservative.append(weekly_sales.inventory_suggest[-1])
                else:
                    neutral.append(weekly_sales.inventory_suggest[-1])
    this_week_inventory = pd.DataFrame({
        'conservative': conservative,
        'neutral': neutral,
        'aggressive': aggressive},
        index=all_products)
    print(location)
    print(this_week_inventory[(this_week_inventory != 0).any(1)])
# print(f"average profit {sum(profits)/len(profits)} with ratio {pl_ratio}")

Conservative indicates a higher likelihood of stockout, but lower likelihood of having waste. Aggressive indicates a higher likelihood of waste, but lower likelihood of stocking out.