# Grupo Bimbo Inventory Demand 

**_DESCRIPTION_**:
Planning a celebration is a balancing act of preparing just enough food to go around without being stuck eating the same leftovers for the next week. The key is `anticipating how many guests will come`. Grupo Bimbo must weigh similar considerations as it strives to meet daily consumer demand for fresh bakery products on the shelves of over 1 million stores along its 45,000 routes across Mexico.

Currently, daily inventory calculations are performed by direct delivery sales employees who must single-handedly predict the forces of supply, demand, and hunger based on their personal experiences with each store. With some breads carrying a one week shelf life, the acceptable margin for error is small.

In this competition, Grupo Bimbo invites Kagglers to develop a `model to accurately forecast inventory demand based on historical sales data`. Doing so will make sure consumers of its over 100 bakery products aren’t staring at empty shelves, while also reducing the amount spent on refunds to store owners with surplus product unfit for sale.

**_EVALUATION_**:
The evaluation metric for this competition is `Root Mean Squared Logarithmic Error`.

The RMSLE is calculated as:

$$\epsilon = \sqrt{\frac{1}{n} \sum_{i=1}^n (\log(p_i + 1) - \log(a_i+1))^2 }$$
Where:

ϵ is the RMSLE value (score)
n is the total number of observations in the (public/private) data set,
pi is your prediction of demand, and
ai is the actual demand for i.
log(x) is the natural logarithm of x
Submission File
For every row in the dataset, submission files should contain two columns: id and Demanda_uni_equi.  The id corresponds to the column of that id in the test.csv. The file should contain a header and have the following format:

**_DATA_**: https://www.kaggle.com/c/5260/download-all

## Model Baseline - Day 1

In [1]:
# Import packages
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

# Autoreload
%load_ext autoreload
%autoreload 2

# Warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Import data
from bimbo.data import Bimbo

data = Bimbo().get_data()

# Load in the train and test datasets
train_df = train_df = data['train']
test_df = data['test']
cliente_tabla_df = data['cliente_tabla']
producto_tabla_df = data['producto_tabla']
town_state_df = data['town_state']
sample_submission_df = data['sample_submission']

In [3]:
week_numbers_list = train_df.Semana.unique()
client_id_list = train_df.Cliente_ID.unique()
product_id_list = train_df.Producto_ID.unique()
agencies_id_list = train_df.Agencia_ID.unique()

In [4]:
baseline_df = train_df.groupby(['Semana','Cliente_ID','Producto_ID']).Demanda_uni_equil.sum()
baseline_df = baseline_df.reset_index()

In [5]:
baseline_df_pivot = baseline_df.pivot_table('Demanda_uni_equil', ['Cliente_ID','Producto_ID'], 'Semana')

In [6]:
baseline_df_pivot = baseline_df_pivot.fillna(0)

In [7]:
baseline_df_pivot['S8_pred'] = baseline_df_pivot.iloc[:,:5].mean(axis=1)
baseline_df_pivot['S9_pred'] = baseline_df_pivot.iloc[:,:5].mean(axis=1)

In [8]:
from sklearn.metrics import mean_squared_log_error
from math import sqrt

y_test_8 = baseline_df_pivot.iloc[:,6]
y_pred_8 = baseline_df_pivot.iloc[:,8]
rmsle = sqrt(mean_squared_log_error(y_test_8, y_pred_8))
print(f'rmsle {rmsle}')

rmsle 0.969430282620381


In [9]:
baseline_df_pivot['S8_pred'] = baseline_df_pivot.iloc[:,5]
baseline_df_pivot['S9_pred'] = baseline_df_pivot.iloc[:,5]

In [10]:
baseline_df_pivot

Unnamed: 0_level_0,Semana,3,4,5,6,7,8,9,S8_pred,S9_pred
Cliente_ID,Producto_ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
26,1182,39.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
26,4767,42.0,0.0,0.0,0.0,42.0,42.0,0.0,42.0,42.0
26,30235,0.0,0.0,0.0,0.0,0.0,0.0,96.0,0.0,0.0
26,30314,0.0,0.0,0.0,0.0,48.0,0.0,0.0,0.0,0.0
26,31393,20.0,16.0,15.0,15.0,18.0,22.0,13.0,22.0,22.0
26,31518,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
26,31690,42.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
26,32953,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0
26,32962,3.0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0
26,33246,30.0,30.0,10.0,10.0,30.0,0.0,0.0,0.0,0.0


In [11]:
y_test_8 = baseline_df_pivot.iloc[:,6]
y_pred_8 = baseline_df_pivot.iloc[:,8]
rmsle = sqrt(mean_squared_log_error(y_test_8, y_pred_8))
print(f'rmsle {rmsle}')
#ADD MAE
#ADD Distrib MAE on products

rmsle 0.8439183827959085
