# Modelling exercise

This is the third part of the Data Science hiring assessment. The assessment further develops the business question from the first interview. 

The goal is to forecast demand for a given product(SKU)-Supermarket combination

### Load basic packages


In [7]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from utils import read_demand, read_promotions

### Import Data


In [8]:
# Demand data
demand = read_demand("./demand.csv")
demand.head(2)

Unnamed: 0_level_0,demand,sku,supermarket
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-01-01,93.0,desperados,albert-heijn
2019-01-02,93.0,desperados,albert-heijn


In [9]:
# promotion data
promotions = read_promotions( "./promotions.csv")
promotions.head(2)

Unnamed: 0_level_0,sku,supermarket
promotion_date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-09-26,desperados,jumbo
2019-09-18,desperados,jumbo


# Prepare the data for modelling

Cleans the demand, merges with the promotion data, and aggregates to weekly data

In [10]:

from utils import merge, clean_demand_per_group, aggregate_to_weekly, extend_promotions_days

def prepare_data(demand, promotions):
    cleaned_demand = clean_demand_per_group(demand)
    extended_promotions = extend_promotions_days(promotions, 7)
    daily = merge(cleaned_demand, extended_promotions)
    weekly = aggregate_to_weekly(daily)
    return weekly

df = prepare_data(demand, promotions)
df.head(10)

Unnamed: 0_level_0,sku,supermarket,demand,promotion
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-01-06,desperados,albert-heijn,554.0,False
2019-01-13,desperados,albert-heijn,635.0,False
2019-01-20,desperados,albert-heijn,637.0,False
2019-01-27,desperados,albert-heijn,636.0,False
2019-02-03,desperados,albert-heijn,1013.0,False
2019-02-10,desperados,albert-heijn,644.0,False
2019-02-17,desperados,albert-heijn,635.0,False
2019-02-24,desperados,albert-heijn,637.0,False
2019-03-03,desperados,albert-heijn,644.0,False
2019-03-10,desperados,albert-heijn,638.0,False


## Steps to take

General guideline to tackle the exercise, this is just for your reference, add/subtract as you think would be best for the problem

- Feature engineering and data preparation for modelling
    - Prepare the data for modelling and create new features if possible. 
- Create a model
    - The business wants to predict the demand for each SKU per supermarket eight weeks in advance. 
    - You are free to choose any kind of model that you like and think would fit best for the exercise. No need to focus on optimizing the error metric with an advanced model
    - Your model should be able to predict the demand for each SKU and supermarket eight weeks in advance
    - What is the impact of different features/variables on the demand?    
- Evaluation
    - Feel free again to choose any metric that you think fits best to the given problem
    - We would like to understand why you have chosen a particular metric and the process of validation and evaluation of the model. It would also be good if you can tie this up with how the business would evaluate the solution and what kind of impact it can have on the existing process    
    