# Exploratory Data Analysis

This is the second part of the Data Science hiring assessment. The assessment further develops the business question from the first interview. You are working on a case to forecast product demands. 

### Load libraries

We added a number of libraries that should help you get started with your exploratory data analysis. 

In [2]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from utils import read_demand, read_promotions, extend_promotions_days, merge

### Import Data

We provide two generated datasets with this exercise. 

The first dataset **demand.csv** represents the historic demand of three different products (SKUs) in three different supermarkets in the Netherlands. Data is available on a daily basis. 

The second **promotions.csv**, indicates different time points in which a certain SKU was on promotion in a specific supermarket. The dataset only shows the first day of the promotion period. Overall, a product is on promotion for one week.

The functions to import the data are provided as well as a function that extends the promotion data to cover the entire promotion period. 

In [3]:
# Demand data
demand = read_demand("./demand.csv")
demand.head()

Unnamed: 0_level_0,demand,sku,supermarket
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-01-01,93.0,desperados,albert-heijn
2019-01-02,93.0,desperados,albert-heijn
2019-01-03,94.0,desperados,albert-heijn
2019-01-04,95.0,desperados,albert-heijn
2019-01-05,92.0,desperados,albert-heijn


In [4]:
# promotion data
promotions = read_promotions( "./promotions.csv")
promotions.head()

Unnamed: 0_level_0,sku,supermarket
promotion_date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-09-26,desperados,jumbo
2019-09-18,desperados,jumbo
2021-09-28,desperados,jumbo
2021-02-11,desperados,jumbo
2021-10-03,desperados,dirk


In [6]:
extended_promotions = extend_promotions_days(promotions, 7).drop("promotion_id", axis=1)
df = merge(demand, extended_promotions)
df.head()

Unnamed: 0_level_0,demand,sku,supermarket,promotion
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-01-01,93.0,desperados,albert-heijn,False
2019-01-02,93.0,desperados,albert-heijn,False
2019-01-03,94.0,desperados,albert-heijn,False
2019-01-04,95.0,desperados,albert-heijn,False
2019-01-05,92.0,desperados,albert-heijn,False


### Part 1: Data quality
The first part of the assignment is to check the quality of the data provided and clean it if necessary.

### Part 2: Exploration
The business wants to predict the demand for each SKU per supermarket _eight weeks in advance_. Perform any kind of explorative analysis that provides initial insights to answer this question historically. You are free to do any kind of analysis, but try to converge back to the business question at the end. 

Your final result should cover two main aspects:
1. You can report back some answers about demand to the business
2. You have initial understanding of the features that affect demand, to create a model next.