## Objective of the Case Study
How many ordered items will be exported to a specific warehouse (warehouse 2) on a daily basis for each country based on the sales forecast.

To achieve this we will accumulate the item values from different types of ordering processess, all of them have different export rules.

## Imports

In [2]:
import pandas as pd

## Fetch data

In [3]:
# Here we are reading out csv files
b2c_sales_df = pd.read_csv('/Users/nitinnandansingh/Documents/workspace/AboutYouLogisticsTask/assets/b2c_items_sales_forecast - b2c_items_sales_forecast.csv')
icd_sales_df = pd.read_csv('/Users/nitinnandansingh/Documents/workspace/AboutYouLogisticsTask/assets/icd_items_sales_forecast - icd_items_sales_forecast.csv')
it_cd_shift_df = pd.read_csv('/Users/nitinnandansingh/Documents/workspace/AboutYouLogisticsTask/assets/item_cross_docking_shift - item_cross_docking_shift.csv')
it_blocked_df = pd.read_csv('/Users/nitinnandansingh/Documents/workspace/AboutYouLogisticsTask/assets/items_blocked_per_cd_item - items_blocked_per_cd_item.csv')

In [4]:
# i will first convert date values into datetime type. 
# This information I gathered after performing some analyses beforehand. 
# Please refer to the notebook called explore_assets for detailed exploration of the datasets.

b2c_sales_df['date'] = pd.to_datetime(b2c_sales_df['date'])
b2c_sales_df['forecast_date'] = pd.to_datetime(b2c_sales_df['forecast_date'])


icd_sales_df['date'] = pd.to_datetime(icd_sales_df['date'])
icd_sales_df['forecast_date'] = pd.to_datetime(icd_sales_df['forecast_date'])


it_cd_shift_df['exported_at_picking_warehouse'] = pd.to_datetime(it_cd_shift_df['exported_at_picking_warehouse'])


it_blocked_df['date'] = pd.to_datetime(it_blocked_df['date'])

### Let's look at the data

In [5]:
b2c_sales_df.head(3)

Unnamed: 0,forecast_date,version,scenario_id,date,planning_shop_cluster,shipping_wh,sales_forecast_items
0,2024-06-04,1,0,2024-05-27,AT,1,2277
1,2024-06-04,1,0,2024-05-27,AT,2,0
2,2024-06-04,1,0,2024-05-27,AT,4,0


In [6]:
icd_sales_df.head(3)

Unnamed: 0,forecast_date,version,scenario_id,date,planning_shop_cluster,shipping_wh,picking_wh,item_cd
0,2024-06-04,1,0,2024-07-22,CZ,3,3,0
1,2024-06-04,1,0,2024-05-31,HU,3,3,0
2,2024-06-04,1,0,2024-08-24,NO,3,3,0


In [7]:
it_cd_shift_df.head(3)

Unnamed: 0,exported_at_picking_warehouse,picking_warehouse_id,shipping_warehouse_id,day,share
0,2024-05-27,1,2,0,0.02
1,2024-05-27,1,2,1,0.23
2,2024-05-27,1,2,2,0.61


In [8]:
it_blocked_df.head(3)

Unnamed: 0,date,shipping_wh,country_code,items_per_cross_docking_cd_item
0,2023-03-01,1,AT,3.17
1,2023-03-01,2,AT,3.17
2,2023-03-01,3,AT,3.17


## Tasks Implementation

### Step 1

Based on export rules, aggregate B2C (shipping warehouse) and SCD (picking warehouse) on exported_at for warehouse_id=2

In [9]:
# Here first I filter the b2c sales data with shipping warehouse number 2 and then group them on date and planning_shop_cluster while summing up the sales_forecast_items
b2c_filtered = b2c_sales_df[b2c_sales_df['shipping_wh'] == 2]
b2c_aggregated = b2c_filtered.groupby(['date', 'planning_shop_cluster'], as_index=False)['sales_forecast_items'].sum()

# For my ease, i will rename the columns. The date column in this table is same as exported_at date as in regular b2c order date created = date exported.
b2c_aggregated.rename(columns={'date': 'exported_at', 'sales_forecast_items': 'b2c_items'}, inplace=True)
print(b2c_aggregated.shape)
b2c_aggregated.head()

(2639, 3)


Unnamed: 0,exported_at,planning_shop_cluster,b2c_items
0,2024-05-27,AT,0
1,2024-05-27,BE,0
2,2024-05-27,BG,754
3,2024-05-27,CH,0
4,2024-05-27,CY,34


In [10]:
# Here I am filtering the data from icd_sales dataframe based on picking warehouse 2 and then grouping them on date and planning_shop_cluster.
scd_filtered = icd_sales_df[icd_sales_df['picking_wh'] == 2]
scd_aggregated = scd_filtered.groupby(['date', 'planning_shop_cluster'], as_index=False)['item_cd'].sum()

# Here also I am renaming the columns and date created = date exported, in the case of scd.
scd_aggregated.rename(columns={'date': 'exported_at', 'item_cd': 'scd_items'}, inplace=True)
print(scd_aggregated.shape)
scd_aggregated.head()

(2639, 3)


Unnamed: 0,exported_at,planning_shop_cluster,scd_items
0,2024-05-27,AT,274
1,2024-05-27,BE,47
2,2024-05-27,BG,0
3,2024-05-27,CH,392
4,2024-05-27,CY,0


In [11]:
# Pivot tables for to get it in the form required in the task. 
# though, we will continue to use the previous versions of the dataframe for further analysis.
b2c_pivot = b2c_aggregated.pivot(index='planning_shop_cluster', columns='exported_at', values='b2c_items').fillna(0)
scd_pivot = scd_aggregated.pivot(index='planning_shop_cluster', columns='exported_at', values='scd_items').fillna(0)


In [12]:
print(b2c_pivot.shape)
b2c_pivot.head()

(29, 91)


exported_at,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,...,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25
planning_shop_cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
BE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
BG,754,705,846,945,996,964,928,608,773,1120,...,1982,2119,2339,627,903,956,956,956,956,1009
CH,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
CY,34,41,49,51,56,41,42,41,43,46,...,71,73,97,29,33,36,37,32,33,37


In [13]:
print(scd_pivot.shape)
scd_pivot.head()

(29, 91)


exported_at,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,...,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25
planning_shop_cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT,274,296,455,536,472,479,602,392,419,449,...,471,487,600,264,297,304,396,271,291,350
BE,47,76,89,83,83,84,91,75,82,71,...,110,90,145,43,45,54,54,50,50,63
BG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
CH,392,573,677,746,644,728,954,595,628,684,...,415,435,551,258,291,309,297,284,303,383
CY,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Step 2

Aggregate ICD for shipping_warehouse_id=2 - this is the amount of items we forecast which will be needed to be sourced from other warehouses to fulfil the B2C orders in warehouse_id=2

In [14]:
# First we begin with filtering data for shipping warehouse 2.

icd_filtered = icd_sales_df[icd_sales_df['shipping_wh'] == 2]
print(icd_filtered.shape)
icd_filtered.head()


(10556, 8)


Unnamed: 0,forecast_date,version,scenario_id,date,planning_shop_cluster,shipping_wh,picking_wh,item_cd
13,2024-06-04,1,0,2024-06-07,GR,2,3,81
23,2024-06-04,1,0,2024-07-30,NL,2,4,0
32,2024-06-04,1,0,2024-06-04,ZZ,2,1,0
37,2024-06-04,1,0,2024-08-08,DE,2,2,0
38,2024-06-04,1,0,2024-08-11,ES,2,2,0


In [15]:
# We will group the data based on date and planning_shop_cluster and sum the item_cd. 
# This will give us a sum of items forecasted for icd for shipping warehouse 2 for each created_at date and country combination.
icd_filtered_aggregated = icd_filtered.groupby(['date', 'planning_shop_cluster'], as_index=False)['item_cd'].sum()
print(icd_filtered_aggregated.shape)
icd_filtered_aggregated.head()


(2639, 3)


Unnamed: 0,date,planning_shop_cluster,item_cd
0,2024-05-27,AT,0
1,2024-05-27,BE,0
2,2024-05-27,BG,103
3,2024-05-27,CH,0
4,2024-05-27,CY,9


In [16]:
# But in items cross dock order dataframe, the date is created date and not exported date. we need exported date which we can get from item cross dock shift dataframe.
# We can add the day column values with to get the exported date.
# First we will merge the two dataframes.
icd_with_delay_2 = pd.merge(
    icd_filtered_aggregated,
    it_cd_shift_df,
    left_on=['date'],
    right_on=['exported_at_picking_warehouse']
)
print(icd_with_delay_2.shape)
icd_with_delay_2.head()

(110838, 8)


Unnamed: 0,date,planning_shop_cluster,item_cd,exported_at_picking_warehouse,picking_warehouse_id,shipping_warehouse_id,day,share
0,2024-05-27,AT,0,2024-05-27,1,2,0,0.02
1,2024-05-27,AT,0,2024-05-27,1,2,1,0.23
2,2024-05-27,AT,0,2024-05-27,1,2,2,0.61
3,2024-05-27,AT,0,2024-05-27,1,2,3,0.14
4,2024-05-27,AT,0,2024-05-27,1,2,4,0.0


In [17]:
# We will then add the say values with the date values to get the exported_at date values
icd_with_delay_2['exported_at'] = pd.to_datetime(icd_with_delay_2['date']) + pd.to_timedelta(icd_with_delay_2['day'], unit='D')
print(icd_with_delay_2.shape)
icd_with_delay_2.head()

(110838, 9)


Unnamed: 0,date,planning_shop_cluster,item_cd,exported_at_picking_warehouse,picking_warehouse_id,shipping_warehouse_id,day,share,exported_at
0,2024-05-27,AT,0,2024-05-27,1,2,0,0.02,2024-05-27
1,2024-05-27,AT,0,2024-05-27,1,2,1,0.23,2024-05-28
2,2024-05-27,AT,0,2024-05-27,1,2,2,0.61,2024-05-29
3,2024-05-27,AT,0,2024-05-27,1,2,3,0.14,2024-05-30
4,2024-05-27,AT,0,2024-05-27,1,2,4,0.0,2024-05-31


In [18]:
# We will group the resulting dataframe by exported_at and planning shop cluster columns while summing the ite_cd values
icd_aggregated_2 = icd_with_delay_2.groupby(['exported_at', 'planning_shop_cluster'], as_index=False)['item_cd'].sum()
print(icd_aggregated_2.shape)
icd_aggregated_2.head()

(2813, 3)


Unnamed: 0,exported_at,planning_shop_cluster,item_cd
0,2024-05-27,AT,0
1,2024-05-27,BE,0
2,2024-05-27,BG,618
3,2024-05-27,CH,0
4,2024-05-27,CY,54


In [19]:
icd_aggregated_2_pivot = icd_aggregated_2.pivot_table(
    index='planning_shop_cluster',
    columns='exported_at',
    values='item_cd',
    aggfunc='sum',
    fill_value=0
)

icd_aggregated_2_pivot = icd_aggregated_2_pivot.rename_axis('country', axis='index')

icd_aggregated_2_pivot.head()

exported_at,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,...,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
BE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
BG,618,1626,2796,4092,5454,6804,8112,8346,8412,8700,...,11286,10158,8874,7404,6660,5610,4500,3390,2280,1170
CH,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
CY,54,108,180,252,336,390,450,450,456,444,...,510,468,426,354,306,258,204,150,102,54


### Step 3

Using results from point (B) of the and input (4), calculate how many items will be affected by item cross-docking process (items transported from picking warehouse + other items in the same order_id)  
-> results in dataframe  “items_minus”


input (4) : Assumption on how many items per ICD item are blocked from immediate export by the ICD process.

In [20]:
# Let's look at the it_blocked_df dataframe. 
# We can see that for each item the number of items affected is given in the column items_per_cross_docking_cd_item
print(it_blocked_df.shape)
it_blocked_df.head()

(70012, 4)


Unnamed: 0,date,shipping_wh,country_code,items_per_cross_docking_cd_item
0,2023-03-01,1,AT,3.17
1,2023-03-01,2,AT,3.17
2,2023-03-01,3,AT,3.17
3,2023-03-01,4,AT,3.17
4,2023-03-01,1,BE,2.78


In [21]:
# We will merge the dataframe created in the last step with the it_blocked_df

merged_icd_blocked = pd.merge(
    icd_aggregated_2,
    it_blocked_df,
    how='left',
    left_on=['exported_at', 'planning_shop_cluster'],
    right_on=['date', 'country_code']
)
print(merged_icd_blocked.shape)
merged_icd_blocked.head()


(11252, 7)


Unnamed: 0,exported_at,planning_shop_cluster,item_cd,date,shipping_wh,country_code,items_per_cross_docking_cd_item
0,2024-05-27,AT,0,2024-05-27,1,AT,3.17
1,2024-05-27,AT,0,2024-05-27,2,AT,3.17
2,2024-05-27,AT,0,2024-05-27,3,AT,3.17
3,2024-05-27,AT,0,2024-05-27,4,AT,3.17
4,2024-05-27,BE,0,2024-05-27,1,BE,2.78


In [22]:
# Calculate how many items are blocked due to the ICD process
# items_minus = ICD items * blocked items per ICD item
merged_icd_blocked['items_minus'] = merged_icd_blocked['item_cd'] * merged_icd_blocked['items_per_cross_docking_cd_item']
print(merged_icd_blocked.shape)
merged_icd_blocked.head()


(11252, 8)


Unnamed: 0,exported_at,planning_shop_cluster,item_cd,date,shipping_wh,country_code,items_per_cross_docking_cd_item,items_minus
0,2024-05-27,AT,0,2024-05-27,1,AT,3.17,0.0
1,2024-05-27,AT,0,2024-05-27,2,AT,3.17,0.0
2,2024-05-27,AT,0,2024-05-27,3,AT,3.17,0.0
3,2024-05-27,AT,0,2024-05-27,4,AT,3.17,0.0
4,2024-05-27,BE,0,2024-05-27,1,BE,2.78,0.0


In [23]:

# We will aggregate the items that are blocked for each country and exported_at date
items_minus = merged_icd_blocked.groupby(['planning_shop_cluster', 'exported_at'])['items_minus'].sum().reset_index()
print(items_minus.shape)
items_minus.head()


(2813, 3)


Unnamed: 0,planning_shop_cluster,exported_at,items_minus
0,AT,2024-05-27,0.0
1,AT,2024-05-28,0.0
2,AT,2024-05-29,0.0
3,AT,2024-05-30,0.0
4,AT,2024-05-31,0.0


In [24]:
# Pivot the table to get countries as index and exported_at dates as columns
items_minus_pivot = items_minus.pivot_table(
    index='planning_shop_cluster',
    columns='exported_at',
    values='items_minus',
    aggfunc='sum',
    fill_value=0
)

items_minus_pivot = items_minus_pivot.rename_axis('country', axis='index')

items_minus_pivot.head()

exported_at,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,...,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BG,5314.8,13983.6,24045.6,35191.2,46904.4,58514.4,69763.2,71775.6,72343.2,74820.0,...,97059.6,87358.8,76316.4,63674.4,57276.0,48246.0,38700.0,29154.0,19608.0,10062.0
CH,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
CY,259.2,518.4,864.0,1209.6,1612.8,1872.0,2160.0,2160.0,2188.8,2131.2,...,2448.0,2246.4,2044.8,1699.2,1468.8,1238.4,979.2,720.0,489.6,259.2


### Step 4

Using “items minus”, calculate the distribution of exported items for each created_at and aggregate on exported_at 
-> results in dataframe “items_plus”

In [25]:
# Let's look at the item cross dock shift dataframe which we will use in this step. 
# This table has the distribution information. 
it_cd_shift_df.head()

Unnamed: 0,exported_at_picking_warehouse,picking_warehouse_id,shipping_warehouse_id,day,share
0,2024-05-27,1,2,0,0.02
1,2024-05-27,1,2,1,0.23
2,2024-05-27,1,2,2,0.61
3,2024-05-27,1,2,3,0.14
4,2024-05-27,1,2,4,0.0


In [26]:
# Let's merge the datframes items_minus and it_cd_shift_df
merged_items_shift = pd.merge(
    items_minus.reset_index(),
    it_cd_shift_df,
    left_on=['exported_at'],
    right_on=['exported_at_picking_warehouse'],
    how='left'
)
print(merged_items_shift.shape)
merged_items_shift.head()

(118146, 9)


Unnamed: 0,index,planning_shop_cluster,exported_at,items_minus,exported_at_picking_warehouse,picking_warehouse_id,shipping_warehouse_id,day,share
0,0,AT,2024-05-27,0.0,2024-05-27,1,2,0,0.02
1,0,AT,2024-05-27,0.0,2024-05-27,1,2,1,0.23
2,0,AT,2024-05-27,0.0,2024-05-27,1,2,2,0.61
3,0,AT,2024-05-27,0.0,2024-05-27,1,2,3,0.14
4,0,AT,2024-05-27,0.0,2024-05-27,1,2,4,0.0


In [27]:

# Now calculate the distributed items based on share
merged_items_shift['distributed_items'] = merged_items_shift['items_minus'] * merged_items_shift['share']
print(merged_items_shift.shape)
merged_items_shift.head()


(118146, 10)


Unnamed: 0,index,planning_shop_cluster,exported_at,items_minus,exported_at_picking_warehouse,picking_warehouse_id,shipping_warehouse_id,day,share,distributed_items
0,0,AT,2024-05-27,0.0,2024-05-27,1,2,0,0.02,0.0
1,0,AT,2024-05-27,0.0,2024-05-27,1,2,1,0.23,0.0
2,0,AT,2024-05-27,0.0,2024-05-27,1,2,2,0.61,0.0
3,0,AT,2024-05-27,0.0,2024-05-27,1,2,3,0.14,0.0
4,0,AT,2024-05-27,0.0,2024-05-27,1,2,4,0.0,0.0


In [28]:

# We will then aggregate the distributed items by country (planning_shop_cluster) and adjusted exported_at
items_plus = merged_items_shift.groupby(['planning_shop_cluster', 'exported_at'])['distributed_items'].sum().reset_index()
print(items_plus.shape)
items_plus.head()


(2813, 3)


Unnamed: 0,planning_shop_cluster,exported_at,distributed_items
0,AT,2024-05-27,0.0
1,AT,2024-05-28,0.0
2,AT,2024-05-29,0.0
3,AT,2024-05-30,0.0
4,AT,2024-05-31,0.0


In [29]:
# Let's pivot the table to get the distribution of items by country and adjusted_exported_at dates so that it is in the form required for the case study
items_plus_pivot = items_plus.pivot_table(
    index='planning_shop_cluster',
    columns='exported_at',
    values='distributed_items',
    aggfunc='sum',
    fill_value=0
)

# Rename the index to country for clarity
items_plus_pivot = items_plus_pivot.rename_axis('country', axis='index')
print(items_plus_pivot.shape)
items_plus_pivot.head()

(29, 97)


exported_at,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,...,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BG,31888.8,83901.6,144273.6,211147.2,281426.4,351086.4,418579.2,430653.6,434059.2,448920.0,...,582357.6,524152.8,457898.4,382046.4,343656.0,289476.0,232200.0,174924.0,117648.0,60372.0
CH,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
CY,1555.2,3110.4,5184.0,7257.6,9676.8,11232.0,12960.0,12960.0,13132.8,12787.2,...,14688.0,13478.4,12268.8,10195.2,8812.8,7430.4,5875.2,4320.0,2937.6,1555.2


### Step 5

Calculate final output: B2C + SCD - items_minus + items_plus

In [30]:
# We will combine the results we have got unitl now.
# Let's begin by combining b2c_aggregated and scd_aggregated
combined_b2c_scd = pd.merge(
    b2c_aggregated,
    scd_aggregated,
    on=['exported_at', 'planning_shop_cluster'],
    how='outer'
).fillna(0)

# Adding the values of b2c_items and scd_items together to get the total items forecast
combined_b2c_scd['total_forecast_items'] = combined_b2c_scd['b2c_items'] + combined_b2c_scd['scd_items']

# Dropping individual columns, they are not required anymore
combined_b2c_scd.drop(columns=['b2c_items', 'scd_items'], inplace=True)

# Renaming the new column for clarity
combined_b2c_scd.rename(columns={'total_forecast_items': 'sales_forecast_items'}, inplace=True)

print(combined_b2c_scd.shape)
combined_b2c_scd.head()

(2639, 3)


Unnamed: 0,exported_at,planning_shop_cluster,sales_forecast_items
0,2024-05-27,AT,274
1,2024-05-27,BE,47
2,2024-05-27,BG,754
3,2024-05-27,CH,392
4,2024-05-27,CY,34


In [31]:
# Now begin by merging the result of last combination dataframe with items_minus.
final_merged = pd.merge(
    combined_b2c_scd,
    items_minus,
    on=['planning_shop_cluster', 'exported_at'],
    how='left'
).fillna(0)

# Subtracting items_minus from sales_forecast_items
final_merged['sales_forecast_items'] = final_merged['sales_forecast_items'] - final_merged['items_minus']

# Dropping the items_minus column
final_merged.drop(columns=['items_minus'], inplace=True)

print(final_merged.shape)
final_merged.head()

(2639, 3)


Unnamed: 0,exported_at,planning_shop_cluster,sales_forecast_items
0,2024-05-27,AT,274.0
1,2024-05-27,BE,47.0
2,2024-05-27,BG,-4560.8
3,2024-05-27,CH,392.0
4,2024-05-27,CY,-225.2


In [32]:
# In the last step add the items_plus values to the dataframe we have got until now
final_output = pd.merge(
    final_merged,
    items_plus,
    left_on=['planning_shop_cluster', 'exported_at'],
    right_on=['planning_shop_cluster', 'exported_at'],
    how='left'
).fillna(0)

# Adding distributed_items to sales_forecast_items
final_output['sales_forecast_items'] = final_output['sales_forecast_items'] + final_output['distributed_items']

# Drop irrelevant columns
final_output.drop(columns=['distributed_items'], inplace=True)

print(final_output.shape)
final_output.head()

(2639, 3)


Unnamed: 0,exported_at,planning_shop_cluster,sales_forecast_items
0,2024-05-27,AT,274.0
1,2024-05-27,BE,47.0
2,2024-05-27,BG,27328.0
3,2024-05-27,CH,392.0
4,2024-05-27,CY,1330.0


In [33]:
final_output = final_output.pivot(
    index='planning_shop_cluster', 
    columns='exported_at', 
    values='sales_forecast_items'
)

# Fill missing values. Replace NaN with 0
final_output = final_output.fillna(0)

# Print or save the reshaped DataFrame
final_output

exported_at,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,...,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25
planning_shop_cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT,274.0,296.0,455.0,536.0,472.0,479.0,602.0,392.0,419.0,449.0,...,471.0,487.0,600.0,264.0,297.0,304.0,396.0,271.0,291.0,350.0
BE,47.0,76.0,89.0,83.0,83.0,84.0,91.0,75.0,82.0,71.0,...,110.0,90.0,145.0,43.0,45.0,54.0,54.0,50.0,50.0,63.0
BG,27328.0,70623.0,121074.0,176901.0,235518.0,293536.0,349744.0,359486.0,362489.0,375220.0,...,427940.0,482257.0,544913.0,539847.0,539865.0,539402.0,486254.0,437750.0,382538.0,319381.0
CH,392.0,573.0,677.0,746.0,644.0,728.0,954.0,595.0,628.0,684.0,...,415.0,435.0,551.0,258.0,291.0,309.0,297.0,284.0,303.0,383.0
CY,1330.0,2633.0,4369.0,6099.0,8120.0,9401.0,10842.0,10841.0,10987.0,10702.0,...,11159.0,12169.0,13921.0,13853.0,13857.0,13860.0,12277.0,11264.0,10257.0,8533.0
CZ,284.0,161.0,185.0,340.0,355.0,339.0,445.0,255.0,275.0,293.0,...,430.0,420.0,581.0,152.0,175.0,189.0,175.0,166.0,171.0,208.0
DE,1788.0,1590.0,2391.0,3347.0,2846.0,2988.0,3809.0,2313.0,2407.0,2444.0,...,2414.0,2585.0,3374.0,1409.0,1516.0,1655.0,1621.0,1516.0,1691.0,2144.0
DK,37.0,51.0,53.0,50.0,50.0,63.0,74.0,52.0,54.0,50.0,...,54.0,52.0,70.0,28.0,33.0,34.0,39.0,32.0,36.0,40.0
EE,22.0,20.0,34.0,36.0,36.0,40.0,43.0,42.0,41.0,40.0,...,36.0,36.0,38.0,38.0,36.0,37.0,35.0,50.0,49.0,56.0
ES,7.0,8.0,9.0,8.0,9.0,8.0,8.0,8.0,8.0,8.0,...,12.0,12.0,14.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0


This dataframe gives the number of items that will be exported to warehouse 2 on a daily basis for each country based on the sales forecast.