# Part 1 -  Simulating Stock

This notebook details the process of creating a simulated dataset.

This simulation models a distribution warehouse for a tech e-commerce website that acts as an intermediary between a large supplier and the end customer. 

The warehouse:

- Purchases shipments in bulk from an upstream supplier.
- Stores and manages the stock.
- Fulfills customer orders through daily outbound deliveries.
- Aims to balance utilization and delivery efficiency while keeping costs low.

The simulation explores different stock replenishment strategies (scheduled vs. Just-in-Time) and tracks key perfomance metrics such as stock levels, unmet demand and shipment volumes. This reflects a common B2C logistics model where warehouses play critical roles in order fulfilment. 

Import the required modules:

In [1]:
# pandas dataframes:
import pandas as pd
# statistical distributions:
import numpy as np
from scipy import stats
import math

# keep the random variables constant
np.random.seed(16)

## Basic set-up

First, we need to set up the basic parametres of the simulation, including:

- The period over which the simulation runs
- The products stocked in the warehouse
- The number of warehouses in the network


### Time period

The period will start on the 1st January 2024 (which is conveniently a Monday), and run for one quarter or 90 days.

In [2]:
start_date = pd.to_datetime("2024-01-01")
dates = pd.date_range(start_date, periods=90)
df_dates = pd.DataFrame({"date": dates})

### Products

The products are items that will be sold and despatched from the warehouse. For the simulation, we will use product IDs. More information can be found in the products dimension table.

In [3]:
product_ids = [f"P{str(i).zfill(3)}" for i in range(1, 16)]
df_products = pd.DataFrame({"product_id": product_ids})

Cross join the two dataframes to get every combination of products and date, representing the inventory flow for each product on each day:

In [4]:
df = df_dates.merge(df_products, how="cross")

### Warehouses

For simplicity, we will have one warehouse. This can be changed in the future to incorporate more warehouses:

In [5]:
df["warehouse_id"] = "WH1"

### Additional set-up

Initialize inbound units to zero:

In [6]:
df["inbound_units"] = 0

Classify each product by how quickly it sells. The products that sell quickly are assigned lower product numbers:

In [7]:
fast_movers = product_ids[0:5]
medium_movers = product_ids[5:10]
slow_movers = product_ids[10:]

This method of classification resembles a simplified **ABC analysis** framework.

In such a framework, goods that are expected to provide the majority of the revenue, the fast movers, are assigned to **class A**. These goods require the most frequent restocking. Goods with a moderate demand and contribute a moderate amount of revenue are classified as **class B**. In our simulation, these are the medium movers. Finally, **class C** goods are either low-value or infrequently ordered, these are represented by the slow movers.

## Inventory flow

A key part of the simulation is **inventory flow** where products enter and leave the warehouse. For the purposes of the simulation, we will assume that each order is for one and only one item.

Inventory flow can be modeled by considering:

- Demand (customer orders for products).
- Outbound items (how many products actually leave the warehouse).
- Inbound items (how many items are delivered to the warehouse from the supplier).
- Inventory level (how many items are left in the warehouse after inbound and outbound items are considered)

### Demand


Simulate orders for each product. Every day, there will be a number of items ordered for each product.

We can use different probability distributions to model different behaviours of base demand.

- **Fast movers** (Class A) - Normal distribution (high and consistent demand).
- **Medium movers** (Class B) - Poisson distribution (steady but lower demand). 
- **Slow movers** (Class C) - Negative binomial distribution (low and erratic demand).

In [8]:
def assign_group(row):
    product_id = row["product_id"]
    if product_id in fast_movers:
        return "A"
    elif product_id in medium_movers:
        return "B"
    else:
        return "C"

In [9]:
df["class"] = df.apply(assign_group, axis=1)

For more variable demand, we can also modify the underlying base demand by adding noise with an additional normal distribution. 

Additionally, we can also add a chance of random demand spikes and dips, and demand that changes over the course of a month. This unpredictability reflects real-world scenarios and necessitates different approaches to find the optimal solution.

First generate demand multiplier from the <code>date</code> column:

In [10]:
def date_based_demand(date):
    day = date.day
    if day <= 7:
    # higher demand at the start of the month
        return np.random.normal(1.2, 0.05)
    elif day >= 22:
    # lower demand at the end of the month
        return np.random.normal(0.7, 0.05)
    else:
        return 1

In [11]:
distributions = pd.DataFrame()

In [12]:
distributions["normal"] = np.random.normal(loc=25, scale=5, size=1000)

Use the date function in the larger <code>simulate_demand</code> function:

In [13]:
def simulate_demand(row):
    product_id = row["product_id"]
    date_multiplier = date_based_demand(row["date"])
    

    # There is a chance to have a larger or smaller demand than normal
    event_chance = np.random.rand()

    if event_chance > 0.9:
        demand_multiplier = 1.2
    elif event_chance < 0.1:
        demand_multiplier = 0.7
    else:
        demand_multiplier = 1

    # simulate the demand using relevant distributions 
    if product_id in fast_movers:
        base_demand = np.random.normal(loc=20, scale=8)
        fluct_scale = 5
    elif product_id in medium_movers:
        base_demand = np.random.poisson(lam=9)
        fluct_scale = 3
    else:
        base_demand = stats.nbinom.rvs(n=2, p=0.3)
        fluct_scale = 1

    # simulate random flucuations each day
    fluctuation = np.random.normal(loc=0, scale=fluct_scale)
    demand = int(max(0, base_demand * demand_multiplier * date_multiplier + fluctuation))

    return demand


In [14]:
df["demand"] = df.apply(simulate_demand, axis=1)

Initialize the <code>actual_outbound</code> column to be equal to the demand.

In [15]:
df["actual_outbound"] = df["demand"]

### Inbound stock
Next, we need to model deliveries from the supplier to replenish the warehouse inventory.

There are different stock replenishment strategies that vary on how often and how much stock is replinished in each cycle.

We will be comparing two replenishment strategies:

- Weekly scheduled deliveries.
- Just-in-Time (JIT) deliveries.

We will also consider **lead time**, the time it takes for an order to arrive at the warehouse after it has been ordered.

#### Weekly deliveries

**Weekly deliveries** are consolidated orders delivered on a weekly schedule. The amount ordered each week depends on the seven-day moving average on the order date, rounded up to the nearest 10 to simulate minimum orders. In the simulation, these orders will be placed every Friday and arrive after the lead time has passed (3 days by default).

Weekly deliveries can combine many deliveries into a single larger delivery saving money and reducing the carbon footprint. However, they are less responsive to changes in demand and can lead to excess or insufficient stock incurring high storage costs or lost sales.

#### Just-in-Time (JIT) deliveries

**Just-in-Time delveries** are deliveries that are scheduled when the inventory of a product is running low, so new goods arrive when as they are needed. This strategy can reduce holding costs by keeping the inventory lean, but requires a reliable and responsive supplier with a short lead time to avoid stock outs.

### Unmet demand

In the simulation, if there are not enough products in the inventory to meet the demand for that day, these orders are not fulfiled.

In [16]:
def simulate_inventory(group, policy="weekly", lead_time=3, restock_point_mod=0.35):

    group = group.copy()

    # Establish starting stock (buffer stock) based on ABC classification
    item_class = group["class"].iloc[0]
    starting_stock_dict = {"A": 250, "B": 80, "C": 40}
    inventory = starting_stock_dict.get(item_class)
    reorder_point = starting_stock_dict.get(item_class) * restock_point_mod

    # Initialize empty lists
    inventory_list = []
    actual_outbound_list = []
    inbound_list = []
    pending_orders = [] # contains tuples with date and quantity
    stockout_list = []

    # Calculate the seven-day rolling average for demand
    group["seven_day_average"] = group["demand"].shift(1).rolling(window=7, min_periods=1).mean().round()

    # iterate through the group
    for index, row in group.iterrows():
        
        date = row["date"]
        demand = row["demand"]
        rolling_average = row["seven_day_average"]

        # 1. apply inbound orders in the inbound orders list if date matches
        arrivals_today = [q for (d, q) in pending_orders if d == date]
        inbound_today = sum(arrivals_today)
        inventory += inbound_today
        pending_orders = [(d, q) for (d, q) in pending_orders if d != date]

        # 2. ship out orders equal to demand
        actual_outbound = min(demand, inventory)
        inventory -= actual_outbound

        # prevent multiple orders when an order is inbound (checks if an order is scheduled at a time after current date)
        has_pending_order = any(d > date for (d, _) in pending_orders)

        # 3. replenishment logic
        if  policy == "JIT":
        # Every day check if stock is below 
            if inventory < reorder_point and not has_pending_order:
                delivery_date = date + pd.Timedelta(days=lead_time)
                pending_orders.append((delivery_date, starting_stock_dict.get(item_class)))

        elif policy == "weekly" and date.dayofweek == 4:
            # uses the rolling average for the week
            if rolling_average and not pd.isna(rolling_average):
                delivery_quantity = 7 * int(rolling_average)
                # Simulate items being packed in groups of 10
                packet_size = 10
                total_packets = math.ceil(delivery_quantity / packet_size)
                total_delivery = total_packets * packet_size
                delivery_date = date + pd.Timedelta(days=lead_time)
                # avoid over-stocking
                if inventory < (1.5 * total_delivery):
                    pending_orders.append((delivery_date, total_delivery))

        inventory_list.append(inventory)
        actual_outbound_list.append(actual_outbound)
        inbound_list.append(inbound_today)
        stockout_list.append(actual_outbound < demand)
            
                
    group["inventory_level"] = inventory_list
    group["actual_outbound"] = actual_outbound_list
    group["inbound_units"] = inbound_list
    group["stockout_flag"] = stockout_list
    group["unmet_demand"] = group["demand"] - group["actual_outbound"]

    return group

## Simulation

With the demand already calculated, we can discover how well each replenishment strategy matches the predefined demand. We are interested in how well the warehouse can keep up with demand, as well as keeping a lean inventory to minimize storage costs:

In [17]:
weekly_df = df.groupby(['product_id', 'warehouse_id'], group_keys=False).apply(simulate_inventory, policy="weekly")
jit_df = df.groupby(['product_id', 'warehouse_id'], group_keys=False).apply(simulate_inventory, policy="JIT")

Aggregate the products by class, and examine the outbound deliveries and the inventory level:

In [18]:
weekly_df.groupby("class").agg({"demand": "sum", "actual_outbound": "sum", "inventory_level": ["max", "mean"], "unmet_demand": "sum"}).round()

Unnamed: 0_level_0,demand,actual_outbound,inventory_level,inventory_level,unmet_demand
Unnamed: 0_level_1,sum,sum,max,mean,sum
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,8212,8039,297,126.0,173
B,3680,3606,122,54.0,74
C,1843,1816,87,37.0,27


Total orders (weekly):

In [19]:
weekly_df["inbound_units"].ne(0).sum()

152

In [20]:
jit_df.groupby("class").agg({"demand": "sum", "actual_outbound": "sum",  "inventory_level": ["max", "mean"], "unmet_demand": "sum"}).round()

Unnamed: 0_level_0,demand,actual_outbound,inventory_level,inventory_level,unmet_demand
Unnamed: 0_level_1,sum,sum,max,mean,sum
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,8212,8212,324,156.0,0
B,3680,3625,92,44.0,55
C,1843,1722,50,23.0,121


Total orders (JIT):

In [21]:
jit_df["inbound_units"].ne(0).sum()

116

#### Missed orders

In terms of missed orders, Just-in-Time Deliveries clearly outperform Weekly Deliveries for Class A goods which tend to have a high and consistent demand. Frequent targeted restocking ensures that inventory levels are adequate for demand, reducing the number of stockouts.

In Class B, JIT performs better albeit by a smaller margin. This shows that the benefits of JIT depend on the predicatability and consistency of the demand.

However, it is the opposite for Class C goods. Due to their more sporadic demand, Class C goods benefit more from weekly top-ups. This allows a small amount of buffer stock to absorb any sudden, unexpected demand. JIT deliveries may struggle to keep up without the buffer stock.

These patterns show that no one replenishment strategy is the best, and the right strategy should be picked depending on the demand profile of the product. This supports the adoptation of a **hybrid strategy** where the replenishment method is specific to the product class. In this case, Class A best fits JIT and Class C weekly deliveries. As an intermediary class, both strategies perform similarily for Class B, however JIT has a small advantage. The correct strategy for this class may be on a product by product basis.

#### Inventory level

One of the clear advantages of JIT deliveries is the ability to maintain a leaner inventory, thereby reducing storage costs.

Inventory utilization tends to lower under JIT compared to weekly deliveries except for Class A products, where both strategies are comparable due to high levels of turnover and replenishment. However, this comes at the expense of missed orders, particularly in Class C. In these cases, the cost of lost revenue likely outweighs the storage savings, especially when demand is irregular. This highlights the balance needed between minimizing held inventory and ensuring product availability, and supports the implementation of a hybrid delivery strategy to optimize overall efficiency.

#### Number of orders

Unexpectedly, there are more orders under a weekly strategy than under a JIT strategy. This may seem counter-inuitive, but the total number orders are not necessarily the same as the total number of deliveries. JIT orders are event driven and are triggered when the stock of a given item dips below a certain reorder point. On the other hand, weekly deliveries are time driven and occur on a set schedule. This means we may see numerous smaller deliveries spread out throughout the week under a JIT system, and larger consolidated deliveries happening at a set schedule under a weekly based one. We will examine this more closely in the warehouse analytics notebook.

### Lead time

It is important to consider lead time when choosing a replenishment strategy.

#### Short lead time

The effectiveness of JIT deliveries is highly dependent on having a short lead time. A responsive supply chain is essential for ensuring that shipments arrive just before inventory is depleted, effectively minimizing both holding costs and the risk of stock outs.

Short lead times also benefit any scheduled deliveries that rely on forecasting (such as the seven-day moving average as in the simulation). In this case, a shorter lead time reduces the lag between when the forecast is made and when the shipment actually arrives, ensuring that replenishment is aligned with the current demand. 

We will investigate how the performance of both strategies is improved by reducing the lead time from 3 days to a single day:

In [22]:
short_weekly_df = df.groupby(['product_id', 'warehouse_id'], group_keys=False).apply(simulate_inventory, policy="weekly", lead_time=1)
short_jit_df = df.groupby(['product_id', 'warehouse_id'], group_keys=False).apply(simulate_inventory, policy="JIT", lead_time=1)

In [23]:
short_weekly_df.groupby("class").agg({"demand": "sum", "actual_outbound": "sum", "inventory_level": ["max", "mean"], "unmet_demand": "sum"}).round()

Unnamed: 0_level_0,demand,actual_outbound,inventory_level,inventory_level,unmet_demand
Unnamed: 0_level_1,sum,sum,max,mean,sum
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,8212,8170,311,149.0,42
B,3680,3660,128,68.0,20
C,1843,1838,87,45.0,5


In [24]:
short_jit_df.groupby("class").agg({"demand": "sum", "actual_outbound": "sum", "inventory_level": ["max", "mean"], "unmet_demand": "sum"}).round()

Unnamed: 0_level_0,demand,actual_outbound,inventory_level,inventory_level,unmet_demand
Unnamed: 0_level_1,sum,sum,max,mean,sum
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,8212,8212,331,191.0,0
B,3680,3680,105,59.0,0
C,1843,1834,53,28.0,9


Reducing lead time significantly improves the efficiency of both replenishment strategies. With a shorter lead time, stock outs are reduced and demand is met more consistently. Missed orders are reduced by over 81%. With the weekly schedule strategy, this comes at the expense of carrying slightly more inventory, but this is a good trade-off in exchange for better item availability.

The JIT strategy also benefits considerably from a shorter lead time with a 95% decrease in missed orders. It maintains a lean inventory while fully meeting the demand for Class A and Class B goods. Class C items also show improvement, but interestingly, the scheduled delivery still performs better. This highlights the robustness of time-based stock replacement when dealing with goods that have sporadic or low demand patterns.

More generally, this shows the great benefits of having a responsive supply chain with a short lead time. Efficiency can be greatly increased by reducing the response time.

#### Long lead time

Conversely, an unresponsive supply chain with a long lead time significantly reduces the benefits of JIT deliveries. Stockouts are more likely to occur while the shipment is still in transit, especialy when demand is volatile or rising.

This is also true for scheduled deliveries that make use of forecasting. The lag between when the forecast is made and the arrival of goods is larger, making the forecast less reliable. As a result, the quantity of goods may no longer reflect the actual demand by the time it arrives leading to excess stock or unmet demand.

We can investigate the results of increasing the lead time from 3 days to 5 days:

In [25]:
long_weekly_df = df.groupby(['product_id', 'warehouse_id'], group_keys=False).apply(simulate_inventory, policy="weekly", lead_time=5)
long_jit_df = df.groupby(['product_id', 'warehouse_id'], group_keys=False).apply(simulate_inventory, policy="JIT", lead_time=5)

In [26]:
long_weekly_df.groupby("class").agg({"demand": "sum", "actual_outbound": "sum", "inventory_level": ["max", "mean"], "unmet_demand": "sum"}).round()

Unnamed: 0_level_0,demand,actual_outbound,inventory_level,inventory_level,unmet_demand
Unnamed: 0_level_1,sum,sum,max,mean,sum
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,8212,7796,273,110.0,416
B,3680,3459,138,47.0,221
C,1843,1718,92,32.0,125


In [27]:
long_jit_df.groupby("class").agg({"demand": "sum", "actual_outbound": "sum", "inventory_level": ["max", "mean"], "unmet_demand": "sum"}).round()

Unnamed: 0_level_0,demand,actual_outbound,inventory_level,inventory_level,unmet_demand
Unnamed: 0_level_1,sum,sum,max,mean,sum
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,8212,7919,263,124.0,293
B,3680,3285,84,34.0,395
C,1843,1550,49,18.0,293


Increasing the lead time results in worse performance under both strategies, but as expected, it has a bigger impact on JIT deliveries. With a longer lead time, replenishment arrives to late, resulting in multiple days of missed orders undermining the core efficiencies of the strategy.

#### Restock point

The problem of a long lead time can be addressed by raising the reorder point or by ordering safety stock to absorb unexpected demand.

We can repeat the long lead time simulation, but this time we order replenishment stock at 50% inventory level rather than at 35%:  

In [28]:
high_long_jit_df = df.groupby(['product_id', 'warehouse_id'], group_keys=False).apply(simulate_inventory, policy="JIT", lead_time=5, restock_point_mod=0.5)

In [29]:
high_long_jit_df.groupby("class").agg({"demand": "sum", "actual_outbound": "sum", "inventory_level": ["max", "mean"], "unmet_demand": "sum"}).round()

Unnamed: 0_level_0,demand,actual_outbound,inventory_level,inventory_level,unmet_demand
Unnamed: 0_level_1,sum,sum,max,mean,sum
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,8212,8177,335,157.0,35
B,3680,3525,98,41.0,155
C,1843,1646,51,21.0,197


When the reorder point is raised, demand is met more consistently, but we at the expense of maintaining a lean inventory. This effectively transforms the JIT model into a hybrid model, often refered to as **JIT with safety stock**. 

The same principle also applies to weekly scheduled deliveries. By ordering excess stock each week, businesses can mitigate the risk of stock outs caused by long lead times by stocking more inventory. These adaptations show that while a long lead time introduces risk, it can be reduced by introducing buffering strategies, such as raising the reordering point or maintaining safety stock. This trades a lean inventory for more consistently meeting the demand. 

## Final product-level dataset

We will return to the original two dataframes with the default lead time of 3 days.

First, merge the two dataframes to make a combined dataframe containing all of the information:


In [30]:
common_columns = ["date", "product_id", "warehouse_id", "class", "demand", "seven_day_average"]
combined_df = pd.merge(weekly_df, jit_df, how="left", on=common_columns, suffixes=("_weekly", "_jit"))

Keep the relevant columns, <code>class</code> is already recorded in the product dimension table, and <code>seven_day_average</code> is a calculated column that was only used for forecasting in the weekly scheduled orders table.

Reorder the columns in a logical way:

In [31]:
ordered_columns = ["date",
                   "product_id",
                   "warehouse_id",
                   "demand",
                   "inbound_units_weekly",
                   "actual_outbound_weekly",
                   "inventory_level_weekly",
                   "unmet_demand_weekly",
                   "stockout_flag_weekly",
                   "inbound_units_jit",
                   "actual_outbound_jit",
                   "inventory_level_jit",
                   "unmet_demand_jit",
                   "stockout_flag_jit",]

In [32]:
combined_df = combined_df[ordered_columns]

By aggregating the combined product-level data by day, we can generate the basis for a dataset that reflects the daily operational performance of the warehouse, including number of inbound and outbound shipments.

Save the combined dataframe to use in the next part of the project:

In [33]:
combined_df.to_csv("../data/warehouse_products.csv", index=False)

## Navigation

[Part 2 - Warehouse Analytics](02_warehouse_analytics.ipynb)