VN2 RL/Simulation Tuning Notebook — Baseline, Segment Tuning, Feature Policy

What this notebook does
- Wraps an organizer-like simulator to evaluate ordering policies end‑to‑end (no backorders, 2‑week lead, weekly review).
- Provides three actionable paths to generate submissions:
  - Baseline parametric base‑stock tuned by a small grid → `orders_round1_rl.csv`.
  - Segment‑wise (Department) Optuna tuning of k and service level → `orders_round1_rl_segments.csv`.
  - Feature‑driven direct policy search (learned k per SKU from recent sales/low‑inventory features) → `orders_round1_rl_policy.csv`.

Key assumptions (aligned with the competition)
- Weekly transition: StartInv = EndInv_prev + InTransit_W+1_prev; new W+1 ← prev W+2; new W+2 ← Orders.
- Costs: holding on EndInv (0.2/unit/week), shortage on Missed Sales (1.0/unit).
- Protection period: P = 3 (lead 2 + review 1). Orders are for the end of the current week; arrivals follow the 2‑week lead.

Inputs required (from Week 0)
- Sales: `Week 0 - 2024-04-08 - Sales.csv`
- Initial State: `Week 0 - 2024-04-08 - Initial State.csv`
- Master (for Department segments): `Week 0 - Master.csv`
- Submission Template: `Week 0 - Submission Template.csv`

How to run (recommended sequence)
1) Cells 1–4: setup, simulator, baseline stats/policy.
2) Cell 7: evaluate grid (k, service_level). Cell 8: emit baseline submission.
3) Cells 9–12: run Optuna for segment‑wise tuning and emit the segment submission.
4) Cells 13–16: run Optuna for feature‑driven policy and emit the policy submission.

Runtime and tuning tips
- Optuna trials are set conservatively for speed; increase `n_trials` for better solutions.
- You can change the segmentation key (e.g., ProductGroup) by swapping the column in `master`.
- Feature policy uses: k = 0.8 + 0.4·sigmoid(w0 + w1·recent_sales + w2·low_inventory). Extend with richer features if desired.

Outputs
- `submissions/orders_round1_rl.csv` — baseline grid‑tuned.
- `submissions/orders_round1_rl_segments.csv` — Department‑tuned.
- `submissions/orders_round1_rl_policy.csv` — feature‑driven policy.



In [1]:
# Purpose: Enable `import vn2inventory` in notebooks by adding the project root to sys.path
# Inputs: implicit (project layout); no external state mutated beyond sys.path
# Outputs: print confirmation; subsequent imports work within this kernel
import sys
from pathlib import Path as _P
PROJECT_ROOT = _P("..").resolve()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

try:
    import vn2inventory  # noqa: F401
    print("Imported vn2inventory from:", PROJECT_ROOT)
except Exception as e:
    print("Import error:", e)



Imported vn2inventory from: /Users/senoni/noni/vn2inventory


In [2]:
# Purpose: Load data and instantiate the simulator (organizer-like transition)
# Inputs: Week 0 CSVs from ../data
# Outputs: `sim` environment ready to step through the full horizon
import pandas as pd
import numpy as np
from pathlib import Path
from vn2inventory.sim_env import InventorySim, Costs
from vn2inventory.policy import _inv_normal_cdf

DATA_DIR = Path("../data").resolve()
sales_wide = pd.read_csv(DATA_DIR / "Week 0 - 2024-04-08 - Sales.csv")
initial_state = pd.read_csv(DATA_DIR / "Week 0 - 2024-04-08 - Initial State.csv")

INDEX = ["Store","Product"]

demand_dates = [c for c in sales_wide.columns if c not in INDEX]

sim = InventorySim(
    sales_wide=sales_wide,
    initial_state=initial_state,
    costs=Costs(holding_per_unit=0.2, shortage_per_unit=1.0),
    index_cols=("Store","Product"),
    demand_dates=demand_dates,
)
len(demand_dates)


157

Parametric base-stock policy

- Parameters: k (scale), service_level (z via inverse normal), P (protection period)
- Order rule: order = max(0, k*(μ*P + z*σ*sqrt(P)) − inv_position)
- Here μ, σ can come from historical stats per SKU as a baseline.



In [3]:
# Purpose: Create baseline μ and σ per SKU; define a parametric base-stock order policy
# Inputs: sales_wide (weekly columns), demand_dates; hyperparams P, k, service_level
# Outputs: `stats` table (mu, sigma) and `order_policy` callable that returns integer orders
# Notes: sigma fallback uses sqrt(mean) when std is undefined
#        order up to k*(mu*P + z*sigma*sqrt(P)) minus current inventory position
#        z is computed from the target service level
# Historical stats as a quick proxy for μ, σ
stats = (
    sales_wide.set_index(INDEX)[demand_dates]
    .stack()
    .rename("Sales")
    .groupby(level=INDEX)
    .agg(["mean","std"])  # sample std
    .rename(columns={"mean":"mu","std":"sigma"})
)
stats["sigma"] = stats["sigma"].fillna(stats["mu"].clip(lower=0).pow(0.5))

P = 3

def order_policy(inv_position: pd.Series, mu: pd.Series, sigma: pd.Series, k: float, service_level: float) -> pd.Series:
    z = _inv_normal_cdf(service_level)
    target = k * (mu * P + z * sigma * np.sqrt(P))
    raw = (target - inv_position).clip(lower=0.0)
    return np.rint(raw).astype(int)



Evaluate a (k, service_level) policy over the full horizon

- Each step:
  - Compute inventory position from sim state
  - Issue orders using the parametric rule
  - Step the simulator and accumulate costs



Build submission from best policy

- Select (k, service_level) with lowest total cost from the sweep above
- Compute inventory position from Initial State (`on_hand = End Inventory`, `on_order = W+1 + W+2`)
- Generate `submissions/orders_round1_rl.csv` in platform index order
- Note: Make sure Cell 7 (grid) ran to define `results`; otherwise this cell will compute its own grid first.



In [4]:
def evaluate_policy(k: float, service_level: float) -> float:
    # Reset sim state from initial file each run for reproducibility
    global sim
    sim.reset_to(initial_state)
    mu = stats["mu"]
    sigma = stats["sigma"]
    total_cost = 0.0
    for _ in demand_dates:
        inv_pos = sim.inventory_position()
        orders = order_policy(inv_pos, mu, sigma, k=k, service_level=service_level)
        info = sim.step(orders)
        total_cost += info["round_cost"]
    return total_cost

# Quick sweep
grid = [(k, sl) for k in [0.9, 1.0, 1.1] for sl in [0.90, 0.95, 0.975]]
results = []
for k, sl in grid:
    cost = evaluate_policy(k, sl)
    results.append({"k": k, "service_level": sl, "total_cost": cost})

pd.DataFrame(results).sort_values("total_cost").head()


Unnamed: 0,k,service_level,total_cost
0,0.9,0.9,145607.0
3,1.0,0.9,160040.2
1,0.9,0.95,160899.2
2,0.9,0.975,176821.0
6,1.1,0.9,177511.2


In [5]:
# Pick best params from the grid
best = pd.DataFrame(results).sort_values("total_cost").iloc[0]
best_k = float(best["k"]); best_sl = float(best["service_level"]) 

# Build one-step orders for the next week (submission format)
from pathlib import Path as _P
INDEX = ["Store","Product"]

# Inventory position at the end of Week 0
sim.reset_to(initial_state)
inv_pos_now = sim.inventory_position()

# Policy orders for immediate submission (for Week 1)
mu = stats["mu"]; sigma = stats["sigma"]
orders_now = order_policy(inv_pos_now, mu, sigma, k=best_k, service_level=best_sl)

# Emit submission in the platform order
index_df = pd.read_csv(DATA_DIR / "Week 0 - Submission Template.csv")[INDEX].set_index(INDEX)
submission = index_df.copy(); submission["order_qty"] = orders_now.reindex(index_df.index).fillna(0).astype(int).values
SUB = _P("../submissions").resolve(); SUB.mkdir(exist_ok=True)
out_csv = SUB / "orders_round1_rl.csv"
submission.to_csv(out_csv)
out_csv


PosixPath('/Users/senoni/noni/vn2inventory/submissions/orders_round1_rl.csv')

Optuna: segment-wise parameter tuning (example: Department)

- Objective: minimize total cost over the horizon
- Parameters per segment s: k_s ∈ [0.8,1.2], service_level_s ∈ [0.88,0.99]
- Implementation: at each step compute orders with segment’s parameters



In [6]:
import optuna

# Load hierarchy for segments
master = pd.read_csv(DATA_DIR / "Week 0 - Master.csv")[["Store","Product","Department"]]
master = master.set_index(["Store","Product"])  
seg = master["Department"]

segments = seg.unique().tolist()

P = 3
mu = stats["mu"]; sigma = stats["sigma"]

# Prejoin for speed
mu_s = mu.to_frame("mu").join(seg)
sigma_s = sigma.to_frame("sigma").join(seg)


def evaluate_params_for_segments(params: dict) -> float:
    sim.reset_to(initial_state)
    total_cost = 0.0
    for _ in demand_dates:
        inv_pos = sim.inventory_position()
        orders_parts = []
        for d in segments:
            k = params[f"k_{d}"]
            sl = params[f"sl_{d}"]
            idx = seg[seg == d].index
            o = order_policy(inv_pos.reindex(idx), mu_s.loc[idx, "mu"], sigma_s.loc[idx, "sigma"], k=k, service_level=sl)
            orders_parts.append(o)
        orders = pd.concat(orders_parts).reindex(inv_pos.index).fillna(0).astype(int)
        info = sim.step(orders)
        total_cost += info["round_cost"]
    return total_cost


def objective(trial: optuna.Trial) -> float:
    params = {}
    for d in segments:
        params[f"k_{d}"] = trial.suggest_float(f"k_{d}", 0.8, 1.2)
        params[f"sl_{d}"] = trial.suggest_float(f"sl_{d}", 0.88, 0.99)
    return evaluate_params_for_segments(params)

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=min(30, 5 * len(segments)), show_progress_bar=False)
seg_params = study.best_params
seg_params


  from .autonotebook import tqdm as notebook_tqdm
[I 2025-09-30 15:40:11,408] A new study created in memory with name: no-name-3c13bfed-4906-4079-a9ed-76c3692f973b
[I 2025-09-30 15:40:13,856] Trial 0 finished with value: 175565.2 and parameters: {'k_30': 0.9195136770749913, 'sl_30': 0.8890410010269614, 'k_44': 0.9642445459549629, 'sl_44': 0.9333339655859303, 'k_24': 0.8464749503003948, 'sl_24': 0.9745632353748427, 'k_4': 1.0982814181115113, 'sl_4': 0.9062506940299959, 'k_20': 0.8388271604951789, 'sl_20': 0.9423026914458036, 'k_26': 0.8259933511927593, 'sl_26': 0.9288554303739056, 'k_14': 1.1533381590827627, 'sl_14': 0.9172723027245693, 'k_2': 1.0087444440369866, 'sl_2': 0.9454418628901903, 'k_47': 0.904123198458812, 'sl_47': 0.9466750984527172, 'k_53': 0.8417939399335117, 'sl_53': 0.9013391651904497, 'k_28': 1.0037519668316752, 'sl_28': 0.9213810962863542, 'k_40': 1.0421382448459597, 'sl_40': 0.9844063528377704, 'k_33': 0.8146224884617872, 'sl_33': 0.9482847748881673, 'k_59': 0.9375614

{'k_30': 0.8996255216627175,
 'sl_30': 0.91709163703126,
 'k_44': 1.0407249398337681,
 'sl_44': 0.9078298439497038,
 'k_24': 0.8604846399102547,
 'sl_24': 0.8910764888990936,
 'k_4': 1.1036670063910072,
 'sl_4': 0.9848905313591176,
 'k_20': 1.0131297969835693,
 'sl_20': 0.9446523769173987,
 'k_26': 0.8510165080713147,
 'sl_26': 0.9072317184346248,
 'k_14': 0.8432696312836371,
 'sl_14': 0.9182911945684633,
 'k_2': 0.9605091105904658,
 'sl_2': 0.922608632285525,
 'k_47': 1.1070828554487526,
 'sl_47': 0.9116845959268123,
 'k_53': 1.064875454080871,
 'sl_53': 0.896997524746356,
 'k_28': 1.0181453752913554,
 'sl_28': 0.9222378836523467,
 'k_40': 0.9779353725183414,
 'sl_40': 0.9245768600214754,
 'k_33': 0.8366510820062973,
 'sl_33': 0.8803677894873503,
 'k_59': 1.12790418691323,
 'sl_59': 0.9026717582327484,
 'k_58': 1.034102219152151,
 'sl_58': 0.9007696216639317,
 'k_45': 0.9604082242124471,
 'sl_45': 0.9061921189939808,
 'k_17': 1.0379784933697398,
 'sl_17': 0.9663848457857864,
 'k_3': 0

Submission from segment-tuned policy

- Use best segment params to generate Week 1 orders and write `orders_round1_rl_segments.csv`
- Note: Requires Cell 10 to finish the Optuna study and define `seg_params`.



In [7]:
# Build orders for Week 1 using tuned segment params
sim.reset_to(initial_state)
inv_pos_now = sim.inventory_position()
orders_parts = []
for d in segments:
    k = seg_params[f"k_{d}"]
    sl = seg_params[f"sl_{d}"]
    idx = seg[seg == d].index
    o = order_policy(inv_pos_now.reindex(idx), mu_s.loc[idx, "mu"], sigma_s.loc[idx, "sigma"], k=k, service_level=sl)
    orders_parts.append(o)
orders_now = pd.concat(orders_parts).reindex(inv_pos_now.index).fillna(0).astype(int)

# Emit submission in the platform order
index_df = pd.read_csv(DATA_DIR / "Week 0 - Submission Template.csv")[INDEX].set_index(INDEX)
submission = index_df.copy(); submission["order_qty"] = orders_now.reindex(index_df.index).fillna(0).astype(int).values
from pathlib import Path as _P
SUB = _P("../submissions").resolve(); SUB.mkdir(exist_ok=True)
out_csv2 = SUB / "orders_round1_rl_segments.csv"
submission.to_csv(out_csv2)
out_csv2


PosixPath('/Users/senoni/noni/vn2inventory/submissions/orders_round1_rl_segments.csv')

Direct policy search: feature-driven k with Optuna

- Policy: k = sigmoid(w0 + w1*recent_sales + w2*stockout_rate)
- Features computed per SKU per step from recent history in the sim state
- Optimize w to minimize total simulated cost



In [8]:
from collections import deque

# Build rolling features from sales for simplicity (recent sales); stockout proxy via low inventory
sales_panel = sales_wide.set_index(INDEX)[demand_dates]


def policy_k_from_features(inv_pos: pd.Series, t: int, w0: float, w1: float, w2: float) -> pd.Series:
    # recent sales: average of last 4 weeks
    start = max(0, t-4)
    recent_sales = sales_panel.iloc[:, start:t].mean(axis=1) if t > 0 else sales_panel.iloc[:, :1].mean(axis=1)
    # stockout proxy: indicator of low inv position at current step
    low_inv = (inv_pos <= 1).astype(float)
    lin = w0 + w1 * recent_sales.reindex(inv_pos.index).fillna(0.0) + w2 * low_inv
    k = 1.0 / (1.0 + np.exp(-lin))  # sigmoid to keep k in (0,1)
    # allow k to stretch around 1 by mapping (0,1) -> (0.8,1.2)
    return 0.8 + 0.4 * k


def objective_policy_search(trial: optuna.Trial) -> float:
    sl = trial.suggest_float("service_level", 0.9, 0.99)
    w0 = trial.suggest_float("w0", -2.0, 2.0)
    w1 = trial.suggest_float("w1", -1.0, 1.0)
    w2 = trial.suggest_float("w2", -1.0, 1.0)

    sim.reset_to(initial_state)
    total = 0.0
    mu = stats["mu"]; sigma = stats["sigma"]
    for t, _ in enumerate(demand_dates):
        inv_pos = sim.inventory_position()
        k_series = policy_k_from_features(inv_pos, t, w0, w1, w2)
        # per-SKU orders using individualized k
        z = _inv_normal_cdf(sl)
        target = k_series * (mu * P + z * sigma * np.sqrt(P))
        raw = (target - inv_pos).clip(lower=0.0)
        orders = np.rint(raw).astype(int)
        info = sim.step(orders)
        total += info["round_cost"]
    return total

study2 = optuna.create_study(direction="minimize")
study2.optimize(objective_policy_search, n_trials=40, show_progress_bar=False)
policy_params = study2.best_params
policy_params


[I 2025-09-30 15:41:27,801] A new study created in memory with name: no-name-78eafeed-c9a4-47f6-a804-05fe0249acc9
[I 2025-09-30 15:41:28,002] Trial 0 finished with value: 155684.00000000003 and parameters: {'service_level': 0.9613675373113764, 'w0': -1.8887900210289787, 'w1': -0.03194382813374008, 'w2': -0.15205621857744145}. Best is trial 0 with value: 155684.00000000003.
[I 2025-09-30 15:41:28,200] Trial 1 finished with value: 167903.39999999988 and parameters: {'service_level': 0.9780648670168125, 'w0': -0.8218159025791598, 'w1': -0.6200244957202643, 'w2': -0.6115423223014735}. Best is trial 0 with value: 155684.00000000003.
[I 2025-09-30 15:41:28,405] Trial 2 finished with value: 243687.9999999999 and parameters: {'service_level': 0.9708508021168047, 'w0': 1.531338202464993, 'w1': 0.3210664031687789, 'w2': -0.33471221572506593}. Best is trial 0 with value: 155684.00000000003.
[I 2025-09-30 15:41:28,600] Trial 3 finished with value: 189354.59999999998 and parameters: {'service_level

{'service_level': 0.905307309412906,
 'w0': -1.0409502380313713,
 'w1': -0.7822720607622098,
 'w2': 0.5228914357697367}

Submission from feature-driven policy

- Use tuned (w0, w1, w2, service_level) to generate Week 1 orders and write `orders_round1_rl_policy.csv`
- Note: Requires Cell 14 to finish the Optuna study and define `policy_params`.



In [9]:
# Generate orders from tuned feature-driven policy at t=0
sim.reset_to(initial_state)
inv_pos_now = sim.inventory_position()
w0 = policy_params["w0"]; w1 = policy_params["w1"]; w2 = policy_params["w2"]; sl = policy_params["service_level"]
k_series = policy_k_from_features(inv_pos_now, t=0, w0=w0, w1=w1, w2=w2)

z = _inv_normal_cdf(sl)
mu = stats["mu"]; sigma = stats["sigma"]
target = k_series * (mu * P + z * sigma * np.sqrt(P))
raw = (target - inv_pos_now).clip(lower=0.0)
orders_now = np.rint(raw).astype(int)

index_df = pd.read_csv(DATA_DIR / "Week 0 - Submission Template.csv")[INDEX].set_index(INDEX)
submission = index_df.copy(); submission["order_qty"] = orders_now.reindex(index_df.index).fillna(0).astype(int).values
from pathlib import Path as _P
SUB = _P("../submissions").resolve(); SUB.mkdir(exist_ok=True)
out_csv3 = SUB / "orders_round1_rl_policy.csv"
submission.to_csv(out_csv3)
out_csv3


PosixPath('/Users/senoni/noni/vn2inventory/submissions/orders_round1_rl_policy.csv')