# Procurement & Supplier Management — Notebook

This notebook follows the `STEP nA / nB / nC` pattern.

- **df_raw** = raw procurement PO data
- **df_treat** = cleaned dataset after TREAT steps

Steps:
- STEP 1: SEE/TREAT/VERIFY structure
- STEP 2: SEE/TREAT/VERIFY missing dates
- STEP 3: SEE/TREAT/VERIFY outliers & suppliers
- STEP 4: VERIFY KPIs
- STEP 5: Descriptive analytics
- STEP 6: Diagnostic analytics
- STEP 7: Predictive logistic regression
- STEP 8: Prescriptive supplier allocation


In [None]:
# STEP 0 — imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
%matplotlib inline


In [None]:
# STEP 0B — load dataset
df_raw = pd.read_csv('procurement_po_data.csv')
df_raw.head()


## Helper Functions

We will use the following helper functions in multiple steps:

- `missing_report(df)`
- `step1_check_structure(df)`
- `step2_check_missing_and_dates(df)`
- `step3_check_outliers_and_suppliers(df)`
- `step4_check_verify_kpis(df, promised_lead_time=10)`


In [None]:
def missing_report(df):
    """Print count of missing values per column."""
    print(df.isna().sum())
    print()

def step1_check_structure(df):
    print("dtypes:\n", df.dtypes)
    print("\nDuplicate PO_ID count:", df.duplicated('PO_ID').sum())
    if 'Lead_Time_Days' in df.columns:
        print("Negative/zero lead times:", (df['Lead_Time_Days']<=0).sum())
    print("\nUnique suppliers:", df['Supplier'].unique())
    print()

def step2_check_missing_and_dates(df):
    print("Missing Delivery_Date:", df['Delivery_Date'].isna().sum())
    if 'Lead_Time_Days' in df.columns:
        print("Missing Lead_Time_Days:", df['Lead_Time_Days'].isna().sum())
    print()

def step3_check_outliers_and_suppliers(df):
    if 'Lead_Time_Days' in df.columns:
        print(df['Lead_Time_Days'].describe())
        print("Lead times > 45 days:", (df['Lead_Time_Days']>45).sum())
    print("\nSuppliers frequency:")
    print(df['Supplier'].value_counts())
    print()

def step4_check_verify_kpis(df, promised_lead_time=10):
    df = df.copy()
    df['Late_Flag'] = (df['Lead_Time_Days'] > promised_lead_time).astype(int)
    grp = df.groupby('Supplier')
    kpi = pd.DataFrame(index=grp.size().index)
    kpi['OTD%'] = (1 - grp['Late_Flag'].mean()) * 100
    kpi['Mean_LT'] = grp['Lead_Time_Days'].mean()
    kpi['Std_LT'] = grp['Lead_Time_Days'].std()
    kpi['CV%'] = kpi['Std_LT'] / kpi['Mean_LT'] * 100
    kpi['DefectRate%'] = grp['Defect_Flag'].mean() * 100
    print(kpi)
    return kpi


## STEP 1 — SEE / TREAT / VERIFY: Structure
Check structure of df_raw: dtypes, duplicates, impossible lead times, supplier names.


In [None]:
# STEP 1A — SEE structure
step1_check_structure(df_raw)


In [None]:
# STEP 1B — TREAT structure
df_treat = df_raw.copy()

mapping = {
    'A': 'A', 'Supp A': 'A', 'supA': 'A',
    'B': 'B', 'Sup B': 'B',
    'C': 'C'
}
df_treat['Supplier'] = df_treat['Supplier'].map(mapping)

df_treat['PO_Date'] = pd.to_datetime(df_treat['PO_Date'])
df_treat['Delivery_Date'] = pd.to_datetime(df_treat['Delivery_Date'])

df_treat['Lead_Time_Days'] = (df_treat['Delivery_Date'] - df_treat['PO_Date']).dt.days
df_treat.loc[df_treat['Lead_Time_Days'] <= 0, 'Lead_Time_Days'] = np.nan

df_treat = df_treat.drop_duplicates(subset='PO_ID', keep='first')


In [None]:
# STEP 1C — VERIFY structure
step1_check_structure(df_treat)


## STEP 2 — SEE / TREAT / VERIFY: Missing Dates & Lead Times
Handle missing Delivery_Date and Lead_Time_Days using median lead time per supplier.


In [None]:
# STEP 2A — SEE missingness
missing_report(df_treat)
step2_check_missing_and_dates(df_treat)


In [None]:
# STEP 2B — TREAT missing Delivery_Date
median_lt = df_treat.groupby('Supplier')['Lead_Time_Days'].median()

mask_missing_del = df_treat['Delivery_Date'].isna()
df_treat.loc[mask_missing_del, 'Delivery_Date'] = (
    df_treat.loc[mask_missing_del, 'PO_Date'] +
    pd.to_timedelta(df_treat.loc[mask_missing_del, 'Supplier'].map(median_lt), unit='D')
)

df_treat['Lead_Time_Days'] = (df_treat['Delivery_Date'] - df_treat['PO_Date']).dt.days


In [None]:
# STEP 2C — VERIFY missingness
missing_report(df_treat)
step2_check_missing_and_dates(df_treat)


## STEP 3 — SEE / TREAT / VERIFY: Outliers & Supplier Labels
Check lead-time outliers (> 45 days) and ensure Supplier ∈ {A,B,C}.


In [None]:
# STEP 3A — SEE outliers & suppliers
step3_check_outliers_and_suppliers(df_treat)


In [None]:
# STEP 3B — TREAT outliers & suppliers
df_treat = df_treat[df_treat['Lead_Time_Days'] <= 45]


In [None]:
# STEP 3C — VERIFY outliers & suppliers
step3_check_outliers_and_suppliers(df_treat)


## STEP 4 — VERIFY: KPI Stability
Compute OTD%, CV%, DefectRate% per supplier to verify dataset stability.


In [None]:
# STEP 4A — Compute KPI table
promised_lead_time = 10
df_treat['Late_Flag'] = (df_treat['Lead_Time_Days'] > promised_lead_time).astype(int)
kpi_table = step4_check_verify_kpis(df_treat, promised_lead_time=promised_lead_time)


In [None]:
# STEP 4B — Inspect KPI table
kpi_table


### STEP 4C — Interpretation (student)
- Which supplier looks most stable?
- Who has the highest OTD% and lowest CV%?


## STEP 5 — Descriptive Analytics
Build a supplier scorecard: lead time, defects, price behaviour.


In [None]:
# STEP 5A — Descriptive KPIs
grp = df_treat.groupby('Supplier')
desc = grp['Lead_Time_Days'].agg(['mean', 'std']).rename(columns={'mean':'Mean_LT','std':'Std_LT'})
desc['CV%'] = desc['Std_LT']/desc['Mean_LT']*100
desc['DefectRate%'] = grp['Defect_Flag'].mean()*100
desc['Price_Mean'] = grp['Unit_Price'].mean()
desc['Price_Std'] = grp['Unit_Price'].std()
desc


In [None]:
# STEP 5B — Visualisations
import matplotlib.pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(1, 2, figsize=(10,4))
sns.barplot(x=desc.index, y='Mean_LT', data=desc, ax=axes[0])
axes[0].set_title('Mean Lead Time by Supplier')

sns.barplot(x=desc.index, y='DefectRate%', data=desc, ax=axes[1])
axes[1].set_title('Defect Rate % by Supplier')
plt.tight_layout()
plt.show()


### STEP 5C — Interpretation (student)
- Summarise which supplier is fastest, most reliable, and cheapest.


## STEP 6 — Diagnostic Analytics
Explain **why** suppliers behave differently (variance drivers, relationships).


In [None]:
# STEP 6A — Lead time variance by supplier
sns.boxplot(x='Supplier', y='Lead_Time_Days', data=df_treat)
plt.title('Lead Time Distribution by Supplier')
plt.show()


In [None]:
# STEP 6B — Relationship between defects and lead time
df_treat.groupby('Defect_Flag')['Lead_Time_Days'].mean()


### STEP 6C — Interpretation (student)
- Does higher lead time correlate with more defects?
- Are some suppliers 'too cheap' and unstable?


## STEP 7 — Predictive Analytics (Logistic Regression)
Model Late_Flag using Lead_Time_Days, Defect_Flag, Supplier_Index, Unit_Price.


In [None]:
# STEP 7A — Feature preparation
df_model = df_treat.copy()
df_model['Late_Flag'] = (df_model['Lead_Time_Days'] > 10).astype(int)
supplier_map = {s:i for i,s in enumerate(sorted(df_model['Supplier'].dropna().unique()))}
df_model['Supplier_Index'] = df_model['Supplier'].map(supplier_map)
X = df_model[['Lead_Time_Days','Defect_Flag','Supplier_Index','Unit_Price']]
y = df_model['Late_Flag']


In [None]:
# STEP 7B — Train logistic regression and predict probabilities
lr = LogisticRegression(max_iter=1000)
lr.fit(X, y)
df_model['Late_Prob'] = lr.predict_proba(X)[:,1]
df_model[['Supplier','Lead_Time_Days','Late_Flag','Late_Prob']].head()


### STEP 7C — Interpretation (student)
- Which features increase late probability?
- Which supplier has lowest average Late_Prob?


## STEP 8 — Prescriptive Decision (Supplier Allocation)
Combine KPIs and Late_Prob to recommend sourcing allocation.


In [None]:
# STEP 8A — Aggregate supplier performance
summary = df_model.groupby('Supplier').agg(
    OTD=('Late_Flag', lambda x: (1-x.mean())*100),
    Mean_LT=('Lead_Time_Days','mean'),
    CV_LT=('Lead_Time_Days', lambda x: x.std()/x.mean()*100),
    DefectRate=('Defect_Flag', lambda x: x.mean()*100),
    Price=('Unit_Price','mean'),
    Late_Prob=('Late_Prob','mean')
)
summary


In [None]:
# STEP 8B — Choose allocation (example)
allocation_proposal = {'A': 30, 'B': 70, 'C': 0}
allocation_proposal


### STEP 8C — Managerial summary (student)
- Write 3–5 bullet points as advice to a Procurement Manager.
