## Project: Building and Evaluating Reorder Prediction Model

This project formulates reorder prediction as a ranking problem and evaluates an XGBoost model against a baseline strategy. The input data is a Kaggle challenge Instacart event-level orders dataset. Performance is assessed using ROC-AUC and per-user Recall@K, highlighting the importance of user-centric evaluation for recommendation systems.

**Imports and Setup**

Import data loading utilities and the XGBoost training and validation helpers used throughout the notebook.


In [1]:
from src.data_loader import *
from model.xgboost_ import *

**Build Feature and Target Datasets**

Generate training and validation datasets with engineered features and the binary reorder target. The data is split according to the specified training partition.


In [2]:
train_df, validate_df = build_feature_target_csv(partition_for_training=0.8, Force_rewrite=False)

print('Model input features', train_df.columns.tolist()[2:-1])

Model input features ['avg_position_in_cart', 'bought_times', 'times_in_last_5_orders', 'time_since_last_order_score', 'user_aisle_freq', 'user_department_freq', 'product_reorder_rate', 'add_to_cart_order', 'avg_time_between_orders', 'user_reorder_rate']


**Model Configuration**

Define the XGBoost hyperparameters for training a binary logistic model optimized for ranking performance (ROC-AUC).


In [3]:
PARAMS = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    'max_depth': 7,
    'min_child_weight': 5,
    'gamma': 0.1,
    'learning_rate': 0.05,
    'n_estimators': 400,
    'subsample': 0.8,
    'colsample_bytree': 0.5,
    'reg_alpha': 0.05,
    'reg_lambda': 1.0,
    'n_jobs': -1,
    'random_state': 42,
}

**Model Training and Prediction**

Train an XGBoost model using the selected feature set to predict reorder likelihood for each user–product pair. Feed the validation dataset


In [4]:
model = xgb_train(train_df=train_df,
                  feature_columns=train_df.columns[2:12],
                  target_column='reordered',
                  params = PARAMS)

df_model = xgb_validate(model=model,
                                validate_df=validate_df,
                                feature_columns=validate_df.columns[2:12],
                                target_column='reordered')

print(df_model)

df_model.to_csv(data_dir/'raw'/'model_output.csv', index=True)


         user_id  y_true     score
0              5       0  0.104450
1              5       0  0.120275
2              5       0  0.068323
3              5       0  0.276848
4              5       0  0.497227
...          ...     ...       ...
1637606   206195       0  0.082295
1637607   206195       0  0.030781
1637608   206195       0  0.052682
1637609   206195       1  0.138630
1637610   206195       0  0.035720

[1637611 rows x 3 columns]


**Baseline: Top-20 Previously Ordered Products**

The baseline recommends the 20 most frequently ordered products for each user using historical order counts. These products are marked as predicted reorders and used as a heuristic benchmark for comparison against the machine-learning model.


In [None]:
df_baseline = build_baseline(orders_prior, order_products_prior, order_products_train, True)

**Mean Recall@K (Per User) Evaluation**

Compute Mean Recall@K by ranking predictions per user, measuring how many of the user’s actual reorders appear in their top-K recommendations, and averaging recall across users.


In [5]:
def recall_at_k_per_user(df, k):
    recalls = []

    for _, g in df.groupby('user_id'):
        y_true = g['y_true'].to_numpy()
        scores = g['score'].to_numpy()

        positives = y_true.sum()
        if positives == 0:
            continue  # or treat as recall = 0, depending on convention

        idx = np.argsort(scores)[::-1][:k]
        y_topk = y_true[idx]

        recall_u = y_topk.sum() / positives
        recalls.append(recall_u)

    return np.mean(recalls) if recalls else 0.0

**Evaluation Output, involving ROC AUC score comparison**

In [6]:
from sklearn.metrics import roc_auc_score
r_model = recall_at_k_per_user(
    df_model,
    k=20
)

r_base = recall_at_k_per_user(
    df_baseline.rename(columns={'reordered': 'y_true', 'pred_reordered': 'score'}),
    k=20
)

model_roc_auc = roc_auc_score(df_model['y_true'], df_model['score'])
baseline_roc_auc = roc_auc_score(df_baseline['reordered'], df_baseline['pred_reordered'])

print(f'MODEL      ROC AUC Score: {model_roc_auc:.4f}')
print(f'BASELINE   ROC AUC Score: {baseline_roc_auc:.4f}')
print(f'MODEL      Mean R@50:     {r_model:.4f}')
print(f'BASELINE   Mean R@50:     {r_base:.4f}')

MODEL      ROC AUC Score: 0.8261
BASELINE   ROC AUC Score: 0.5768
MODEL      Mean R@50:     0.7585
BASELINE   Mean R@50:     0.4576


Compared to the baseline heuristic, the XGBoost model substantially improves both global ranking quality
and user-level recommendation effectiveness.
The model achieves a ROC-AUC of ~0.83 versus ~0.58 for the baseline, representing a
**53% relative improvement** in ranking performance.
More importantly, at the decision level, the model retrieves
~76% of items users actually reorder within the top-20 recommendations, compared to
~46% for the baseline, a
**66% relative improvement**.
 This demonstrates that the model not only ranks items more accurately overall, but also surfaces
 significantly more relevant items in the top-K positions where decisions are made.