![bse_logo_textminingcourse](https://bse.eu/sites/default/files/bse_logo_small.png)

# *Final Project: Contextual Bandits - Preprocessing Context Features*

## Reinforcement Learning

#### Authors: **Timothy Cassel, Marvin Ernst, Oliver Tausendschön**

Date: July 2, 2025

Instructors: *Hamish Flynn and Vincent Adam*

In this notebook, we preprocess the context features from the **Open Bandit Dataset (OBD)** to improve the performance of contextual bandit algorithms. The original context vectors are **high-dimensional and sparse**, with many features rarely active. This motivates a **dimensionality reduction step** (via PCA) to extract denser and more informative representations.

We also construct a **reduced-action dataset**, where only **10 arms** (the 5 most-clicked and 5 least-clicked) are retained. This speeds up experimentation while preserving diversity in action outcomes.

Finally, we extract the **three most frequently selected arms** under the logging policy. These arms are useful for focused offline evaluation, as they appear more often in the dataset and increase the chance that our evaluation policy matches logged actions - making **IPS and DR estimates more reliable**.

---

## Step 1: Load the Dataset

Libraries:

In [28]:
from load_opb import load_obp_dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from pathlib import Path
import os

Load the OBP-style dict:

In [29]:
IS_RAW = False

data = load_obp_dataset(
    campaign="all",
    behavior_policy="random",
    n_rounds=None,
    use_raw_csv=IS_RAW
)

Loaded OBP dataset: 1374327 rounds, 80 actions


  item_feature_cat = self.item_context.drop("item_feature_0", 1).apply(
  self.action_context = pd.concat([item_feature_cat, item_feature_0], 1).values


## Step 2: Drop Sparse Context Features

We drop features that are active in fewer than `threshold` of samples.

In [30]:
def drop_sparse_features(X, threshold=0.01):
    feature_means = X.mean(axis=0)
    keep_mask = feature_means >= threshold
    print(f"Keeping {keep_mask.sum()} out of {len(keep_mask)} features")
    return X[:, keep_mask], keep_mask

In [31]:
context_matrix = data["context"]
context_dropped, keep_mask = drop_sparse_features(context_matrix, threshold=0.01)

Keeping 18 out of 26 features


## Step 3: PCA for Dimensionality Reduction

We apply PCA to retain `variance_threshold` of variance.

In [32]:
def apply_pca(X, variance_threshold=0.99):
    pca = PCA(n_components=variance_threshold)
    X_reduced = pca.fit_transform(X)
    print(f"Reduced dimensions from {X.shape[1]} to {X_reduced.shape[1]}")
    return X_reduced, pca

In [33]:
X_reduced, pca_model = apply_pca(context_dropped)

Reduced dimensions from 18 to 12


## Step 4: Save Preprocessed Features

In [34]:
output_dir = Path("processed")
os.makedirs(output_dir, exist_ok=True)

np.save(output_dir / "context_reduced.npy", X_reduced)
np.save(output_dir / "context_mask.npy", keep_mask)
np.save(output_dir / "actions.npy", data["action"])
np.save(output_dir / "rewards.npy", data["reward"])
np.save(output_dir / "pscores.npy", data["pscore"])

with open(output_dir / "meta.txt", "w") as f:
    f.write(f"n_actions: {data['n_actions']}\n")

print("Saved preprocessed data to 'processed/'")

Saved preprocessed data to 'processed/'


## Step 5: Create Reduced Action Dataset (10 Arms Only)

We select the 5 most-clicked and 5 least-clicked arms, and keep all observations for those actions only.

**1: Compute click count per action**

In [35]:
unique_actions, click_rates = np.unique(data['action'], return_counts=True)
rewards = data["reward"]

Compute click-through rates per action:

In [36]:
clicks_per_action = np.array([rewards[data['action'] == a].sum() for a in unique_actions])
ctr_per_action = clicks_per_action / click_rates

Select top 2 and bottom 2 arms:

In [None]:
sorted_indices = np.argsort(ctr_per_action)
low_arms = unique_actions[sorted_indices[:5]]
high_arms = unique_actions[sorted_indices[-5:]]

selected_arms = np.sort(np.concatenate([low_arms, high_arms]))
print("Selected Arms (original IDs):", selected_arms)

Selected Arms (original IDs): [ 4 35 54 60]


**2: Filter dataset**

In [38]:
mask = np.isin(data["action"], selected_arms)
filtered_context = data["context"][mask]
filtered_action = data["action"][mask]
filtered_reward = data["reward"][mask]
filtered_pscore = data["pscore"][mask]

Reindex actions to 0–9:

In [39]:
original_to_reduced = {orig: idx for idx, orig in enumerate(selected_arms)}
reduced_action = np.array([original_to_reduced[a] for a in filtered_action])

**3: Save reduced dataset**

In [None]:
output_dir_small = Path("processed_small")
os.makedirs(output_dir_small, exist_ok=True)

np.save("processed_small/context_reduced.npy", X_reduced[mask])
np.save("processed_small/actions.npy", reduced_action)
np.save("processed_small/rewards.npy", filtered_reward)
np.save("processed_small/pscores.npy", filtered_pscore)

with open("processed_small/meta.txt", "w") as f:
    f.write(f"original_actions: {selected_arms.tolist()}\n")
    f.write("n_actions: 10\n")

print("Saved reduced dataset to 'processed_small/'")

Saved reduced dataset to 'processed_small/'


## Step 6: Create Focused Dataset (Top 3 Most Frequently Logged Arms)

We extract the **three most frequently selected arms** under the random logging policy. This subset increases the chance of action overlap and enables more reliable IPS/DR evaluation.

Count frequency of each action:

In [41]:
unique_actions, counts = np.unique(data["action"], return_counts=True)
sorted_indices = np.argsort(counts)[::-1]
top3_arms = unique_actions[sorted_indices[:3]]

Filter dataset for those arms only:

In [42]:
mask_top3 = np.isin(data["action"], top3_arms)
context_top3 = data["context"][mask_top3]
action_top3 = data["action"][mask_top3]
reward_top3 = data["reward"][mask_top3]
pscore_top3 = data["pscore"][mask_top3]

Reindex actions to 0–2:

In [43]:
original_to_top3 = {orig: idx for idx, orig in enumerate(top3_arms)}
action_top3_reindexed = np.array([original_to_top3[a] for a in action_top3])

Save to new directory:

In [44]:
output_dir_top3 = Path("processed_top3")
output_dir_top3.mkdir(exist_ok=True)

np.save(output_dir_top3 / "context_reduced.npy", X_reduced[mask_top3])
np.save(output_dir_top3 / "actions.npy", action_top3_reindexed)
np.save(output_dir_top3 / "rewards.npy", reward_top3)
np.save(output_dir_top3 / "pscores.npy", pscore_top3)

with open(output_dir_top3 / "meta.txt", "w") as f:
    f.write(f"original_actions: {top3_arms.tolist()}\n")
    f.write("n_actions: 3\n")

print("Saved focused dataset with top 3 arms to 'processed_top3/'")

Saved focused dataset with top 3 arms to 'processed_top3/'


## Summary

- Dropped rarely active context features (<1% activation)
- Applied PCA to reduce dimensionality while retaining 99% variance
- Created **reduced dataset** with 10 selected arms (5 most-clicked + 5 least-clicked)
- Created **focused dataset** with the 3 most frequently selected arms under the random logging policy
- Saved all cleaned datasets for fast evaluation and reliable offline policy assessment

This preprocessing pipeline supports both:
- **Full-data training and benchmarking**, and
- **Faster, more statistically robust experimentation** using smaller action subsets that improve offline evaluation reliability (e.g., IPS, DR).