# Lab 3: Contextual Bandit-Based News Article Recommendation

**`Course`:** Reinforcement Learning Fundamentals  
**`Student Name`:**  
**`Roll Number`:**  
**`GitHub Branch`:** firstname_U20230xxx  

# Imports and Setup

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

from rlcmab_sampler import sampler


# Load Datasets

In [4]:
# Load datasets
news_df = pd.read_csv("data/news_articles.csv")
train_users = pd.read_csv("data/train_users.csv")
test_users = pd.read_csv("data/test_users.csv")

news_df.head()

Unnamed: 0,link,headline,category,short_description,authors,date
0,https://www.huffpost.com/entry/covid-boosters-...,Over 4 Million Americans Roll Up Sleeves For O...,U.S. NEWS,Health experts said it is too early to predict...,"Carla K. Johnson, AP",2022-09-23
1,https://www.huffpost.com/entry/american-airlin...,"American Airlines Flyer Charged, Banned For Li...",U.S. NEWS,He was subdued by passengers and crew when he ...,Mary Papenfuss,2022-09-23
2,https://www.huffpost.com/entry/funniest-tweets...,23 Of The Funniest Tweets About Cats And Dogs ...,COMEDY,"""Until you have a dog you don't understand wha...",Elyse Wanshel,2022-09-23
3,https://www.huffpost.com/entry/funniest-parent...,The Funniest Tweets From Parents This Week (Se...,PARENTING,"""Accidentally put grown-up toothpaste on my to...",Caroline Bologna,2022-09-23
4,https://www.huffpost.com/entry/amy-cooper-lose...,Woman Who Called Cops On Black Bird-Watcher Lo...,U.S. NEWS,Amy Cooper accused investment firm Franklin Te...,Nina Golgowski,2022-09-22


In [5]:
train_users.head()

Unnamed: 0,user_id,age,income,clicks,purchase_amount,label
0,1,28,58242,81,378.38,user3
1,2,28,38225,21,114.5,user3
2,3,39,95017,41,66.24,user2
3,4,52,33473,98,496.88,user3
4,5,29,80690,5,293.24,user1


## Data Preprocessing

In this section:
- Handle missing values
- Encode categorical features
- Prepare data for user classification

In [12]:
for name, df in {
    "news": news_df,
    "train_users": train_users,
    "test_users": test_users
}.items():
    print(f"\n{name}")
    print(f"Total entries: {len(df)}")
    print(df.isna().sum())

print(f'\n{(news_df["category"] == "").sum()=}')
print(f'{(train_users["label"] == "").sum()=}')
print(f'{(test_users["label"] == "").sum()=}')


news
Total entries: 209527
link                     0
headline                 6
category                 0
short_description    19712
authors              37418
date                     0
dtype: int64

train_users
Total entries: 2000
user_id            0
age                0
income             0
clicks             0
purchase_amount    0
label              0
dtype: int64

test_users
Total entries: 2000
user_id            0
age                0
income             0
clicks             0
purchase_amount    0
label              0
dtype: int64

(news_df["category"] == "").sum()=np.int64(0)
(train_users["label"] == "").sum()=np.int64(0)
(test_users["label"] == "").sum()=np.int64(0)


**Null Value Handling**
1. All the news articles do have a non null or empty category.
2. All the users in both train and test datasets have non null or empty labels.

Therefore, I am not changing anything as there aren't any null values that can cause issues for classifying users, or for encoding news categories as arms.

## User Classification

Train a classifier to predict the user category (`User1`, `User2`, `User3`),
which serves as the **context** for the contextual bandit.


In [13]:
import classifiers as myclsf

# Drop user_id col 
train_users = train_users.drop(columns=["user_id"], errors='ignore')
test_users = test_users.drop(columns=["user_id"], errors='ignore')

X_train, y_train, lr = myclsf.preprocess(train_users)
X_val, y_val, _ = myclsf.preprocess(test_users)

results, models = myclsf.train_classifiers(X_train, y_train, X_val, y_val)

results

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


{'Logistic Regression': 0.3365,
 'Random Forest': 0.325,
 'SVM (RBF)': 0.3225,
 'KNN': 0.303}

After trying multiple methods, seems like Logistic Regression has the best accuracy.

# `Contextual Bandit`

## Reward Sampler Initialization

The sampler is initialized using the student's roll number `i`.
Rewards are obtained using `sampler.sample(j)`.


## Arm Mapping

| Arm Index (j) | News Category | User Context |
|--------------|---------------|--------------|
| 0–3          | Entertainment, Education, Tech, Crime | User1 |
| 4–7          | Entertainment, Education, Tech, Crime | User2 |
| 8–11         | Entertainment, Education, Tech, Crime | User3 |

## Epsilon-Greedy Strategy

This section implements the epsilon-greedy contextual bandit algorithm.


## Upper Confidence Bound (UCB)

This section implements the UCB strategy for contextual bandits.

## SoftMax Strategy

This section implements the SoftMax strategy with temperature $ \tau = 1$.


## Reinforcement Learning Simulation

We simulate the bandit algorithms for $T = 10,000$ steps and record rewards.

P.S.: Change $T$ value as and if required.


## Results and Analysis

This section presents:
- Average Reward vs Time
- Hyperparameter comparisons
- Observations and discussion


## Final Observations

- Comparison of Epsilon-Greedy, UCB, and SoftMax
- Effect of hyperparameters
- Strengths and limitations of each approach
