### Random Baseline: Overview and Motivation

**Random recommendation** is the simplest possible recommendation approach where items are suggested to users completely at random, without considering user preferences or item attributes. While this approach has no practical use in production systems, it serves as a crucial **baseline** for evaluating other recommendation algorithms.

---

#### Why Use Random Recommendations as a Baseline?

- **Minimum performance threshold**: Any useful recommendation algorithm should significantly outperform random recommendations.
- **Sanity check**: Ensures that evaluation metrics and experimental setup are functioning correctly.
- **Identifies data biases**: Unexpectedly high performance from random recommendations can reveal hidden biases in the evaluation setup.
- **Quantifies improvements**: Provides a reference point to measure the absolute improvement of more sophisticated approaches.
- **Computation reference**: Demonstrates the trade-off between algorithmic complexity and performance gains.

In [1]:
import pandas as pd
import numpy as np
import random
from tqdm import tqdm
import os
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import ndcg_score

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)

DATA_PATH = 'data_final_project/KuaiRec 2.0/data/'

### Load Data

### Dataset Description

The recommendation system is based on several CSV files that represent user interactions and video metadata:

---

#### `big_matrix.csv`
- Contains historical **user-video interactions**.
- Used for **training** user profiles.
- Includes fields like: `user_id`, `video_id`, `watch_ratio`, `play_duration`, etc.

#### `small_matrix.csv`
- Contains **test interactions**, recorded after the training period.
- Used to **evaluate** recommendation quality.

For our baseline random recommender, we only need the basic interaction data to identify which videos are available for recommendation and which users we should generate recommendations for.

In [2]:
print("Loading datasets...")
		
# Load interaction data
interactions_train = pd.read_csv(os.path.join(DATA_PATH, "big_matrix.csv"))
interactions_test = pd.read_csv(os.path.join(DATA_PATH, "small_matrix.csv"))

print(f"Train interactions: {interactions_train.shape}")
print(f"Test interactions: {interactions_test.shape}")

Loading datasets...
Train interactions: (12530806, 8)
Test interactions: (4676570, 8)


### Data Preprocessing

In [3]:
# Show differences before and after dropna and drop_duplicates

def preprocess(df, name):
    before = df.shape[0]
    after = df.dropna().drop_duplicates().shape[0]
    print(f"{name}: {before} rows -> {after} rows ({before - after} removed)")
    return df.dropna().drop_duplicates()

interactions_train = preprocess(interactions_train, "interactions_train")
interactions_test = preprocess(interactions_test, "interactions_test")

# Process watch_ratio - clamp extreme values to keep them in a reasonable range
# Normally, a watch_ratio > 1 can indicate replays, but clamp to 2.0 as a reasonable max value
interactions_train['watch_ratio_clamped'] = np.clip(interactions_train['watch_ratio'], 0, 2.0)
interactions_test['watch_ratio_clamped'] = np.clip(interactions_test['watch_ratio'], 0, 2.0)

# Adjust the positive interaction threshold based on data analysis
POSITIVE_THRESHOLD = 0.7
interactions_train['positive_interaction'] = (interactions_train['watch_ratio_clamped'] >= POSITIVE_THRESHOLD).astype(int)
interactions_test['positive_interaction'] = (interactions_test['watch_ratio_clamped'] >= POSITIVE_THRESHOLD).astype(int)

print(f"Positive interactions in train set: {interactions_train['positive_interaction'].sum()} / {len(interactions_train)} ({interactions_train['positive_interaction'].mean()*100:.2f}%)")
print(f"Positive interactions in test set: {interactions_test['positive_interaction'].sum()} / {len(interactions_test)} ({interactions_test['positive_interaction'].mean()*100:.2f}%)")

# Get unique users and videos
unique_users = interactions_train['user_id'].unique()
all_videos = interactions_train['video_id'].unique()

print(f"Number of unique users: {len(unique_users)}")
print(f"Number of unique videos: {len(all_videos)}")

# Create user and video ID maps for quick reference
user_id_map = {user_id: idx for idx, user_id in enumerate(unique_users)}
video_id_map = {video_id: idx for idx, video_id in enumerate(all_videos)}

print("Data loaded and preprocessed successfully!")

interactions_train: 12530806 rows -> 11564987 rows (965819 removed)
interactions_test: 4676570 rows -> 4494578 rows (181992 removed)
Positive interactions in train set: 5966291 / 11564987 (51.59%)
Positive interactions in test set: 2536880 / 4494578 (56.44%)
Number of unique users: 7176
Number of unique videos: 10728
Data loaded and preprocessed successfully!


### Random Recommendation Implementation

The `generate_random_recommendations()` function simply selects videos at random to recommend to users. Unlike content-based or collaborative approaches, no user profile or similarity computation is needed.

---

#### Key Features:

- **Pure randomness**: Recommendations are completely unbiased and unpersonalized.
- **Optional exclusion**: Can exclude videos the user has already watched.
- **Baseline performance**: Establishes the minimum effectiveness for any recommendation algorithm.

---

#### Why it matters:

Random recommendations serve as a performance floor - any algorithm that cannot outperform random recommendations is not providing meaningful personalization.

In [4]:
def generate_random_recommendations(user_id, top_n=10):
    """
    Generate random recommendations for a user.
    """
    # Check if user exists in our data
    if user_id not in user_id_map:
        print(f"User {user_id} not found in training data")
        return []

    return [(video_id, random.random()) for video_id in random.sample(list(all_videos), top_n)]

# Example recommendations for a few users
for user_id in unique_users[:5]:
    recommendations = generate_random_recommendations(user_id, top_n=5)
    print(f"Random recommendations for user {user_id}: {recommendations}")

Random recommendations for user 0: [(6156, 0.22321073814882275), (6259, 0.7364712141640124), (4194, 0.6766994874229113), (532, 0.8921795677048454), (4524, 0.08693883262941615)]
Random recommendations for user 1: [(5505, 0.2326608933907396), (8516, 0.6020187290499803), (4266, 0.561245062938613), (674, 0.7160196129224035), (2496, 0.7013249735902359)]
Random recommendations for user 2: [(6561, 0.8094304566778266), (2574, 0.006498759678061017), (9979, 0.8058192518328079), (9230, 0.6981393949882269), (1519, 0.3402505165179919)]
Random recommendations for user 3: [(7105, 0.3799273006373374), (9828, 0.3589793804846284), (6120, 0.3439557224789711), (1431, 0.26452086722201307), (7376, 0.04345044292309719)]
Random recommendations for user 4: [(9327, 0.552040631273227), (10048, 0.8294046642529949), (10409, 0.6185197523642461), (6867, 0.8617069003107772), (4167, 0.577352145256762)]


### Evaluation: Ranking Quality on Test Videos

To ensure a fair comparison with the content-based approach, we use the exact same evaluation methodology. This function evaluates how well the random recommender system ranks videos that the user has **actually seen** in the test set.

---

#### Metrics Computed:

For each user, the following metrics are computed at `k` (typically 5, 10, or 20):

| Metric        | Definition                                                                 |
|---------------|---------------------------------------------------------------------------|
| `Precision@k` | Fraction of top-k recommended videos that are truly liked                 |
| `Recall@k`    | Fraction of all liked videos that appear in the top-k                     |
| `NDCG@k`      | Discounted cumulative gain, rewarding relevant items near the top of the list |

In [5]:
def evaluate_random_model(k=10, positive_threshold=0.7):
    """
    Evaluate the ranking quality of random recommendations on test videos.
    """
    print(f"Random baseline evaluation on test videos... (top-{k})")

    test_users = interactions_test['user_id'].unique()
    test_users = [u for u in test_users if u in user_id_map]

    precision_list = []
    recall_list = []
    ndcg_list = []
    skipped = 0

    for user_id in tqdm(test_users, desc="Evaluating users"):
        # Don't filter by videos in training set - include all test videos
        user_test_data = interactions_test[interactions_test['user_id'] == user_id]

        if user_test_data.empty:
            skipped += 1
            continue

        # Ground truth: all liked videos in test set (not just ones from training)
        positive_videos = set(
            user_test_data[user_test_data['watch_ratio_clamped'] >= positive_threshold]['video_id']
        )

        if len(positive_videos) == 0:
            skipped += 1
            continue

        # Get all unique videos from both training and test sets
        all_possible_videos = np.union1d(all_videos, interactions_test['video_id'].unique())
        
        # Modified random recommendation function that samples from ALL possible videos
        # This ensures we're truly measuring random recommendations
        recommendations = [(video_id, random.random()) 
                           for video_id in random.sample(list(all_possible_videos), min(k, len(all_possible_videos)))]
        
        ranked_video_ids = [vid for vid, _ in recommendations]

        recommended_set = set(ranked_video_ids)
        intersection = positive_videos & recommended_set

        precision = len(intersection) / len(ranked_video_ids)
        recall = len(intersection) / len(positive_videos) if len(positive_videos) > 0 else 0
        relevance = [1 if vid in positive_videos else 0 for vid in ranked_video_ids]
        ndcg = ndcg_score([relevance], [list(range(len(ranked_video_ids), 0, -1))]) if any(relevance) else 0

        precision_list.append(precision)
        recall_list.append(recall)
        ndcg_list.append(ndcg)

    print(f"Users evaluated: {len(precision_list)} / {len(test_users)}")
    print(f"Users skipped: {skipped}")
    
    # Calculate expected random precision (theoretical value)
    all_possible_videos = np.union1d(all_videos, interactions_test['video_id'].unique())
    avg_positive_per_user = np.mean([
        len(interactions_test[(interactions_test['user_id'] == u) & 
                              (interactions_test['positive_interaction'] == 1)]['video_id'].unique())
        for u in test_users[:100]  # Sample for efficiency
    ])
    expected_precision = avg_positive_per_user / len(all_possible_videos)
    
    print(f"Expected random precision: {expected_precision:.6f}")
    print(f"Actual precision: {np.mean(precision_list):.6f}")
    print(f"Precision ratio (actual/expected): {np.mean(precision_list)/expected_precision if expected_precision > 0 else 'N/A'}")

    return {
        'precision@k': np.mean(precision_list),
        'recall@k': np.mean(recall_list),
        'ndcg@k': np.mean(ndcg_list),
        'expected_precision': expected_precision,
        'users_evaluated': len(precision_list)
    }

# Evaluate at different k values
for k_value in [5, 10, 20]:
    results = evaluate_random_model(k=k_value, positive_threshold=0.7)
    print(f"\nRandom Baseline Results at k={k_value}:")
    print(f"Precision@{k_value}: {results['precision@k']:.4f} (expected: {results['expected_precision']:.4f})")
    print(f"Recall@{k_value}:    {results['recall@k']:.4f}")
    print(f"NDCG@{k_value}:      {results['ndcg@k']:.4f}")
    print(f"Users evaluated:   {results['users_evaluated']}")


Random baseline evaluation on test videos... (top-5)


Evaluating users: 100%|██████████| 1411/1411 [00:29<00:00, 47.42it/s]


Users evaluated: 1411 / 1411
Users skipped: 0
Expected random precision: 0.169234
Actual precision: 0.164989
Precision ratio (actual/expected): 0.9749198327906096

Random Baseline Results at k=5:
Precision@5: 0.1650 (expected: 0.1692)
Recall@5:    0.0005
NDCG@5:      0.3773
Users evaluated:   1411
Random baseline evaluation on test videos... (top-10)


Evaluating users: 100%|██████████| 1411/1411 [00:30<00:00, 46.75it/s]


Users evaluated: 1411 / 1411
Users skipped: 0
Expected random precision: 0.169234
Actual precision: 0.167257
Precision ratio (actual/expected): 0.9883207926915115

Random Baseline Results at k=10:
Precision@10: 0.1673 (expected: 0.1692)
Recall@10:    0.0009
NDCG@10:      0.4517
Users evaluated:   1411
Random baseline evaluation on test videos... (top-20)


Evaluating users: 100%|██████████| 1411/1411 [00:31<00:00, 45.25it/s]


Users evaluated: 1411 / 1411
Users skipped: 0
Expected random precision: 0.169234
Actual precision: 0.167824
Precision ratio (actual/expected): 0.991671032666737

Random Baseline Results at k=20:
Precision@20: 0.1678 (expected: 0.1692)
Recall@20:    0.0019
NDCG@20:      0.4942
Users evaluated:   1411
