### Step 1: Environment Setup and Library Imports

This cell prepares the notebook for execution in Google Colab. It includes three key actions:

1. Mounting Google Drive:  
   Enables access to local CSV files (e.g., user reviews and listing metadata) that are stored in your Google Drive.

2. Installing Required Packages:  
   The `lightfm` package is used to build hybrid recommendation models combining collaborative and content-based filtering.

3. Importing Dependencies:  
   This includes basic libraries (`numpy`, `pandas`, `os`) as well as specialized libraries from LightFM for model construction and evaluation. The `scipy.sparse` module is used to handle sparse matrix operations efficiently.

These steps are necessary to ensure that all subsequent code cells execute correctly and reproducibly in the Colab environment.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!pip install lightfm



In [4]:
import numpy as np
import pandas as pd
import os
from itertools import product
from lightfm import LightFM
from lightfm.data import Dataset
from lightfm.evaluation import precision_at_k, recall_at_k, auc_score
from scipy.sparse import csr_matrix

# Set paths
reviews_path = "/content/drive/MyDrive/Reviews.csv"
listings_path = "/content/drive/MyDrive/final_listings.csv"

### Step 2: Feature Engineering Function

This cell defines a feature extraction function generator, `make_feature_fn()`, which constructs interpretable item-level features from the Airbnb listing metadata. The function takes six binary flags as input, each representing whether to include a particular category of features:

- `room_type` (e.g., Entire home/apt, Private room)
- `property_type` (e.g., Apartment, House)
- `loc_cluster` (geographical cluster label assigned to each listing)
- `price` (binned into low, mid, high)
- `review_scores_rating` (high quality if rating > 90)
- `accommodates` (number of guests the listing can host)

Each listing is transformed into a tuple of the form `(item_idx, [feature_1, feature_2, ...])`, which can be directly passed into LightFM’s item feature construction method.

#### Why this step is important:
- It enables hybrid modeling by allowing the recommender system to consider both interaction data and content features.
- It supports experimentation with different combinations of features during the grid search.
- It provides interpretability to the feature space used for training.

This feature engineering step aligns with Section 1 of the final report and demonstrates our group’s contribution in selecting meaningful, domain-informed features.


In [5]:
def make_feature_fn(use_room, use_prop, use_loc, use_price, use_rating, use_accom):
    def fn(items):
        out = []
        for _, row in items.iterrows():
            f = []
            if use_room and pd.notna(row.get('room_type')): f.append(f"room={row['room_type']}")
            if use_prop and pd.notna(row.get('property_type')): f.append(f"prop={row['property_type']}")
            if use_loc and pd.notna(row.get('loc_cluster')): f.append(f"loc={row['loc_cluster']}")
            if use_price:
                try:
                    p = float(row.get('price', 0))
                    if p < 50: f.append("price=low")
                    elif p < 100: f.append("price=mid")
                    else: f.append("price=high")
                except: pass
            if use_rating:
                try:
                    r = float(row.get('review_scores_rating', 0))
                    if r > 90: f.append("rating=high")
                except: pass
            if use_accom:
                try:
                    a = int(row.get('accommodates', 0))
                    f.append(f"accom={a}")
                except: pass
            out.append((row['item_idx'], f))
        return out
    return fn

### Step 3: Data Loading and Preprocessing

This cell defines the `load_data()` function, which is responsible for loading and preprocessing the user-item interaction data and listing metadata. It performs the following key operations:

1. **Read and clean user interaction data**  
   - Loads the review dataset (user_id, item_id, review_id, date).
   - Converts the `date` column to datetime format for later use in time-based splitting.
   - Creates a binary target column `booked`, where a missing review ID is interpreted as 0 and a present review ID as 1.

2. **Apply interaction frequency filters**  
   - Filters out users and items that do not meet the minimum interaction thresholds defined by `min_user_inter` and `min_item_inter`. This step helps to reduce sparsity and improve learning stability.

3. **Map IDs to integer indices**  
   - Creates mappings from user and item IDs to integer indices, which are required by LightFM.

4. **Load and filter listing metadata**  
   - Loads the listing data and filters it to include only those items that remain after preprocessing the interaction data.
   - Adds `item_idx` to align the listings with the interaction matrix.

#### Why this step is important:
- It ensures data quality by filtering out sparse users and items.
- It aligns the structure of the data with the input format required by LightFM.
- It supports reproducibility and modularity, allowing data preparation to be repeated consistently across different experimental configurations.

This function plays a foundational role in the pipeline and directly contributes to the performance of the recommendation model by ensuring a clean and consistent dataset.


In [6]:
def load_data(reviews_path, listings_path, min_user_inter, min_item_inter):
    df = pd.read_csv(reviews_path)
    df.columns = ['item_id', 'review_id', 'date', 'user_id']
    df['date'] = pd.to_datetime(df['date'])
    df['booked'] = df['review_id'].notna().astype(int)

    df = df[df['item_id'].map(df['item_id'].value_counts()) >= min_item_inter]
    df = df[df['user_id'].map(df['user_id'].value_counts()) >= min_user_inter]

    user2idx = {uid: i for i, uid in enumerate(df['user_id'].unique())}
    item2idx = {iid: i for i, iid in enumerate(df['item_id'].unique())}
    df['user_idx'] = df['user_id'].map(user2idx)
    df['item_idx'] = df['item_id'].map(item2idx)

    items = pd.read_csv(listings_path)
    items = items[items['listing_id'].isin(df['item_id'])]
    items['item_idx'] = items['listing_id'].map(item2idx)

    return df, items

### Step 4: Time-Based Data Splitting

This cell defines the `time_based_split()` function, which partitions the user-item interaction data into training, validation, and test sets based on temporal order. The splitting is performed per user to simulate a realistic recommendation setting, where models are trained on past interactions and evaluated on future behavior.

#### Splitting Strategy:
- For each user, interactions are sorted by the `date` column.
- The first ~70% of interactions are allocated to the training set.
- The next ~10% are assigned to the validation set.
- The final ~20% are used as the test set.
- Users with fewer than 3 interactions are skipped to ensure sufficient data for all three sets.

#### Why this step is important:
- It prevents data leakage by ensuring that future interactions are not used to inform the model during training.
- It allows early stopping to be evaluated properly using validation data that simulates unseen future behavior.
- It reflects a real-world usage scenario where recommendations are made based on a user's historical interactions.

This function supports valid model evaluation and directly relates to Section 2 of the final report, which emphasizes model performance under realistic conditions.


In [7]:
def time_based_split(df, val_ratio=0.1, test_ratio=0.2):
    df = df.sort_values("date")
    train_rows, val_rows, test_rows = [], [], []

    for _, group in df.groupby("user_id"):
        group = group.sort_values("date")
        n = len(group)
        if n < 3: continue
        n_test = max(1, int(n * test_ratio))
        n_val = max(1, int(n * val_ratio))
        n_train = n - n_val - n_test
        if n_train < 1: continue
        train_rows.append(group.iloc[:n_train])
        val_rows.append(group.iloc[n_train:n_train + n_val])
        test_rows.append(group.iloc[n_train + n_val:])

    return pd.concat(train_rows), pd.concat(val_rows), pd.concat(test_rows)


### Step 5: Model Training with Early Stopping

This cell defines the `train_lightfm_with_early_stopping()` function, which trains a LightFM recommendation model using partial fitting and early stopping based on validation performance.

#### Training Procedure:
- The model is trained incrementally using `fit_partial()` for one epoch at a time.
- After each epoch, the model is evaluated on the validation set using `precision@10`.
- If validation performance improves, the current model is saved.
- If no improvement is observed for a predefined number of consecutive epochs (`patience`), training stops early to prevent overfitting.

#### Parameters:
- `train_matrix`: interaction matrix for training
- `val_matrix`: interaction matrix for validation
- `item_features`: additional item-level content features
- `params`: dictionary containing model hyperparameters (e.g., learning rate, number of components)
- `max_epochs`: maximum number of epochs allowed
- `patience`: number of epochs to wait before stopping if no improvement

#### Why this step is important:
- Early stopping improves generalization by avoiding overfitting to the training data.
- Training in small increments allows precise monitoring of validation performance.
- Saving the best model ensures that subsequent evaluations are based on the most effective configuration.

This function reflects our effort to make training both efficient and robust. It directly supports the model selection process discussed in Section 2 of the final report.


In [8]:
def train_lightfm_with_early_stopping(train_matrix, val_matrix, item_features, params, max_epochs=30, patience=5):
    model = LightFM(**params)
    best_model = None
    best_score = -np.inf
    wait = 0

    for epoch in range(1, max_epochs + 1):
        model.fit_partial(train_matrix, item_features=item_features, epochs=1, num_threads=2)
        val_score = precision_at_k(model, val_matrix, item_features=item_features, k=10).mean()
        print(f"Epoch {epoch}: val precision@10 = {val_score:.4f}")

        if val_score > best_score + 1e-4:
            best_score = val_score
            best_model = LightFM(**params)
            best_model.__dict__ = model.__dict__.copy()
            wait = 0
        else:
            wait += 1
            if wait >= patience:
                print(f"Early stopping at epoch {epoch}")
                break

    return best_model, best_score

### Step 6: Evaluation Metrics for Top-K Recommendation

This cell defines the `evaluate_topk()` function, which evaluates a trained recommendation model using several widely accepted metrics for top-k recommendation tasks.

#### Evaluation Procedure:
For each user in the test set:
- Predict scores for all items.
- Remove items the user has already interacted with in the training set to avoid leakage.
- Rank items by predicted score and select the top-k items.
- Compare these top-k items to the actual test items to compute evaluation metrics.

#### Metrics Computed:
- **Hit Rate @k (HR@k)**: Proportion of users for whom at least one of the top-k recommended items was relevant.
- **Normalized Discounted Cumulative Gain @k (NDCG@k)**: Measures ranking quality, giving higher weight to items appearing earlier in the list.
- **Precision @k**: Fraction of recommended items in the top-k that are relevant.
- **Recall @k**: Fraction of relevant items that are recommended in the top-k.
- **Mean Average Precision (MAP)**: Mean of the average precision scores across all users.
- **Area Under the ROC Curve (AUC)**: Measures the model’s ability to rank positive items above negative ones.

#### Why this step is important:
- These metrics collectively provide a comprehensive view of recommendation quality from multiple perspectives (ranking, relevance, user coverage).
- All metrics are calculated in a user-wise manner and averaged, which reflects real-world recommendation performance.
- This evaluation method allows fair comparison of different model configurations, aligning directly with Section 2 (Models and Performance) in the report.

The results produced by this function are used to select the best model and to report final performance outcomes.


In [9]:
def evaluate_topk(model, train_interactions, test_interactions, item_features, k=10):
    train_csr = train_interactions.tocsr()
    test_csr = test_interactions.tocsr()
    num_users, num_items = test_csr.shape
    hit, ndcg, precision, recall, ap = [], [], [], [], []

    for user_id in range(num_users):
        test_items = test_csr[user_id].indices
        if len(test_items) == 0: continue

        scores = model.predict(user_id, np.arange(num_items), item_features=item_features)
        train_items = train_csr[user_id].indices
        scores[train_items] = -np.inf

        top_k_items = np.argsort(-scores)[:k]

        hit_count = np.isin(top_k_items, test_items).sum()
        hit.append(1 if hit_count > 0 else 0)
        precision.append(hit_count / k)
        recall.append(hit_count / len(test_items))

        dcg = sum((1.0 / np.log2(idx + 2)) for idx, item in enumerate(top_k_items) if item in test_items)
        idcg = sum((1.0 / np.log2(i + 2)) for i in range(min(len(test_items), k)))
        ndcg.append(dcg / idcg if idcg > 0 else 0)

        num_hits = 0.0
        sum_precisions = 0.0
        for idx, item in enumerate(top_k_items):
            if item in test_items:
                num_hits += 1
                sum_precisions += num_hits / (idx + 1)
        ap.append(sum_precisions / len(test_items) if len(test_items) > 0 else 0)

    auc = auc_score(model, test_interactions, item_features=item_features, num_threads=2).mean()

    return {
        f'HR@{k}': np.mean(hit),
        f'NDCG@{k}': np.mean(ndcg),
        f'Precision@{k}': np.mean(precision),
        f'Recall@{k}': np.mean(recall),
        'MAP': np.mean(ap),
        'AUC': auc
    }

### Step 7: Hyperparameter Grid Search and Model Selection

This cell defines the `simple_grid_search_and_run()` function, which performs an automated grid search over several LightFM hyperparameters and selects the best model based on validation performance.

#### Grid Search Setup:
The function searches over all combinations of the following parameters:
- `dim`: dimensionality of latent factors (e.g., 16, 32)
- `lr`: learning rate (e.g., 0.01, 0.03)
- `loss`: loss function (e.g., 'warp', 'bpr')
- `alpha`: regularization strength (e.g., 1e-6, 1e-5)

Each configuration is evaluated as follows:
1. Load data with the specified interaction thresholds.
2. Extract item features based on the binary flags in `config['feat']`.
3. Filter out datasets with excessive sparsity (`sparsity < 0.001`) to avoid unreliable training.
4. Train the model using early stopping based on validation precision@10.
5. Track and store the best-performing configuration and corresponding model.

#### Why this step is important:
- Grid search is essential to find a strong-performing configuration without overfitting.
- Skipping sparse configurations improves efficiency and stability.
- Selecting the model based on validation performance ensures that the final evaluation is meaningful and generalizable.

This function is central to the experimentation process described in Section 2 of the final report. It demonstrates our systematic and data-driven approach to model selection.


In [10]:
def simple_grid_search_and_run(config_base, dim_list, lr_list, loss_list, alpha_list):
    best_score = -np.inf
    best_config = None

    for dim, lr, loss, alpha in product(dim_list, lr_list, loss_list, alpha_list):
        config = config_base.copy()
        config.update({'dim': dim, 'lr': lr, 'loss': loss, 'alpha': alpha})

        try:
            print(f"\nTrying config: {config}")
            feat_flags = list(map(int, list(config['feat'])))
            extract_fn = make_feature_fn(*feat_flags)
            df, items = load_data(reviews_path, listings_path, config['min_user'], config['min_item'])
            feats = extract_fn(items)

            dset = Dataset()
            dset.fit(df['user_idx'], df['item_idx'], item_features=set(f for _, fl in feats for f in fl))

            full_matrix, _ = dset.build_interactions(list(df[['user_idx', 'item_idx', 'booked']].itertuples(index=False, name=None)))
            sparsity = full_matrix.nnz / (full_matrix.shape[0] * full_matrix.shape[1])
            print(f"Sparsity: {sparsity:.6f}")
            if sparsity < 0.001:
                print("Skipping due to high sparsity")
                continue

            train_df, val_df, test_df = time_based_split(df)
            train_i = list(train_df[['user_idx', 'item_idx', 'booked']].itertuples(index=False, name=None))
            val_i = list(val_df[['user_idx', 'item_idx', 'booked']].itertuples(index=False, name=None))
            test_i = list(test_df[['user_idx', 'item_idx', 'booked']].itertuples(index=False, name=None))

            train, _ = dset.build_interactions(train_i)
            val, _ = dset.build_interactions(val_i)
            test, _ = dset.build_interactions(test_i)
            feat_mat = dset.build_item_features(feats)

            params = {
                'no_components': config['dim'],
                'learning_rate': config['lr'],
                'loss': config['loss'],
                'item_alpha': config['alpha']
            }

            model, val_score = train_lightfm_with_early_stopping(train, val, feat_mat, params)
            print(f"Validation Precision@10: {val_score:.4f}")

            if val_score > best_score:
                best_score = val_score
                best_config = config
                best_model = model
                best_test = test
                best_feat_mat = feat_mat
                best_train = train

        except Exception as e:
            print(f"Skipped config due to error: {e}")
            continue

    print("\n===== BEST CONFIG =====")
    print(best_config)
    print("========================\n")

    # Evaluate best model
    results = evaluate_topk(best_model, best_train, best_test, item_features=best_feat_mat, k=10)
    print("Final Evaluation Results:")
    for k, v in results.items():
        print(f"{k}: {v:.4f}")

### Step 8: Execute Grid Search and Evaluate the Best Model

This final cell defines the baseline configuration and triggers the `simple_grid_search_and_run()` function, which executes the full pipeline:

1. Loads data.
2. Extracts features.
3. Trains and validates models across hyperparameter combinations.
4. Selects the best model based on validation precision@10.
5. Evaluates the best model on the test set using six different top-k metrics.

#### Configuration:
```python
config_base = {
    'feat': '001010',     # Use loc_cluster and review_scores_rating
    'min_user': 10,       # Minimum number of interactions per user
    'min_item': 20        # Minimum number of interactions per item
}


In [11]:
config_base = {
    'feat': '001010',
    'min_user': 10,
    'min_item': 20
}
simple_grid_search_and_run(
    config_base=config_base,
    dim_list=[16, 32],
    lr_list=[0.01, 0.03],
    loss_list=['warp', 'bpr'],
    alpha_list=[1e-6, 1e-5]
)



Trying config: {'feat': '001010', 'min_user': 10, 'min_item': 20, 'dim': 16, 'lr': 0.01, 'loss': 'warp', 'alpha': 1e-06}
Sparsity: 0.001059
Epoch 1: val precision@10 = 0.0005
Epoch 2: val precision@10 = 0.0010
Epoch 3: val precision@10 = 0.0017
Epoch 4: val precision@10 = 0.0023
Epoch 5: val precision@10 = 0.0032
Epoch 6: val precision@10 = 0.0043
Epoch 7: val precision@10 = 0.0055
Epoch 8: val precision@10 = 0.0057
Epoch 9: val precision@10 = 0.0065
Epoch 10: val precision@10 = 0.0073
Epoch 11: val precision@10 = 0.0082
Epoch 12: val precision@10 = 0.0088
Epoch 13: val precision@10 = 0.0093
Epoch 14: val precision@10 = 0.0096
Epoch 15: val precision@10 = 0.0102
Epoch 16: val precision@10 = 0.0107
Epoch 17: val precision@10 = 0.0111
Epoch 18: val precision@10 = 0.0116
Epoch 19: val precision@10 = 0.0120
Epoch 20: val precision@10 = 0.0122
Epoch 21: val precision@10 = 0.0126
Epoch 22: val precision@10 = 0.0129
Epoch 23: val precision@10 = 0.0133
Epoch 24: val precision@10 = 0.0134
Epoc