#Google Map Restaurants Recommendation

*Ron Wang, Yifan Geng, Hercy Shen*



The full code can be accessed: https://github.com/ronyw7/cs224w-proj. Our blog post is: https://medium.com/@yifangeng/95c9325f87fc.

# 1. Preparing the dataset

In this project, we utilize pre-trained representations to augment GNN abilities.
- We assume that `src/embeddings/data` already contains the image and text embeddings necessary for training.
- Please refer to `README.md` and the blogpost on how we prepared the embeddings using OpenAI's CLIP encoders.

We use the multi-modal Google Restaurants dataset, collected by [Yan et al. (2022)](https://cseweb.ucsd.edu/~jmcauley/datasets.html#google_restaurants).

In [None]:
import json
import pandas as pd

# download the dataset and place it under data
with open('data/filter_all_t.json', 'r') as file:
    data = json.load(file)

df = pd.DataFrame(data["train"])

print("Shape of training data:", df.shape)

Shape of training data: (87013, 6)


The original dataset ships with three splits. We will work with the training set as it contains enough datapoints for our purposes. We will split it into our own train, validation, and test sets.

Each row in the original dataset has `business_id`, `user_id`, `rating`, `review_text`, `pics`, and `history_reviews`. As mentioned, we have preprocessed and stored each piece of review text and picture that can be easily looked up by their unique IDs.

In [None]:
df.head()

Unnamed: 0,business_id,user_id,rating,review_text,pics,history_reviews
0,60567465d335d0abfb415b26,101074926318992653684,4,The tang of the tomato sauce is outstanding. A...,"[AF1QipM-2IRmvitARbcJr7deWfe5hyVBg_ArPMQSYvq0,...",[[101074926318992653684_6056272797d555cc6fb0d1...
1,6050fa9f5b4ccec8d5cae994,117065749986299237881,5,Chicken and waffles were really good!,[AF1QipMpfxIZUT_aymQ3qPGO-QgGYzxbtLZGmHufAp2s],[[117065749986299237881_605206f8d8c08f462b93e8...
2,604be10877e81aaed3cc9a1e,106700937793048450809,4,The appetizer of colossal shrimp was very good...,"[AF1QipMNnqM5X9sSyZ9pXRZ1jvrURHN9bZhGdzuEXoP8,...",[[106700937793048450809_6044300b27f39b7b5d1dbf...
3,60411e017cd8bf130362365a,101643045857250355161,5,The fish tacos here omg! The salad was great ...,"[AF1QipM-a6AGGp4Hgk5RD0gY5sDRp5kEfB1hZLvlRkft,...",[[101643045857250355161_604fbdd099686c10168c91...
4,604139dd7cd8bf1303624208,109802745326785766951,4,"Ribs are great, as are the mac and cheese, fri...",[AF1QipNVys4yq-5w_3EsDdHpSc9ZNb7Nl30Mfb6Y0Gup],[[109802745326785766951_60524fa9f09a4ffff042f9...


Let us map `user_id`s and `business_id`s to consecutive values so they are easier to work with.

In [None]:
def create_id_mappings(data):
  '''
  Args:
    data (pd.DataFrame)
  Returns:
    data (pd.DataFrame)
    user_id_map (dict)
    business_id_map (dict)
  '''
  # Create a dictionary mapping user_id to consecutive values [0,..., n]
  user_id_map = {idx: i for i, idx in enumerate(data["user_id"].unique())}
  # Create a dictionary mapping business_id to consecutive values [0,..., m]
  business_id_map = {idx: i for i, idx in enumerate(data["business_id"].unique())}

  # Extend the dataframe with new ids
  data["u_id"] = data["user_id"].map(user_id_map)
  data["b_id"] = data["business_id"].map(business_id_map)
  data["r_id"] = data.index

  return data, user_id_map, business_id_map

In [None]:
data, user_id_map, business_id_map = create_id_mappings(df)
num_users, num_businesses = len(user_id_map), len(business_id_map)
num_nodes = num_users + num_businesses

print("The number of users: ", num_users)
print("The number of businesses: ", num_businesses)
print("The number of nodes: ", num_nodes)


The number of users:  29596
The number of businesses:  27896
The number of nodes:  57492


### Generate feature embeddings for ndoes

We load image and text embeddings into memory. For extremely large datasets, it would be more efficient to utilize some vector store here.

Additionally, we have pre-computed two dictionaries: one maps each business to its associated image IDs, and the other maps each user to their associated text IDs. The idea is to later represent each business as the aggregation of its image embeddings and each user as the aggregation of their review text embeddings. There are many other ways to do this, and we encourage readers to explore additional schemes. The only caveat here is to use the `.pkl` file format instead of `.npz`, as the latter is much slower to load into memory.

In [None]:
import numpy as np
import pickle

# load pre-computed embeddings, the code for computing embeddings can be seen in the github repo
IMAGE_EMBEDDINGS_ = np.load('embeddings/data/embeddings_pics_train.pkl', allow_pickle=True)
TEXT_EMBEDDINGS_ = np.load('embeddings/data/embeddings_text_train.pkl', allow_pickle=True)


UNIQUE_BUSINESS_IDS = df['business_id'].unique()
UNIQUE_USER_IDS = df['user_id'].unique()

print("Number of businesses:", len(UNIQUE_BUSINESS_IDS))
print("Number of users:", len(UNIQUE_USER_IDS))

business_to_image_keys = pickle.load(open('data/business_image_mapping.pkl', 'rb'))
business_to_image_keys = {item['business_id']: item['image_keys'] for item in business_to_image_keys}

user_to_text_keys = pickle.load(open('data/user_text_mapping.pkl', 'rb'))
user_to_text_keys = {item['user_id']: [item['text_key']] for item in user_to_text_keys}

Number of businesses: 27896
Number of users: 29596


In [None]:
import torch

# create embeddings for business
business_features = torch.zeros((len(UNIQUE_BUSINESS_IDS), 768))
print(business_features.shape)

for i, biz_id in enumerate(UNIQUE_BUSINESS_IDS):
    # Process image embeddings in batches
    img_keys = business_to_image_keys.get(biz_id, None)
    if img_keys:
        # Convert all at once
        img_tensors = torch.from_numpy(np.stack([IMAGE_EMBEDDINGS_[k] for k in img_keys]))
        # Use mean aggregation of the image embeddings to represent a business
        img_embedding = img_tensors.mean(dim=0)
    else:
    # deal with cases where images cannot be found
        img_embedding = torch.zeros(768)
    img_embedding = torch.nan_to_num(img_embedding, nan=0.0, posinf=1e6, neginf=-1e6)
    business_features[i] = img_embedding

# create embeddings for users
user_features = torch.zeros((len(UNIQUE_USER_IDS), 768))
print(user_features.shape)
for i, user_id in enumerate(UNIQUE_USER_IDS):
    # Process text embeddings in batches
    text_key = user_to_text_keys.get(user_id, None)
    if text_key:
        text_tensors = torch.from_numpy(np.stack([TEXT_EMBEDDINGS_[k].mean(axis=0) for k in text_key]))
        # Use mean aggregation of the text embeddings to represent a user
        text_tensors = text_tensors.mean(axis=0)
    else:
    # deal with cases where texts cannot be found
        text_tensors = torch.zeros(768)
    text_tensors = torch.nan_to_num(text_tensors, nan=0.0, posinf=1e6, neginf=-1e6)
    user_features[i] = text_tensors

torch.Size([27896, 768])
torch.Size([29596, 768])


  text_tensors = torch.from_numpy(np.stack([TEXT_EMBEDDINGS_[k].mean(axis=0) for k in text_key]))
  ret = um.true_divide(


In [None]:
# check the sizes of the feature embeddings
node_features = torch.cat([user_features, business_features], dim=0)
print(node_features.shape)
rand_embeddings = torch.nn.init.xavier_uniform_(torch.empty((num_nodes, node_features.shape[1])))
print(rand_embeddings.shape)

torch.Size([57492, 768])
torch.Size([57492, 768])


### Augment the dataset

We inspected the dataset and found the original dataset is very sparse, meaning, every user is only connected to one or two restaurants. This makes it not suitable for a recommendation task. To fix it, we augment the dataset by sampling additional (user, restaurant) edges. For one in very 10 restaurant nodes, we sample `k` additional nearest neighbors based on the cosine similarities of their feature vectors.

In [None]:
import torch.nn.functional as F

def find_k_similar(business_features: torch.Tensor,
                  idx: int,
                  k: int = 5,
                  use_cosine: bool = True) -> tuple[torch.Tensor, torch.Tensor]:
    """
    Find k most similar tensors to tensor at index idx
    Returns: tuple of (indices, similarity_scores)
    """
    # Get the query tensor
    query = business_features[idx]

    if use_cosine:
        # Normalize the features (for cosine similarity)
        normalized_features = F.normalize(business_features, p=2, dim=1)
        normalized_query = F.normalize(query.unsqueeze(0), p=2, dim=1)

        # Compute cosine similarity
        similarities = torch.mm(normalized_query, normalized_features.T).squeeze()
    else:
        # Compute dot product
        similarities = torch.mm(query.unsqueeze(0), business_features.T).squeeze()

    # Get top k (excluding the query itself)
    similarities[idx] = -float('inf')  # Exclude self
    top_k_similarities, top_k_indices = torch.topk(similarities, k)

    return top_k_indices, top_k_similarities


In [None]:
from torch_sparse import SparseTensor

def generate_edge_index(data, user_id_map, business_id_map, business_features, k=50):
    '''
    Args:
        data              (pd.DataFrame)
        user_id_map       (dict)
        business_id_map   (dict)
        business_features (torch.tensor)
        k                 (int): number of similar businesses
    Returns:
        edge_index        (torch.tensor)
        edge_index_sparse (SparseTensor)
    '''
    # Use a set to store unique edges
    edge_set = set()
    print(len(data))
    for i in range(len(data)):
        uid = user_id_map[data["user_id"][i]]
        bid = business_id_map[data["business_id"][i]]

        # Add original edge
        edge_set.add((uid, bid + len(user_id_map)))

        # Find similar businesses and add edges
        if i % 10 == 0:
          indices, _ = find_k_similar(business_features, bid, k)
          for idx in indices:
              edge_set.add((uid, idx.item() + len(user_id_map)))

    # Convert set to edge_index format
    edge_index = torch.tensor([[edge[0] for edge in edge_set],
                             [edge[1] for edge in edge_set]])

    # Create sparse tensor
    num_nodes = len(user_id_map) + len(business_id_map)
    edge_index_sparse = SparseTensor(
        row=edge_index[0],
        col=edge_index[1],
        sparse_sizes=(num_nodes, num_nodes)
    )

    return edge_index, edge_index_sparse

We can now create `PyG.data.Data` objects and use the `RandomLinkSplit` method to split our graph into train, val, and test components. Because the sampling procedure takes some time, we have saved the results as pickle objects that we can conveniently reload.

In [None]:
from torch_geometric.data import Data
from torch_geometric.transforms import RandomLinkSplit

edge_index, edge_index_sparse = generate_edge_index(data, user_id_map, business_id_map, business_features)
graph_data = Data(edge_index = edge_index, num_nodes = num_nodes)

train_split, val_split, test_split = RandomLinkSplit(
    is_undirected=True,
    add_negative_train_samples=False,
    num_val=0.2, num_test=0.1
)(graph_data)

In [None]:
import pickle as pkl
train_split = pkl.load(open("data/train_split.pkl", "rb"))
val_split = pkl.load(open("data/val_split.pkl", "rb"))
test_split = pkl.load(open("data/test_split.pkl", "rb"))

print(type(train_split))

<class 'torch_geometric.data.data.Data'>


This completes all the data preprocessing we need to start training. The `torch_geometric.data.data.Data` provides message-passing edges as well as supervision edges that are accessible via the `edge_index` and `edge_label_index` attributes.

# 2. Creating a Custom LightGCN Recommender Model

The next step is to create a modded LightGCN model that supports using multi-modal features in training. We thought about different ways to this. One naive way is to initialize node embeddings with the pre-trained features. However, this means the model only uses this knowledge in initialization, and it gradually "disregards" the original information. We tested this empirically, and the result is no different from random (Xavier and He) initialization.

Our solution is to find a way to incorporate this knowledge during training. Specifically, in each forward pass, the model gets the associated embeddings of each node via the `self.get_embedding` API. Our key modification is to also let the model learn a trainable weight matrix that transforms the pre-trained embeddings and adds them to the original node embeddings. In code, this is:

```
if self.has_clip_features:
    x = self.feature_weight * x + (1 - self.feature_weight) * self.clip_features
```

This is effective for our dataset, and we intend to test on other benchmarks.

Our implementation is in `src/lgcn/model.py`.

In [None]:
# You can also our implementations directly
from lgcn.model import LightGCN
from lgcn.utils.negative_sampling import get_negative_samples
from lgcn.utils.recall import recall_at_k

In [None]:
"""
Model adapted from LightGCN source code:
https://pytorch-geometric.readthedocs.io/en/2.5.2/_modules/torch_geometric/nn/models/lightgcn.html#LightGCN
"""
from typing import Optional, Union

import torch
import torch.nn.functional as F
from torch import Tensor
from torch.nn import Parameter, Embedding, ModuleList
from torch.nn.modules.loss import _Loss

from torch_geometric.nn.conv import LGConv
from torch_geometric.typing import Adj, OptTensor
from torch_geometric.utils import is_sparse, to_edge_index

class LightGCN(torch.nn.Module):
    def __init__(
        self,
        num_nodes: int,
        embedding_dim: int,
        num_layers: int,
        alpha: Optional[Union[float, Tensor]] = None,
        has_clip_features = False,
        **kwargs,
    ):
        super().__init__()
        self.num_nodes = num_nodes
        self.embedding_dim = embedding_dim
        self.num_layers = num_layers
        self.has_clip_features = has_clip_features

        # add clip_features to the model
        if self.has_clip_features:
            self.clip_features = Embedding(num_nodes, embedding_dim)
            self.feature_weight = Parameter(torch.tensor(0.5))

        if alpha is None:
            alpha = 1. / (num_layers + 1)

        if isinstance(alpha, Tensor):
            assert alpha.size(0) == num_layers + 1
        else:
            alpha = torch.tensor([alpha] * (num_layers + 1))

        self.register_buffer('alpha', alpha)

        self.embedding = Embedding(num_nodes, embedding_dim)
        self.convs = ModuleList([LGConv(**kwargs) for _ in range(num_layers)])

        self.reset_parameters()

    def initialize_clip_features(self, data: torch.Tensor):
        r"""Initialize this model with pre-trained CLIP features."""
        if not self.has_clip_features:
            raise ValueError("Model was not initialized with CLIP features support")

        self.clip_features.weight.data.copy_(data)

    def initialize_node_embedding(self, data: torch.Tensor):
        self.embedding.weight.data.copy_(data)

    def reset_parameters(self):
        torch.nn.init.xavier_uniform_(self.embedding.weight)
        for conv in self.convs:
            conv.reset_parameters()

    def get_embedding(self, edge_index: Adj) -> Tensor:
        x = self.embedding.weight

        # create a weighted combination of clip features and learned embeddings
        if self.has_clip_features:
            x = self.feature_weight * x + (1 - self.feature_weight) * self.clip_features.weight
        out = x * self.alpha[0]

        for i in range(self.num_layers):
            x = self.convs[i](x, edge_index)
            out = out + x * self.alpha[i + 1]

        return out

    def forward(self, edge_index: Adj,
                edge_label_index: OptTensor = None) -> Tensor:
        if edge_label_index is None:
            if is_sparse(edge_index):
                edge_label_index, _ = to_edge_index(edge_index)
            else:
                edge_label_index = edge_index

        out = self.get_embedding(edge_index)

        out_src = out[edge_label_index[0]]
        out_dst = out[edge_label_index[1]]

        return (out_src * out_dst).sum(dim=-1)

    def predict_link(self, edge_index: Adj, edge_label_index: OptTensor = None,
                     prob: bool = False) -> Tensor:

        pred = self(edge_index, edge_label_index).sigmoid()
        return pred if prob else pred.round()


    def recommend(self, edge_index: Adj, src_index: OptTensor = None,
                  dst_index: OptTensor = None, k: int = 1) -> Tensor:
        out_src = out_dst = self.get_embedding(edge_index)

        if src_index is not None:
            out_src = out_src[src_index]

        if dst_index is not None:
            out_dst = out_dst[dst_index]

        pred = out_src @ out_dst.t()
        top_index = pred.topk(k, dim=-1).indices

        if dst_index is not None:  # Map local top-indices to original indices.
            top_index = dst_index[top_index.view(-1)].view(*top_index.size())

        return top_index


    def link_pred_loss(self, pred: Tensor, edge_label: Tensor,
                       **kwargs) -> Tensor:
        loss_fn = torch.nn.BCEWithLogitsLoss(**kwargs)
        return loss_fn(pred, edge_label.to(pred.dtype))


    def recommendation_loss(self, pos_edge_rank: Tensor, neg_edge_rank: Tensor,
                            node_id: Optional[Tensor] = None,
                            lambda_reg: float = 1e-4, **kwargs) -> Tensor:
        r"""Computes the model loss for a ranking objective via the Bayesian
        Personalized Ranking (BPR) loss."""
        loss_fn = BPRLoss(lambda_reg, **kwargs)
        emb = self.embedding.weight
        emb = emb if node_id is None else emb[node_id]
        return loss_fn(pos_edge_rank, neg_edge_rank, emb)

    def __repr__(self) -> str:
        return (f'{self.__class__.__name__}({self.num_nodes}, '
                f'{self.embedding_dim}, num_layers={self.num_layers})')


class BPRLoss(_Loss):
    r"""The Bayesian Personalized Ranking (BPR) loss.

    The BPR loss is a pairwise loss that encourages the prediction of an
    observed entry to be higher than its unobserved counterparts
    (see `here <https://arxiv.org/abs/2002.02126>`__).

    .. math::
        L_{\text{BPR}} = - \sum_{u=1}^{M} \sum_{i \in \mathcal{N}_u}
        \sum_{j \not\in \mathcal{N}_u} \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj})
        + \lambda \vert\vert \textbf{x}^{(0)} \vert\vert^2

    where :math:`lambda` controls the :math:`L_2` regularization strength.
    We compute the mean BPR loss for simplicity.

    Args:
        lambda_reg (float, optional): The :math:`L_2` regularization strength
            (default: 0).
        **kwargs (optional): Additional arguments of the underlying
            :class:`torch.nn.modules.loss._Loss` class.
    """
    __constants__ = ['lambda_reg']
    lambda_reg: float

    def __init__(self, lambda_reg: float = 0, **kwargs):
        super().__init__(None, None, "sum", **kwargs)
        self.lambda_reg = lambda_reg

    def forward(self, positives: Tensor, negatives: Tensor,
                parameters: Tensor = None) -> Tensor:
        r"""Compute the mean Bayesian Personalized Ranking (BPR) loss.

        .. note::

            The i-th entry in the :obj:`positives` vector and i-th entry
            in the :obj:`negatives` entry should correspond to the same
            entity (*.e.g*, user), as the BPR is a personalized ranking loss.

        Args:
            positives (Tensor): The vector of positive-pair rankings.
            negatives (Tensor): The vector of negative-pair rankings.
            parameters (Tensor, optional): The tensor of parameters which
                should be used for :math:`L_2` regularization
                (default: :obj:`None`).
        """
        log_prob = F.logsigmoid(positives - negatives).mean()

        regularization = 0
        if self.lambda_reg != 0:
            regularization = self.lambda_reg * parameters.norm(p=2).pow(2)
            regularization = regularization / positives.size(0)

        return -log_prob + regularization


Two additional functions we had to implement are `get_negative_samples`, which samples negative edges, and `recall_at_k`, which calculate the overlap between recommended edges and true positives.

Our initial implementation had serious performance issues. We would like to acknowledge that [this notebook](https://colab.research.google.com/drive/1DhPrtHLggaObSyKjyCQNw0Z-k8vzXZHh?usp=sharing) on building a recommender system for Spotify tracks has a very efficient implementation. We learned the approach and found that using tensor operations like `gather()` greatly helped boost efficiency.

In [None]:
def get_negative_samples(data, num_users, num_items):
    """
    Generate negative samples for user-business interactions
    Args:
        data: PyG Graph data object containing positive edges
        num_users: Total number of users
        num_items: Total number of items
    Returns:
        tuple: (negative edge indices, negative edge labels)
    """
    # Get positive interactions
    pos_users, pos_items = data.edge_label_index
    device = data.edge_label_index.device

    # Initialize interaction matrix
    interactions = torch.zeros(
        (num_users, num_items),
        device=device,
        dtype=torch.bool,
    )

    # Mark positive interactions
    business_indices = pos_items - num_users  # Adjust business indices
    interactions[pos_users, business_indices] = True

    # Find all possible negative interactions
    available_negatives = torch.where(~interactions.reshape(-1))[0]

    # Sample random negative interactions
    num_samples = pos_users.size(0)
    sampled_indices = available_negatives[
        torch.randint(
            0,
            available_negatives.size(0),
            size=(num_samples,),
            device=device
        )
    ]

    # Convert linear indices to user-business pairs
    sampled_users = sampled_indices // num_items
    sampled_items = (sampled_indices % num_items) + num_users

    # Create negative edge tensor
    neg_edges = torch.stack((sampled_users, sampled_items), dim=0)
    neg_labels = torch.zeros(neg_edges.shape[1], device=device)

    return neg_edges, neg_labels

def recall_at_k(data, model, num_users, k=500):
    """
    Calculate recall@k for recommendations.
    Returns average recall score across all users.

    Args:
        data: Graph data object containing edge indices and labels
        model: Neural network model with get_embedding method
    """
    model.eval()
    with torch.no_grad():
        # Get embeddings for users and items
        embeddings = model.get_embedding(data.edge_index)
        user_embeds, item_embeds = embeddings[:num_users], embeddings[num_users:]

        # Calculate similarities and initialize truth matrix
        similarities = torch.matmul(user_embeds, item_embeds.t())
        truth = torch.zeros_like(similarities, dtype=torch.bool)

        # Get training and supervision edge masks
        train_edges = data.edge_index[:, data.edge_index[0] < num_users]
        sup_edges = data.edge_label_index[:, data.edge_label_index[0] < num_users]

        # Mask out training edges from recommendations
        similarities[train_edges[0], train_edges[1] - num_users] = float('-inf')

        # Mark ground truth edges
        truth[sup_edges[0], sup_edges[1] - num_users] = True

        # Calculate recall
        topk_scores, topk_items = torch.topk(similarities, k, dim=1)
        hits = truth.gather(1, topk_items).sum(dim=1)

        # Calculate total relevant items per user
        relevants = torch.bincount(sup_edges[0], minlength=num_users)

        # Compute recall, handling users with no relevant items
        recalls = torch.where(
            relevants > 0,
            hits.float() / relevants.float(),
            torch.ones_like(relevants, dtype=torch.float)
        )

        return recalls.mean().item()

# 3. Training and Evaluation

We study the effects of multi-modal feature integration and learnable feature fusion, as described in model implementation.
We compare the performance of the augmented model with a baseline model, where we use Xavier initialization to generate node embeddings and do not incorporate the weight and feature matrices.

In [None]:
def train(model, optimizer, train_split, val_split, num_epochs=81, run = None):
  """
  Train the model using the given optimizer, train_split, num_epochs, and perform validation test on val_split

  Args:
    model: the model we want to train
    train_split: the training data
    val_split: the validation data
    run: wandb run object
  """
    for epoch in range(num_epochs):
        model.train()
        optimizer.zero_grad()

        # sample negative edges
        edge_index_negative, _ = get_negative_samples(train_split, num_users, num_businesses)

        out = model.get_embedding(train_split.edge_index)

        # perform gradient update using training data
        train_src = out[train_split.edge_label_index[0]]
        train_dst = out[train_split.edge_label_index[1]]
        pos_scores = (train_src * train_dst).sum(dim=-1)

        neg_src = out[edge_index_negative[0]]
        neg_dst = out[edge_index_negative[1]]
        neg_scores = (neg_src * neg_dst).sum(dim=-1)
        loss = model.recommendation_loss(pos_scores, neg_scores)

        loss.backward()
        optimizer.step()

        # calculate the loss on validation data
        edge_index_negative_val, _ = get_negative_samples(val_split, num_users, num_businesses)
        val_src = out[val_split.edge_label_index[0]]
        val_dst = out[val_split.edge_label_index[1]]
        pos_scores = (val_src * val_dst).sum(dim=-1)

        neg_src = out[edge_index_negative_val[0]]
        neg_dst = out[edge_index_negative_val[1]]
        neg_scores = (neg_src * neg_dst).sum(dim=-1)

        val_loss = model.recommendation_loss(pos_scores, neg_scores)

        # evaluate performance on validation every 10 epochs
        if epoch % 10 == 0:
            val_recall = recall_at_k(val_split, model, num_users)
            print(f"Epoch {epoch}, Train loss {loss}, Val loss {val_loss}, Val Recall@500 {val_recall}")
            if run:
                run.log({"epoch": epoch, "train/loss": loss, "val/loss": val_loss, "val/recall@500": val_recall})


In [None]:
# model initialization
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LightGCN(num_nodes, 768, num_layers = 3, has_clip_features = True).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.5, weight_decay=1e-5)

train_split.to(device)
val_split.to(device)

Data(edge_index=[2, 727582], num_nodes=57492, edge_label=[77954], edge_label_index=[2, 77954])

### LightGCN + CLIP embeddings

In [None]:
# initialize clip features
if model.has_clip_features:
    model.initialize_clip_features(node_features)

In [None]:
import wandb

# perform experiments using wandb
run = wandb.init(
    project="224w-project",
    config={'clip_features': model.has_clip_features},
    reinit=True,
    name='with_clip_features'
)

train(model, optimizer, train_split, val_split, num_epochs=81, run=run)

0,1
epoch,▁▂▂▃▃▄▅▅▆▆▇▇█
train/loss,█▂▁▁▂▂▂▁▁▁▂▃▂
val/loss,▆▂▂▂▄▆▄▁▁▂▅█▂
val/recall@500,▁▇▇▇▆▅▆▇██▆▅▆

0,1
epoch,120.0
train/loss,0.08344
val/loss,0.19308
val/recall@500,0.67333


Epoch 0, Train loss 0.7028226852416992, Val loss 1.0535435676574707, Val Recall@500 0.617391049861908
Epoch 10, Train loss 0.20973829925060272, Val loss 0.3511536121368408, Val Recall@500 0.5997821688652039
Epoch 20, Train loss 0.17751699686050415, Val loss 0.22771747410297394, Val Recall@500 0.6284925937652588
Epoch 30, Train loss 0.11669503152370453, Val loss 0.19768154621124268, Val Recall@500 0.6381189823150635
Epoch 40, Train loss 0.08847761154174805, Val loss 0.16044510900974274, Val Recall@500 0.7018598318099976
Epoch 50, Train loss 0.06255592405796051, Val loss 0.14117825031280518, Val Recall@500 0.7100704908370972
Epoch 60, Train loss 0.0509820431470871, Val loss 0.1321713626384735, Val Recall@500 0.7304561138153076
Epoch 70, Train loss 0.041059356182813644, Val loss 0.13186897337436676, Val Recall@500 0.730961263179779
Epoch 80, Train loss 0.0361776165664196, Val loss 0.13592469692230225, Val Recall@500 0.7276843190193176


In [None]:
# evaluate on test split

test_split.to(device)
test_recall = recall_at_k(test_split, model, num_users)
test_recall

0.731330156326294

### Baseline LightGCN

Let us compare this with the baseline result:

In [None]:
# model initialization
HAS_CLIP_FEATURES = False
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LightGCN(num_nodes, 768, num_layers = 3, has_clip_features = HAS_CLIP_FEATURES).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.5, weight_decay=1e-5)

train_split.to(device)
val_split.to(device)

Data(edge_index=[2, 727582], num_nodes=57492, edge_label=[77954], edge_label_index=[2, 77954])

In [None]:
# train using wandb
run = wandb.init(
    project="224w-project",
    config={'clip_features': model.has_clip_features},
    reinit=True,
    name='with_xavier_init'
)

train(model, optimizer, train_split, val_split, num_epochs=81, run = run)

Epoch 0, Train loss 0.6929919719696045, Val loss 0.7015377879142761, Val Recall@500 0.5252739191055298
Epoch 10, Train loss 0.22964796423912048, Val loss 0.2945592999458313, Val Recall@500 0.6358852982521057
Epoch 20, Train loss 0.27541419863700867, Val loss 0.32218989729881287, Val Recall@500 0.5855546593666077
Epoch 30, Train loss 0.2691667377948761, Val loss 0.2984590530395508, Val Recall@500 0.5783751010894775
Epoch 40, Train loss 0.25886270403862, Val loss 0.2817089259624481, Val Recall@500 0.5817865133285522
Epoch 50, Train loss 0.26367825269699097, Val loss 0.2834400236606598, Val Recall@500 0.5841618180274963
Epoch 60, Train loss 0.26217812299728394, Val loss 0.2826944589614868, Val Recall@500 0.5819141864776611
Epoch 70, Train loss 0.26327621936798096, Val loss 0.2827521562576294, Val Recall@500 0.5821539163589478
Epoch 80, Train loss 0.262026846408844, Val loss 0.2822498679161072, Val Recall@500 0.5828391313552856


# Dimension Reduction

Additionally, we have explored whether _compressed_ feature vectors can still boost model performance. We studied two schemes, PCA and auto-encoders. We found that even when compressed, feature vectors and learnable feature fusion still jointly improve model performance over the base version. The downside is Recall@K, our key metric, seems to be more unstable.

### PCA

In [None]:
from sklearn.decomposition import PCA

# perform dimension reduction on features using PCA
N_DIM = 8
pca = PCA(n_components=N_DIM)
compressed_features = pca.fit_transform(node_features.numpy())
compressed_features = torch.tensor(compressed_features)
print(compressed_features.shape)

torch.Size([57492, 8])


In [None]:
# model initialization for training with compressed embeddings

run = wandb.init(
    project="224w-project",
    config={'clip_features': model.has_clip_features},
    reinit=True,
    name=f'with_clip_features{N_DIM}d'
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LightGCN(num_nodes, N_DIM, num_layers = 3, has_clip_features = True).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.5, weight_decay=1e-5)

train_split.to(device)
val_split.to(device)

if model.has_clip_features:
    model.initialize_clip_features(compressed_features)
    print("Initialized node features")

train(model, optimizer, train_split, val_split, num_epochs=121, run=run)

0,1
epoch,▁▂▂▃▃▄▅▅▆▆▇▇█
train/loss,▆▂▁▁▃█▄▁▂▄▂▂▁
val/loss,▂▁▁▁▃█▂▁▂▄▁▁▁
val/recall@500,▄▄▇█▅▄▁▅▅▄▅▆▇

0,1
epoch,120.0
train/loss,0.0373
val/loss,0.14707
val/recall@500,0.72104


Initialized node features
Epoch 0, Train loss 0.35747501254081726, Val loss 0.39174678921699524, Val Recall@500 0.6349913477897644
Epoch 10, Train loss 0.1674979031085968, Val loss 0.2161165475845337, Val Recall@500 0.6307083964347839
Epoch 20, Train loss 0.08177262544631958, Val loss 0.16880294680595398, Val Recall@500 0.6855728626251221
Epoch 30, Train loss 0.05585501343011856, Val loss 0.16720707714557648, Val Recall@500 0.6998448967933655
Epoch 40, Train loss 0.07032384723424911, Val loss 0.22334684431552887, Val Recall@500 0.6745778918266296
Epoch 50, Train loss 0.1065654531121254, Val loss 0.36214086413383484, Val Recall@500 0.6568030118942261
Epoch 60, Train loss 0.11154107749462128, Val loss 0.326681911945343, Val Recall@500 0.6498600244522095
Epoch 70, Train loss 0.0769653245806694, Val loss 0.2024274319410324, Val Recall@500 0.6701334714889526
Epoch 80, Train loss 0.06354409456253052, Val loss 0.16505947709083557, Val Recall@500 0.6929592490196228
Epoch 90, Train loss 0.05280

### Auto-Encoders

In [None]:
import torch.nn as nn

# Implement an autoencoder class
N_DIM = 8
class Autoencoder(nn.Module):
    def __init__(self, input_dim=768, hidden_dim=N_DIM):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 512),
            nn.ReLU(),
            nn.Linear(512, hidden_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(hidden_dim, 512),
            nn.ReLU(),
            nn.Linear(512, input_dim)
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

    def encode(self, x):
        return self.encoder(x)

# Train autoencoder
autoencoder = Autoencoder().to(device)
optimizer = torch.optim.Adam(autoencoder.parameters())
criterion = nn.MSELoss()

all_features = torch.cat([business_features, user_features], dim=0).to(device)

for epoch in range(101):
    output = autoencoder(all_features)
    loss = criterion(output, all_features)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

compressed_features = autoencoder.encode(all_features)
compressed_business = compressed_features[:len(business_features)]
compressed_users = compressed_features[len(business_features):]
compressed_features = torch.cat([compressed_users, compressed_business], dim=0)

Epoch 0, Loss: 0.3262
Epoch 10, Loss: 0.2006
Epoch 20, Loss: 0.1066
Epoch 30, Loss: 0.0778
Epoch 40, Loss: 0.0693
Epoch 50, Loss: 0.0648
Epoch 60, Loss: 0.0615
Epoch 70, Loss: 0.0594
Epoch 80, Loss: 0.0579
Epoch 90, Loss: 0.0568
Epoch 100, Loss: 0.0559


In [None]:
# model initialization

run = wandb.init(
    project="224w-project",
    config={'clip_features': model.has_clip_features},
    reinit=True,
    name=f'with_clip_features_{N_DIM}d_autoencoder'
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LightGCN(num_nodes, N_DIM, num_layers = 3, has_clip_features = True).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.5, weight_decay=1e-5)

train_split.to(device)
val_split.to(device)

if model.has_clip_features:
    model.initialize_clip_features(compressed_features)
    print("Initialized node features")

train(model, optimizer, train_split, val_split, num_epochs=121, run=run)

0,1
epoch,▁▂▂▃▃▄▅▅▆▆▇▇█
train/loss,█▃▂▂▁▁▁▁▁▁▁▅▄
val/loss,▇▂▁▁▁▁▁▁▁▁▁█▅
val/recall@500,▁▁▃▆▇▇████▇▆▄

0,1
epoch,120.0
train/loss,0.42232
val/loss,0.98112
val/recall@500,0.65947


Initialized node features
Epoch 0, Train loss 0.2952347993850708, Val loss 0.3198264241218567, Val Recall@500 0.5913599729537964
Epoch 10, Train loss 0.07470818608999252, Val loss 0.1982196569442749, Val Recall@500 0.676962673664093
Epoch 20, Train loss 0.06170197203755379, Val loss 0.1843721568584442, Val Recall@500 0.6863926649093628
Epoch 30, Train loss 0.05988023430109024, Val loss 0.20966924726963043, Val Recall@500 0.6835346221923828
Epoch 40, Train loss 0.07343178987503052, Val loss 0.25717443227767944, Val Recall@500 0.6688758730888367
Epoch 50, Train loss 0.09906797111034393, Val loss 0.31763756275177, Val Recall@500 0.656664252281189
Epoch 60, Train loss 0.08913768082857132, Val loss 0.24978135526180267, Val Recall@500 0.6618995070457458
Epoch 70, Train loss 0.06840551644563675, Val loss 0.17765934765338898, Val Recall@500 0.6879534721374512
Epoch 80, Train loss 0.05582723394036293, Val loss 0.16374439001083374, Val Recall@500 0.6975999474525452
Epoch 90, Train loss 0.0535940