# Recommender with all attributes

## Introduction

This notebook presents a content-based two-tower recommender system that leverages multiple user and item attributes, including both traditional numerical features and text-based information. The recommender utilizes embeddings derived from a pre-trained BERT model to capture semantic relationships from user professions, item descriptions, and item features. These embeddings are combined with numerical data, such as user age and item price, to form a comprehensive feature set.

The goal of this project is to explore how incorporating rich, high-dimensional features impacts the performance of a recommender system, and to evaluate the model's effectiveness using a holdout dataset of actual user-item interactions. By comparing this complex approach with a simpler model that uses fewer attributes, we aim to understand the trade-offs in terms of model complexity, data quality, and recommendation accuracy.

## Installations and Imports

In [None]:
pip install tensorflow




In [None]:
pip install transformers torch



In [None]:
import torch
from transformers import BertTokenizer, BertModel
import numpy as np
import pandas as pd


In [None]:
# Initialize the pre-trained BERT model and tokenizer
model_name = "bert-base-uncased"  # You can choose other models like "bert-large-uncased" or domain-specific models
tokenizer = BertTokenizer.from_pretrained(model_name)
bert_model = BertModel.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## Data Loading and Preparation

In this section, we load the necessary datasets, including user profiles, item details, and the user-item interaction matrix. Each entry in the matrix indicates whether a user has positively interacted with a specific item.

To prepare the data for model training, we first transform the matrix into a long-form user-item interaction table. We filter for positive interactions and randomly shuffle them to eliminate ordering bias. The dataset is then split into an 80% training set and a 20% holdout set.

The holdout set plays a crucial role in the later evaluation phase, allowing us to test how well the recommender can predict actual user-item interactions it has not seen during training. This helps ensure a realistic assessment of the model's recommendation performance.

In [None]:
users = pd.read_csv('users.csv')


items = pd.read_csv('items.csv')
ratings_df = pd.read_csv('user_product_rating_matrix.csv')

# Create user-item pairs for training based on ratings_df
interactions_df = ratings_df.melt(id_vars=['user_id'], var_name='item_id', value_name='label')
interactions_positve=interactions_df[interactions_df['label'] == 1]

In [None]:
from sklearn.model_selection import train_test_split

# Shuffle and split interactions (assuming all are positive)
interactions_shuffled = interactions_positve.sample(frac=1, random_state=42).reset_index(drop=True)

# Split into 80% train, 20% holdout
interactions_train, interactions_holdout = train_test_split(
    interactions_shuffled,
    test_size=0.2,
    random_state=42
)

# Optional: reset indices
interactions = interactions_train.reset_index(drop=True)
interactions_holdout = interactions_holdout.reset_index(drop=True)

In [None]:
print(users.shape)
print(items.shape)
print(interactions.shape)
print(interactions_holdout.shape)

(10000, 4)
(1000, 5)
(773028, 3)
(193258, 3)


## Negative Sampling for Training Balance and Robustness

To build a more robust and realistic recommender, we incorporate negative sampling into the training data. This process helps the model learn to distinguish between items a user is likely to interact with and those they are not.

We begin by filtering out the user-item pairs with no positive interaction (label = 0), representing implicit negative feedback. These are shuffled and, for each user, we sample up to 100 negative interactions to avoid overwhelming the model with negatives.

We then ensure all positive interactions are explicitly labeled and concatenate them with the sampled negatives to form a balanced training dataset. This enriched interaction set allows the model to better learn the contrast between relevant and irrelevant items during training.

In [None]:
import pandas as pd
from tqdm import tqdm

# Step 1: Filter to only negative samples
negatives = interactions_df[interactions_df['label'] == 0]

# Step 2: Shuffle the negatives globally
negatives = negatives.sample(frac=1, random_state=42).reset_index(drop=True)

# Step 3: For each user, take the first 100 negatives
neg_samples = (
    negatives.groupby('user_id', group_keys=False)
    .head(100)
)

# Step 4: Mark original interactions (positives) with label = 1, if not already
interactions['label'] = 1

# Step 5: Combine positives and sampled negatives
enhanced_interactions = pd.concat([interactions, neg_samples], ignore_index=True)
interactions=enhanced_interactions

In [None]:
# new shape with also 100 extra non interactions per user
interactions.shape

(1773028, 3)

## Preprocessing with Semantic and Numerical Feature Embeddings

In this section, we enrich our recommender system by incorporating both semantic and numerical features through advanced preprocessing. We leverage pre-trained BERT models to transform text-based attributes—such as user professions, item descriptions, and item features—into dense vector representations that capture semantic meaning.

To ensure feature values are on a similar scale, we also apply Min-Max scaling to numerical attributes like user age and item price. These scaled features are then concatenated with their corresponding BERT embeddings to form comprehensive feature vectors for users and items.

User and item IDs are mapped to indices to facilitate easy access during model training. Finally, the interaction dataset is shuffled and a subset of 200,000 samples is selected to train the model, ensuring a balanced and computationally efficient training process. This robust preprocessing pipeline enables the model to capture deeper contextual and behavioral patterns beyond simple IDs or categorical features.

In [None]:
import pandas as pd
import numpy as np
from transformers import BertTokenizer, BertModel
import torch
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Concatenate
from sklearn.preprocessing import MinMaxScaler

# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = BertModel.from_pretrained('bert-base-uncased')

# Function to get BERT embeddings for a given list of texts
def get_bert_embeddings_200(texts):
    # Tokenize input texts and convert to tensor
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=200)
    with torch.no_grad():
        outputs = bert_model(**inputs)
    # Use the mean of the last layer hidden states for the embeddings
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Mean pooling
    return embeddings.numpy()

# Function to get BERT embeddings for a given list of texts
def get_bert_embeddings_100(texts):
    # Tokenize input texts and convert to tensor
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=100)
    with torch.no_grad():
        outputs = bert_model(**inputs)
    # Use the mean of the last layer hidden states for the embeddings
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Mean pooling
    return embeddings.numpy()

# Function to get BERT embeddings for a given list of texts
def get_bert_embeddings_50(texts):
    # Tokenize input texts and convert to tensor
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=50)
    with torch.no_grad():
        outputs = bert_model(**inputs)
    # Use the mean of the last layer hidden states for the embeddings
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Mean pooling
    return embeddings.numpy()


In [None]:
user_age = users[['age']].values
item_price = items[['price']].values

age_scaler = MinMaxScaler()
price_scaler = MinMaxScaler()
user_age_scaled = age_scaler.fit_transform(user_age)
item_price_scaled = price_scaler.fit_transform(item_price)

# Get BERT embeddings for user profession, item description, and item features
user_profession_embeddings = get_bert_embeddings_50(users['profession'].tolist())
item_description_embeddings = get_bert_embeddings_200(items['description'].tolist())
item_features_embeddings = get_bert_embeddings_100(items['features'].tolist())

# # Combine the embeddings into one feature vector per user and item
# user_embeddings = user_profession_embeddings
# item_embeddings = np.concatenate([item_description_embeddings, item_features_embeddings], axis=1)

# # Combine the embeddings into one feature vector per user and item
user_embeddings = np.concatenate([user_profession_embeddings, user_age_scaled], axis=1)
item_embeddings = np.concatenate([item_description_embeddings, item_features_embeddings, item_price_scaled], axis=1)

# Convert user_id and item_id to indices for model input
user_id_to_index = {user_id: idx for idx, user_id in enumerate(users['user_id'])}
item_id_to_index = {item_id: idx for idx, item_id in enumerate(items['item_id'])}


In [None]:
interactions=interactions.sample(frac=1, random_state=42).reset_index(drop=True)
interactions_trainable=interactions.head(200000)

In [None]:
# Prepare the training data (user and item embeddings from interactions)
X_user = np.array([user_embeddings[user_id_to_index[user]] for user in interactions_trainable['user_id']])
X_item = np.array([item_embeddings[item_id_to_index[item]] for item in interactions_trainable['item_id']])
y = interactions['label'].values

## Two-Tower Neural Recommender with Rich Feature Representations

In this section, we construct a two-tower neural recommender architecture designed to learn meaningful representations of users and items based on their embedded features. Each tower processes one side of the interaction: the user tower takes the BERT-encoded profession and scaled age, while the item tower receives BERT embeddings for item descriptions and features, along with scaled price.

Both user and item inputs are passed through dense layers to extract high-level representations. These representations are then concatenated and fed into a final dense layer with a sigmoid activation to predict the probability of an interaction between a user and an item.

The model is compiled with the Adam optimizer and binary cross-entropy loss, and trained for 10 epochs using a 90/10 train-validation split. This structure allows the model to leverage semantic and numerical information in a unified framework, improving its ability to generalize user preferences and item characteristics.

In [None]:
# Build the 2-tower recommender model
user_input = Input(shape=(user_embeddings.shape[1],), name='user_input')
item_input = Input(shape=(item_embeddings.shape[1],), name='item_input')

user_dense = Dense(64, activation='relu')(user_input)
item_dense = Dense(64, activation='relu')(item_input)

# Concatenate user and item dense layers
concat = Concatenate()([user_dense, item_dense])

# Output layer (predict interaction)
output = Dense(1, activation='sigmoid')(concat)

# Create the model
model = Model(inputs=[user_input, item_input], outputs=output)

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
len(X_user)

200000

In [None]:
# Train the model
model.fit([X_user, X_item], y, epochs=10, batch_size=32, validation_split=0.1)

Epoch 1/10
[1m5625/5625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 5ms/step - accuracy: 0.6168 - loss: 0.6560 - val_accuracy: 0.6218 - val_loss: 0.6478
Epoch 2/10
[1m5625/5625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 5ms/step - accuracy: 0.6450 - loss: 0.6384 - val_accuracy: 0.6561 - val_loss: 0.6336
Epoch 3/10
[1m5625/5625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 5ms/step - accuracy: 0.6487 - loss: 0.6376 - val_accuracy: 0.6414 - val_loss: 0.6389
Epoch 4/10
[1m5625/5625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 5ms/step - accuracy: 0.6497 - loss: 0.6366 - val_accuracy: 0.6586 - val_loss: 0.6323
Epoch 5/10
[1m5625/5625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 4ms/step - accuracy: 0.6487 - loss: 0.6370 - val_accuracy: 0.6564 - val_loss: 0.6326
Epoch 6/10
[1m5625/5625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 5ms/step - accuracy: 0.6518 - loss: 0.6345 - val_accuracy: 0.6572 - val_loss: 0.6305
Epoch 7/10

<keras.src.callbacks.history.History at 0x788abe61c290>

## Personalized Recommendation Function

In this section, we define a function that generates personalized item recommendations for a given user, leveraging the feature-rich embeddings produced during preprocessing.

For a specified user, we retrieve their corresponding embedding (a combination of profession and scaled age) and identify items they haven't interacted with yet. We then generate predictions by feeding the user embedding alongside each candidate item embedding through the trained model.

The function returns the top-N highest-scoring items as recommendations, along with useful metadata such as item description, features, price, and the predicted interaction score. This allows for interpretable and high-quality personalized recommendations that go beyond simple collaborative filtering by incorporating semantic understanding from textual attributes.

In [None]:
def recommend_for_user(user_id, top_n=3):
    # Ensure user exists
    if user_id not in user_id_to_index:
        raise ValueError(f"User ID {user_id} not found.")

    # Get user row and index
    user_row = users[users['user_id'] == user_id].iloc[0]
    user_idx = user_id_to_index[user_id]

    # Get user embedding (just the profession in our case)
    user_embedding = user_embeddings[user_idx]

    # Items user has already interacted with
    interacted_items = interactions[interactions['user_id'] == user_id]['item_id'].values
    candidate_items = items[~items['item_id'].isin(interacted_items)]

    if candidate_items.empty:
        return f"No unseen items left to recommend for user {user_id}."

    # Get embeddings for candidate items
    item_idxs = [item_id_to_index[iid] for iid in candidate_items['item_id']]
    candidate_item_embeddings = item_embeddings[item_idxs]

    # Repeat user embedding to match candidate items
    repeated_user_embeddings = np.tile(user_embedding, (len(candidate_item_embeddings), 1))

    # Predict interaction scores
    scores = model.predict([repeated_user_embeddings, candidate_item_embeddings], verbose=0).flatten()

    # Get top N indices
    top_n_indices = scores.argsort()[-top_n:][::-1]
    top_scores = scores[top_n_indices]
    top_items = candidate_items.iloc[top_n_indices]

    # Format output
    recommendations = []
    for idx, row in top_items.iterrows():
        recommendations.append({
            'user_id': user_row['user_id'],
            'user_age': int(user_row['age']),
            'user_profession': user_row['profession'],
            'item_id': row['item_id'],
            'item_price': int(row['price']),
            'item_description': row['description'],
            'item_features': row['features'],
            'score': float(top_scores[list(top_items.index).index(idx)])
        })

    return recommendations


In [None]:
top_recs = recommend_for_user("tec_0001", top_n=3)
for rec in top_recs:
    print(rec)


{'user_id': 'tec_0001', 'user_age': 22, 'user_profession': 'science', 'item_id': 'item_0475', 'item_price': 1513, 'item_description': 'With high refresh rate display and AI-powered processor, this laptop is built for performance. Perfect for remote workers, it delivers top-tier functionality.', 'item_features': 'Fast charging, USB-C connectivity, Expandable storage, Bluetooth 5.0', 'score': 0.7355149984359741}
{'user_id': 'tec_0001', 'user_age': 22, 'user_profession': 'science', 'item_id': 'item_0129', 'item_price': 699, 'item_description': 'Designed for remote workers, this laptop includes AI-powered processor and high refresh rate display. It’s built for performance and built for high efficiency.', 'item_features': 'Expandable storage, USB-C connectivity, Bluetooth 5.0', 'score': 0.734379231929779}
{'user_id': 'tec_0001', 'user_age': 22, 'user_profession': 'science', 'item_id': 'item_0705', 'item_price': 305, 'item_description': 'This laptop features advanced cooling system and high 

## Generating Representative Recommendations Across User Personalities

To better understand how our recommender model performs across diverse user profiles, this section generates recommendations for a representative sample of users from each personality type.

For every unique personality in the dataset, a small number of users are randomly selected, and personalized top-N item recommendations are generated for each. This allows us to visually inspect the diversity, relevance, and alignment of recommendations with different user personalities and interests.

This step provides valuable qualitative insights into how well the model tailors suggestions based on both behavioral data and enriched user attributes, such as profession and personality traits.

In [None]:
import random
from pprint import pprint

def recommend_for_representative_users(users_df, top_n=5, users_per_personality=3):
    personalities = users_df['personality'].unique()

    for personality in personalities:
        print(f"\n Personality: {personality}")
        personality_users = users_df[users_df['personality'] == personality]['user_id'].tolist()

        if len(personality_users) < users_per_personality:
            selected_users = personality_users
        else:
            selected_users = random.sample(personality_users, users_per_personality)

        for user_id in selected_users:
            print(f"\n Recommendations for user: {user_id}")
            recs = recommend_for_user(user_id, top_n=top_n)
            pprint(recs)


In [None]:
# print recommedations for 3 users per personality
recommend_for_representative_users(users, top_n=5, users_per_personality=3)


 Personality: Tech Enthusiast

 Recommendations for user: tec_0511
[{'item_description': 'Designed for gamers, this laptop includes advanced '
                      'cooling system and high refresh rate display. It’s '
                      'built for performance and built for high efficiency.',
  'item_features': 'Expandable storage, Fast charging, Bluetooth 5.0',
  'item_id': 'item_0503',
  'item_price': 330,
  'score': 0.7638413310050964,
  'user_age': 18,
  'user_id': 'tec_0511',
  'user_profession': 'science'},
 {'item_description': 'With high refresh rate display and AI-powered '
                      'processor, this laptop is built for performance. '
                      'Perfect for remote workers, it delivers top-tier '
                      'functionality.',
  'item_features': 'Fast charging, USB-C connectivity, Expandable storage, '
                   'Bluetooth 5.0',
  'item_id': 'item_0475',
  'item_price': 1513,
  'score': 0.7355149984359741,
  'user_age': 18,
  'user_

## Model Evaluation Using Holdout Data

To quantitatively assess the performance of our enhanced recommender system, we evaluate how confidently the model predicts interactions that actually occurred in a held-out validation dataset—a portion of data that was not seen during training.

In this evaluation:

We select one real positive interaction per user from the holdout set.

For each user-item pair, we compute the predicted interaction score using the trained model.

These scores represent the model’s confidence in recommending each item to the corresponding user.

The final metric is the average predicted score across all sampled interactions. A higher average score indicates the model is effectively recognizing and prioritizing items that users are truly interested in.

The score achieved in this evaluation was 44.5010%, indicating that on average, the model assigns a moderate level of confidence to real user-item interactions from the holdout dataset—interactions it had no prior exposure to during training.

In [None]:
def get_score(user_id, item_id):
    # Check if IDs exist
    if user_id not in user_id_to_index or item_id not in item_id_to_index:
        raise ValueError("User ID or Item ID not found in the dataset.")

    # Get user and item indices
    user_idx = user_id_to_index[user_id]
    item_idx = item_id_to_index[item_id]

    # Get the corresponding user and item embedding
    user_vector = user_embeddings[user_idx].reshape(1, -1)
    item_vector = item_embeddings[item_idx].reshape(1, -1)

    # Predict using the trained model
    score = model.predict([user_vector, item_vector], verbose=0)[0][0]
    return score


In [None]:
score = get_score("tec_0001", "item_0042")
print(f"Predicted interaction score: {score:.4f}")


Predicted interaction score: 0.3912


In [None]:
from tqdm import tqdm

def evaluate_holdout_one_per_user(holdout_df):
    # Pick one random interaction per user
    sampled = holdout_df.groupby('user_id').sample(n=1, random_state=42).reset_index(drop=True)

    scores = []
    for _, row in tqdm(sampled.iterrows(), total=len(sampled), desc="Scoring one interaction per user"):
        user_id = row['user_id']
        item_id = row['item_id']
        score = get_score(user_id, item_id)
        scores.append(score)

    avg_score = sum(scores) / len(scores)
    #print(f"\n Average predicted score (1 interaction per user): {avg_score:.4f}")
    return avg_score


In [None]:
avg_one_per_user = evaluate_holdout_one_per_user(interactions_holdout)

Scoring one interaction per user: 100%|██████████| 10000/10000 [19:20<00:00,  8.61it/s]


In [None]:
print(f"\n Average recommendation score for actual interactions "
      f"(1 random real interaction per user from holdout): {(avg_one_per_user * 100):.4f}%")


 Average recommendation score for actual interactions (1 random real interaction per user from holdout): 44.5010%


## Overall Conclusion

In this project, we developed and evaluated a content-based two-tower recommender system that incorporates multiple user and item attributes, including text-based features like user professions, item descriptions, and item features. These features were embedded using a pre-trained BERT model, which allows us to capture deeper semantic information. However, despite the advanced design of the model, the results showed that it achieved an average recommendation score of 44% when evaluated on a holdout dataset of real user-item interactions.

In comparison, a simpler recommender model from an earlier notebook, which relied on a smaller set of attributes (such as user age and item price), outperformed the more complex model with an average recommendation score of 68%.

**Possible Reasons for the Performance Gap:**

* **Synthetic or Noisy Data:** The dataset used for this project was synthetically generated or augmented. As a result, the text-based features (like item descriptions and user professions) may not have meaningful semantics that align with the expectations of the pre-trained BERT model. This discrepancy can introduce noise, leading to less informative embeddings.

* **Model Complexity vs. Data Quality:** The more complex model may have suffered from overfitting due to the limited amount and quality of training data, making it less capable of generalizing to unseen interactions. On the other hand, the simpler model may have been better at extracting meaningful patterns from a smaller set of clean, structured features.

* **Dimensionality Overload:** The BERT embeddings significantly increase the input dimensionality of the model. Without a sufficient amount of high-quality data to support these additional features, the model might have been overwhelmed, diminishing its ability to learn effective patterns from the data.

* **Mismatch of Feature Importance:** Not all features are equally predictive. By using all embedded attributes (such as item descriptions and features) without distinguishing their relevance, the model may have missed the importance of more significant features like user age or item price, which could have been more directly relevant for recommendations.


The results of this project highlight an important insight: greater complexity does not always lead to better performance. While more sophisticated models that incorporate deeper semantic features have the potential to improve recommendations, their success is highly dependent on the quality and relevance of the data. In this case, the simpler model performed better due to its focus on a smaller set of well-structured, highly predictive features.