# Recommender with subset of item/user attributes

## Introduction

In this notebook, we develop a personalized recommender system for tech products using a subset of user and item attributes. Starting with preprocessed data—including user id,age  and product id and price—we construct a two-tower neural network model that learns to match users with relevant items.

The pipeline includes steps such as:

* Preparing positive and negative interaction samples

* Feature preprocessing (encoding and scaling)

* Building and training the recommendation model

* Generating and displaying recommendations for diverse user personalities

* Evaluating the model’s predictive performance using a holdout set

This approach allows us to build a recommender that not only predicts user preferences accurately but also adapts to varying user profiles and product characteristics.

## Installations and Imports

In [None]:
# Install required packages
!pip install -q tensorflow scikit-learn

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow.keras import layers, Model
from pprint import pprint
from sklearn.model_selection import train_test_split

## Data Preparation and Holdout Set Creation for Recommender Training

In this phase, we begin building the recommender system using a subset of user and item attributes. The user, item, and interaction data are first loaded from the previously generated CSV files: users.csv, items.csv, and user_product_rating_matrix.csv. These files were created during the earlier stage of data generation and represent our chosen domain — tech products.

The interaction matrix is reshaped into a user-item-label format, focusing solely on positive interactions (i.e., where a user has meaningfully interacted with a product). To train and evaluate the model, we randomly split the positive interactions into 80% training and 20% holdout sets.

The holdout set is crucial as it allows us to test the recommender's ability to predict items that a user has already interacted with — simulating how well the model can recover known preferences. This step ensures that our evaluation is grounded in realistic scenarios and supports the development of a more effective recommendation strategy.

In [None]:
users = pd.read_csv('users.csv')


items = pd.read_csv('items.csv')
ratings_df = pd.read_csv('user_product_rating_matrix.csv')

# Create user-item pairs for training based on ratings_df
interactions_df = ratings_df.melt(id_vars=['user_id'], var_name='item_id', value_name='label')
interactions_positve=interactions_df[interactions_df['label'] == 1]

In [None]:
from sklearn.model_selection import train_test_split

# Shuffle and split interactions (assuming all are positive)
interactions_shuffled = interactions_positve.sample(frac=1, random_state=42).reset_index(drop=True)

# Split into 80% train, 20% holdout
interactions_train, interactions_holdout = train_test_split(
    interactions_shuffled,
    test_size=0.2,
    random_state=42
)

# Optional: reset indices
interactions = interactions_train.reset_index(drop=True)
interactions_holdout = interactions_holdout.reset_index(drop=True)

In [None]:
print(users.shape)
print(items.shape)
print(interactions.shape)
print(interactions_holdout.shape)

(10000, 4)
(1000, 5)
(773028, 3)
(193258, 3)


## Negative Sampling for Balanced Interaction Modeling

To improve the robustness of our recommender system and train a model that can effectively distinguish between relevant and irrelevant items, we implement negative sampling.

In real-world scenarios, users interact with only a small fraction of available products — leaving most items unobserved. Treating all non-interactions as negative signals can lead to bias. To address this, we sample a manageable and representative set of negative examples.

Specifically, we:

* Extract all negative user-item pairs (label = 0).

* Shuffle these globally to ensure randomness.

* Limit the number of negatives per user by selecting the top 100 for each user.

* Combine these negatives with the previously extracted positive interactions (label = 1).

This balanced dataset of positive and negative interactions allows us to train models that better understand user preferences and make more accurate predictions.

In [None]:
import pandas as pd
from tqdm import tqdm

# Step 1: Filter to only negative samples
negatives = interactions_df[interactions_df['label'] == 0]

# Step 2: Shuffle the negatives globally
negatives = negatives.sample(frac=1, random_state=42).reset_index(drop=True)

# Step 3: For each user, take the first 100 negatives
neg_samples = (
    negatives.groupby('user_id', group_keys=False)
    .head(100)
)

# Step 4: Mark original interactions (positives) with label = 1, if not already
interactions['label'] = 1

# Step 5: Combine positives and sampled negatives
enhanced_interactions = pd.concat([interactions, neg_samples], ignore_index=True)
interactions=enhanced_interactions


In [None]:
# new shape with also 100 extra non interactions per user
interactions.shape

(1773028, 3)

## Preprocessing User and Item Attributes for Model Input


Before feeding the data into the recommender model, we preprocess user and item attributes to ensure they are in a format suitable for machine learning algorithms.

We begin by merging relevant user (e.g., age) and item (e.g., price) attributes into the interaction dataset. Then, we apply the following preprocessing steps:

* Label Encoding: We convert the categorical identifiers (user_id and item_id) into numerical representations using LabelEncoder. This transformation is essential for embedding layers in recommendation models.

* Feature Scaling: Numerical features like user age and product price are scaled using MinMaxScaler to normalize their values between 0 and 1. This prevents features with larger ranges from dominating the learning process.

These steps ensure that both categorical and numerical attributes are efficiently utilized by the model, leading to improved learning and better generalization.

In [None]:
# Merge age and price
interactions = interactions.merge(users, on='user_id')
interactions = interactions.merge(items, on='item_id')

from sklearn.preprocessing import LabelEncoder, MinMaxScaler

user_encoder = LabelEncoder()
item_encoder = LabelEncoder()

interactions['user_encoded'] = user_encoder.fit_transform(interactions['user_id'])
interactions['item_encoded'] = item_encoder.fit_transform(interactions['item_id'])

# Scale numerical features
scaler_age = MinMaxScaler()
scaler_price = MinMaxScaler()

interactions['age_scaled'] = scaler_age.fit_transform(interactions[['age']])
interactions['price_scaled'] = scaler_price.fit_transform(interactions[['price']])


## Building a Two-Tower Neural Network Recommender


In this section, we implement a two-tower neural network recommender that learns separate representations for users and items before combining them to predict interactions.

The model is structured as follows:

* User Tower: Processes user-specific inputs — a unique user ID and their scaled age. The user ID is passed through an embedding layer, flattened, and then concatenated with the age input. This merged vector is then transformed via a dense layer.

* Item Tower: Mirrors the user tower but operates on item-specific inputs — item ID and scaled price. The embedded item ID and item price are similarly combined and passed through a dense layer.

* Interaction Layer: The outputs of both towers (user and item representations) are combined using a dot product, which captures their similarity. A sigmoid activation then predicts the probability of interaction (i.e., whether the user would interact with the item).

We compile the model using binary cross-entropy loss and the Adam optimizer, training it over 5 epochs with a batch size of 64.

This two-tower architecture enables efficient learning of user-item relationships and can scale to large datasets, making it a strong baseline for personalized recommendation.

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, Model, Input

embedding_dim = 32

# Inputs
user_id_input = Input(shape=(), name='user_id', dtype=tf.int32)
user_age_input = Input(shape=(1,), name='user_age', dtype=tf.float32)

item_id_input = Input(shape=(), name='item_id', dtype=tf.int32)
item_price_input = Input(shape=(1,), name='item_price', dtype=tf.float32)

# User tower
user_embedding = layers.Embedding(len(user_encoder.classes_), embedding_dim)(user_id_input)
user_embedding = layers.Flatten()(user_embedding)
user_concat = layers.Concatenate()([user_embedding, user_age_input])
user_vector = layers.Dense(64, activation='relu')(user_concat)

# Item tower
item_embedding = layers.Embedding(len(item_encoder.classes_), embedding_dim)(item_id_input)
item_embedding = layers.Flatten()(item_embedding)
item_concat = layers.Concatenate()([item_embedding, item_price_input])
item_vector = layers.Dense(64, activation='relu')(item_concat)

# Dot product
dot_product = layers.Dot(axes=1)([user_vector, item_vector])
output = layers.Activation('sigmoid')(dot_product)

# Compile
model = Model(inputs=[user_id_input, user_age_input, item_id_input, item_price_input], outputs=output)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
# Build input
X = {
    'user_id': interactions['user_encoded'].values,
    'user_age': interactions['age_scaled'].values,
    'item_id': interactions['item_encoded'].values,
    'item_price': interactions['price_scaled'].values
}
y = interactions['label'].values

model.fit(X, y, epochs=5, batch_size=64)


Epoch 1/5
[1m27704/27704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 3ms/step - accuracy: 0.6810 - loss: 0.6121
Epoch 2/5
[1m27704/27704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 3ms/step - accuracy: 0.7711 - loss: 0.5566
Epoch 3/5
[1m27704/27704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 3ms/step - accuracy: 0.7723 - loss: 0.5554
Epoch 4/5
[1m27704/27704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m77s[0m 3ms/step - accuracy: 0.7724 - loss: 0.5554
Epoch 5/5
[1m27704/27704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 3ms/step - accuracy: 0.7732 - loss: 0.5555


<keras.src.callbacks.history.History at 0x7ca6691b9bd0>

## Personalized Recommendation For Users

To enable personalized suggestions, we implemented a function that provides the top-N most relevant product recommendations for a specific user, based on the trained two-tower model.

The function follows these steps:

1. User Feature Extraction: Retrieves and encodes the user's ID and scales their age using the fitted scaler.

2. Candidate Item Pool: Considers all available items, excluding those the user has already interacted with positively.

3. Input Preparation: Constructs model inputs by replicating the user features for all remaining candidate items.

4. Prediction: The model computes interaction scores (likelihood of preference) for each user-item pair.

5. Ranking: The top-N items with the highest scores are selected as recommendations.

6. Result Formatting: For each recommended item, we provide relevant metadata (e.g., description, price, type, and features), along with the predicted score.

This function supports flexible top-N queries and enables us to evaluate or demonstrate the recommender’s output for individual users, highlighting its personalization capabilities

In [None]:
def recommend_for_user(user_id, top_n=5):
    # Get user encoded index and scaled age
    user_idx = user_encoder.transform([user_id])[0]
    user_row = users[users['user_id'] == user_id].iloc[0]
    user_age_scaled = scaler_age.transform([[user_row['age']]])[0][0]

    # All item data
    all_item_ids = items['item_id'].values
    all_item_indices = item_encoder.transform(all_item_ids)
    item_prices = items['price'].values
    item_prices_scaled = scaler_price.transform(item_prices.reshape(-1, 1)).flatten()

    # Remove already interacted items
    interacted_items = interactions[
        (interactions['user_id'] == user_id) & (interactions['label'] == 1)
    ]['item_id'].values
    interacted_indices = item_encoder.transform(interacted_items)

    mask = ~np.isin(all_item_indices, interacted_indices)
    item_indices = all_item_indices[mask]
    item_ids = all_item_ids[mask]
    item_prices_scaled = item_prices_scaled[mask]

    # Prepare model input
    user_id_array = np.full_like(item_indices, user_idx)
    user_age_array = np.full_like(item_prices_scaled, user_age_scaled)

    # Predict scores
    predictions = model.predict({
        'user_id': user_id_array,
        'user_age': user_age_array,
        'item_id': item_indices,
        'item_price': item_prices_scaled
    }, verbose=0).flatten()

    # Select top N
    top_idxs = predictions.argsort()[-top_n:][::-1]
    top_item_ids = item_ids[top_idxs]
    top_scores = predictions[top_idxs]

    # Merge and format output
    recommended = []
    for item_id, score in zip(top_item_ids, top_scores):
        item_row = items[items['item_id'] == item_id].iloc[0]
        rec = {
            'user_id': user_row['user_id'],
            'user_age': user_row['age'],
            'user_profession': user_row.get('profession', 'N/A'),
            'user_personality': user_row.get('personality', 'N/A'),
            'item_id': item_row['item_id'],
            'item_type': item_row.get('product_type', 'N/A'),
            'item_description': item_row.get('description', 'N/A'),
            'item_price': item_row['price'],
            'item_features': item_row.get('features', 'N/A'),
            'score': float(score)
        }
        recommended.append(rec)

    return recommended


In [None]:
# print recommendations for a random user
pprint(recommend_for_user('bud_0002', top_n=3))




[{'item_description': 'With noise cancellation and wireless connectivity, this '
                      'headphones is great for travel. Perfect for music '
                      'lovers, it delivers top-tier functionality.',
  'item_features': 'Fast charging, Bluetooth 5.0',
  'item_id': 'item_0755',
  'item_price': np.float64(43.5),
  'item_type': 'headphones',
  'score': 0.9996224641799927,
  'user_age': np.int64(26),
  'user_id': 'bud_0002',
  'user_personality': 'Budget Conscious',
  'user_profession': 'education'},
 {'item_description': 'With wireless connectivity and noise cancellation, this '
                      'headphones is great for travel. Perfect for music '
                      'lovers, it delivers top-tier functionality.',
  'item_features': 'Bluetooth 5.0, Over-ear design, Noise cancellation, Fast '
                   'charging',
  'item_id': 'item_0341',
  'item_price': np.float64(46.06),
  'item_type': 'headphones',
  'score': 0.9976105690002441,
  'user_age': np.i

## Exploring Recommendations Across Personality Types

To better understand how the recommender adapts to different user profiles, we generated indicative recommendations for a sample of users from each personality type.

For each personality group, we randomly selected up to three users, and for each user, we retrieved the top-5 personalized recommendations using our trained model. These recommendations were then printed along with relevant user and item details, offering insights into:

* The diversity and relevance of recommended tech products.

* How user attributes, such as age and personality, influence the system’s output.

* Potential patterns or differences in recommendations across personality types.

This qualitative step serves as an initial diagnostic of model behavior, helping us assess whether personalization is occurring meaningfully across segments before moving on to quantitative evaluation.

In [None]:
import random
from pprint import pprint

def recommend_for_representative_users(users_df, top_n=5, users_per_personality=3):
    personalities = users_df['personality'].unique()

    for personality in personalities:
        print(f"\n Personality: {personality}")
        personality_users = users_df[users_df['personality'] == personality]['user_id'].tolist()

        if len(personality_users) < users_per_personality:
            selected_users = personality_users
        else:
            selected_users = random.sample(personality_users, users_per_personality)

        for user_id in selected_users:
            print(f"\n Recommendations for user: {user_id}")
            recs = recommend_for_user(user_id, top_n=top_n)
            pprint(recs)


In [None]:
# print recommedations for 3 users per personality
recommend_for_representative_users(users, top_n=5, users_per_personality=3)


 Personality: Tech Enthusiast

 Recommendations for user: tec_0348




[{'item_description': 'This laptop features advanced cooling system and '
                      'AI-powered processor, designed for gamers. It is built '
                      'for performance, offering reliable performance and user '
                      'comfort.',
  'item_features': 'Fast charging, Bluetooth 5.0',
  'item_id': 'item_0849',
  'item_price': np.float64(483.54),
  'item_type': 'laptop',
  'score': 0.9993810653686523,
  'user_age': np.int64(35),
  'user_id': 'tec_0348',
  'user_personality': 'Tech Enthusiast',
  'user_profession': 'engineering'},
 {'item_description': 'Laptop comes equipped with advanced cooling system and '
                      'high refresh rate display, making it built for '
                      'performance. Great choice for business professionals.',
  'item_features': 'Touchscreen, Expandable storage, Fast charging',
  'item_id': 'item_0768',
  'item_price': np.float64(1612.27),
  'item_type': 'laptop',
  'score': 0.9993674159049988,
  'user_age'



[{'item_description': 'Laptop comes equipped with long battery life and '
                      'advanced cooling system, making it built for '
                      'performance. Great choice for business professionals.',
  'item_features': 'Fast charging, Expandable storage',
  'item_id': 'item_0732',
  'item_price': np.float64(407.81),
  'item_type': 'laptop',
  'score': 0.9997192025184631,
  'user_age': np.int64(30),
  'user_id': 'tec_1161',
  'user_personality': 'Tech Enthusiast',
  'user_profession': 'engineering'},
 {'item_description': 'Laptop comes equipped with advanced cooling system and '
                      'long battery life, making it built for performance. '
                      'Great choice for remote workers.',
  'item_features': 'USB-C connectivity, Bluetooth 5.0, Touchscreen',
  'item_id': 'item_0631',
  'item_price': np.float64(285.36),
  'item_type': 'laptop',
  'score': 0.9996629953384399,
  'user_age': np.int64(30),
  'user_id': 'tec_1161',
  'user_personali



[{'item_description': 'Designed for business professionals, this laptop '
                      'includes AI-powered processor and high refresh rate '
                      'display. It’s perfect for multitasking and built for '
                      'high efficiency.',
  'item_features': 'Touchscreen, Bluetooth 5.0',
  'item_id': 'item_0001',
  'item_price': np.float64(852.66),
  'item_type': 'laptop',
  'score': 0.5,
  'user_age': np.int64(25),
  'user_id': 'tec_0145',
  'user_personality': 'Tech Enthusiast',
  'user_profession': 'science'},
 {'item_description': 'Designed for gamers, this headphones includes long '
                      'battery life and noise cancellation. It’s ideal for '
                      'immersive sound and built for high efficiency.',
  'item_features': 'Bluetooth 5.0, Fast charging, Noise cancellation, Over-ear '
                   'design',
  'item_id': 'item_1000',
  'item_price': np.float64(86.29),
  'item_type': 'headphones',
  'score': 0.5,
  'user_a



[{'item_description': 'With 4K UHD display and ultra-thin bezel, this monitor '
                      'is built for high clarity. Perfect for content '
                      'creators, it delivers top-tier functionality.',
  'item_features': 'Low blue light, HDMI connectivity, Adjustable stand',
  'item_id': 'item_0505',
  'item_price': np.float64(118.19),
  'item_type': 'monitor',
  'score': 0.9999873638153076,
  'user_age': np.int64(35),
  'user_id': 'cre_1936',
  'user_personality': 'Creative Explorer',
  'user_profession': 'arts'},
 {'item_description': 'Designed for content creators, this monitor includes '
                      'HDR10 support and anti-glare coating. It’s built for '
                      'high clarity and built for high efficiency.',
  'item_features': 'Adjustable stand, 4K display, Low blue light, HDMI '
                   'connectivity',
  'item_id': 'item_0669',
  'item_price': np.float64(80.0),
  'item_type': 'monitor',
  'score': 0.9999867677688599,
  'user_



[{'item_description': 'This monitor features anti-glare coating and 4K UHD '
                      'display, designed for content creators. It is built for '
                      'high clarity, offering reliable performance and user '
                      'comfort.',
  'item_features': '4K display, Adjustable stand',
  'item_id': 'item_0025',
  'item_price': np.float64(122.67),
  'item_type': 'monitor',
  'score': 0.9999279975891113,
  'user_age': np.int64(42),
  'user_id': 'cre_1872',
  'user_personality': 'Creative Explorer',
  'user_profession': 'business'},
 {'item_description': 'With 4K UHD display and ultra-thin bezel, this monitor '
                      'is built for high clarity. Perfect for content '
                      'creators, it delivers top-tier functionality.',
  'item_features': 'Low blue light, HDMI connectivity, Adjustable stand',
  'item_id': 'item_0505',
  'item_price': np.float64(118.19),
  'item_type': 'monitor',
  'score': 0.9999266862869263,
  'user_age': 



[{'item_description': 'With 4K UHD display and HDR10 support, this monitor is '
                      'built for high clarity. Perfect for content creators, '
                      'it delivers top-tier functionality.',
  'item_features': 'HDMI connectivity, Low blue light, Adjustable stand, 4K '
                   'display',
  'item_id': 'item_0394',
  'item_price': np.float64(281.75),
  'item_type': 'monitor',
  'score': 0.9999866485595703,
  'user_age': np.int64(29),
  'user_id': 'cre_1058',
  'user_personality': 'Creative Explorer',
  'user_profession': 'arts'},
 {'item_description': 'Designed for content creators, this monitor includes '
                      'anti-glare coating and 4K UHD display. It’s built for '
                      'high clarity and built for high efficiency.',
  'item_features': 'Adjustable stand, Low blue light',
  'item_id': 'item_0466',
  'item_price': np.float64(137.37),
  'item_type': 'monitor',
  'score': 0.9999734163284302,
  'user_age': np.int64(29),



[{'item_description': 'Designed for business professionals, this laptop '
                      'includes AI-powered processor and high refresh rate '
                      'display. It’s perfect for multitasking and built for '
                      'high efficiency.',
  'item_features': 'Touchscreen, Bluetooth 5.0',
  'item_id': 'item_0001',
  'item_price': np.float64(852.66),
  'item_type': 'laptop',
  'score': 0.5,
  'user_age': np.int64(38),
  'user_id': 'pra_1501',
  'user_personality': 'Practical Buyer',
  'user_profession': 'education'},
 {'item_description': 'Designed for gamers, this headphones includes long '
                      'battery life and noise cancellation. It’s ideal for '
                      'immersive sound and built for high efficiency.',
  'item_features': 'Bluetooth 5.0, Fast charging, Noise cancellation, Over-ear '
                   'design',
  'item_id': 'item_1000',
  'item_price': np.float64(86.29),
  'item_type': 'headphones',
  'score': 0.5,
  'user



[{'item_description': 'Designed for gamers, this mouse includes adjustable DPI '
                      'and precision tracking. It’s ideal for everyday use and '
                      'built for high efficiency.',
  'item_features': 'Wireless, Silent clicks',
  'item_id': 'item_0679',
  'item_price': np.float64(68.11),
  'item_type': 'mouse',
  'score': 0.9983545541763306,
  'user_age': np.int64(55),
  'user_id': 'pra_0987',
  'user_personality': 'Practical Buyer',
  'user_profession': 'business'},
 {'item_description': 'With adjustable DPI and silent click buttons, this '
                      'mouse is ideal for everyday use. Perfect for office '
                      'use, it delivers top-tier functionality.',
  'item_features': 'Compact design, Silent clicks, Wireless, Rechargeable',
  'item_id': 'item_0054',
  'item_price': np.float64(58.87),
  'item_type': 'mouse',
  'score': 0.9966248273849487,
  'user_age': np.int64(55),
  'user_id': 'pra_0987',
  'user_personality': 'Practical



[{'item_description': 'Designed for graphic designers, this mouse includes '
                      'ergonomic design and silent click buttons. It’s great '
                      'for portability and built for high efficiency.',
  'item_features': 'Wireless, Compact design, Rechargeable, Silent clicks',
  'item_id': 'item_0121',
  'item_price': np.float64(63.81),
  'item_type': 'mouse',
  'score': 0.9994961023330688,
  'user_age': np.int64(41),
  'user_id': 'pra_0076',
  'user_personality': 'Practical Buyer',
  'user_profession': 'law'},
 {'item_description': 'With adjustable DPI and ergonomic design, this mouse is '
                      'great for portability. Perfect for gamers, it delivers '
                      'top-tier functionality.',
  'item_features': 'Wireless, Rechargeable, Compact design',
  'item_id': 'item_0871',
  'item_price': np.float64(41.28),
  'item_type': 'mouse',
  'score': 0.998749852180481,
  'user_age': np.int64(41),
  'user_id': 'pra_0076',
  'user_personalit



[{'item_description': 'Laptop comes equipped with long battery life and high '
                      'refresh rate display, making it great for travel. Great '
                      'choice for remote workers.',
  'item_features': 'Touchscreen, Expandable storage',
  'item_id': 'item_0427',
  'item_price': np.float64(410.57),
  'item_type': 'laptop',
  'score': 0.9999997615814209,
  'user_age': np.int64(28),
  'user_id': 'per_1030',
  'user_personality': 'Performance Seeker',
  'user_profession': 'business'},
 {'item_description': 'This laptop features advanced cooling system and high '
                      'refresh rate display, designed for business '
                      'professionals. It is built for performance, offering '
                      'reliable performance and user comfort.',
  'item_features': 'Expandable storage, Fast charging, Bluetooth 5.0',
  'item_id': 'item_0705',
  'item_price': np.float64(305.42),
  'item_type': 'laptop',
  'score': 0.9999992847442627,
  'use



[{'item_description': 'Designed for remote workers, this laptop includes high '
                      'refresh rate display and long battery life. It’s great '
                      'for travel and built for high efficiency.',
  'item_features': 'USB-C connectivity, Touchscreen, Expandable storage, Fast '
                   'charging',
  'item_id': 'item_0609',
  'item_price': np.float64(273.32),
  'item_type': 'laptop',
  'score': 0.9987972974777222,
  'user_age': np.int64(31),
  'user_id': 'per_1015',
  'user_personality': 'Performance Seeker',
  'user_profession': 'education'},
 {'item_description': 'This laptop features long battery life and high refresh '
                      'rate display, designed for gamers. It is perfect for '
                      'multitasking, offering reliable performance and user '
                      'comfort.',
  'item_features': 'Fast charging, Expandable storage, Touchscreen',
  'item_id': 'item_0219',
  'item_price': np.float64(521.82),
  'item_ty



[{'item_description': 'Designed for gamers, this headphones includes wireless '
                      'connectivity and noise cancellation. It’s great for '
                      'travel and built for high efficiency.',
  'item_features': 'Noise cancellation, Fast charging, Bluetooth 5.0',
  'item_id': 'item_0967',
  'item_price': np.float64(49.19),
  'item_type': 'headphones',
  'score': 0.9983124732971191,
  'user_age': np.int64(26),
  'user_id': 'bud_1133',
  'user_personality': 'Budget Conscious',
  'user_profession': 'arts'},
 {'item_description': 'Headphones comes equipped with deep bass and noise '
                      'cancellation, making it great for travel. Great choice '
                      'for commuters.',
  'item_features': 'Bluetooth 5.0, Fast charging, Noise cancellation, Over-ear '
                   'design',
  'item_id': 'item_0569',
  'item_price': np.float64(40.0),
  'item_type': 'headphones',
  'score': 0.9982671737670898,
  'user_age': np.int64(26),
  'user_i



[{'item_description': 'With ergonomic design and adjustable DPI, this mouse is '
                      'ideal for everyday use. Perfect for gamers, it delivers '
                      'top-tier functionality.',
  'item_features': 'Silent clicks, Rechargeable, Compact design, Wireless',
  'item_id': 'item_0380',
  'item_price': np.float64(31.07),
  'item_type': 'mouse',
  'score': 0.9966893196105957,
  'user_age': np.int64(24),
  'user_id': 'bud_1631',
  'user_personality': 'Budget Conscious',
  'user_profession': 'education'},
 {'item_description': 'Laptop comes equipped with long battery life and '
                      'advanced cooling system, making it great for travel. '
                      'Great choice for business professionals.',
  'item_features': 'Fast charging, USB-C connectivity',
  'item_id': 'item_0197',
  'item_price': np.float64(427.55),
  'item_type': 'laptop',
  'score': 0.9957559704780579,
  'user_age': np.int64(24),
  'user_id': 'bud_1631',
  'user_personality': 



## Evaluating Recommender Accuracy with Holdout Interactions


To assess the performance of our recommender, we evaluated its ability to predict real, previously observed interactions using the holdout dataset. This dataset was reserved during training and contains genuine user-item interactions that the model has not seen before, making it ideal for unbiased evaluation.

For each user in the holdout set, we randomly selected one actual interaction and calculated the model's predicted score for that user-item pair using the trained two-tower architecture.

The resulting average predicted score for these true interactions was:

68.80%

This score indicates the model's confidence in recommending items that users have genuinely engaged with, providing a quantitative measure of accuracy and validating that the model has learned meaningful patterns during training.

In [None]:
def get_score(user_id, item_id):
    # Encode user and item
    user_idx = user_encoder.transform([user_id])[0]
    item_idx = item_encoder.transform([item_id])[0]

    # Get user and item features
    user_row = users[users['user_id'] == user_id].iloc[0]
    item_row = items[items['item_id'] == item_id].iloc[0]

    user_age_scaled = scaler_age.transform([[user_row['age']]])[0][0]
    item_price_scaled = scaler_price.transform([[item_row['price']]])[0][0]

    # Predict
    prediction = model.predict({
        'user_id': np.array([user_idx]),
        'user_age': np.array([user_age_scaled]),
        'item_id': np.array([item_idx]),
        'item_price': np.array([item_price_scaled])
    }, verbose=0).flatten()[0]

    return float(prediction)


In [None]:
score = get_score('bud_0002', 'item_0755')
print(f"Predicted score: {score:.4f}")


Predicted score: 0.9996




In [None]:

interactions[(interactions['user_id']=='bud_0002') & (interactions['item_id']=='item_0755')]

Unnamed: 0,user_id,item_id,label,age,profession,personality,product_type,description,price,features,user_encoded,item_encoded,age_scaled,price_scaled


In [None]:

interactions_holdout[(interactions_holdout['user_id']=='bud_0002') & (interactions_holdout['item_id']=='item_0755')]

Unnamed: 0,user_id,item_id,label
117458,bud_0002,item_0755,1


In [None]:
from tqdm import tqdm

def evaluate_holdout_one_per_user(holdout_df):
    # Pick one random interaction per user
    sampled = holdout_df.groupby('user_id').sample(n=1, random_state=42).reset_index(drop=True)

    scores = []
    for _, row in tqdm(sampled.iterrows(), total=len(sampled), desc="Scoring one interaction per user"):
        user_id = row['user_id']
        item_id = row['item_id']
        score = get_score(user_id, item_id)
        scores.append(score)

    avg_score = sum(scores) / len(scores)
    #print(f"\n Average predicted score (1 interaction per user): {avg_score:.4f}")
    return avg_score



In [None]:
avg_one_per_user = evaluate_holdout_one_per_user(interactions_holdout)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Scoring one interaction per user: 100%|██████████| 10000/10000 [25:17<00:00,  6.59it/s]


In [None]:
print(f"\n Average recommendation score for actual interactions "
      f"(1 random real interaction per user from holdout): {(avg_one_per_user * 100):.4f}%")


 Average recommendation score for actual interactions (1 random real interaction per user from holdout): 68.7972%
