# Recommender Systems with Neural Networks

## Introduction

Recommender systems have become an integral part of modern online experiences, guiding users towards products, services, and content tailored to their preferences. With the advent of deep learning, recommender systems have evolved significantly, leveraging complex neural network architectures to capture intricate patterns in user behavior.

In this tutorial, we'll explore how to build personalized recommender systems using deep learning techniques such as matrix factorization and deep collaborative filtering. We'll delve into the underlying mathematics, implement models using Python and PyTorch, and discuss the latest developments in the field. Key papers and relevant imagery will be included to enhance understanding.

## Table of Contents

1. [Understanding Recommender Systems](#1)
   - [Types of Recommender Systems](#1.1)
   - [Challenges in Recommendation](#1.2)
2. [Matrix Factorization](#2)
   - [Singular Value Decomposition (SVD)](#2.1)
   - [Alternating Least Squares (ALS)](#2.2)
   - [Implementation of Matrix Factorization](#2.3)
3. [Neural Collaborative Filtering](#3)
   - [Generalized Matrix Factorization (GMF)](#3.1)
   - [Neural Matrix Factorization (NeuMF)](#3.2)
   - [Implementation of Neural Collaborative Filtering](#3.3)
4. [Autoencoders for Recommendation](#4)
   - [Denoising Autoencoders](#4.1)
   - [Variational Autoencoders (VAE)](#4.2)
   - [Implementation of Autoencoder-based Recommender](#4.3)
5. [Latest Developments](#5)
   - [Graph Neural Networks for Recommendation](#5.1)
   - [Self-Attention Mechanisms](#5.2)
6. [Conclusion](#6)
7. [References](#7)

<a id="1"></a>
# 1. Understanding Recommender Systems

Recommender systems aim to predict users' preferences and suggest items they are likely to find interesting. They play a crucial role in various domains, including e-commerce, streaming services, and social media.

<a id="1.1"></a>
## 1.1 Types of Recommender Systems

### Content-Based Filtering

- **Definition**: Recommends items similar to those a user liked in the past.
- **Approach**: Uses item features and user profiles.

### Collaborative Filtering

- **Definition**: Recommends items based on the preferences of similar users.
- **Approach**: Utilizes user-item interaction data (e.g., ratings, clicks).

### Hybrid Methods

- Combines content-based and collaborative filtering to leverage the strengths of both.

<a id="1.2"></a>
## 1.2 Challenges in Recommendation

- **Data Sparsity**: Most users interact with a small subset of items.
- **Cold Start Problem**: Difficulty in recommending for new users or items.
- **Scalability**: Managing large-scale datasets with millions of users and items.
- **Diversity and Serendipity**: Balancing between accurate recommendations and novel content.

<a id="2"></a>
# 2. Matrix Factorization

Matrix factorization is a collaborative filtering technique that decomposes the user-item interaction matrix into latent factors representing users and items.

<a id="2.1"></a>
## 2.1 Singular Value Decomposition (SVD)

SVD decomposes a matrix $( R )$ into the product of three matrices:

$[
R = U \Sigma V^T
]$

- $( U )$: User factors.
- $( \Sigma )$: Diagonal matrix of singular values.
- $( V^T )$: Item factors.

### Limitations

- Not suitable for sparse matrices.
- Requires imputation of missing values.

<a id="2.2"></a>
## 2.2 Alternating Least Squares (ALS)

ALS optimizes the following objective function:

$[
\min_{P,Q} \sum_{(u,i) \in K} (R_{ui} - P_u^T Q_i)^2 + \lambda (\| P_u \|^2 + \| Q_i \|^2)
]$

- $( R_{ui} )$: Rating of user $( u )$ for item $( i )$.
- $( P_u )$: Latent vector for user $( u )$.
- $( Q_i )$: Latent vector for item $( i )$.
- $( \lambda )$: Regularization parameter.
- $( K )$: Set of known ratings.

### Algorithm

1. Initialize user and item factors randomly.
2. Fix item factors and solve for user factors.
3. Fix user factors and solve for item factors.
4. Repeat until convergence.

<a id="2.3"></a>
## 2.3 Implementation of Matrix Factorization

We'll implement ALS using Python and NumPy.

Install Dependencies:

In [None]:
pip install numpy pandas torch torchvision matplotlib

In [None]:
# Import libraries
import numpy as np
import pandas as pd
from scipy.sparse.linalg import svds

# Load dataset
# For demonstration, we'll use the MovieLens 100k dataset
# Download from: https://grouplens.org/datasets/movielens/100k/

# Read data
ratings = pd.read_csv('ml-100k/u.data', sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])

# Create user-item matrix
R_df = ratings.pivot(index='user_id', columns='item_id', values='rating').fillna(0)
R = R_df.values
user_ratings_mean = np.mean(R, axis=1)
R_demeaned = R - user_ratings_mean.reshape(-1, 1)

# Perform SVD
U, sigma, Vt = svds(R_demeaned, k=50)

# Reconstruct ratings
sigma = np.diag(sigma)
predicted_ratings = np.dot(np.dot(U, sigma), Vt) + user_ratings_mean.reshape(-1, 1)

# Convert to DataFrame
preds_df = pd.DataFrame(predicted_ratings, columns=R_df.columns)

# Function to recommend movies
def recommend_movies(predictions_df, user_id, movies_df, original_ratings_df, num_recommendations=5):
    # Get and sort user's predictions
    user_row_number = user_id - 1
    sorted_user_predictions = predictions_df.iloc[user_row_number].sort_values(ascending=False)
    
    # Get user's data and merge with movies
    user_data = original_ratings_df[original_ratings_df.user_id == user_id]
    user_full = user_data.merge(movies_df, how='left', left_on='item_id', right_on='movie_id').sort_values(['rating'], ascending=False)
    
    print('User {0} has already rated {1} movies.'.format(user_id, user_full.shape[0]))
    print('Recommending highest predicted ratings movies not already rated.')
    
    # Recommend movies
    recommendations = movies_df[~movies_df['movie_id'].isin(user_full['movie_id'])]
    recommendations = recommendations.merge(pd.DataFrame(sorted_user_predictions).reset_index(), how='left', left_on='movie_id', right_on='item_id')
    recommendations = recommendations.rename(columns={user_row_number: 'Predictions'}).sort_values('Predictions', ascending=False)
    
    return user_full, recommendations.head(num_recommendations)

# Load movies data
movies_df = pd.read_csv('ml-100k/u.item', sep='|', encoding='latin-1', names=['movie_id', 'title'], usecols=[0, 1])

# Recommend movies for a user
already_rated, predictions = recommend_movies(preds_df, 837, movies_df, ratings, 5)

print('Top 5 movie recommendations:')
print(predictions[['title', 'Predictions']])

**Explanation:**

- We load the MovieLens 100k dataset.
- Create a user-item matrix and perform SVD.
- Reconstruct the predicted ratings.
- Recommend movies to a user based on predicted ratings.

<a id="3"></a>
# 3. Neural Collaborative Filtering

Neural Collaborative Filtering (NCF) [[1]](#ref1) leverages neural networks to model user-item interactions, providing greater expressiveness than traditional matrix factorization.

<a id="3.1"></a>
## 3.1 Generalized Matrix Factorization (GMF)

GMF replaces the dot product in traditional matrix factorization with a neural network.

### Model Architecture

- **Embedding Layers**: Map users and items to latent vectors.
- **Element-wise Multiplication**: Combine user and item embeddings.
- **Output Layer**: Predict interaction (e.g., rating).

### Mathematical Formulation

$[
\hat{y}_{ui} = h(P_u \odot Q_i)
]$

- $( P_u )$: User embedding.
- $( Q_i )$: Item embedding.
- $( \odot )$: Element-wise multiplication.
- $( h )$: Activation function (e.g., sigmoid).

<a id="3.2"></a>
## 3.2 Neural Matrix Factorization (NeuMF)

NeuMF combines GMF and Multi-Layer Perceptron (MLP) to capture both linear and nonlinear interactions.

### Model Architecture

- **GMF Component**: Captures linear interactions.
- **MLP Component**: Captures nonlinear interactions.
- **Fusion Layer**: Concatenates outputs from GMF and MLP.
- **Output Layer**: Predicts interaction.

### Mathematical Formulation

$[
\hat{y}_{ui} = h(\mathbf{h}_{GMF}(u,i) \oplus \mathbf{h}_{MLP}(u,i))
]$

- $( \mathbf{h}_{GMF} )$: Output from GMF.
- $( \mathbf{h}_{MLP} )$: Output from MLP.
- $( \oplus )$: Concatenation.
- $( h )$: Activation function.

<a id="3.3"></a>
## 3.3 Implementation of Neural Collaborative Filtering

We'll implement NeuMF using PyTorch.

In [None]:
# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# Create dataset class
class RatingDataset(Dataset):
    def __init__(self, user_tensor, item_tensor, target_tensor):
        self.user_tensor = user_tensor
        self.item_tensor = item_tensor
        self.target_tensor = target_tensor
    def __getitem__(self, index):
        return self.user_tensor[index], self.item_tensor[index], self.target_tensor[index]
    def __len__(self):
        return self.user_tensor.size(0)

# Define NeuMF model
class NeuMF(nn.Module):
    def __init__(self, num_users, num_items, latent_dim_mf=8, latent_dim_mlp=8, layers=[16, 8, 4]):
        super(NeuMF, self).__init__()
        
        # GMF part
        self.embed_user_GMF = nn.Embedding(num_users, latent_dim_mf)
        self.embed_item_GMF = nn.Embedding(num_items, latent_dim_mf)
        
        # MLP part
        self.embed_user_MLP = nn.Embedding(num_users, latent_dim_mlp)
        self.embed_item_MLP = nn.Embedding(num_items, latent_dim_mlp)
        
        self.fc_layers = nn.ModuleList()
        input_size = latent_dim_mlp * 2
        for layer_size in layers:
            self.fc_layers.append(nn.Linear(input_size, layer_size))
            input_size = layer_size
        
        # Final prediction layer
        predict_size = latent_dim_mf + layers[-1]
        self.predict_layer = nn.Linear(predict_size, 1)
    
    def forward(self, user_indices, item_indices):
        # GMF part
        user_embedding_GMF = self.embed_user_GMF(user_indices)
        item_embedding_GMF = self.embed_item_GMF(item_indices)
        output_GMF = user_embedding_GMF * item_embedding_GMF
        
        # MLP part
        user_embedding_MLP = self.embed_user_MLP(user_indices)
        item_embedding_MLP = self.embed_item_MLP(item_indices)
        interaction = torch.cat((user_embedding_MLP, item_embedding_MLP), -1)
        output_MLP = interaction
        for fc in self.fc_layers:
            output_MLP = nn.ReLU()(fc(output_MLP))
        
        # Concatenate GMF and MLP parts
        concat = torch.cat((output_GMF, output_MLP), -1)
        prediction = self.predict_layer(concat)
        return prediction.squeeze()

# Prepare data
user_ids = ratings['user_id'].values - 1
item_ids = ratings['item_id'].values - 1
targets = ratings['rating'].values.astype(np.float32)

dataset = RatingDataset(torch.tensor(user_ids), torch.tensor(item_ids), torch.tensor(targets))
train_loader = DataLoader(dataset, batch_size=256, shuffle=True)

# Instantiate model
num_users = ratings['user_id'].nunique()
num_items = ratings['item_id'].nunique()
model = NeuMF(num_users, num_items)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for user_ids_batch, item_ids_batch, ratings_batch in train_loader:
        optimizer.zero_grad()
        outputs = model(user_ids_batch, item_ids_batch)
        loss = criterion(outputs, ratings_batch)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {total_loss / len(train_loader):.4f}')

**Explanation:**

- **Data Preparation**: We prepare the user IDs, item IDs, and ratings as tensors.
- **Model Definition**: NeuMF combines GMF and MLP architectures.
- **Training**: We train the model using MSE loss and Adam optimizer.

<a id="4"></a>
# 4. Autoencoders for Recommendation

Autoencoders can be used to reconstruct user preferences, making them suitable for recommendation tasks.

<a id="4.1"></a>
## 4.1 Denoising Autoencoders

Denoising autoencoders [[2]](#ref2) learn to reconstruct input data from corrupted versions, improving robustness.

<a id="4.2"></a>
## 4.2 Variational Autoencoders (VAE)

VAEs [[3]](#ref3) introduce probabilistic latent variables, allowing for better generalization and capturing data distributions.

<a id="4.3"></a>
## 4.3 Implementation of Autoencoder-based Recommender

We'll implement a simple autoencoder for collaborative filtering.

In [None]:
# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader

# Create user-item interaction matrix
num_users = ratings['user_id'].nunique()
num_items = ratings['item_id'].nunique()

interaction_matrix = np.zeros((num_users, num_items))
for row in ratings.itertuples():
    interaction_matrix[row[1]-1, row[2]-1] = row[3]

# Convert to tensor
data = torch.FloatTensor(interaction_matrix)

# Define Autoencoder model
class AutoEncoder(nn.Module):
    def __init__(self, num_items, encoding_dim=32):
        super(AutoEncoder, self).__init__()
        self.encoder = nn.Linear(num_items, encoding_dim)
        self.decoder = nn.Linear(encoding_dim, num_items)
    def forward(self, x):
        encoded = nn.ReLU()(self.encoder(x))
        decoded = self.decoder(encoded)
        return decoded

# Instantiate model
model = AutoEncoder(num_items)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 20
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for user_ratings in data:
        user_ratings = user_ratings.unsqueeze(0)
        optimizer.zero_grad()
        outputs = model(user_ratings)
        loss = criterion(outputs, user_ratings)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {total_loss / num_users:.4f}')

# Recommend items for a user
def recommend_ae(model, user_id, num_recommendations=5):
    user_ratings = data[user_id-1].unsqueeze(0)
    with torch.no_grad():
        reconstructed = model(user_ratings)
    recommendations = reconstructed[0].numpy()
    already_rated = np.where(user_ratings[0].numpy() > 0)[0]
    recommendations[already_rated] = -np.inf
    recommended_items = np.argsort(-recommendations)[:num_recommendations]
    
    recommended_movie_ids = recommended_items + 1
    recommended_movies = movies_df[movies_df['movie_id'].isin(recommended_movie_ids)]
    return recommended_movies

# Recommend movies for user 837
recommended_movies = recommend_ae(model, 837)
print('Recommended movies:')
print(recommended_movies['title'])

**Explanation:**

- **Data Preparation**: Create a user-item interaction matrix.
- **Model Definition**: Define an autoencoder with an encoder and decoder.
- **Training**: Train the autoencoder to reconstruct user ratings.
- **Recommendation**: Recommend items with highest reconstructed ratings.

<a id="5"></a>
# 5. Latest Developments

<a id="5.1"></a>
## 5.1 Graph Neural Networks for Recommendation

Graph Neural Networks (GNNs) [[4]](#ref4) model users and items as nodes in a graph, capturing complex relationships and interactions.

### Key Concepts

- **Graph Representation**: Users and items connected by edges representing interactions.
- **Message Passing**: Nodes aggregate information from neighbors.
- **Applications**: Social recommendation, session-based recommendation.

<a id="5.2"></a>
## 5.2 Self-Attention Mechanisms

Self-attention models like Transformers [[5]](#ref5) have been applied to recommendation, capturing sequential dependencies and user behaviors.

### SASRec: Self-Attentive Sequential Recommendation

- **Model**: Uses self-attention to model user behavior sequences.
- **Advantages**: Captures long-range dependencies without recurrence.
- **Reference**: Kang & McAuley, 2018 [[6]](#ref6)

<a id="6"></a>
# 6. Conclusion

Recommender systems are essential tools in the modern digital landscape. Deep learning techniques have significantly enhanced the capabilities of these systems, allowing for more accurate and personalized recommendations. From matrix factorization to neural collaborative filtering and autoencoders, we have explored various methods to build recommender systems using neural networks. Understanding the underlying mathematics and implementation details is crucial for developing effective models. The field continues to evolve with the incorporation of advanced architectures like GNNs and self-attention mechanisms.

<a id="7"></a>
# 7. References

1. <a id="ref1"></a>He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). *Neural Collaborative Filtering*. In Proceedings of the 26th International Conference on World Wide Web (WWW).
2. <a id="ref2"></a>Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). *Extracting and Composing Robust Features with Denoising Autoencoders*. In Proceedings of the 25th International Conference on Machine Learning (ICML).
3. <a id="ref3"></a>Kingma, D. P., & Welling, M. (2014). *Auto-Encoding Variational Bayes*. [arXiv:1312.6114](https://arxiv.org/abs/1312.6114)
4. <a id="ref4"></a>Wang, X., He, X., Wang, M., Feng, F., & Chua, T.-S. (2019). *Neural Graph Collaborative Filtering*. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.
5. <a id="ref5"></a>Vaswani, A., et al. (2017). *Attention Is All You Need*. [arXiv:1706.03762](https://arxiv.org/abs/1706.03762)
6. <a id="ref6"></a>Kang, W.-C., & McAuley, J. (2018). *Self-Attentive Sequential Recommendation*. In IEEE International Conference on Data Mining (ICDM).

---

This notebook provides a comprehensive guide to building recommender systems with neural networks. You can run the code cells to see how models are implemented and experiment with different architectures and datasets.