## Generative Recommendation with Multi-Modal Data Integration

#### Objective
- Generate personalized product recommendations using generative models.
- Integrate multi-modal data (product descriptions and images) to understand product features better.
- Use multi-modal embeddings to represent text and image data jointly, enabling more context-aware suggestions.

#### Dataset
We will use a dataset containing:

- User-item interactions: Clicks, purchases, and ratings.
- Product descriptions: Textual data describing each product.
- Product images: Visual content associated with products.

#### Required Libraries
- Python
- TensorFlow or PyTorch
- Hugging Face Transformers (for text embeddings)
- OpenCV or Pillow (for image processing)
- Pre-trained models (e.g., BERT, ResNet)

#### Data Preprocessing and Embedding Creation

**Product Descriptions (Text)**

Tokenize and preprocess product descriptions using a pre-trained BERT model to generate text embeddings.

In [None]:
from transformers import BertTokenizer, BertModel
import torch

# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

def get_text_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
    outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).detach()


**Product Images (Visual Data)**

Generate embeddings for product images using a pre-trained ResNet model.

In [None]:
from torchvision import models, transforms
from PIL import Image
import torch

# Load pre-trained ResNet model
model = models.resnet18(pretrained=True)
model.eval()

def get_image_embedding(image_path):
    img = Image.open(image_path)
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    img_tensor = preprocess(img).unsqueeze(0)
    with torch.no_grad():
        embedding = model(img_tensor)
    return embedding


**Combining Text and Image Data**

Create a unified representation by concatenating the embeddings from text and image data.

In [None]:
import torch

def combine_embeddings(text_embeddings, image_embeddings):
    return [torch.cat((text_embed, img_embed), dim=1) 
            for text_embed, img_embed in zip(text_embeddings, image_embeddings)]


#### Generating Personalized Recommendations

**Using VAEs for Recommendations**

Train a VAE on combined embeddings to model user preferences and generate recommendations.

In [None]:
import torch
from torch import nn

class VAE(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super(VAE, self).__init__()
        # Encoder
        self.fc1 = nn.Linear(input_dim, 512)
        self.fc2_mu = nn.Linear(512, latent_dim)
        self.fc2_logvar = nn.Linear(512, latent_dim)
        # Decoder
        self.fc3 = nn.Linear(latent_dim, 512)
        self.fc4 = nn.Linear(512, input_dim)

    def encode(self, x):
        h1 = torch.relu(self.fc1(x))
        return self.fc2_mu(h1), self.fc2_logvar(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        h3 = torch.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

def loss_function(recon_x, x, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD

#### Evaluation and Testing

** Performance Metrics **

Evaluate recommendations using metrics like Precision@k, Recall@k, and Normalized Discounted Cumulative Gain (NDCG).

In [None]:
def precision_at_k(recommended, relevant, k):
    recommended_at_k = recommended[:k]
    return len(set(recommended_at_k) & set(relevant)) / k

def recall_at_k(recommended, relevant, k):
    recommended_at_k = recommended[:k]
    return len(set(recommended_at_k) & set(relevant)) / len(relevant)

def ndcg_at_k(recommended, relevant, k):
    dcg = sum([1 / (math.log2(i + 2)) for i, r in enumerate(recommended[:k]) if r in relevant])
    idcg = sum([1 / (math.log2(i + 2)) for i in range(min(len(relevant), k))])
    return dcg / idcg


#### Main function

In [None]:
if __name__ == "__main__":
    # Load data and preprocess
    product_descriptions = ["Elegant watch", "Stylish sunglasses"]
    product_images = ["watch.jpg", "sunglasses.jpg"]
    
    text_embeddings = [get_text_embedding(desc) for desc in product_descriptions]
    image_embeddings = [get_image_embedding(img) for img in product_images]
    
    combined_embeddings = combine_embeddings(text_embeddings, image_embeddings)
    
    # Train VAE
    vae = VAE(input_dim=combined_embeddings[0].shape[1], latent_dim=32)
    optimizer = torch.optim.Adam(vae.parameters(), lr=0.001)
    for epoch in range(10):  # Simplified training loop
        for data in combined_embeddings:
            recon, mu, logvar = vae(data)
            loss = loss_function(recon, data, mu, logvar)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
    # Evaluate model
    recommended = ["watch", "sunglasses"]
    relevant = ["watch"]
    print(f"Precision@1: {precision_at_k(recommended, relevant, 1)}")
    print(f"Recall@1: {recall_at_k(recommended, relevant, 1)}")
    print(f"NDCG@1: {ndcg_at_k(recommended, relevant, 1)}")
