# 📚 Book Recommender Jupyter Notebook Outline

## 🎯 What This Notebook Does

This notebook builds a **content-based book recommendation system** that takes an ISBN as input and returns 10 similar book recommendations. Here's what we accomplish:

### **📚 The Book Recommender:**
- **Input**: Book ISBN (e.g., "9780553103540" for "A Game of Thrones")
- **Output**: 10 similar books with titles, authors, genres, and similarity scores
- **Algorithm**: TF-IDF vectorization + SVD dimensionality reduction + cosine similarity
- **Training Data**: 40 popular books across Fantasy, Sci-Fi, Literary Fiction, and Romance

### **🔧 Notebook Workflow:**
1. **Wrap Algorithm in Class**: Create `BookRecommenderPipeline` - a scikit-learn compatible class
2. **Train Locally**: Call `recommender.fit(df)` to build the complete recommendation system
3. **Log as Scikit-Learn Model**: Use `frogml.scikit_learn.log_model()` to store model + serving code
4. **Enable Production**: The logged model can be built and deployed as a real-time API

### **🚀 Deployment Flow:**
```
┌─────────────────────────────────────┐
│  LOCAL ENVIRONMENT                  │
│  ✓ Train BookRecommenderPipeline    │
│  ✓ Log to JFrogML Registry          │
└─────────────────────────────────────┘
              ↓
┌─────────────────────────────────────┐
│  JFROGML PLATFORM                   │
│  ✓ Build (package as container)     │
│  ✓ Deploy (launch as API endpoint)  │
└─────────────────────────────────────┘
```

## Setup & Data Loading

In [7]:
# Import libraries and load data
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity
import re

# Load dataset
df = pd.read_csv('main/books_dataset.csv')
print(f"📚 Loaded {len(df)} books")

📚 Loaded 41 books


## BookRecommenderPipeline Class

In [8]:
class BookRecommenderPipeline:
    def __init__(self):
        # All assets needed for serving will be stored here
        self.tfidf_vectorizer = None
        self.svd_model = None
        self.similarity_matrix = None
        self.book_metadata = None
        self.isbn_to_index = {}
        self.index_to_isbn = {}
        self.is_fitted = False
    
    def fit(self, book_data):
        # Create rich content features (title + author + genre + description)
        content = (book_data['title'].fillna('') + ' ' + 
                  book_data['author'].fillna('') + ' ' + 
                  book_data['genre'].fillna('') + ' ' + 
                  book_data['description'].fillna(''))
        
        # Full performance TF-IDF + SVD + Similarity
        self.tfidf_vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
        tfidf_matrix = self.tfidf_vectorizer.fit_transform(content)
        self.svd_model = TruncatedSVD(n_components=100, random_state=42)
        features = self.svd_model.fit_transform(tfidf_matrix)
        self.similarity_matrix = cosine_similarity(features)
        
        # Store all metadata and mappings (everything needed for serving)
        self.book_metadata = book_data[['isbn', 'title', 'author', 'genre', 'rating']].copy()
        self.isbn_to_index = {isbn: idx for idx, isbn in enumerate(book_data['isbn'])}
        self.index_to_isbn = {idx: isbn for isbn, idx in self.isbn_to_index.items()}
        self.is_fitted = True
        return self
    
    def predict(self, isbn_list, top_n=10):
        # This method contains all logic needed for serving
        recommendations = []
        for isbn in isbn_list:
            if isbn in self.isbn_to_index:
                idx = self.isbn_to_index[isbn]
                scores = self.similarity_matrix[idx]
                top_indices = np.argsort(scores)[::-1][1:top_n+1]
                
                for i in top_indices:
                    book = self.book_metadata.iloc[i]
                    recommendations.append({
                        'input_isbn': isbn,
                        'recommended_isbn': book['isbn'],
                        'title': book['title'],
                        'author': book['author'],
                        'genre': book['genre'],
                        'similarity_score': float(scores[i])
                    })
            else:
                # Fallback to popular books
                popular = self.book_metadata.nlargest(top_n, 'rating')
                for _, book in popular.iterrows():
                    recommendations.append({
                        'input_isbn': isbn,
                        'recommended_isbn': book['isbn'],
                        'title': book['title'],
                        'author': book['author'],
                        'genre': book['genre'],
                        'similarity_score': 0.8
                    })
        return recommendations

print("✅ Complete pipeline with all serialized assets ready!")

✅ Complete pipeline with all serialized assets ready!


## Train & Test

In [9]:
# Define hyperparameters (consolidated in one place)
HYPERPARAMS = {
    "tfidf_max_features": 5000,
    "tfidf_stop_words": "english", 
    "svd_components": 100,
    "svd_random_state": 42,
    "similarity_metric": "cosine",
    "top_n_recommendations": 10
}

# Train the model using consolidated hyperparameters
recommender = BookRecommenderPipeline()
recommender.fit(df)

# Quick test
test_recs = recommender.predict(['9780553103540'], top_n=HYPERPARAMS["top_n_recommendations"])
print("🧪 Test recommendations:")
for rec in test_recs[:3]:  # Show first 3
    print(f"- {rec['title']} by {rec['author']}")

🧪 Test recommendations:
- The Lord of the Rings by J.R.R. Tolkien
- The Way of Kings by Brandon Sanderson
- Harry Potter and the Goblet of Fire by J.K. Rowling


## Prepare Metadata

In [10]:
# Calculate performance metrics
avg_similarity = np.mean(recommender.similarity_matrix[np.triu_indices_from(recommender.similarity_matrix, k=1)])

# Model properties (string metadata for categorization)
MODEL_PROPERTIES = {
    "model_type": "content_based_recommender",
    "algorithm": "tfidf_cosine_similarity",
    "framework": "scikit_learn", 
    "domain": "book_recommendations"
}

# Training parameters (use consolidated hyperparameters)
MODEL_PARAMETERS = HYPERPARAMS.copy()

# Performance metrics (calculated from trained model and data)
# Note: Only use basic JSON-serializable types (int, float, str, bool)
MODEL_METRICS = {
    "total_books": int(len(df)),
    "total_genres": int(len(df['genre'].unique())),
    "average_similarity": float(avg_similarity),
    "similarity_matrix_rows": int(recommender.similarity_matrix.shape[0]),
    "similarity_matrix_cols": int(recommender.similarity_matrix.shape[1]),
    "average_rating": float(df['rating'].mean()),
    "rating_std": float(df['rating'].std()),
    "min_rating": float(df['rating'].min()),
    "max_rating": float(df['rating'].max()),
    "feature_dimensions": int(HYPERPARAMS["svd_components"]),
    "sparsity_ratio": float(np.sum(recommender.similarity_matrix > 0.5) / recommender.similarity_matrix.size)
}

print("✅ Metadata prepared for logging:")
print(f"📊 {len(MODEL_PARAMETERS)} parameters, {len(MODEL_METRICS)} metrics, {len(MODEL_PROPERTIES)} properties")

✅ Metadata prepared for logging:
📊 6 parameters, 11 metrics, 4 properties


## Log to JFrogML Model Registry (Artifactory)

In [11]:
import frogml

# Log model to JFrog ML Registry
frogml.scikit_learn.log_model(
    model=recommender,
    model_name = "book_recommender",
    repository = "ml-prod",
    version = "",
    properties = MODEL_PROPERTIES,
    parameters = MODEL_PARAMETERS,
    metrics = MODEL_METRICS,
    dependencies = ["main/conda.yml"],
    code_dir = "serving_code",
    predict_file = "serving_code/predict.py"
)

print("✅ Model logged to registry!")
print("🚀 Ready for build and deployment in the JFrogML UI!")

INFO:frogml.sdk.model_version.utils.model_log_config:No version provided; using current datetime as the version
INFO:ScikitLearnModelVersionManager:Logging model book_recommender to ml-prod
INFO:JmlCustomerClient:Customer exists in JML.
INFO:JmlCustomerClient:Getting project key for repository ml-prod
INFO:frogml.sdk.model_version.utils.files_tools:Code directory, predict file and dependencies are provided. Setup template files for model_name book_recommender
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmpchee4h7p/book_recommender.joblib: 100%|██████████| 151k/151k [00:00<00:00, 2.09GB/s]
main/conda.yml: 100%|██████████| 165/165 [00:00<00:00, 2.90MB/s]
/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmpchee4h7p/code.zip: 100%|██████████| 2.52k/2.52k [00:00<00:00, 3.20kB/s]

2025-10-08 19:09:45,586 - INFO - frogml.storage.logging._log_config.frog_ml.__upload_model:540 - Model: "book_recommender", version: "2025-10-08-16-09-41-901" has been uploaded successfully





✅ Model logged to registry!
🚀 Ready for build and deployment in the JFrogML UI!



## 🚀 After Running This Notebook

### **What Gets Logged to JFrog ML Registry:**
- ✅ **Complete BookRecommenderPipeline Object**: Pre-computed similarity matrix, TF-IDF vectorizer, book metadata, ISBN mappings
- ✅ **Production Serving Code**: `serving_code/predictor.py` that handles ISBN input → 10 recommendations output
- ✅ **Training Configuration**: TF-IDF parameters, SVD components, similarity metrics
- ✅ **Model Performance**: Average similarity scores, dataset statistics

### **Next Steps - Production Deployment:**

#### **1. Build Container (JFrogML UI)**
- Navigate to **JFrog UI → AI/ML → Models → "book_recommender"**
- Click **"Build Version"** to package the BookRecommenderPipeline into a deployable container

#### **2. Deploy API (JFrogML UI)**
- Click **"Deploy → Real-time"** to launch as auto-scaling API endpoint
- API accepts: `POST /predict` with `{"isbn": "9780553103540"}`
- API returns: JSON with 10 book recommendations

#### **3. Test Production API**
```bash
python test_live_endpoint.py
# Sends "A Game of Thrones" ISBN → Gets fantasy book recommendations
```