- **NumPy** is the main library Python uses for heavy-duty math.

- Google Colab just updated its servers to use the brand-new **NumPy 2.0**.

- Our recommender library, **surprise**, was built and compiled using the older **NumPy 1.x**.

- If **surprise** library is trying to talk to NumPy 2.0 using the "language" of NumPy 1.x, Then the new version of NumPy i.e. **NumPy 2.0** won't understand it, so it will crash.

- **Solution** : We just need to tell Colab to install the older, stable version of NumPy that 'surprise' knows how to talk to.

In [None]:
# Cell 0: Fix NumPy Incompatibility
# We are forcing Colab to use an older version of NumPy that works with 'surprise'
!pip install "numpy<2.0"

Collecting numpy<2.0
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/61.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.0/18.0 MB[0m [31m61.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.2
    Uninstalling numpy-2.0.2:
      Successfully uninstalled numpy-2.0.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opencv-python 4.12.0.88 requires numpy<2.3.0,>=2; py

**Step 0: Setup & Installations:**

First, we need to set up our environment. We'll be using pandas to manage our data and a special library called **surprise** which, as the name suggests, makes building recommender systems surprisingly easy!

In [None]:
# Cell 1: Install the 'surprise' library
# This is the main library we'll use for our first model.
!pip install surprise

Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl.metadata (327 bytes)
Collecting scikit-surprise (from surprise)
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp312-cp312-linux_x86_64.whl size=2544603 sha256=d517682c931ef32fc095ba28999e798105a40f15260842e294412c69654a6893
  Stored in directory: /root/.cache

In [None]:
# Cell 2: Import all our tools
import pandas as pd
import numpy as np

# --- For Model 1: Collaborative Filtering ---
from surprise import Reader, Dataset, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

# --- For Model 2: Content-Based Filtering ---
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

print("All libraries imported successfully!")

All libraries imported successfully!


##Step 1: Load Our Data

Our data is split into many files. We need to load them and merge them together to get the information we need. We don't need all 9 files right away, just the key ones.

In [None]:
# Cell 3: Load all the necessary CSV files
# Make sure you've uploaded these files to your Colab environment!

customers = pd.read_csv('/content/drive/MyDrive/Olist E-commerce Recommender System/olist_customers_dataset.csv')
orders = pd.read_csv('/content/drive/MyDrive/Olist E-commerce Recommender System/olist_orders_dataset.csv')
order_items = pd.read_csv('/content/drive/MyDrive/Olist E-commerce Recommender System/olist_order_items_dataset.csv')
order_reviews = pd.read_csv('/content/drive/MyDrive/Olist E-commerce Recommender System/olist_order_reviews_dataset.csv')
products = pd.read_csv('/content/drive/MyDrive/Olist E-commerce Recommender System/olist_products_dataset.csv')
translations = pd.read_csv('/content/drive/MyDrive/Olist E-commerce Recommender System/product_category_name_translation.csv')

print("All 6 key CSV files loaded successfully.")
print("Files ready for merging.")

All 6 key CSV files loaded successfully.
Files ready for merging.


##Model 1: Collaborative Filtering (SVD):

> SVD stands for **Singular Value Decomposition** — it’s a mathematical way to **break down a large matrix into smaller, simpler parts** so we can understand or work with it more easily.


**The Idea**: We're going to find users who have similar "tastes." To do this, we need to create one big table that links a **user** to a **product** and the **rating** they gave it.

**Step 2:** Prepare Data for Model 1
We need to join our tables to get these three columns:

- **User**: customer_unique_id (from customers)

- **Item**: product_id (from order_items)

- **Rating**: review_score (from order_reviews)


1. **SVD** - 	A method to break data into simpler parts

2. **Used for** - 	Dimensionality reduction, pattern discovery, noise removal, recommendations

3. **Intuition** - Find hidden relationships in data

4. **Example** -	Netflix movie recommendations, LSA for text, PCA internally uses SVD

In [None]:
# Cell 4: Merge the datasets
print("Merging dataframes to get user-item-rating format...")

# 1. Link orders to customers to get the unique user ID
merged_orders = orders.merge(customers, on='customer_id')

# 2. Link order_items to reviews to get product IDs and scores
merged_order_items = order_items.merge(order_reviews, on='order_id')

# 3. Now, merge the two big dataframes to link users to their product reviews
df_model_1 = merged_orders.merge(merged_order_items, on='order_id')

# 4. We only need the 3 key columns for this model
ratings_df = df_model_1[['customer_unique_id', 'product_id', 'review_score']]

# 5. Clean up: drop any rows with missing values
ratings_df = ratings_df.dropna()

print("Data merging complete! Here's a sample of our ratings data:")
print(ratings_df.head())

Merging dataframes to get user-item-rating format...
Data merging complete! Here's a sample of our ratings data:
                 customer_unique_id                        product_id  \
0  7c396fd4830fd04220f754e42b4e5bff  87285b34884572647811a353c7ac498a   
1  af07308b275d755c9edb36a90c618231  595fac2a385ac33a80bd5114aec74eb8   
2  3a653a41f6f9fc3d2a113cf8398680e8  aa4383b373c6aca5d8797843e5594415   
3  7c142cf63193a1473d2e66489a9ae977  d0b61bfb1de832b15ba9d266ca96e5b0   
4  72632f0f9dd73dfee390c9b22eb56dd6  65266b2da20d04dbe00c5c2d3bb7859e   

   review_score  
0             4  
1             4  
2             5  
3             5  
4             5  


Step 3: Train the SVD Model
Now we use the surprise library. We'll use an algorithm called SVD (Singular Value Decomposition).

Simple Explanation: Imagine a giant, mostly empty spreadsheet where every row is a user and every column is a product. SVD is a powerful math technique that "compresses" this sheet into two smaller ones. One sheet describes users by their "hidden preferences" (e.g., "likes modern tech," "is budget-conscious") and the other describes products by their "hidden attributes" (e.g., "is high-end," "is practical"). This is called Matrix Factorization.

In [None]:
# Cell 5: Load the data into the 'surprise' library format
# The Reader object tells 'surprise' what our rating scale is (1 to 5 stars)
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings_df, reader)

print("Data loaded into Surprise format.")

Data loaded into Surprise format.


In [None]:
# Cell 6: Train the SVD model
# We'll split our data: 80% to train the model, 20% to test how good it is.
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

# 'n_factors=100' means we are asking it to find 100 "hidden preference" types
model_svd = SVD(n_factors=100, n_epochs=20, random_state=42)

print("Training the SVD model... (This might take a minute or two)")
model_svd.fit(trainset)
print("Training complete!")

Training the SVD model... (This might take a minute or two)
Training complete!


Step 4: Test Model 1
How do we know if it's any good? We use the test data we set aside. We'll calculate the RMSE (Root Mean Squared Error).

Simple Explanation: RMSE tells us, on average, how "off" our model's predicted star rating was from the actual star rating. A lower number is better! An RMSE of 1.0 means we are, on average, about 1 star off in our predictions.

In [None]:
# Cell 7: Evaluate the SVD Model
print("Evaluating model...")
predictions = model_svd.test(testset)

# Calculate and print the RMSE
rmse = accuracy.rmse(predictions)
print(f"Test Set RMSE: {rmse}")

Evaluating model...
RMSE: 1.2681
Test Set RMSE: 1.2680982230386


Model 2: Content-Based Filtering (NLP)
The Idea: What if a product is brand new and has no ratings? Model 1 fails. This model solves that! It reads the content of a product (in our case, its category) and finds other products with similar content.

Step 5: Prepare Data for Model 2
We'll use the products file and the translations file to get the English category names for each product.


In [None]:
# Cell 8: Load and prepare data for content-based model
print("Loading product and translation data...")

# We only need the product_id and its category
products_df = products[['product_id', 'product_category_name']]

# Merge with translations to get English names
products_with_names = products_df.merge(translations, on='product_category_name', how='left')

# Clean up: Fill missing categories with an empty string
products_with_names['product_category_name_english'] = products_with_names['product_category_name_english'].fillna('')

# We'll also merge in the product name from the 'order_items' df (a bit of a hack, but it gives us a name)
# This is just to make our final output look nicer
product_names = order_items[['product_id']].drop_duplicates()
# This part is tricky in Olist, as product names aren't in the product table. We'll skip it for simplicity
# and just use the product_id and category.

print("Product data for Model 2 is ready:")
print(products_with_names.head())

Loading product and translation data...
Product data for Model 2 is ready:
                         product_id  product_category_name  \
0  1e9e8ef04dbcff4541ed26657ea517e5             perfumaria   
1  3aa071139cb16b67ca9e5dea641aaa2f                  artes   
2  96bd76ec8810374ed1b65e291975717f          esporte_lazer   
3  cef67bcfe19066a932b7673e239eb23d                  bebes   
4  9dc1a7de274444849c219cff195d0b71  utilidades_domesticas   

  product_category_name_english  
0                     perfumery  
1                           art  
2                sports_leisure  
3                          baby  
4                    housewares  


Step 6: Vectorize the Content (TF-IDF)
How does a computer read "home_decor" or "sports_leisure"? It can't. We have to turn the words into numbers. We'll use TF-IDF (Term Frequency-Inverse Document Frequency).

Simple Explanation: TF-IDF is a clever way to score how "important" a word is to a document. It gives a high score to words that appear a lot in one product's category but are rare in all other categories. This helps find unique, defining words.

In [None]:
# Cell 9: Create the TF-IDF Matrix
# 1. Initialize the TF-IDF Vectorizer
# 'stop_words='english'' tells it to ignore common words like 'and', 'the', 'is'
tfidf = TfidfVectorizer(stop_words='english')

# 2. Fit and transform the category names into a matrix of numbers
print("Fitting TF-IDF Vectorizer...")
tfidf_matrix = tfidf.fit_transform(products_with_names['product_category_name_english'])

print("TF-IDF Matrix created:")
print(tfidf_matrix.shape)

Fitting TF-IDF Vectorizer...
TF-IDF Matrix created:
(32951, 71)


Step 7: Calculate Similarity (Cosine Similarity)
Now that all our products are represented as number vectors, we can compare them. We'll use Cosine Similarity.

Simple Explanation: This measures the "angle" between two product vectors. If two products are very similar (e.g., both "baby_toys"), the angle between them is tiny (score near 1). If they are very different (e.g., "baby_toys" and "car_parts"), the angle is large (score near 0).

In [None]:
# Cell 10: Compute the Cosine Similarity Matrix
print("Calculating Cosine Similarity matrix...")
# This creates a giant matrix where cell (i, j) is the similarity score between product i and product j
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

print(f"Cosine similarity matrix shape: {cosine_sim.shape}")
print("Model 2 built successfully!")

Calculating Cosine Similarity matrix...
Cosine similarity matrix shape: (32951, 32951)
Model 2 built successfully!


Step 8: Get Recommendations!
Now for the fun part. Let's create two functions to use our models.

In [None]:
# Cell 11: Create a function to get Content-Based recommendations
# We need a way to map a product_id to its index number in the matrix
indices = pd.Series(products_with_names.index, index=products_with_names['product_id']).drop_duplicates()

def get_similar_products(product_id, n=10):
    try:
        # 1. Get the index of the product we want to match
        idx = indices[product_id]
    except KeyError:
        return f"Product ID {product_id} not found."

    # 2. Get the similarity scores for this product with all other products
    sim_scores = list(enumerate(cosine_sim[idx]))

    # 3. Sort the products based on their similarity score
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # 4. Get the scores of the top 10 most similar products (skip index 0, that's the product itself)
    sim_scores = sim_scores[1:n+1]

    # 5. Get the original indices of those top products
    product_indices = [i[0] for i in sim_scores]

    # 6. Return the product_ids and categories of the most similar items
    return products_with_names[['product_id', 'product_category_name_english']].iloc[product_indices]

In [None]:
# Cell 12: TEST MODEL 2!
# Let's pick a random product ID from our order_items
sample_product_id = order_items['product_id'].sample(1).values[0]
original_category = products_with_names.loc[products_with_names['product_id'] == sample_product_id, 'product_category_name_english'].values[0]

print(f"--- Recommendations for Product: {sample_product_id} ---")
print(f"--- Original Category: {original_category} ---\n")
print("--- Top 10 Similar Products: ---")

recommendations = get_similar_products(sample_product_id, n=10)
print(recommendations)

--- Recommendations for Product: b166907a5770631b7deedd0891b1b7ab ---
--- Original Category: housewares ---

--- Top 10 Similar Products: ---
                           product_id product_category_name_english
25   8ba4f2a4ae695d26e5626c1bf710975e                    housewares
38   e6af694343b45b56304ad91974a110b9                    housewares
109  67bea89008edcb996cfe4e3d062b62a8                    housewares
114  2be2be0a6a5916840503fdf50808ebcb                    housewares
135  bb09cce52b336261572a5a7e25a33795                    housewares
158  722fb67c17907e21734449c091420bf5                    housewares
167  f9471562478eba8761bc985b968a0092                    housewares
170  c7147724cd430269c6296fc758c0a086                    housewares
184  a85543a371d5cfafedb80a8177a692b5                    housewares
188  9aa70106d14b83c7ad033592870b2b30                    housewares


Final Conclusion
And there you have it! You've successfully:

Loaded and merged a complex, multi-file dataset.

Built a Collaborative Filtering model (SVD) to predict user ratings.

Built a Content-Based Filtering model (TF-IDF) to find similar products.

## We need to create a func for model 1:

In [None]:
# Cell 13: DEFINE THE SVD RECOMMENDATION FUNCTION

def get_top_n_recommendations_svd(user_id, n=10):
    # 1. Get a list of all item IDs
    all_item_ids = ratings_df['product_id'].unique()

    # 2. Get a list of items the user has already rated
    items_rated_by_user = ratings_df.loc[ratings_df['customer_unique_id'] == user_id, 'product_id']

    # 3. Get items the user has *not* rated (the candidates for recommendation)
    items_to_predict = np.setdiff1d(all_item_ids, items_rated_by_user)

    # 4. Predict ratings for all unrated items
    # We create a "test set" for this user and all items they haven't seen
    test_set_for_user = [[user_id, item_id, 0] for item_id in items_to_predict]

    # 5. Predict all the ratings
    user_predictions = model_svd.test(test_set_for_user)

    # 6. Sort predictions by estimated rating
    user_predictions.sort(key=lambda x: x.est, reverse=True)

    # 7. Return the top-N predicted items
    top_n = user_predictions[:n]

    top_n_item_ids = [pred.iid for pred in top_n]
    return top_n_item_ids

print("Function 'get_top_n_recommendations_svd' is now defined!")

Function 'get_top_n_recommendations_svd' is now defined!


## SVD Model Test 1: The "Power User"
What it is: Let's find a user who has left many reviews. Our SVD model should have a good "understanding" of their taste, so the recommendations should be personalized.

In [None]:
# Cell 14: SVD Test Case (Power User)
power_users = ratings_df['customer_unique_id'].value_counts()
power_user_id = power_users.idxmax() # .idxmax() gets the ID of the top user
num_reviews = power_users.max()

print(f"--- Testing SVD Model with a Power User ---")
print(f"User ID: {power_user_id}")
print(f"Number of reviews this user left: {num_reviews}\n")

print(f"--- Top 10 Recommendations for this Power User: ---")
recommendations = get_top_n_recommendations_svd(power_user_id, n=10)

# We need to get the product names for these IDs
recommended_products = products_with_names[products_with_names['product_id'].isin(recommendations)]
print(recommended_products[['product_id', 'product_category_name_english']])

--- Testing SVD Model with a Power User ---
User ID: d97b3cfb22b0d6b25ac9ed4e9c2d481b
Number of reviews this user left: 24

--- Top 10 Recommendations for this Power User: ---
                             product_id product_category_name_english
2269   88dd63919fc9ab693803578a04a20209         computers_accessories
8290   d1c427060a0f73f6b889a5c7c61f2ac4         computers_accessories
9367   57e089e3103f5cda6a4ce23b77399bdb                          baby
21450  698b3ddae2f0b80c2a48fb40624ca4e4               furniture_decor
24842  3e4176d545618ed02f382a3057de32b4           luggage_accessories
27244  a7d756e8f7c4b7e5b679e248a57d91ec      fashion_bags_accessories
30397  3af6d5f9fdb78f106c003ce49d7f0186                 health_beauty
31561  6109d0cae3bcb57d579bc0fab6e61814           luggage_accessories
31807  73326828aa5efe1ba096223de496f596                          food
31832  475e8a9ddbebf13af503d1c7eccadb1a              office_furniture


## SVD Model Test 2: The "New User" (Cold Start Problem)
What it is: Let's find a user who only left one review. This is the "cold start" problem. The model has very little data, so it will probably just recommend popular, generic items.

In [None]:
# Cell 15: SVD Test Case (New User)
new_users = ratings_df['customer_unique_id'].value_counts()
new_user_id = new_users[new_users == 1].index[0] # Get the first user with exactly 1 review

print(f"--- Testing SVD Model with a 'New' User ---")
print(f"User ID: {new_user_id}\n")

print(f"--- Top 10 Recommendations for this 'New' User: ---")
recommendations = get_top_n_recommendations_svd(new_user_id, n=10)

recommended_products = products_with_names[products_with_names['product_id'].isin(recommendations)]
print(recommended_products[['product_id', 'product_category_name_english']])

--- Testing SVD Model with a 'New' User ---
User ID: 7d2252746734931a8177e2680680eeeb

--- Top 10 Recommendations for this 'New' User: ---
                             product_id product_category_name_english
8565   afeeea6271148ee1bb15173b8187c431                     telephony
9073   a3ceb95649a48c0c54ae4bd1dd66d035                     telephony
9367   57e089e3103f5cda6a4ce23b77399bdb                          baby
12648  c7b3b8509e06ae21abdd78b541215cda                     perfumery
13444  c1617123e66d2491ca93ceadfd36203e                bed_bath_table
16147  79366d6a24de9351b7ca6e3cf75a68ec              small_appliances
16571  4d38a4daf13a87012b73156f834afec0                bed_bath_table
20966  e7f85e7f0203b7b95cc1b4c21b4b070c                    cool_stuff
24842  3e4176d545618ed02f382a3057de32b4           luggage_accessories
31807  73326828aa5efe1ba096223de496f596                          food


## Content-Based Model Test 1: Specific Category
What it is: Let's test Model 2. This model doesn't know about users. It only knows about product similarity. If we give it a product from health_beauty, it should give us other products from health_beauty.

In [None]:
# Cell 16: Content-Based Test (Specific Category)
specific_product = products_with_names[products_with_names['product_category_name_english'] == 'health_beauty'].sample(1)
specific_product_id = specific_product['product_id'].values[0]
original_category = specific_product['product_category_name_english'].values[0]

print(f"--- Testing Content-Based Model with a Specific Product ---")
print(f"Product ID: {specific_product_id}")
print(f"Original Category: {original_category}\n")

print(f"--- Top 10 Similar Products: ---")
# This uses the 'get_similar_products' function already in your notebook
recommendations = get_similar_products(specific_product_id, n=10)
print(recommendations)

--- Testing Content-Based Model with a Specific Product ---
Product ID: e4bf89766decbc6dd5e5c29edff02990
Original Category: health_beauty

--- Top 10 Similar Products: ---
                           product_id product_category_name_english
49   c5d8079278e912d7e3b6beb48ecb56e8                 health_beauty
62   36555a2f528d7b2a255c504191445d39                 health_beauty
75   e586ebb6022265ae1eea38f46ffe3ead                 health_beauty
80   75b4372e69a42f8ae1d908c076f547b2                 health_beauty
81   3569d4374a919941a50f57371b1dc93d                 health_beauty
91   3a6a0247ced9dcb444b46caafdcdd368                 health_beauty
92   adf591c625cb265c12bc6749d3a2f757                 health_beauty
156  50556c630443502c11acde1c320fe278                 health_beauty
157  88d2c501ec765f5d7e8038fa6aab0e62                 health_beauty
193  b29ca3d3127057c43ef4b364bbe360ea                 health_beauty


## Content-Based Model Test 2: Niche Category
What it is: Let's try another one to be sure. How about computers_accessories?

In [None]:
# Cell 17: Content-Based Test (Niche Category)
niche_product = products_with_names[products_with_names['product_category_name_english'] == 'computers_accessories'].sample(1)
niche_product_id = niche_product['product_id'].values[0]
original_category = niche_product['product_category_name_english'].values[0]

print(f"--- Testing Content-Based Model with a Niche Product ---")
print(f"Product ID: {niche_product_id}")
print(f"Original Category: {original_category}\n")

print(f"--- Top 10 Similar Products: ---")
recommendations = get_similar_products(niche_product_id, n=10)
print(recommendations)

--- Testing Content-Based Model with a Niche Product ---
Product ID: eecc2c78b528d8073b4f1c4bddf92aae
Original Category: computers_accessories

--- Top 10 Similar Products: ---
                           product_id product_category_name_english
27   c78b767da00efb70c1bcccab87c28cd5         computers_accessories
28   a0253d43394dd4da9a5d7b1f546f1a32         computers_accessories
89   c478b1bbf9ec8c5691f37ccb83187386         computers_accessories
101  a2e2851eae0aebb8ee4df32348b42e2b         computers_accessories
171  dbb399a8be7395d5b136d49fcdce13df         computers_accessories
177  8e71b24c3e25a92fef6176120a67fac7         computers_accessories
210  21db47f6493b06e8e7fc562ec9890e77         computers_accessories
239  9e48435521202c8795e21ac42efcc761         computers_accessories
264  a1bf559ac1eab015ba992bd76d9d76c7         computers_accessories
284  d68bd4dedccc5545b1ff6629de8fb021         computers_accessories


So in this project we created two different types of recommeders.

Here’s the simple breakdown of the difference:


1. ***get_top_n_recommendations_svd(user_id, n=10)***

- **What it Asks:** "Which products would this *user like the most?*"

- **How it Works (Collaborative Filtering):** This function uses the **model_svd** (our *Matrix Factorization model*). It looks at the past behavior of all users to find other users with similar tastes to the one you provided. It then recommends products that those similar users liked but this user hasn't seen yet.

- **Input:** It needs a *user_id*.

- **Key Idea:** It's all about personalization based on user behavior. It recommends what you might like, even if the product is in a totally different category from what you've bought before.





__________________________________________________________________

2. ***get_similar_products(product_id, n=10)***

- **What it Asks:** "Which products are *most similar to this product?*"

- **How it Works (Content-Based Filtering):** This function uses the *cosine_sim matrix (our NLP model)*. It ignores all users and only looks at the *product's content (in our case, its category)*. It finds other products that have the *most similar category text.*

- **Input:** It needs a *product_id*.

- **Key Idea:** It's all about similarity. It's great for "Customers who viewed this item also viewed..." or for solving the "cold start" problem (recommending new products that have no ratings yet).


__________________________________________________________________
**Simple Analogy**:

1. ***get_top_n_recommendations_svd (SVD):***  
This is like asking a friend who has the same taste in movies as you what you should watch next. They might recommend a comedy, even if you just watched a sci-fi, because they know you.



2. ***get_similar_products (Content):***   
 This is like clicking on the "sci-fi" genre tag on a streaming site. It will only show you other sci-fi movies, regardless of what you or other users like.