<a href="https://colab.research.google.com/github/rakeshxp2007/Machine-Learning/blob/main/collaborative_filtering_user_based_and_item_based.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **FLOW DIAGRAM OF THE ALGORITHM**
```
INPUT: User 0 wants recommendations
                        ‚Üì
STEP 1: Find all movies User 0 hasn't watched
                        ‚Üì
STEP 2: For each unwatched movie:
                        ‚Üì
    2a. Find all users who watched this movie
                        ‚Üì
    2b. Check how similar they are to User 0
                        ‚Üì
    2c. Take top 3 most similar users
                        ‚Üì
    2d. Calculate weighted average of their ratings
                        ‚Üì
    2e. That's the predicted rating!
                        ‚Üì
STEP 3: Sort all predictions (highest first)
                        ‚Üì
STEP 4: Return top 3
                        ‚Üì
OUTPUT: Movie C (4.50), Movie F (4.20), Movie D (3.80)


# **LIVE DEMO - Setting Up Our Tools**


In [None]:
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity



# **LIVE DEMO - Creating Our Rating Matrix**

In [None]:
# Create a rating matrix
# Rows = Users, Columns = Movies
# Values = Ratings (1-5 stars), 0 = Not rated yet

ratings = np.array([
    [5, 4, 0, 0, 2, 0],  # User 0
    [4, 0, 0, 3, 0, 0],  # User 1
    [0, 5, 4, 0, 0, 0],  # User 2
    [0, 0, 5, 4, 0, 5],  # User 3
    [2, 0, 0, 0, 5, 4],  # User 4
    [0, 3, 0, 0, 4, 5],  # User 5
])

# Let's give names to make it clearer
movies = ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E', 'Movie F']
users = ['User 0', 'User 1', 'User 2', 'User 3', 'User 4', 'User 5']

# Convert to a nice table format
ratings_df = pd.DataFrame(ratings, columns=movies, index=users)
print(ratings_df)

        Movie A  Movie B  Movie C  Movie D  Movie E  Movie F
User 0        5        4        0        0        2        0
User 1        4        0        0        3        0        0
User 2        0        5        4        0        0        0
User 3        0        0        5        4        0        5
User 4        2        0        0        0        5        4
User 5        0        3        0        0        4        5


# **LIVE DEMO - Finding Who's Similar to Whom**

In [None]:
# Calculate cosine similarity between ALL users
user_similarity = cosine_similarity(ratings)

# Convert to a nice table
user_similarity_df = pd.DataFrame(
    user_similarity,
    columns=users,
    index=users
)

print("\nUser Similarity Matrix:")
print(user_similarity_df)


User Similarity Matrix:
          User 0    User 1    User 2    User 3    User 4    User 5
User 0  1.000000  0.596285  0.465620  0.000000  0.444444  0.421637
User 1  0.596285  1.000000  0.000000  0.295420  0.238514  0.000000
User 2  0.465620  0.000000  1.000000  0.384473  0.000000  0.331295
User 3  0.000000  0.295420  0.384473  1.000000  0.366988  0.435194
User 4  0.444444  0.238514  0.000000  0.366988  1.000000  0.843274
User 5  0.421637  0.000000  0.331295  0.435194  0.843274  1.000000


# **LIVE DEMO - Predicting a Rating (The Magic Moment!)**

In [None]:
def predict_rating(user_id, item_id, ratings, user_similarity, k=3):
    """
    Predict what rating a user would give to an item

    user_id: Which user we're predicting for
    item_id: Which item (movie) we're predicting
    ratings: Our rating table
    user_similarity: Our similarity table
    k: How many similar users to consider (neighbors)
    """

    # Step 1: Get how similar this user is to everyone else
    similarities = user_similarity[user_id]

    # Step 2: Get ratings for this specific movie by all users
    item_ratings = ratings[:, item_id]

    # Step 3: Find who has actually watched this movie (rating > 0)
    rated_mask = item_ratings > 0

    # Step 4: Filter to get only relevant similarities and ratings
    relevant_similarities = similarities[rated_mask]
    relevant_ratings = item_ratings[rated_mask]

    # Step 5: Pick the top-k most similar users
    if len(relevant_similarities) > k:
        top_k_indices = np.argsort(relevant_similarities)[-k:]
        relevant_similarities = relevant_similarities[top_k_indices]
        relevant_ratings = relevant_ratings[top_k_indices]

    # Step 6: Calculate weighted average
    if np.sum(relevant_similarities) == 0:
        return 0  # No similar users found - can't predict

    predicted_rating = np.sum(relevant_similarities * relevant_ratings) / np.sum(relevant_similarities)

    return predicted_rating

# Let's predict: What would User 0 rate Movie C?
user_idx = 0
movie_idx = 2  # Movie C is column 2
predicted = predict_rating(user_idx, movie_idx, ratings, user_similarity, k=3)

print(f"\nüé¨ Predicted rating for {users[user_idx]} on {movies[movie_idx]}: {predicted:.2f} stars")


üé¨ Predicted rating for User 0 on Movie C: 4.00 stars


# **LIVE DEMO - Getting Full Recommendations**

In [None]:
def get_recommendations(user_id, ratings, user_similarity, n_recommendations=3):
    """
    Get top N recommendations for a user
    """
    # Step 1: Find all movies this user hasn't watched yet
    user_ratings = ratings[user_id]
    unrated_items = np.where(user_ratings == 0)[0]

    # Step 2: Predict ratings for ALL unwatched movies
    predictions = []
    for item_id in unrated_items:
        pred = predict_rating(user_id, item_id, ratings, user_similarity)
        predictions.append((item_id, pred))

    # Step 3: Sort by predicted rating (highest first)
    predictions.sort(key=lambda x: x[1], reverse=True)

    # Step 4: Return top N recommendations
    return predictions[:n_recommendations]

# Get recommendations for User 0
user_idx = 0
recommendations = get_recommendations(user_idx, ratings, user_similarity, n_recommendations=3)

print(f"\nüéØ Top 3 Recommendations for {users[user_idx]}:")
print("=" * 50)
for item_id, predicted_rating in recommendations:
    print(f"  üé¨ {movies[item_id]}: Predicted rating {predicted_rating:.2f} ‚≠ê")


üéØ Top 3 Recommendations for User 0:
  üé¨ Movie F: Predicted rating 4.49 ‚≠ê
  üé¨ Movie C: Predicted rating 4.00 ‚≠ê
  üé¨ Movie D: Predicted rating 3.00 ‚≠ê


# **ITEM BASED COLLABORATIVE FILTERING**

# **LIVE DEMO - Setting Up Our Data**


In [None]:
print("Our Rating Matrix:")
print(ratings_df)


Our Rating Matrix:
        Movie A  Movie B  Movie C  Movie D  Movie E  Movie F
User 0        5        4        0        0        2        0
User 1        4        0        0        3        0        0
User 2        0        5        4        0        0        0
User 3        0        0        5        4        0        5
User 4        2        0        0        0        5        4
User 5        0        3        0        0        4        5


# **LIVE DEMO - Calculating Item Similarities (The Key Difference!)**

In [None]:
# Calculate item similarity
# We TRANSPOSE the ratings matrix so items become rows
item_similarity = cosine_similarity(ratings.T)  # Note the .T for transpose!

# Create a nice table to visualize
item_similarity_df = pd.DataFrame(
    item_similarity,
    columns=movies,
    index=movies
)

print("\nItem Similarity Matrix:")
print(item_similarity_df)

print("\nLet's round to 2 decimals for clarity:")
print(item_similarity_df.round(2))


Item Similarity Matrix:
          Movie A   Movie B   Movie C   Movie D   Movie E   Movie F
Movie A  1.000000  0.421637  0.000000  0.357771  0.444444  0.146795
Movie B  0.421637  1.000000  0.441726  0.000000  0.421637  0.261116
Movie C  0.000000  0.441726  1.000000  0.624695  0.000000  0.480592
Movie D  0.357771  0.000000  0.624695  1.000000  0.000000  0.492366
Movie E  0.444444  0.421637  0.000000  0.000000  1.000000  0.733976
Movie F  0.146795  0.261116  0.480592  0.492366  0.733976  1.000000

Let's round to 2 decimals for clarity:
         Movie A  Movie B  Movie C  Movie D  Movie E  Movie F
Movie A     1.00     0.42     0.00     0.36     0.44     0.15
Movie B     0.42     1.00     0.44     0.00     0.42     0.26
Movie C     0.00     0.44     1.00     0.62     0.00     0.48
Movie D     0.36     0.00     0.62     1.00     0.00     0.49
Movie E     0.44     0.42     0.00     0.00     1.00     0.73
Movie F     0.15     0.26     0.48     0.49     0.73     1.00


# **LIVE DEMO - Item-Based Prediction Function (The Core Algorithm)**

In [None]:
def predict_rating_item_based(user_id, item_id, ratings, item_similarity, k=3):
    """
    Predict rating using item-based collaborative filtering

    user_id: Which user we're predicting for
    item_id: Which item we want to predict rating for
    ratings: Our rating matrix
    item_similarity: Our item similarity matrix
    k: How many similar items to consider
    """

    # STEP 1: Get this user's ratings for all items
    user_ratings = ratings[user_id]

    # STEP 2: Get how similar the target item is to all other items
    item_similarities = item_similarity[item_id]

    # STEP 3: Find which items this user has actually rated
    rated_mask = user_ratings > 0

    # STEP 4: Keep only similarities and ratings for items user rated
    relevant_similarities = item_similarities[rated_mask]
    relevant_ratings = user_ratings[rated_mask]

    # STEP 5: Remove the target item itself if user already rated it
    # (we don't want item to predict itself!)
    if rated_mask[item_id]:
        idx = np.where(rated_mask)[0].tolist().index(item_id)
        relevant_similarities = np.delete(relevant_similarities, idx)
        relevant_ratings = np.delete(relevant_ratings, idx)

    # STEP 6: Select only the top-k most similar items
    if len(relevant_similarities) > k:
        top_k_indices = np.argsort(relevant_similarities)[-k:]
        relevant_similarities = relevant_similarities[top_k_indices]
        relevant_ratings = relevant_ratings[top_k_indices]

    # STEP 7: Calculate weighted average
    if np.sum(np.abs(relevant_similarities)) == 0:
        return 0  # No similar items found - can't predict

    predicted_rating = np.sum(relevant_similarities * relevant_ratings) / np.sum(np.abs(relevant_similarities))

    return predicted_rating

# **LIVE DEMO - Testing Item-Based Prediction (The Moment of Truth!)**

In [None]:
# Predict: What would User 0 rate Movie C using ITEM-BASED approach?
user_idx = 0
movie_idx = 2  # Movie C

predicted_item_based = predict_rating_item_based(
    user_idx,
    movie_idx,
    ratings,
    item_similarity,
    k=3
)

print(f"\nüé¨ Item-Based Prediction:")
print(f"{users[user_idx]} rating for {movies[movie_idx]}: {predicted_item_based:.2f} ‚≠ê")

# Let's compare with user-based prediction
predicted_user_based = predict_rating(user_idx, movie_idx, ratings, user_similarity, k=3)

print(f"\nüìä COMPARISON:")
print(f"User-Based Prediction: {predicted_user_based:.2f} ‚≠ê")
print(f"Item-Based Prediction: {predicted_item_based:.2f} ‚≠ê")


üé¨ Item-Based Prediction:
User 0 rating for Movie C: 4.00 ‚≠ê

üìä COMPARISON:
User-Based Prediction: 4.00 ‚≠ê
Item-Based Prediction: 4.00 ‚≠ê


# **LIVE DEMO - Item-Based Recommendations (Complete System!)**

In [None]:
def get_recommendations_item_based(user_id, ratings, item_similarity, n_recommendations=3):
    """
    Get top N recommendations using item-based CF
    """
    # STEP 1: Find items the user hasn't rated yet
    user_ratings = ratings[user_id]
    unrated_items = np.where(user_ratings == 0)[0]

    # STEP 2: Predict ratings for ALL unrated items
    predictions = []
    for item_id in unrated_items:
        pred = predict_rating_item_based(
            user_id,
            item_id,
            ratings,
            item_similarity
        )
        predictions.append((item_id, pred))

    # STEP 3: Sort by predicted rating (highest first)
    predictions.sort(key=lambda x: x[1], reverse=True)

    # STEP 4: Return top N
    return predictions[:n_recommendations]

# Get recommendations for User 0
user_idx = 0
recommendations_item = get_recommendations_item_based(
    user_idx,
    ratings,
    item_similarity,
    n_recommendations=3
)

print(f"\nüéØ Top 3 Recommendations for {users[user_idx]} (ITEM-BASED):")
print("=" * 60)
for item_id, predicted_rating in recommendations_item:
    print(f"  üé¨ {movies[item_id]}: Predicted rating {predicted_rating:.2f} ‚≠ê")


üéØ Top 3 Recommendations for User 0 (ITEM-BASED):
  üé¨ Movie D: Predicted rating 5.00 ‚≠ê
  üé¨ Movie C: Predicted rating 4.00 ‚≠ê
  üé¨ Movie F: Predicted rating 2.84 ‚≠ê


# **LIVE DEMO - Head-to-Head Comparison (User-Based vs Item-Based)**

In [None]:
# Get recommendations from BOTH approaches
recommendations_user = get_recommendations(user_idx, ratings, user_similarity, n_recommendations=3)

print(f"\n{'='*60}")
print(f"üî• BATTLE OF THE ALGORITHMS üî•")
print(f"Recommendations for {users[user_idx]}")
print(f"{'='*60}")

print("\nüë• USER-BASED Collaborative Filtering:")
for item_id, predicted_rating in recommendations_user:
    print(f"  {movies[item_id]}: {predicted_rating:.2f} ‚≠ê")

print("\nüì¶ ITEM-BASED Collaborative Filtering:")
for item_id, predicted_rating in recommendations_item:
    print(f"  {movies[item_id]}: {predicted_rating:.2f} ‚≠ê")

# Let's also show what User 0 already loved
print(f"\n‚ù§Ô∏è  {users[user_idx]}'s Existing High Ratings:")
user_high_ratings = np.where(ratings[user_idx] >= 4)[0]
for item_id in user_high_ratings:
    print(f"  {movies[item_id]}: {ratings[user_idx][item_id]} ‚≠ê")


üî• BATTLE OF THE ALGORITHMS üî•
Recommendations for User 0

üë• USER-BASED Collaborative Filtering:
  Movie F: 4.49 ‚≠ê
  Movie C: 4.00 ‚≠ê
  Movie D: 3.00 ‚≠ê

üì¶ ITEM-BASED Collaborative Filtering:
  Movie D: 5.00 ‚≠ê
  Movie C: 4.00 ‚≠ê
  Movie F: 2.84 ‚≠ê

‚ù§Ô∏è  User 0's Existing High Ratings:
  Movie A: 5 ‚≠ê
  Movie B: 4 ‚≠ê


# **LIVE DEMO - Understanding the Reasoning (Looking Under the Hood)**

In [None]:
print("\n" + "="*60)
print("üß† UNDERSTANDING THE REASONING")
print("="*60)

# For ITEM-BASED: Show the logic
print("\nüì¶ Item-Based Logic for Movie C:")
movie_c_idx = 2

print(f"\nMovies similar to Movie C that User 0 has watched:")
similarities_to_c = item_similarity[movie_c_idx]

for i, sim in enumerate(similarities_to_c):
    if i != movie_c_idx and sim > 0.3:  # Show items with similarity > 0.3
        user_rating = ratings[user_idx][i]
        if user_rating > 0:
            print(f"  üé¨ {movies[i]}: Similarity={sim:.2f}, User 0 rated it {user_rating} ‚≠ê")

# For USER-BASED: Show the logic
print("\nüë• User-Based Logic for Movie C:")
print(f"\nUsers similar to User 0 who rated Movie C:")

for i, sim in enumerate(user_similarity[user_idx]):
    if i != user_idx and sim > 0.3:  # Show users with similarity > 0.3
        user_rating_for_c = ratings[i][movie_c_idx]
        if user_rating_for_c > 0:
            print(f"  üë§ {users[i]}: Similarity={sim:.2f}, rated Movie C as {user_rating_for_c} ‚≠ê")


üß† UNDERSTANDING THE REASONING

üì¶ Item-Based Logic for Movie C:

Movies similar to Movie C that User 0 has watched:
  üé¨ Movie B: Similarity=0.44, User 0 rated it 4 ‚≠ê

üë• User-Based Logic for Movie C:

Users similar to User 0 who rated Movie C:
  üë§ User 2: Similarity=0.47, rated Movie C as 4 ‚≠ê
