### User-user collaborative filtering
User-user collaborative filteringUser-user collaborative filtering is a technique commonly used in recommendation systems. Here's an intuitive way to understand it:

Analogy: Friends' Recommendations
Imagine you're looking to watch a new movie. You ask your friends for suggestions. Each of your friends has a history of movies they've watched and enjoyed. You'd likely give more weight to recommendations from friends who have a similar taste in movies to yours. For example, if both you and a friend loved "The Matrix" and "Inception," you'd probably be more inclined to watch other movies they recommend.

How User-User Collaborative Filtering Works
Similarity Calculation: The system first identifies users who are similar to the target user (you, in this case). This similarity is based on how similarly users have rated items (like movies, products, etc.). If two users have rated many items similarly, they are considered similar.

Rating Predictions: Once similar users are found, the system predicts the target user's interest in an item (a movie, for example) that they haven't yet rated or seen. This prediction is based on how the similar users have rated that item.

Recommendation Generation: The system then recommends items to the target user that have high predicted ratings, suggesting that the user will like these items based on the preferences of similar users.

Key Points
User Profiles: The system builds a profile for each user based on their past ratings or interactions.
Neighborhood Selection: It selects a "neighborhood" of users who have the most similar tastes to the target user.
Weighted Averages: The ratings from these similar users are often combined using a weighted average to predict the target user's preference for an unrated item.
Dynamic: The recommendations change as users' preferences evolve and as new users and new ratings are added to the system.
Limitations
Cold Start Problem: New users who haven't rated many items yet can be hard to match with similar users.
Scalability: As the number of users and items grows, finding the most similar users quickly becomes computationally expensive.
Diversity of Recommendations: Sometimes, the system might recommend very popular items repeatedly, leading to a lack of diversity in recommendations.
In essence, user-user collaborative filtering is like getting recommendations from a large, constantly evolving group of friends who share your tastes and preferences.

Let's create a basic example of user-user collaborative filtering using Python. We'll use a small dataset of users and their ratings for simplicity. The example will demonstrate how to calculate user similarity and then predict ratings for a user.

First, we will create a simple dataset, calculate similarity between users, and then predict ratings for a user who has not rated some items.

Step 1: Create a Simple Dataset
We'll use a Python dictionary to represent user ratings for different movies.

In [1]:
ratings = {
    'User1': {'Matrix': 5, 'Inception': 3, 'Avengers': 2},
    'User2': {'Matrix': 3, 'Inception': 5, 'Avengers': 1},
    'User3': {'Matrix': 4, 'Inception': 4, 'Avengers': 3},
    'User4': {'Inception': 4, 'Avengers': 5}
}


Step 2: Calculate Similarity
We'll use cosine similarity to measure the similarity between users. Let's define a function to calculate this.

In [2]:
import numpy as np

def cosine_similarity(user1, user2):
    common_movies = set(ratings[user1]).intersection(set(ratings[user2]))
    if not common_movies:
        return 0

    user1_ratings = np.array([ratings[user1][movie] for movie in common_movies])
    user2_ratings = np.array([ratings[user2][movie] for movie in common_movies])

    return np.dot(user1_ratings, user2_ratings) / (np.linalg.norm(user1_ratings) * np.linalg.norm(user2_ratings))


Step 3: Predict Ratings
Now, we'll write a function to predict ratings for a user. We'll average the ratings from the most similar users.

In [3]:
def predict_rating(user, movie):
    total_similarity = 0
    weighted_ratings = 0

    for other_user in ratings.keys():
        if other_user != user and movie in ratings[other_user]:
            similarity = cosine_similarity(user, other_user)
            total_similarity += similarity
            weighted_ratings += similarity * ratings[other_user][movie]

    if total_similarity == 0:
        return 0

    return weighted_ratings / total_similarity


Step 4: Run the Example
Let's predict the rating for User4 for the movie "Matrix".

In [4]:
predicted_rating = predict_rating('User4', 'Matrix')
print(f"Predicted rating for User4 for 'Matrix': {predicted_rating}")


Predicted rating for User4 for 'Matrix': 4.069678945992079


### Example Using Test and Train Data

In [5]:
import random

# Expanded dataset
ratings = {
    'User1': {'Matrix': 5, 'Inception': 3, 'Avengers': 2, 'Toy Story': 4},
    'User2': {'Matrix': 3, 'Inception': 5, 'Avengers': 1, 'Toy Story': 3},
    'User3': {'Matrix': 4, 'Inception': 4, 'Avengers': 3, 'Toy Story': 5},
    'User4': {'Inception': 4, 'Avengers': 5, 'Toy Story': 2},
    'User5': {'Matrix': 3, 'Avengers': 4, 'Toy Story': 3}
}

# Function to split data into train and test sets
def split_data(data, test_ratio=0.2):
    train_data = {}
    test_data = {}

    for user, movies in data.items():
        train_data[user] = {}
        test_data[user] = {}

        for movie, rating in movies.items():
            if random.random() < test_ratio:
                test_data[user][movie] = rating
            else:
                train_data[user][movie] = rating

    return train_data, test_data

train_ratings, test_ratings = split_data(ratings)


In [6]:
def cosine_similarity(user1, user2, data):
    common_movies = set(data[user1]).intersection(set(data[user2]))
    if not common_movies:
        return 0

    user1_ratings = np.array([data[user1][movie] for movie in common_movies])
    user2_ratings = np.array([data[user2][movie] for movie in common_movies])

    return np.dot(user1_ratings, user2_ratings) / (np.linalg.norm(user1_ratings) * np.linalg.norm(user2_ratings))


In [7]:
def predict_rating(user, movie, train_data):
    total_similarity = 0
    weighted_ratings = 0

    for other_user in train_data.keys():
        if other_user != user and movie in train_data[other_user]:
            similarity = cosine_similarity(user, other_user, train_data)
            total_similarity += similarity
            weighted_ratings += similarity * train_data[other_user][movie]

    if total_similarity == 0:
        return 0

    return weighted_ratings / total_similarity


In [8]:
def evaluate_model(train_data, test_data):
    total_error = 0
    count = 0

    for user in test_data.keys():
        for movie, actual_rating in test_data[user].items():
            predicted_rating = predict_rating(user, movie, train_data)
            total_error += abs(predicted_rating - actual_rating)
            count += 1

    return total_error / count if count else 0

error = evaluate_model(train_ratings, test_ratings)
print(f"Average prediction error: {error}")


Average prediction error: 1.0032964351234455


### Example Using sklearn

In [9]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Sample dataset
ratings = {
    'User1': {'Matrix': 5, 'Inception': 3, 'Avengers': 2, 'Toy Story': 4},
    'User2': {'Matrix': 3, 'Inception': 5, 'Avengers': 1, 'Toy Story': 3},
    'User3': {'Matrix': 4, 'Inception': 4, 'Avengers': 3, 'Toy Story': 5},
    'User4': {'Inception': 4, 'Avengers': 5, 'Toy Story': 2},
    'User5': {'Matrix': 3, 'Avengers': 4, 'Toy Story': 3}
}

# Convert the ratings dictionary to a user-item matrix
movies = sorted({movie for user_ratings in ratings.values() for movie in user_ratings})
user_ids = sorted(ratings.keys())
user_item_matrix = np.zeros((len(user_ids), len(movies)))

for i, user in enumerate(user_ids):
    for j, movie in enumerate(movies):
        user_item_matrix[i, j] = ratings[user].get(movie, 0)


In [10]:
similarity_matrix = cosine_similarity(user_item_matrix)


In [11]:
def predict_rating(user_index, movie_index):
    # Weights are the similarities of other users to the target user
    weights = similarity_matrix[user_index]

    # Ratings are the ratings of all users for the target movie
    movie_ratings = user_item_matrix[:, movie_index]

    # Exclude the target user from the calculation
    weights[user_index] = 0

    weighted_sum = np.dot(weights, movie_ratings)
    sum_of_weights = np.sum(np.abs(weights))

    if sum_of_weights == 0:
        return 0

    return weighted_sum / sum_of_weights

# Example: Predict the rating for User1 (index 0) for 'Avengers' (index 2)
user_index = 0  # User1
movie_index = 2  # 'Avengers'
predicted_rating = predict_rating(user_index, movie_index)
print(f"Predicted rating for User1 for 'Avengers': {predicted_rating}")


Predicted rating for User1 for 'Avengers': 2.7411199529949077
