<a href="https://colab.research.google.com/github/karim-mammadov/My_Elevvo_Pathways_Tasks/blob/main/Movie_Recommendation_System_Description_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Movie Recommendation System

## Description  
The goal of this project is to build a **movie recommendation system** using the *MovieLens 100K Dataset* (Kaggle).  
The system recommends movies to users based on **user similarity** and evaluates the recommendations’ quality.  

### Steps:  
- Load and explore the **MovieLens 100K Dataset**  
- Build a **user-item matrix** and compute similarity scores between users  
- Recommend **top-rated unseen movies** for a given user  
- Evaluate recommendation performance using **Precision@K**  

---

## Tools & Libraries  
- Python  
- Pandas  
- Numpy  
- Scikit-learn  

---

## Covered Topics  
- Recommendation Systems  
- Similarity-based Modeling  
- User-based Collaborative Filtering  



In [None]:
from sklearn.decomposition import TruncatedSVD

svd = TruncatedSVD(n_components=50, random_state=42)
user_item_matrix_svd = svd.fit_transform(train_user_item_matrix)

display(user_item_matrix_svd[:5, :5])

array([[33.04650187, -2.71374814,  0.90867445, 12.16879492, -0.71087954],
       [ 7.69321419,  9.62122904,  9.57396918, -3.00153916, -2.06813183],
       [ 2.79902734,  5.11982133,  4.93111308, -1.91145275,  6.8383073 ],
       [ 2.89057423,  4.44516779,  2.69968603, -1.92862262,  4.23165024],
       [16.64896274, -0.44430783, -7.91063368,  2.05571069, -0.62322442]])

In [None]:
user_item_matrix = ratings_df.pivot(index='userId', columns='movieId', values='rating').fillna(0)
display(user_item_matrix.head())

movieId,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,3.0,4.0,3.0,3.0,5.0,4.0,1.0,5.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
import pandas as pd

ratings_df = pd.read_csv('/content/u.data', sep='\t', header=None, names=['userId', 'movieId', 'rating', 'timestamp'])

display(ratings_df.head())

Unnamed: 0,userId,movieId,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


### Subtask:


In [None]:
def recommend_movies_svd(user_id, user_item_matrix, user_item_matrix_svd, svd_model, k=10):
    """
    Recommends top-rated unseen movies for a given user based on SVD results.

    Args:
        user_id (int): The ID of the target user.
        user_item_matrix (pd.DataFrame): The original user-item matrix.
        user_item_matrix_svd (np.ndarray): The SVD transformed user-item matrix.
        svd_model (TruncatedSVD): The fitted SVD model.
        k (int): The number of recommendations to return.

    Returns:
        list: A list of recommended item IDs.
    """
    if user_id not in user_item_matrix.index:
        return []

    user_index = user_item_matrix.index.get_loc(user_id)

    user_latent_factors = user_item_matrix_svd[user_index]

    item_latent_factors = svd_model.components_.T

    predicted_ratings = user_latent_factors.dot(item_latent_factors.T)

    predicted_ratings_series = pd.Series(predicted_ratings, index=user_item_matrix.columns)

    target_user_original_ratings = user_item_matrix.loc[user_id]

    unseen_items = target_user_original_ratings[target_user_original_ratings == 0].index

    predicted_ratings_unseen = predicted_ratings_series.loc[unseen_items]

    recommended_items = predicted_ratings_unseen.sort_values(ascending=False).index.tolist()

    return recommended_items[:k]

user_id_to_recommend = 15
k = 10
recommendations_svd = recommend_movies_svd(user_id_to_recommend, user_item_matrix, user_item_matrix_svd, svd, k=k)
print(f"Top {k} recommendations for user {user_id_to_recommend} (SVD): {recommendations_svd}")

Top 10 recommendations for user 15 (SVD): [304, 284, 597, 242, 116, 100, 756, 245, 124, 126]


### Subtask:
Evaluate performance using precision at K for SVD-based recommendations.

**Reasoning**:
Define the function to calculate precision at K and then use it to evaluate the SVD recommendations.

In [None]:
### Assumptions for Movie Recommendation System

# Before evaluating the recommendation performance, we assume the following variables are already defined:

# - `test_relevant_items_dict`: A dictionary where each key is a `user_id` and the corresponding value is a list of **relevant movie_ids** in the test set.
# - `train_user_item_matrix`: The **user-item matrix** created from the training data.
# - `user_item_matrix_svd`: The **SVD-transformed user-item matrix** from the training data.
# - `svd`: The **fitted TruncatedSVD model** used to reduce dimensionality of the user-item matrix.


def precision_at_k(recommended_items, relevant_items, k):
    """
    Calculates Precision at K.

    Args:
        recommended_items (list): A list of recommended item IDs.
        relevant_items (list): A list of relevant item IDs.
        k (int): The number of recommendations considered.

    Returns:
        float: The precision at K score.
    """
    set_recommended = set(recommended_items[:k])
    set_relevant = set(relevant_items)
    hit_items = set_recommended.intersection(set_relevant)
    return len(hit_items) / k if k > 0 else 0

total_precision_svd = 0
num_users_with_recommendations_svd = 0

for user_id in test_user_item_matrix.index:
    relevant_items = test_relevant_items_dict.get(user_id, [])

    if relevant_items:
        recommendations = recommend_movies_svd(user_id, train_user_item_matrix, user_item_matrix_svd, svd, k=k)

        if recommendations:
            precision = precision_at_k(recommendations, relevant_items, k)
            total_precision_svd += precision
            num_users_with_recommendations_svd += 1

average_precision_at_k_svd = total_precision_svd / num_users_with_recommendations_svd if num_users_with_recommendations_svd > 0 else 0

print(f"Average Precision at K (SVD)@{k}: {average_precision_at_k_svd}")

Average Precision at K (SVD)@10: 0.3038297872340417


### Data Splitting for Evaluation

**Reasoning**:
Split the data into training and testing sets to evaluate the recommendation systems. Create training and testing user-item matrices and a dictionary of relevant items for the test set.

In [None]:
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(ratings_df, test_size=0.2, random_state=42)
train_user_item_matrix = train_df.pivot(index='userId', columns='movieId', values='rating').fillna(0)
test_user_item_matrix = test_df.pivot(index='userId', columns='movieId', values='rating').fillna(0)
test_relevant_items_dict = test_df.groupby('userId')['movieId'].apply(list).to_dict()

print("Data split and matrices created.")
display(train_user_item_matrix.head())
display(test_user_item_matrix.head())

Data split and matrices created.


movieId,1,2,3,4,5,6,7,8,9,10,...,1668,1670,1671,1672,1673,1676,1678,1679,1680,1681
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,3.0,4.0,0.0,3.0,0.0,4.0,0.0,5.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


movieId,1,2,3,4,5,6,7,8,9,10,...,1648,1649,1655,1656,1658,1669,1674,1675,1677,1682
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,0.0,0.0,3.0,0.0,5.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Compare and Summarize

**Reasoning**:
Compare the average precision at K scores for each recommendation method and summarize the findings.

In [None]:
total_precision_item_based = 0
num_users_with_recommendations_item_based = 0

for user_id in test_user_item_matrix.index:
    relevant_items = test_relevant_items_dict.get(user_id, [])

    if relevant_items:
        recommendations = recommend_movies_item_based(user_id, train_user_item_matrix, item_similarity_df, k=k)

        if recommendations:
            precision = precision_at_k(recommendations, relevant_items, k)
            total_precision_item_based += precision
            num_users_with_recommendations_item_based += 1

average_precision_at_k_item_based = total_precision_item_based / num_users_with_recommendations_item_based if num_users_with_recommendations_item_based > 0 else 0

print(f"Average Precision at K (Item-Based Collaborative Filtering)@{k}: {average_precision_at_k_item_based}")

Average Precision at K (Item-Based Collaborative Filtering)@10: 0.0010638297872340424


In [None]:
def recommend_movies_item_based(user_id, user_item_matrix, item_similarity_df, k=10):
    """
    Recommends top-rated unseen movies for a given user based on item similarity.

    Args:
        user_id (int): The ID of the target user.
        user_item_matrix (pd.DataFrame): The user-item matrix.
        item_similarity_df (pd.DataFrame): The item similarity matrix.
        k (int): The number of recommendations to return.

    Returns:
        list: A list of recommended item IDs.
    """
    if user_id not in user_item_matrix.index:
        return []

    target_user_ratings = user_item_matrix.loc[user_id]

    unseen_items = target_user_ratings[target_user_ratings == 0].index

    predicted_ratings = {}
    for item_id in unseen_items:
        item_similarities = item_similarity_df.loc[item_id]

        rated_items = target_user_ratings[target_user_ratings > 0].index

        relevant_items = item_similarities.loc[rated_items]
        relevant_items = relevant_items[relevant_items > 0]

        if not relevant_items.empty:
            ratings_of_relevant_items = target_user_ratings.loc[relevant_items.index]

            sum_of_products = (relevant_items * ratings_of_relevant_items).sum()
            sum_of_similarities = relevant_items.sum()

            if sum_of_similarities > 0:
                predicted_ratings[item_id] = sum_of_products / sum_of_similarities

    recommended_items = sorted(predicted_ratings.items(), key=lambda item: item[1], reverse=True)

    return [item[0] for item in recommended_items[:k]]

user_id_to_recommend = 15
recommendations_item_based = recommend_movies_item_based(user_id_to_recommend, user_item_matrix, item_similarity_df, k=k)
print(f"Top {k} recommendations for user {user_id_to_recommend} (Item-Based): {recommendations_item_based}")

Top 10 recommendations for user 15 (Item-Based): [1546, 1548, 1557, 1559, 1561, 1562, 1563, 1564, 1565, 1566]


In [None]:
from sklearn.metrics.pairwise import cosine_similarity

item_user_matrix = user_item_matrix.T
item_similarity_matrix = cosine_similarity(item_user_matrix)
item_similarity_df = pd.DataFrame(item_similarity_matrix, index=item_user_matrix.index, columns=item_user_matrix.index)
display(item_similarity_df.head())

movieId,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.402382,0.330245,0.454938,0.286714,0.116344,0.620979,0.481114,0.496288,0.273935,...,0.035387,0.0,0.0,0.0,0.035387,0.0,0.0,0.0,0.047183,0.047183
2,0.402382,1.0,0.273069,0.502571,0.318836,0.083563,0.383403,0.337002,0.255252,0.171082,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078299,0.078299
3,0.330245,0.273069,1.0,0.324866,0.212957,0.106722,0.372921,0.200794,0.273669,0.158104,...,0.0,0.0,0.0,0.0,0.032292,0.0,0.0,0.0,0.0,0.096875
4,0.454938,0.502571,0.324866,1.0,0.334239,0.090308,0.489283,0.490236,0.419044,0.252561,...,0.0,0.0,0.094022,0.094022,0.037609,0.0,0.0,0.0,0.056413,0.075218
5,0.286714,0.318836,0.212957,0.334239,1.0,0.037299,0.334769,0.259161,0.272448,0.055453,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.094211


In [None]:
### Assumptions for User-Based Movie Recommendation

# Before evaluating the recommendations, we assume the following variables are already defined:

# - `test_relevant_items_dict`: A dictionary where each key is a `user_id` and the value is a list of **relevant movie_ids** in the test set.
# - `train_user_item_matrix`: The **user-item matrix** created from the training data.
# - `user_similarity_df`: The **user similarity matrix**, where each entry represents the similarity between two users.


total_precision_user_based = 0
num_users_with_recommendations_user_based = 0

for user_id in test_user_item_matrix.index:
    relevant_items = test_relevant_items_dict.get(user_id, [])

    if relevant_items:
        recommendations = recommend_movies_user_based(user_id, train_user_item_matrix, user_similarity_df, k=k)

        if recommendations:
            precision = precision_at_k(recommendations, relevant_items, k)
            total_precision_user_based += precision
            num_users_with_recommendations_user_based += 1

average_precision_at_k_user_based = total_precision_user_based / num_users_with_recommendations_user_based if num_users_with_recommendations_user_based > 0 else 0

print(f"Average Precision at K (User-Based Collaborative Filtering)@{k}: {average_precision_at_k_user_based}")

Average Precision at K (User-Based Collaborative Filtering)@10: 0.0010638297872340426


## Implement User-Based Collaborative Filtering


In [None]:
from sklearn.metrics.pairwise import cosine_similarity

user_similarity_matrix = cosine_similarity(train_user_item_matrix)
user_similarity_df = pd.DataFrame(user_similarity_matrix, index=train_user_item_matrix.index, columns=train_user_item_matrix.index)

display(user_similarity_df.head())

userId,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.136196,0.030424,0.026203,0.284613,0.331412,0.319056,0.274139,0.083486,0.281396,...,0.277459,0.084849,0.205849,0.144161,0.133679,0.092367,0.216948,0.084181,0.104599,0.329288
2,0.136196,1.0,0.114644,0.16822,0.093128,0.162165,0.095848,0.09136,0.149476,0.125701,...,0.149359,0.268977,0.320095,0.323347,0.241012,0.152655,0.230951,0.117484,0.166632,0.096719
3,0.030424,0.114644,1.0,0.346894,0.0,0.085071,0.032829,0.053875,0.060177,0.052552,...,0.021713,0.017707,0.154299,0.049358,0.107604,0.019022,0.101207,0.021959,0.127179,0.013805
4,0.026203,0.16822,0.346894,1.0,0.011848,0.051287,0.075209,0.1421,0.060465,0.035202,...,0.034908,0.04448,0.087428,0.118082,0.100612,0.0,0.151086,0.110324,0.112342,0.032367
5,0.284613,0.093128,0.0,0.011848,1.0,0.168527,0.298438,0.18529,0.039737,0.166013,...,0.276012,0.103529,0.085547,0.072429,0.104445,0.049198,0.204472,0.148028,0.099978,0.247527


### Subtask:
Recommend top-rated unseen movies for a given user based on user similarity.

**Reasoning**:
Define the function to recommend top-rated unseen movies for a given user based on user similarity.

In [None]:
def recommend_movies_user_based(user_id, user_item_matrix, user_similarity_df, k=10):
    """
    Recommends top-rated unseen movies for a given user based on user similarity.

    Args:
        user_id (int): The ID of the target user.
        user_item_matrix (pd.DataFrame): The user-item matrix.
        user_similarity_df (pd.DataFrame): The user similarity matrix.
        k (int): The number of recommendations to return.

    Returns:
        list: A list of recommended item IDs.
    """
    if user_id not in user_item_matrix.index:
        return []

    target_user_ratings = user_item_matrix.loc[user_id]

    unseen_items = target_user_ratings[target_user_ratings == 0].index

    predicted_ratings = {}
    for item_id in unseen_items:
        item_ratings_by_other_users = user_item_matrix[item_id]
        user_similarities = user_similarity_df.loc[user_id]
        relevant_users = item_ratings_by_other_users[item_ratings_by_other_users > 0].index
        relevant_users = relevant_users.intersection(user_similarities[user_similarities > 0].index)

        if not relevant_users.empty:
            ratings_by_relevant_users = item_ratings_by_other_users.loc[relevant_users]
            similarities_with_relevant_users = user_similarities.loc[relevant_users]
            sum_of_products = (similarities_with_relevant_users * ratings_by_relevant_users).sum()
            sum_of_similarities = similarities_with_relevant_users.sum()

            if sum_of_similarities > 0:
                predicted_ratings[item_id] = sum_of_products / sum_of_similarities

    recommended_items = sorted(predicted_ratings.items(), key=lambda item: item[1], reverse=True)
    return [item[0] for item in recommended_items[:k]]

user_id_to_recommend = 15
recommendations_user_based = recommend_movies_user_based(user_id_to_recommend, train_user_item_matrix, user_similarity_df, k=k)
print(f"Top {k} recommendations for user {user_id_to_recommend} (User-Based): {recommendations_user_based}")

Top 10 recommendations for user 15 (User-Based): [850, 1189, 1201, 1293, 1306, 1467, 1500, 1612, 1629, 1642]


In [None]:
print(f"Average Precision at K (User-Based Collaborative Filtering)@{k}: {average_precision_at_k_user_based}")
print(f"Average Precision at K (Item-Based Collaborative Filtering)@{k}: {average_precision_at_k_item_based}")
print(f"Average Precision at K (SVD)@{k}: {average_precision_at_k_svd}")


Average Precision at K (User-Based Collaborative Filtering)@10: 0.0010638297872340426
Average Precision at K (Item-Based Collaborative Filtering)@10: 0.0010638297872340424
Average Precision at K (SVD)@10: 0.3038297872340417


## Summary and Conclusion

summary_recommendation = f"""
## Movie Recommendation System – Summary and Conclusion

Based on the average **Precision at K (P@10)** scores:

- **SVD-based Matrix Factorization:** {average_precision_at_k_svd:.4f}  
- **User-Based Collaborative Filtering:** {average_precision_at_k_user_based:.4f}  
- **Item-Based Collaborative Filtering:** {average_precision_at_k_item_based:.4f}  

In this implementation, the **SVD-based matrix factorization** model performed significantly better than both **user-based** and **item-based collaborative filtering** for recommending relevant movies within the top 10 recommendations.

This indicates that capturing **latent factors** of users and items via SVD is more effective in predicting unseen ratings and identifying relevant movies compared to relying solely on explicit similarity.

**Potential Improvements:**  
- Tune the number of components in SVD  
- Try other matrix factorization methods  
- Incorporate additional features (e.g., movie genres, release dates)  
- Use more sophisticated evaluation metrics
"""

from IPython.display import Markdown, display
display(Markdown(summary_recommendation))
