<a href="https://colab.research.google.com/github/shstreuber/Data-Mining/blob/master/Module3a_Recommender_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**What is a Recommender System?**
A recommender system is a type of information filtering system that provides suggestions for items (products, movies, music, etc.) that are likely to be of interest to a user. These systems analyze patterns of user behavior and relationships between items to predict what the user might like.

Recommender systems are widely used in e-commerce (e.g., Amazon suggesting products), streaming services (e.g., Netflix recommending movies), and social media (e.g., Facebook suggesting friends or posts).

There are different approaches to building recommender systems, including:

* Collaborative Filtering: Recommends items based on the preferences of similar users.
* Content-Based Filtering: Recommends items similar to those the user has liked in the past.
* Hybrid Approaches: Combine collaborative and content-based filtering.
<center>
<img src = "https://www.nvidia.com/content/dam/en-zz/Solutions/glossary/data-science/recommendation-system/img-2.png" height = 400>
</center>

For **MORE INFORMATION**, check out [this explanation](https://www.nvidia.com/en-us/glossary/recommendation-system/).



#**Example: Building a Simple Recommender System in Python**
Let’s build a simple recommender system using the collaborative filtering approach based on measures of association like cosine similarity.

**Dataset:**
Suppose we have a dataset where users have rated different products. Here’s an example dataset:

In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# Sample data: User-Product ratings
data = {
    'User': ['User1', 'User1', 'User1', 'User2', 'User2', 'User3', 'User3', 'User4', 'User4'],
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Laptop', 'Tablet', 'Smartphone', 'Tablet', 'Laptop', 'Smartphone'],
    'Rating': [5, 3, 4, 4, 5, 5, 3, 2, 4]
}

# Create DataFrame
df = pd.DataFrame(data)
df

Unnamed: 0,User,Product,Rating
0,User1,Laptop,5
1,User1,Smartphone,3
2,User1,Tablet,4
3,User2,Laptop,4
4,User2,Tablet,5
5,User3,Smartphone,5
6,User3,Tablet,3
7,User4,Laptop,2
8,User4,Smartphone,4


##**1. Setting up a User-Product Matrix**

**User-Product Matrix**: We create a matrix where rows represent users and columns represent products. The cells contain the ratings given by the users to the products. If a user hasn't rated a product, the cell is filled with 0.

In [2]:
# Create a pivot table with Users as rows and Products as columns
user_product_matrix = df.pivot_table(index='User', columns='Product', values='Rating')

# Fill NaN values with 0 (assuming unrated products are given a rating of 0)
user_product_matrix = user_product_matrix.fillna(0)

# Display the user-product matrix
print("User-Product Matrix:")
print(user_product_matrix)

User-Product Matrix:
Product  Laptop  Smartphone  Tablet
User                               
User1       5.0         3.0     4.0
User2       4.0         0.0     5.0
User3       0.0         5.0     3.0
User4       2.0         4.0     0.0


## **2. Calculating Cosine Similarity**

* **Cosine Similarity**: We use the cosine similarity metric to measure the
similarity between users. Cosine similarity calculates the cosine of the angle between two non-zero vectors in a multi-dimensional space, which in this case, represents the users' ratings.
* **User Similarity Matrix**: We create a similarity matrix where each cell represents the similarity between two users.

In [3]:
# Calculate cosine similarity between users
user_similarity = cosine_similarity(user_product_matrix)

# Convert the similarity matrix to a DataFrame for better readability
user_similarity_df = pd.DataFrame(user_similarity, index=user_product_matrix.index, columns=user_product_matrix.index)

print("\nUser Similarity Matrix:")
print(user_similarity_df)


User Similarity Matrix:
User      User1     User2     User3     User4
User                                         
User1  1.000000  0.883452  0.654846  0.695701
User2  0.883452  1.000000  0.401754  0.279372
User3  0.654846  0.401754  1.000000  0.766965
User4  0.695701  0.279372  0.766965  1.000000


## **3. Generating Recommendations for User4**

**Recommendation Generation**: To generate recommendations for a specific user (e.g., User4):

* We find users who are similar to User4.
* We then check the products that these similar users have rated highly but that User4 has not rated yet.
* Finally, we aggregate these ratings to recommend the top-rated products to User4.

In [6]:
# Example: Recommending products for User4
# Step 1: Find users similar to User4
similar_users = user_similarity_df['User4'].sort_values(ascending=False)
print("\nUsers Similar to User4:")
print(similar_users)

# Step 2: Find products rated by similar users
# Initialize an empty DataFrame to store recommended products
recommended_products = pd.DataFrame()

# Iterate over similar users to find products they rated highly
for similar_user in similar_users.index[1:]:  # Skip User4 itself
    user_ratings = user_product_matrix.loc[similar_user]

    # Create a DataFrame with the filtered ratings
    user_ratings_df = user_ratings[user_ratings > 0].to_frame().T

    # Concatenate the DataFrame with the recommended products
    recommended_products = pd.concat([recommended_products, user_ratings_df], axis=0)

# Aggregate and recommend the top-rated products that User4 has not rated yet
recommended_products = recommended_products.mean(axis=0).sort_values(ascending=False)
recommended_products = recommended_products[user_product_matrix.loc['User4'] == 0]

print("\nRecommended Products for User4:")
print(recommended_products)


Users Similar to User4:
User
User4    1.000000
User3    0.766965
User1    0.695701
User2    0.279372
Name: User4, dtype: float64

Recommended Products for User4:
Product
Tablet    4.0
dtype: float64


## **Explanation:**
* Step 1: Find users similar to User4: This step calculates the cosine similarity between users, which is used to identify users whose preferences are similar to User4.
* Step 2: Find products rated by similar users: After identifying similar users, the code looks at the products these users have rated highly and uses this information to generate recommendations for User4. Products that User4 hasn't rated yet are considered for recommendation.
* pd.concat(): Used to concatenate the recommended products from different similar users.
* Aggregation: Finally, the ratings are aggregated (mean value) to prioritize the top recommendations that are most likely to be of interest to User4.

**Output:**

This will output the recommended products for User4 in descending order of average rating, excluding any products that User4 has already rated.