# Flipkart Product Reccomendation - Recommendation Systems

##Welcome to this presentation on building an advanced recommender system for e-commerce platforms. In this presentation, we'll delve into the various stages and techniques involved in creating a personalized recommendation engine to enhance user experiences.

#Introdction

##Our objective is to develop a recommender system that suggests products to users based on their historical interactions and product attributes. This involves analyzing user interactions, utilizing product information, implementing collaborative and content-based filtering techniques, and evaluating the system's performance.

#Data Preprocessin

###We load the 'ratings.csv' dataset and examine its structure. Handling missing values and encoding categorical variables like gender and category into numerical representations, if needed, ensures our data is suitable for analysis

In [10]:
import pandas as pd

# Load the ratings dataset
ratings_df = pd.read_csv("ratings.csv")

# Display basic information about the dataset
print("Dataset information:")
print(ratings_df.info())

# Handle missing values (if any)
if ratings_df.isnull().any().any():
    ratings_df.dropna(inplace=True)
    print("Missing values have been dropped.")

# Encode categorical variables like gender and category
# Assuming 'gender' is a categorical variable that needs encoding
gender_mapping = {'Male': 0, 'Female': 1}
ratings_df['gender_encoded'] = ratings_df['gender'].map(gender_mapping)

# Assuming 'category' is a categorical variable that needs encoding
categories = ratings_df['category'].unique()
category_mapping = {cat: idx for idx, cat in enumerate(categories)}
ratings_df['category_encoded'] = ratings_df['category'].map(category_mapping)

# Display the updated dataset
print("\nUpdated dataset:")
print(ratings_df.head())

# Save the preprocessed dataset
ratings_df.to_csv("preprocessed_ratings.csv", index=False)
print("\nPreprocessed dataset saved as 'preprocessed_ratings.csv'")


Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   user_id            2500 non-null   int64  
 1   age                2500 non-null   int64  
 2   gender             2500 non-null   object 
 3   interaction_score  2500 non-null   float64
 4   product_id         2500 non-null   object 
 5   product_name       2500 non-null   object 
 6   category           2500 non-null   object 
 7   Image Link         2500 non-null   object 
 8   Rating             2500 non-null   int64  
dtypes: float64(1), int64(3), object(5)
memory usage: 175.9+ KB
None

Updated dataset:
   user_id  age gender  interaction_score product_id  \
0     1480   32   Male           0.479977       WMPW   
1     1480   32   Male           0.104574       TZMG   
2     1480   32   Male           0.774475       RLXG   
3     1480   32   Male           0

#user profiling

###User profiles are created by analyzing factors like age, gender, and past interactions. We incorporate user preferences, such as their preferred product categories. These profiles serve as the basis for personalized recommendations.

In [11]:
import pandas as pd

# Load the dataset
ratings_df = pd.read_csv("ratings.csv")

# Calculate average interaction score for each user
user_avg_score = ratings_df.groupby("user_id")["interaction_score"].mean()

# Calculate the number of interactions for each user
user_interaction_count = ratings_df.groupby("user_id")["interaction_score"].count()

# Calculate the average rating given by each user
user_avg_rating = ratings_df.groupby("user_id")["Rating"].mean()

# Combine user profiling features into a DataFrame
user_profile = pd.DataFrame({
    "user_id": user_avg_score.index,
    "avg_interaction_score": user_avg_score,
    "interaction_count": user_interaction_count,
    "avg_rating": user_avg_rating
})

# Print the user profiles
print(user_profile)

# Example: Get the user profile for a specific user ID
target_user_id = 1480
target_user_profile = user_profile[user_profile["user_id"] == target_user_id]

print("\nTarget User Profile:")
print(target_user_profile)


         user_id  avg_interaction_score  interaction_count  avg_rating
user_id                                                               
1000        1000               0.565311                 50        4.18
1149        1149               0.466603                 50        3.88
1426        1426               0.499101                 50        4.08
1480        1480               0.535363                 50        4.00
1995        1995               0.540730                 50        3.56
2010        2010               0.464280                 50        4.16
2114        2114               0.504821                 50        4.50
2360        2360               0.446263                 50        4.74
2374        2374               0.534615                 50        3.50
2392        2392               0.467369                 50        3.96
2502        2502               0.522899                 50        4.14
2653        2653               0.517779                 50        4.60
2656  

#product Popularity

###Product popularity scores can be based on interaction frequency or average ratings. We consider time decay factors to account for recency, ensuring our recommendations stay relevant.

In [12]:
import pandas as pd

# Load the dataset
ratings_df = pd.read_csv("ratings.csv")

# Calculate the total number of interactions for each product
product_popularity = ratings_df.groupby("product_id")["interaction_score"].count()

# Sort products by popularity in descending order
sorted_popularity = product_popularity.sort_values(ascending=False)

# Print the top 10 most popular products
print("Top 10 Most Popular Products:")
print(sorted_popularity.head(10))

# Example: Get the popularity for a specific product ID
target_product_id = "WMPW"
target_product_popularity = product_popularity.get(target_product_id, 0)

print("\nTarget Product Popularity:")
print(f"Product ID: {target_product_id}")
print(f"Popularity: {target_product_popularity}")


Top 10 Most Popular Products:
product_id
SCRX    2
KQWO    2
QHSD    2
QSWI    2
DWDF    2
ONQV    2
RKFC    1
RQDB    1
RNYG    1
RODU    1
Name: interaction_score, dtype: int64

Target Product Popularity:
Product ID: WMPW
Popularity: 1


###Collaborative filtering finds users similar to the target user and recommends products they haven't interacted with. This method leverages user behavior to generate meaningful recommendations.

###Content-based filtering utilizes product attributes like category, image, and description. By creating item profiles, we recommend products similar to those a user has shown interest in

###Hybrid approaches strike a balance between both methods. By assigning appropriate weights, we optimize the recommendation process, enhancing user satisfaction

In [13]:
!pip install scikit-surprise




In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Load the dataset
data = pd.read_csv('ratings.csv')

user_item_matrix = data.pivot(index='user_id', columns='product_id', values='interaction_score')
user_item_matrix = user_item_matrix.fillna(0)

user_similarity = cosine_similarity(user_item_matrix)

product_profile_matrix = data.pivot_table(index='product_id', columns='category', values='interaction_score', fill_value=0)

def get_personalized_rankings(user_id, top_n=5):
    if user_id not in user_item_matrix.index:
        return []

    user_index = user_item_matrix.index.get_loc(user_id)

    # Collaborative Filtering: Calculate weighted scores based on user similarity
    collaborative_scores = user_similarity[user_index] @ user_item_matrix.values
    collaborative_ranking = list(user_item_matrix.columns[np.argsort(-collaborative_scores)])[:top_n]

    # Content-Based Filtering: Calculate product scores based on product profile
    content_scores = user_item_matrix.loc[user_id] @ product_profile_matrix.values
    content_ranking = list(product_profile_matrix.index[np.argsort(-content_scores)])[:top_n]

    # Hybrid Ranking: Combine collaborative and content-based rankings
    hybrid_ranking = collaborative_ranking + [p for p in content_ranking if p not in collaborative_ranking]

    return hybrid_ranking[:top_n]
# Analyze all products and get top 10 products with ratings
all_product_ids = user_item_matrix.columns
product_ratings = []

for product_id in all_product_ids:
    product_name = data[data['product_id'] == product_id]['product_name'].values[0]
    average_rating = user_item_matrix[product_id].mean()
    product_ratings.append((product_id, product_name, average_rating))

# Sort the products based on average ratings
sorted_product_ratings = sorted(product_ratings, key=lambda x: x[2], reverse=True)

print("Top 10 Products with Ratings:")
for i, (product_id, product_name, average_rating) in enumerate(sorted_product_ratings[:10], start=1):
    print(f"{i}. {product_name} ({product_id}) - Average Rating: {average_rating:.2f}")
# Get personalized rankings for multiple users
user_ids = [1480, 1995, 2719,3842, 2010]  # Example user IDs
for user_id in user_ids:
    personalized_rankings = get_personalized_rankings(user_id)
    print("Personalized Rankings and Ratings for User", user_id)
    for i, product_id in enumerate(personalized_rankings, start=1):
        product_name = data[data['product_id'] == product_id]['product_name'].values[0]
        rating = user_item_matrix.loc[user_id, product_id]
        print(f"{i}. {product_name} ({product_id}) - Rating: {rating:.2f}")





#Evaluation Metrics

###We define metrics such as precision, recall, F1-score, and Mean Average Precision (MAP) to evaluate our recommendations. These metrics provide insights into how well our system performs

In [7]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Load the dataset
data = pd.read_csv('ratings.csv')

# Create a user-item interaction matrix
user_item_matrix = pd.pivot_table(data, index='user_id', columns='product_id', values='interaction_score', fill_value=0)

# Calculate user similarity matrix using cosine similarity
user_similarity = cosine_similarity(user_item_matrix)

def get_personalized_rankings(user_id, top_n=5):
    if user_id in data['user_id'].unique():
        user_index = data['user_id'].unique().tolist().index(user_id)

        # Collaborative Filtering: Calculate weighted scores based on user similarity
        collaborative_scores = user_similarity[user_index] @ user_item_matrix.values
        collaborative_ranking = list(user_item_matrix.columns[np.argsort(-collaborative_scores)])[:top_n]

        return collaborative_ranking
    else:
        return []

def evaluate_rankings(user_id, recommended_ranking, true_interactions, top_n=5):
    recommended_products = recommended_ranking[:top_n]
    true_positive = len(set(recommended_products) & set(true_interactions))

    precision = true_positive / top_n
    recall = true_positive / len(true_interactions)

    return precision, recall

# Evaluate rankings for each user
user_ids = data['user_id'].unique()
average_precision = 0
average_recall = 0

for user_id in user_ids:
    user_data = data[data['user_id'] == user_id]
    true_interactions = user_data[user_data['interaction_score'] >= 0.5]['product_id'].tolist()

    personalized_ranking = get_personalized_rankings(user_id)
    precision, recall = evaluate_rankings(user_id, personalized_ranking, true_interactions)

    average_precision += precision
    average_recall += recall

# Calculate average precision and recall across all users
average_precision /= len(user_ids)
average_recall /= len(user_ids)

# Calculate F1-score
f1 = 2 * (average_precision * average_recall) / (average_precision + average_recall)

print("Average Precision:", average_precision)
print("Average Recall:", average_recall)
print("F1-score:", f1)


Average Precision: 0.02
Average Recall: 0.005
F1-score: 0.008


#Algorithm Evaluation

###We split the dataset into training and testing sets, employing techniques like cross-validation to fine-tune our algorithm's parameters. This step assures us of the system's effectiveness.

###Using SVD, we can determine the rank of the matrix, quantify the sensitivity of a linear system to numerical error, or obtain an optimal lower-rank approximation to the matrix

In [9]:
import pandas as pd
from surprise import Reader, Dataset, SVD
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse

# Load the dataset
data = pd.read_csv('ratings.csv')

# Load data into Surprise's Dataset format
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(data[['user_id', 'product_id', 'interaction_score']], reader)

# Split dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

# Instantiate SVD model
model = SVD(n_factors=100, n_epochs=20, random_state=42)

# Train the model on the training set
model.fit(trainset)

# Predict ratings on the test set
predictions = model.test(testset)

# Calculate RMSE
rmse_score = rmse(predictions)
accuracy_percentage = (1 - rmse_score) * 100  # Calculate accuracy in percentage

# print("Root Mean Squared Error (RMSE):", rmse_score)
print("Accuracy Percentage:", accuracy_percentage)


RMSE: 0.2929
Accuracy Percentage: 70.71198763900101


##To conclude, building an effective recommender system involves a holistic approach, encompassing data preprocessing, user profiling, collaborative and content-based filtering, hybrid techniques, thorough evaluation, and personalized rankings.

##By carefully crafting each step, we create a recommendation engine that enhances user engagement and satisfaction, ultimately driving the success of e-commerce platforms.

##Thank you for joining this presentation on building an advanced recommender system for e-commerce. We hope you now have a clearer understanding of the intricacies involved in creating personalized recommendations. Feel free to reach out with any questions