<a href="https://colab.research.google.com/github/quinbez/Hybrid-Recommendation-System/blob/main/Personalized_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**<h2>Project Description**

This project focuses on developing a personalized recommendation system using Python, leveraging the capabilities of the LightFM library to implement both collaborative and content-based filtering techniques. The system initially processes and analyzes user interaction data with various posts, constructing a refined model that can predict user preferences with high accuracy. By integrating collaborative filtering, which predicts based on user-item interactions, and content-based methods, which utilize item features, the project aims to provide tailored recommendations. These recommendations are evaluated through precision and AUC metrics to ensure their relevance and accuracy, enhancing user engagement by suggesting content that aligns closely with individual interests and behaviors.

**<h2>Project Structure**

  
  * Download and Setup Dataset

  * Data Preprocessing

  * Build Interaction Matrix
      
  * Model Development and Training
      
  * Model Evaluation
     
  * Collaborative Filtering Recommendation

  * Content-based Recommendation   
  
  * Hybrid Recommendation
    - Combine collaborative and content-based recommendations

  * Display and Review Recommendations

**<h2>Introduction**


A recommender, also known as a recommendation system or recommendation engine, is a software or algorithmic system designed to suggest or recommend items, products, or content to users based on their preferences, interests, or past behavior. Recommenders are widely used in various online platforms, including e-commerce websites, streaming services, social media platforms, and content sharing platforms.

The primary goal of a recommender is to provide users with personalized and relevant recommendations, assisting them in discovering new items or content that they may find interesting or useful. Recommenders leverage various techniques and algorithms to analyze user data and generate recommendations.A recommender, also known as a recommendation system or recommendation engine, is a software or algorithmic system designed to suggest or recommend items, products, or content to users based on their preferences, interests, or past behavior. Recommenders are widely used in various online platforms, including e-commerce websites, streaming services, social media platforms, and content sharing platforms.

The primary goal of a recommender is to provide users with personalized and relevant recommendations, assisting them in discovering new items or content that they may find interesting or useful. Recommenders leverage various techniques and algorithms to analyze user data and generate recommendations.

**<h2>Types of Recommendation System**

1. **Non-personalized Recommendations**: These systems suggest items based on general popularity and trends without considering individual user preferences. For example, they might recommend the best-selling books on a website to all visitors, regardless of their individual reading tastes.

2. **Semi-personalized Recommendations**: These systems use some user-specific information such as demographic data or geographic location to make recommendations. For instance, a concert ticketing platform might show upcoming events in a user's nearby city, or a fashion retailer might highlight winter clothing to users in colder regions.

3. **Personalized Recommendations**: These systems create highly tailored suggestions by analyzing a user's individual interaction history, such as previous purchases, browsing behavior, and ratings. E-commerce platforms like Amazon use personalized recommendation systems to show products that align closely with a user's specific interests and past behaviors.

## **Download Data**

In [None]:
!kaggle datasets download -d vatsalparsaniya/post-pecommendation

Dataset URL: https://www.kaggle.com/datasets/vatsalparsaniya/post-pecommendation
License(s): copyright-authors
Downloading post-pecommendation.zip to /content
  0% 0.00/890k [00:00<?, ?B/s]
100% 890k/890k [00:00<00:00, 94.0MB/s]


In [None]:
!pip install lightfm -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.4/316.4 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for lightfm (setup.py) ... [?25l[?25hdone


## **Import**

In [None]:
import os
import random
import numpy as np
import pandas as pd
from scipy import sparse

import lightfm
from lightfm import LightFM, cross_validation
from lightfm.evaluation import precision_at_k, auc_score

from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## **Read Data**

In [None]:
import zipfile

with zipfile.ZipFile('post-pecommendation.zip', 'r') as zip_ref:
    zip_ref.extractall('dataset')

In [None]:
import pandas as pd

df_playlist = pd.read_csv('/content/dataset/view_data.csv',
                          skiprows=lambda i: i>0 and random.random() > 0.50)
df_playlist=df_playlist.drop('time_stamp',axis=1)
df_playlist.head(10)

Unnamed: 0,user_id,post_id
0,5eece14ffc13ae66090001d4,76472880
1,5eece14ffc13ae66090001bd,104702447
2,5eece14ffc13ae660900014f,957888426
3,5eece14ffc13ae660900018c,618411064
4,5eece14efc13ae6609000006,194876200
5,5eece14ffc13ae660900010a,484235148
6,5eece14ffc13ae66090001c8,807801064
7,5eece14efc13ae6609000069,272416192
8,5eece14efc13ae6609000052,653523018
9,5eece14ffc13ae66090000b5,662340373


In [None]:
df_playlist['user_id'].value_counts()

user_id
5eece14ffc13ae660900016d    158
5eece14ffc13ae66090000ea    157
5eece14ffc13ae66090000ec    155
5eece14efc13ae6609000070    155
5eece14ffc13ae66090000fe    154
                           ... 
5eece14efc13ae6609000051      1
5eece14ffc13ae66090001dc      1
5eece14ffc13ae66090001b4      1
5eece14ffc13ae6609000115      1
5eece14ffc13ae6609000103      1
Name: count, Length: 497, dtype: int64

In [None]:
df_playlist['post_id'].value_counts()

post_id
521082798    16
514851789    15
539017536    15
925961607    15
433277737    15
             ..
959833736     1
485463470     1
615389604     1
842050928     1
129470604     1
Name: count, Length: 5986, dtype: int64

It groups the posts by their unique IDs. Then, it sums up the ratings for all the posts with the same ID. Essentially, it's calculating the total rating each post has received based on some criteria, like the number of likes or shares.

After calculating these total ratings for each post, it creates a map (a dictionary) that matches each post ID to its total rating. This map helps to quickly look up the total rating for any post by its ID.

Finally, it updates the ratings for each post in the playlist. Instead of keeping the default rating of 1, it looks up the total rating for each post from the map created earlier and assigns that total rating to the post. This means each post now has its own specific rating based on the total attention it received, rather than a generic rating of 1.



In [None]:
df_playlist['rating']=1
df_rating=df_playlist.groupby('post_id',as_index=False).sum()
post_ids2=df_rating['post_id'].tolist()
rating2=df_rating['rating'].tolist()
rating_mapping=dict(zip(post_ids2,rating2))

df_playlist['rating']=df_playlist['post_id'].map(rating_mapping)
display(df_playlist)

Unnamed: 0,user_id,post_id,rating
0,5eece14ffc13ae66090001d4,76472880,7
1,5eece14ffc13ae66090001bd,104702447,5
2,5eece14ffc13ae660900014f,957888426,3
3,5eece14ffc13ae660900018c,618411064,6
4,5eece14efc13ae6609000006,194876200,5
...,...,...,...
35982,5eece14ffc13ae660900010d,110983111,5
35983,5eece14ffc13ae66090000fb,398851260,5
35984,5eece14ffc13ae660900010c,348689108,7
35985,5eece14ffc13ae6609000190,619052165,5


## **Data Preprocessing**

The user_id column indicates who rated or interacted with each post. So, each row in this display represents a user's interaction with a particular post. For example, the first row shows that a user with ID 5eece14ffc13ae660900008b rated the post with ID 136781766 a rating of 10. Similarly, the second row indicates that another user with ID 5eece14efc13ae6609000025 rated the post with ID 42428071 a rating of 8, and so on.

It checks each group of posts (grouped by their IDs) and keeps only those groups where there are 10 or more interactions (like ratings, comments, shares, etc.). So, after this line, our playlist will only contain posts that have had at least 10 interactions.

In [None]:
df_playlist = df_playlist.groupby('post_id').filter(lambda x: len(x)>=10)
df_playlist

Unnamed: 0,user_id,post_id,rating
5,5eece14ffc13ae660900010a,484235148,12
8,5eece14efc13ae6609000052,653523018,10
43,5eece14efc13ae660900004f,485915574,12
48,5eece14ffc13ae66090001ed,277062955,10
52,5eece14ffc13ae66090000e7,495836839,10
...,...,...,...
35962,5eece14ffc13ae66090000ac,16266483,11
35966,5eece14ffc13ae660900011c,92186453,10
35968,5eece14efc13ae6609000039,261928184,10
35976,5eece14ffc13ae6609000118,135453406,10


Next, we're focusing on users who have interacted with at least 10 different posts. It counts the number of unique posts each user has interacted with.

In [None]:
df_playlist = df_playlist[df_playlist.groupby('user_id').post_id.transform('nunique')>=10]
df_playlist

Unnamed: 0,user_id,post_id,rating
5,5eece14ffc13ae660900010a,484235148,12
8,5eece14efc13ae6609000052,653523018,10
43,5eece14efc13ae660900004f,485915574,12
48,5eece14ffc13ae66090001ed,277062955,10
52,5eece14ffc13ae66090000e7,495836839,10
...,...,...,...
35962,5eece14ffc13ae66090000ac,16266483,11
35966,5eece14ffc13ae660900011c,92186453,10
35968,5eece14efc13ae6609000039,261928184,10
35976,5eece14ffc13ae6609000118,135453406,10


In [None]:
df_title=pd.read_csv('/content/dataset/post_data.csv')
titles=df_title['title'].tolist()
post_ids=df_title['post_id'].tolist()
normal_mapping=dict(zip(titles,post_ids))
reverse_mapping=dict(zip(post_ids,titles))
print(normal_mapping)



In [None]:
df_title

Unnamed: 0,title,category,post_id
0,Find A Quick Way To GRAPHIC,graphic,10260109
1,How To Sell CRAFT,Craft,39550285
2,POLITICS An Incredibly Easy Method That Works ...,politics,935118791
3,5 Brilliant Ways To Use POLITICAL,political,151805043
4,How To Make Your MATHEMATICS Look Amazing In ...,Mathematics,995833095
...,...,...,...
5995,Who Else Wants To Be Successful With PROGRAMMING,programming,815625033
5996,Avoid The Top 10 SCIENCE Mistakes,science,870247682
5997,7 and a Half Very Simple Things You Can Do To...,drawing,856393394
5998,Why Everything You Know About ZOOLOGY Is A Lie,zoology,152219066


## **Define Functions**

This function constructs an interaction matrix from a given DataFrame. It groups the data based on specified user and item columns and aggregates ratings, allowing the creation of a matrix where rows represent users and columns represent items, with cell values indicating the strength of interaction (e.g., ratings).

In [None]:
def create_interaction_matrix(df,user_col, item_col, rating_col, norm= False, threshold = None):

    interactions = df.groupby([user_col, item_col])[rating_col] \
            .sum().unstack().reset_index(). \
            fillna(0).set_index(user_col)
    if norm:
        interactions = interactions.applymap(lambda x: 1 if x > threshold else 0)
    return interactions

This function generates a dictionary mapping each unique user ID from the interaction matrix to a sequential index. This is particularly useful for models that require numerical indices for users, such as matrix factorization algorithms. The function iterates through the list of user IDs, assigning each one a unique integer, which simplifies referencing users in the model training and evaluation process.

In [None]:
def create_user_dict(interactions):

    user_id = list(interactions.index)
    user_dict = {}
    counter = 0
    for i in user_id:
        user_dict[i] = counter
        counter += 1
    return user_dict

## **Model Development and Training**

This function sets up and trains a matrix factorization model using the LightFM library, specifically configured for collaborative filtering. It takes the interaction matrix as input, fits the model to this data, and then returns the trained model. This function is key for learning user and item embeddings that can predict missing entries in the interaction matrix, essentially recommending items to users based on their historical data.

In [None]:
def runMF(interactions, n_components=30, loss='warp', k=15, epoch=30,n_jobs = 4):

    model = LightFM(no_components= n_components, loss=loss,k=k)
    model.fit(x,epochs=epoch,num_threads = n_jobs)
    return model

In [None]:
interactions = create_interaction_matrix(df = df_playlist, user_col = "user_id", item_col = 'post_id', rating_col = 'rating', norm= False, threshold = None)
interactions.head()

post_id,10504319,11246738,12702125,13211110,16266483,16940159,18453456,20760164,21841968,28457563,...,983850910,984554090,987429227,988396955,988703947,990495089,995518920,995833095,996411611,998486033
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5eece14efc13ae6609000003,0.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5eece14efc13ae6609000006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5eece14efc13ae6609000008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5eece14efc13ae660900000a,0.0,0.0,10.0,11.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5eece14efc13ae660900000c,0.0,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
user_dict = create_user_dict(interactions=interactions)
user_dict

{'5eece14efc13ae6609000003': 0,
 '5eece14efc13ae6609000006': 1,
 '5eece14efc13ae6609000008': 2,
 '5eece14efc13ae660900000a': 3,
 '5eece14efc13ae660900000c': 4,
 '5eece14efc13ae660900000d': 5,
 '5eece14efc13ae660900000e': 6,
 '5eece14efc13ae660900000f': 7,
 '5eece14efc13ae6609000010': 8,
 '5eece14efc13ae6609000011': 9,
 '5eece14efc13ae6609000012': 10,
 '5eece14efc13ae6609000014': 11,
 '5eece14efc13ae6609000017': 12,
 '5eece14efc13ae6609000019': 13,
 '5eece14efc13ae660900001b': 14,
 '5eece14efc13ae660900001c': 15,
 '5eece14efc13ae660900001d': 16,
 '5eece14efc13ae660900001e': 17,
 '5eece14efc13ae660900001f': 18,
 '5eece14efc13ae6609000022': 19,
 '5eece14efc13ae6609000023': 20,
 '5eece14efc13ae6609000024': 21,
 '5eece14efc13ae6609000025': 22,
 '5eece14efc13ae6609000026': 23,
 '5eece14efc13ae660900002a': 24,
 '5eece14efc13ae660900002b': 25,
 '5eece14efc13ae660900002c': 26,
 '5eece14efc13ae660900002d': 27,
 '5eece14efc13ae660900002e': 28,
 '5eece14efc13ae6609000030': 29,
 '5eece14efc13ae6609

In [None]:
item_dict = reverse_mapping
# item_dict

In [None]:
x = sparse.csr_matrix(interactions.values)
train, test = lightfm.cross_validation.random_train_test_split(x, test_percentage=0.2, random_state=None)

In [None]:
%time
model_user = runMF(interactions = train,
                 n_components = 30,
                 loss = 'warp',
                 k = 15,
                 epoch = 30,
                 n_jobs = 4)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 7.63 µs


In [None]:
interactions_item_based = interactions.transpose()
model_item = runMF(interactions_item_based)

In [None]:
interactions_item_based.shape

(519, 270)

In [None]:
train_auc = auc_score(model_user, train, num_threads=4).mean()
print('Train AUC: %s' % train_auc)

Train AUC: 0.99020386


In [None]:
test_auc = auc_score(model_user, test, train_interactions=train, num_threads=4).mean()
print('Test AUC: %s' % test_auc)

Test AUC: 0.9932588


In [None]:
train_precision = precision_at_k(model_user, train, k=10).mean()
test_precision = precision_at_k(model_user, test, k=10, train_interactions=train).mean()

In [None]:
print('train Precision %.2f, test Precision %.2f.' % (train_precision, test_precision))

train Precision 0.70, test Precision 0.31.


## **Collaborative filtering**

Collaborative filtering analyzes the behavior and preferences of a large group of users to identify patterns and similarities among them. It works under the assumption that users who have similar preferences in the past will have similar preferences in the future. Collaborative filtering recommends items to a user based on the preferences or actions of other users with similar tastes or interests. This approach does not rely on explicit item attributes or characteristics but rather focuses on the collective behavior of users.

In [None]:
def sample_recommendation_user(model, interactions, user_id, user_dict,
                               item_dict, threshold=0, nrec_items=10, show=True):

    n_users, n_items = interactions.shape
    user_x = user_dict[user_id]
    scores = pd.Series(model.predict(user_x, np.arange(n_items)))
    scores.index = interactions.columns
    scores = list(pd.Series(scores.sort_values(ascending=False).index))

    known_items = list(pd.Series(interactions.loc[user_id, :] \
                                 [interactions.loc[user_id, :] > threshold].index) \
                       .sort_values(ascending=False))

    known_items = known_items[:10]

    scores = [x for x in scores if x not in known_items]
    return_score_list = scores[:nrec_items]
    known_items = list(pd.Series(known_items).apply(lambda x: item_dict[x]))
    scores = list(pd.Series(return_score_list).apply(lambda x: item_dict[x]))

    if show:
        print("Known Likes:")
        counter = 1
        for i in known_items:
            print(str(counter) + '- ' + i)
            counter += 1

        print("\nRecommended Items:")
        counter = 1
        for i in scores:
            print(str(counter) + '- ' + i)
            counter += 1
    return return_score_list


In [None]:
rec_list = sample_recommendation_user(model = model_user,
                                      interactions = interactions,
                                      user_id = '5eece14ffc13ae66090001c3',
                                      user_dict = user_dict,
                                      item_dict = item_dict,
                                      threshold = 0,
                                      nrec_items = 10,
                                      show = True)

Known Likes:
1-  How To Start A Business With PROGRAMMING
2-  How To Turn PROGRAMMING Into Success
3-  Do BUSINESS Better Than Barack Obama
4-  How To Turn Your PROGRAMMING From Zero To Hero
5-  Top 3 Ways To Buy A Used PROGRAMMING
6-  Here Is What You Should Do For Your GST
7-  Find Out How I Cured My OPERATING SYSTEM In 2 Days
8-  3 POLITICS Secrets You Never Knew
9- At Last, The Secret To DRAWING Is Revealed
10-  10 Unforgivable Sins Of DANCE

Recommended Items:
1-  10 Ways To Immediately Start Selling PROGRAMMING
2- GRAPHIC: This Is What Professionals Do
3-  4 Ways You Can Grow Your Creativity Using MUSIC
4-  What Is GRAPHIC and How Does It Work?
5-  Essential MUSIC Smartphone Apps
6-  Get The Most Out of PAINTING and Facebook
7-  Are You Embarrassed By Your PROGRAMMING Skills? Here's What To Do
8-  Don't Just Sit There! Start DANCE
9-  To People That Want To Start POLITICAL But Are Affraid To Get Started
10- ZOOLOGY Your Way To Success


## **Content based Filtering**

Content-based filtering focuses on the characteristics or attributes of items. It recommends items that are similar to the ones a user has shown interest in before. Content-based filtering typically involves analyzing item features such as keywords, genres, descriptions, or other relevant attributes. By creating profiles based on the characteristics of items and the user's historical preferences, the system can recommend items that match the user's preferences or exhibit similar attributes.

In [None]:
def content_based_recommendation(df, user_id, item_dict, nrec_items=10, show=True):
    user_data = df[df['user_id'] == user_id]
    post_scores = user_data.groupby('post_id')['rating'].sum()
    recommended_posts = post_scores.sort_values(ascending=False).index.tolist()[:nrec_items]
    recommended_posts_titles = [item_dict.get(post_id, "Unknown Post") for post_id in recommended_posts]

    if show:
        print("\nRecommended Items:")
        for i, post_title in enumerate(recommended_posts_titles, 1):
            print(f"{i}- {post_title}")

    return recommended_posts_titles

In [None]:
user_id = '5eece14ffc13ae66090001c3'
recommended_items = content_based_recommendation(df_playlist, user_id, item_dict, nrec_items=10, show=True)



Recommended Items:
1-  Here Is What You Should Do For Your GST
2- At Last, The Secret To DRAWING Is Revealed
3-  10 Ways To Immediately Start Selling PROGRAMMING
4-  Do BUSINESS Better Than Barack Obama
5-  How To Start A Business With PROGRAMMING
6- GRAPHIC: This Is What Professionals Do
7-  10 Unforgivable Sins Of DANCE
8-  3 POLITICS Secrets You Never Knew
9-  Find Out How I Cured My OPERATING SYSTEM In 2 Days
10-  Top 3 Ways To Buy A Used PROGRAMMING


## **Hybrid recommendation**

 Hybrid approaches that combine collaborative filtering and content-based filtering are also common. These approaches aim to leverage the strengths of both methods to provide more accurate and diverse recommendations.

In [None]:
def display_hybrid_recommendations(hybrid_rec_list, item_dict):
    print("Hybrid Recommendations:")
    count = 0
    for i, post_id in enumerate(hybrid_rec_list, 1):
        post_title = item_dict.get(post_id, None)
        if post_title:
            count += 1
            print(f"{count}- {post_title}")
            if count == 10:
                break

def get_hybrid_recommendations(model_user, interactions, user_id, user_dict, item_dict, df_content, top_n):

    # Collaborative filtering (User-based) recommendation
    user_recommendations = sample_recommendation_user(model_user, interactions, user_id, user_dict,
                                                      item_dict, threshold=0, nrec_items=top_n, show=False)

    # Content-based recommendation
    content_based_rec = content_based_recommendation(df_content, user_id, item_dict, top_n, show=False)

    # Combine recommendations from both methods
    hybrid_recommendations = list(set(user_recommendations + content_based_rec))

    # Filter out unknown items
    hybrid_recommendations = [post_id for post_id in hybrid_recommendations if post_id in item_dict]

    return hybrid_recommendations[:top_n]

user_id = '5eece14ffc13ae66090001c3'
hybrid_rec_list = get_hybrid_recommendations(model_user, interactions, user_id, user_dict, item_dict,
                                              df_playlist, top_n=10)

display_hybrid_recommendations(hybrid_rec_list, item_dict)


Hybrid Recommendations:
1-  Get The Most Out of PAINTING and Facebook
2-  To People That Want To Start POLITICAL But Are Affraid To Get Started
3- GRAPHIC: This Is What Professionals Do
4-  Don't Just Sit There! Start DANCE
5-  Essential MUSIC Smartphone Apps
6- ZOOLOGY Your Way To Success
7-  Are You Embarrassed By Your PROGRAMMING Skills? Here's What To Do
8-  What Is GRAPHIC and How Does It Work?
9-  10 Ways To Immediately Start Selling PROGRAMMING
10-  4 Ways You Can Grow Your Creativity Using MUSIC


**<h2>Next Steps**

* Implementing item-based collaborative filtering and combine it with user-based collaborative filtering to create a robust hybrid system.
* Incorporate both explicit and implicit feedback
* Develop a method for capturing negative feedback effectively.
* Integrate the enhanced hybrid recommendation system with Mindplex's existing recommendation framework.