<a href="https://colab.research.google.com/github/rohit0906/Recommender_systems/blob/master/recommending_post_socialcomment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Recommender System using Content Based Filtering.

Importing numpy and pandas

In [0]:
import numpy as np
import pandas as pd

Loading users.csv

In [20]:
users=pd.read_csv("users.csv")
users=users.rename(columns={"_id":"user_id"})
users.head()

Unnamed: 0,user_id,name,gender,academics
0,5d60098a653a331687083238,Nivesh Singh Chauhan,male,undergraduate
1,5d610ae1653a331687083239,Gaurav Sharma,male,graduate
2,5d618359fc5fcf3bdd9a0910,Akshay Mishra,male,undergraduate
3,5d6d2bb87fa40e1417a49315,Saksham Mathur,male,undergraduate
4,5d7c994d5720533e15c3b1e9,Varun Chowhan,male,undergraduate


Loading posts.csv

In [21]:
post_df=pd.read_csv("posts.csv")
post_df=post_df.rename(columns={"_id":"post_id"})
post_df.head()

Unnamed: 0,post_id,title,category,post_type
0,5d62abaa65218653a132c956,hello there,Plant Biotechnology,blog
1,5d6d39567fa40e1417a4931c,Ml and AI,Artificial Intelligence|Machine Learning|Infor...,blog
2,5d7d23315720533e15c3b1ee,What is an Operating System ?,Operating Systems,blog
3,5d7d405e5720533e15c3b1f3,Lord Shiva,Drawings,artwork
4,5d80dfbc6c53455f896e600e,How Competition law evolved?,Competition Laws,blog


Loading views.csv

In [22]:
views=pd.read_csv("views.csv")
views.head()

Unnamed: 0,user_id,post_id,timestamp
0,5df49b32cc709107827fb3c7,5ec821ddec493f4a2655889e,2020-06-01T10:46:45.131Z
1,5ed3748576027d35905ccaab,5ed4cbadbd514d602c1531a6,2020-06-01T09:39:20.021Z
2,5ed0defa76027d35905cc2de,5eac305f10426255a7aa9dd3,2020-06-01T08:12:42.682Z
3,5ed0defa76027d35905cc2de,5ed1ff0276027d35905cc60d,2020-06-01T08:10:23.880Z
4,5ed0defa76027d35905cc2de,5ed3820f76027d35905ccac8,2020-06-01T08:08:54.124Z


In [23]:
print("users.shape", users.shape)
print("posts.shape", post_df.shape)
print("views.shape", views.shape)

users.shape (118, 4)
posts.shape (493, 4)
views.shape (1449, 3)


In [24]:
post_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 493 entries, 0 to 492
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   post_id     493 non-null    object
 1   title       493 non-null    object
 2   category    465 non-null    object
 3    post_type  493 non-null    object
dtypes: object(4)
memory usage: 15.5+ KB


Feature extracting using NLP library **TfidfVectorizer**.
It creates a sparse matrix as output.

In [0]:
from sklearn.feature_extraction.text import TfidfVectorizer

TfidfVectorizer
tfv = TfidfVectorizer(min_df=3,  max_features=None, 
            strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
            ngram_range=(1, 3),
            stop_words = 'english')

# Filling NaNs with empty string
post_df['category'] = post_df['category'].fillna('')

In [0]:
tfv_matrix = tfv.fit_transform(post_df['category'])


In [27]:
tfv_matrix.shape

(493, 228)

Finding correlation on tfv.


In [0]:
from sklearn.metrics.pairwise import sigmoid_kernel

sig = sigmoid_kernel(tfv_matrix, tfv_matrix)

Mapping post title with its index.

In [0]:
indices = pd.Series(post_df.index, index=post_df['title']).drop_duplicates()

In [30]:
indices.head()

title
hello there                      0
Ml and AI                        1
What is an Operating System ?    2
Lord Shiva                       3
How Competition law evolved?     4
dtype: int64

Function give_rec() used to return a list of tuple containing index and title of recommended posts.

In [0]:
def give_rec(title,no_of_rec, sig=sig):
    # Get the index corresponding to original_title
    idx = indices[title]

    # Get the pairwsie similarity scores 
    sig_scores = list(enumerate(sig[idx]))

    # Sort the movies 
    sig_scores = sorted(sig_scores, key=lambda x: x[1], reverse=True)

    # Scores of the 10 most similar movies
    sig_scores = sig_scores[0:no_of_rec+1]

    # post indices
    post_index = [i[0] for i in sig_scores]

    # Top 10 most similar movies
    return post_df['title'].iloc[post_index]

Function recommend_post() is used to print the recommended posts' title.

In [0]:
def recommend_post(title,n=10):
  rec=give_rec(title,n).to_list()
  count=0
  for item in rec:
    if not (item==title) and count<=n:
      count+=1
      print(item)

Merging users, posts and views dataframe to findout the watch history of each user. Thus using this info to recommend next post based on the current post that the user is viewing.

In [33]:
data_df = pd.merge( users,views, on='user_id')
data_df = pd.merge( data_df,post_df, on='post_id') 
data_df=data_df.drop(columns=['user_id','gender','academics','post_id','category'])
data_df.head()

Unnamed: 0,name,timestamp,title,post_type
0,Nivesh Singh Chauhan,2020-05-31T18:01:54.308Z,Configure Docker with Django; PostgreSQL; Pg-a...,blog
1,Kanika Sharma,2020-05-31T20:40:18.693Z,Configure Docker with Django; PostgreSQL; Pg-a...,blog
2,Asif Hossain,2020-06-01T08:08:54.124Z,Configure Docker with Django; PostgreSQL; Pg-a...,blog
3,Parth Vijay,2020-05-31T10:08:37.079Z,Configure Docker with Django; PostgreSQL; Pg-a...,blog
4,Nivesh Singh Chauhan,2020-05-31T08:21:29.911Z,AWS services and how to launch OS on AWS Cloud,blog


Function recommend_user() is used to recommend post based on what the user has viewed recently.

In [0]:
def recommend_user(user_name,n=10):
  curr=data_df.query('name == @user_name')
  latest=max(curr['timestamp'])
  curr=curr.query('timestamp==@latest')
  for i in (curr['title']):
    print("Recent Post viewed:-", i)
    print("\nRecommended for you:\n")
    recommend_post(i,n)


**Recommending similar posts for the given post**

In [35]:
recommend_post("Ml and AI",5)

6 Best + Free PLC Programming Training & Course [2020 UPDATED]
What sports will look like in the future
Artificial Intelligence
Types Of AI.
7 Best Python Data Science Courses & Certification [2020]


**Recommend posts for the given user**

In [36]:
recommend_user("Sahana B")

Recent Post viewed:- Understanding Cloud Computing(AWS)

Recommended for you:

Cloud Computing
AWS services and how to launch OS on AWS Cloud
8 Best Machine Learning Courses for 2020
Cloud Computing
Mobile Computing Technology
App Development
Stereoscopic and virtual-reality systems
Firewall
DATA AND MESSAGE SECURITY
Applications of mobile computing
