# Topic Based Recommender

# Topic Based Recommender
1. Represent articles in terms of Topic Vector
2. Represent user in terms of Topic Vector of read articles
3. Calculate cosine similarity between read and unread articles 
4. Get the recommended articles 

**Describing parameters**:

*1. PATH_ARTICLE_TOPIC_DISTRIBUTION: specify the path where 'ARTICLE_TOPIC_DISTRIBUTION.csv' is present.* <br/>
*2. PATH_NEWS_ARTICLES: specify the path where news_article.csv is present*  <br/>
*3. NO_OF_TOPIC: Number of topics specified when training your topic model. This would refer to the dimension of        each vector representing an article*  <br/>
*4. ARTICLES_READ: List of Article_Ids read by the user*  <br/>
*5. NO_RECOMMENDED_ARTICLES: Refers to the number of recommended articles as a result*

In [1]:
PATH_ARTICLE_TOPIC_DISTRIBUTION = "/home/phoenix/Documents/HandsOn/Final/python/Topic Model/model/Article_Topic_Distribution.csv"
PATH_NEWS_ARTICLES = "/home/phoenix/Documents/HandsOn/news_articles.csv"
NO_OF_TOPICS=150
ARTICLES_READ=[2,7]
NO_RECOMMENDED_ARTICLES=5

In [2]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

## 1. Represent Read Article in terms of Topic Vector

In [3]:
article_topic_distribution = pd.read_csv(PATH_ARTICLE_TOPIC_DISTRIBUTION)
article_topic_distribution.shape

(22186, 3)

***Generate Article-Topic Distribution matrix ***

In [4]:
#Pivot the dataframe
article_topic_pivot = article_topic_distribution.pivot(index='Article_Id', columns='Topic_Id', values='Topic_Weight')
#Fill NaN with 0
article_topic_pivot.fillna(value=0, inplace=True)
#Get the values in dataframe as matrix
articles_topic_matrix = article_topic_pivot.values
articles_topic_matrix.shape

(4831, 150)

## 2. Represent user in terms of Topic Vector of read articles


***A user vector is represented in terms of average of read articles topic vector***

In [5]:
#Select user in terms of read article topic distribution
row_idx = np.array(ARTICLES_READ)
read_articles_topic_matrix=articles_topic_matrix[row_idx[:, None]]
#Calculate the average of read articles topic vector 
user_vector = np.mean(read_articles_topic_matrix, axis=0)
user_vector.shape

(1, 150)

## 3. Calculate cosine similarity between read and unread articles 

In [6]:
#Calculate cosine similarity between the corpus and user
cos_sim = cosine_similarity(articles_topic_matrix, user_vector)
#Get those  article_ids having maximum similarity with user
recommended_articles_id = np.concatenate( cos_sim, axis=0 ).argsort()[:][::-1]

In [7]:
recommended_articles_id

array([2843,    2, 3419, ..., 2342, 4461, 4830])

In [8]:
#Remove read articles from recommendations
final_recommended_articles_id = [article_id for article_id in recommended_articles_id if article_id not in ARTICLES_READ ][:NO_RECOMMENDED_ARTICLES]
final_recommended_articles_id

[2843, 3419, 2760, 3123, 3307]

# 4. Recommendation Using Topic Model:-

In [9]:
#Recommended Articles and their title
news_articles = pd.read_csv(PATH_NEWS_ARTICLES)
print 'Articles Read'
print news_articles.loc[news_articles['Article_Id'].isin(ARTICLES_READ)]['Title']
print '\n'
print 'Recommender '
print news_articles.loc[news_articles['Article_Id'].isin(final_recommended_articles_id)]['Title']

Articles Read
2    US  South Korea begin joint military drill ami...
7    Dialogue crucial in finding permanent solution...
Name: Title, dtype: object


Recommender 
2760    Rajnath Singh s security is Pak s responsibili...
2843    Siachen avalanche  Indian Army says missing so...
3123    Military Plane Crashes Outside Seville Airport...
3307    Europe survives  year of hell   but worse expe...
3419    Jammu   Kashmir  Army Indicts 9 Soldiers for K...
Name: Title, dtype: object


# Topics + NER Recommender

# Topic + NER Based Recommender

1. Represent user in terms of - <br/>
        (Alpha) <Topic Vector> + (1-Alpha) <NER Vector> <br/>
   where <br/>
   Alpha => [0,1] <br/>
   [Topic Vector] => Topic vector representation of concatenated read articles <br/>
   [NER Vector]   => Topic vector representation of NERs associated with concatenated read articles <br/>
2. Calculate cosine similarity between user vector and articles TF-IDF matrix
3. Get the recommended articles 

In [13]:
ALPHA = 0.5
DICTIONARY_PATH = "/home/phoenix/Documents/HandsOn/Final/python/Topic Model/model/dictionary_of_vocabulary.p"
LDA_MODEL_PATH = "/home/phoenix/Documents/HandsOn/Final/python/Topic Model/model/lda.model"

In [12]:
#Select user in terms of read article topic distribution
row_idx = np.array(ARTICLES_READ)
read_articles_topic_matrix=articles_topic_matrix[row_idx[:, None]]
#Calculate the average of read articles topic vector 
user_topic_vector = np.mean(read_articles_topic_matrix, axis=0)
user_topic_vector.shape

(1, 150)

In [None]:
# 2