Skip to content

Project repo for ALS collaborative filtering recommender system on news data

Notifications You must be signed in to change notification settings

sukilau/demo-news-recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Feed Recommender System

This repo serves as a demo to demonstrate how to build a news feed collaborative filtering recommender system using Implicit, implemented topic modeling using Gensim, and auto tag news feed using Random Forest Classifier.

News Feed Recommender System

  • See demo-recommender.ipynb Link

  • Collaborative filtering is commonly used to recommend item to users based on user-item interaction. In particular, Alternating Leaset Square (ALS) has proven to achieve very goood model performance on implicit data, ie. users' behaviour to items without rating or specific action such as like or dislike, e.g. the number of times a user plays a song. Implicit provides a fast Cython implementation to speedup ALS in deployment. It also provides built in functions for recommendations and similar items. This notebook demonstrates the implementation of ALS for a news feed recommender system using Implicit library.

Topic Modeling for News Feed

  • See demo-lda.ipynb Link
  • Latent Dirichlet Allocation (LDA) is a generative model which is commonly used in topic modeling. In LDA, we assume that each document is generated by a mixtures of latent topics, and each topic has probabilities of generating various words. In our exmaple, a news feed can be viewed as a mixture of various latent topics. Our goal is to find word counts of top words appeared in each news feed, and predict topics which are most likely to generate this news feed according to the word count vector. This notebook demonstrates how to implement LDA topic modeling for news text data using Gensim. Since we have ground truth of the news topics, we have also evaluated the LDA model. The trained model achieves 76.7% accuracy.

Auto Tagging for News Feed

  • See demo-tagging.ipynb Link
  • This notebook demonstrates how to preprocess text data of news feed and train Random Forest Classifier to auto tag news feed into categories such as sports, politics. To preprocess Chinese text data, standard NLP techiniques (similar to English language) such as stopword removal, TF-IDF transformation are used. We have also implemented word segmentation for Chinese words using Jieba (a fast Python implementation of Chinese text segmentation). We then train a Random Forest Classifier on the numeric representation of the news feed. The model achieves 99.6% accuracy on the test set.

About

Project repo for ALS collaborative filtering recommender system on news data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages