Skip to content

eugenelin89/recommender_content_based

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recommendation System with Content Based Filtering

Please see project page at:
http://eugenelin89.github.io/recommender_content_based/

SETUP

The system is built with LensKit, an open-source took kit for building recommenders.

Requires the following:

Java SE Development Kit 7 (http://www.oracle.com/technetwork/java/javase/downloads/index.html)
LensKit  (http://lenskit.grouplens.org)
Apache Maven (https://maven.apache.org)
Java IDE such as Eclipse (http://www.eclipse.org/) or IntelliJ IDEA (http://www.jetbrains.com/idea/).

BACKGROUND
This recommendation system prototype uses content-based filtering.  For detailed background, please refer to:
http://recommender-systems.org/content-based-filtering/

The algorithm implemented in this prototype is composed of the following steps: 

1. Compute item-tag vectors (the model)

Implemented in the model builder (TFIDFModelBuilder, in the get() method), compute the unit-normalized TF-IDF vector for each movie in the data set. The model contains a mapping of item IDs to TF-IDF vectors, normalized to unit vectors, for each item.

2. Build user profile for each query user

The makeUserVector(long) method of TFIDFItemScorer takes a user ID and produces a vector representing that user's profile. In the original implementation, the profile was the sum of the item-tag vectors of all items the user has rated positively (>= 3.5 stars). This approach was later improved with weighted user profile (with the older implementation commented out for reference). Weighted profile is computed with weighted sum of the item vectors for all items, with weights being based on the user's rating.

3. Generate item scores for each user

The heart of the recommendation process in many LensKit recommenders is the score method of the item scorer, in this case TFIDFItemScorer. This method scores each item by using cosine similarity: the score for an item is the cosine between that item's tag vector and the user's profile vector.

TEST DRIVE

A set of test data is provided for movie ratings, but can be easily adopted for other domains.

data/movie-tags.csv
Attributes associated with each movie. This is the basis of the model generated in step 1.

data/movie-titles.csv
Maps Movie IDs to Movie Titles.

data/ratings.csv
Users and their movie ratings. Each line of the CSV file is ordered as: User ID, Movie ID, Rating

data/users.csv
Maps User ID to User Name.

The test data is injected into the system in CBFMain.java in the method configureRecommender().

Run the recommender with command similar to the following, where the arguments are the user IDs:
runecbf 4045 144 3855 1637 2919

With the above command, the following output is produced.

recommendations for user 4045:
  807: 0.1932
  63: 0.1438
  187: 0.0947
  11: 0.0900
  641: 0.0471
recommendations for user 144:
  11: 0.1394
  585: 0.1229
  671: 0.1130
  672: 0.0878
  141: 0.0436
recommendations for user 3855:
  1892: 0.2243
  1894: 0.1465
  604: 0.1258
  462: 0.1050
  10020: 0.0898
recommendations for user 1637:
  393: 0.1976
  24: 0.1900
  2164: 0.1522
  601: 0.1334
  5503: 0.0992
recommendations for user 2919:
  180: 0.1454
  11: 0.1238
  1891: 0.1172
  424: 0.1074
  2501: 0.0973



About

A Recommender System Prototype using Content Based Filtering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages