Skip to content

mfranco/nt-recommend

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 

nt-recommend: Experimentation Framework for Learning Recommender Systems

This project is a basic recommender system experimentation framework built in python3. It covers collaborative filtering initially, but it can be extended to support other recommendation techniques. It is not compatible with python2.

Disclaimer

This framework is only for educational proposes, we use basic native python data structures, we have implemented all the algorithms by ourselves with the only aim of learning, therefore, this project is not suitable for production environments.

Installation

We recommend to install this framework inside a python virtual environment. If you don't have virtualenv in your machine, you can install it with the following command:

pip install virtualenv

Now you are ready to create a virtualenv, it is very important to make sure that is a python3 and not python2 virtualenv:

mkvirtualenv nt-recommend -p /usr/bin/python3
workon nt-recommend

After creating the virtualenv, you can install the required libraries with the following command:

pip install -r tools/requirements/dev.txt

After installing all the packges you will need to execute the following command to dowload some libraries for Natural Language Processing:

 python -m nltk.downloader all

MovieLens Dataset

The MovieLens Dataset is a collection of rating datasets suitable for research and education. It contains about 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users.

We use the reduced version of this dataset that contains 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.

Initial Data analysis

The following command generates basic statistics about the dataset:

python manage.py initial_diagnose

General Statistics

Total Users Total Movies Total Tags Matrix Density
671 9125 1296 0.0163

Statistics by User and Ratings

Max Number of Ratings for User Min Number of Ratings for User Median Ratings for User Mean Ratings for User Standard Deviation Ratings for User
2391 20 71 149 231

Statistics by Movie and Ratings:

Max Number of Ratings for Movie Min Number of Ratings for Movie Median Ratings for Movie Mean Ratings for Movie Standard Deviation Ratings for Movie
341 0 3 11 24

Total number of ratings for each of the ratings

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
1101 3326 1687 7271 4449 20064 10538 28750 7723 15095

Statistics by Tags and User

Max Number of Tags by User Min Number of Tags by User Median Number of Tags by User Mean Number of Tags by User Standard Deviation Tags by User
401 0 0 1.93 18.6

Statistics by Tags and Movie

Max Number of Tags by Movie Min Number of Tags by Movie Median Number of Tags by Movie Mean Number of Tags by Movie Standard Deviation Tags by Movie
25 0 0 0.142 0.763

Executing Unit Test:

cd src
workon nt-recommend
python manage.py test

Running the Experiments

Benchmarking Mean Predictor

python3 manage.py mean_predictor command benchmarks the performance of mean predictor. The following parameters can be used to customize the execution:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --t ratings threshold: default value is 2, it specifies the minimum number of ratings by movie required to include that movie in the computations.

The following command run a K-Fold, leave-one-out experiment for mean predictor with default parameters:

cd src
workon nt-recommend
python manage.py mean_predictor

The previous command is computational expensive. A more realistic approach is to run a 10-Fold validation experiment:

cd src
workon nt-recommend
python manage.py mean_predictor --kn 10 --t 1

Benchmarking Collaborative Filtering

python3 manage.py collaborative_filter command benchmarks performance of collaborative filtering. The following paramters can be used to mofify predictor behaviour:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --m similarity metric: valid values are: cosine, euclidean, msd, pearson

  • --ns neighbourhood size

  • --t ratings threshold: default value is 2, it specifies the minimum number of ratings by movie required to include that movie in the computations.

The following command executes 10-fold validation for collaborative filtering with msd similarity metric and a neighbourhood size of 10:

python manage.py collaborative_filter --n 1 --m cosine --ns 10 --kn 10 

The following command executes 10-fold validation for collaborative filtering with msd similarity metric and a neighbourhood size of 10:

python manage.py collaborative_filter --n 1 --m msd --ns 10 --kn 10 

The following command executes 10-fold validation for collaborative filtering with euclidean distance similarity metric and a neighbourhood size of 10:

python manage.py collaborative_filter --n 1 --m euclidean --ns 10 --kn 10

The following command executes 10-fold validation for collaborative filtering with similarity based in Pearson Correlation and a neighbourhood size of 10:

python manage.py collaborative_filter --n 1 --m pearson --ns 10 --kn 10

The following command executes 10-fold validation for collaborative filtering with msd similarity metric and a neighbourhood size of 100:

python manage.py collaborative_filter --n 1 --m msd --ns 100 --kn 10 

Benchmarking Resnik Collaborative Filtering

python3 manage.py resnik_collaborative_filter command benchmarks performance of collaborative filtering with reskink formula.

The following parameters can be used to mofify predictor behaviour:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --m similarity metric: valid values are: cosine, euclidean, msd, pearson

  • --ns neighbourhood size

The following command executes 10-fold validation for resnik collaborative filtering with msd similarity metric and a neighbourhood size of 10:

python manage.py resnik_collaborative_filter  --m msd --ns 10 --kn 10 

Benchmarking Frequent Item Recommender

python manage.py frequent_item_recommender command bechmarks performace of recommendations based on most frequent items.

The following parameters can be used to mofify predictor behaviour:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --m similarity metric: valid values are: cosine, euclidean, msd, pearson

  • --ns neighbourhood size

The following command executes 10-fold validation for frequent items recommendation with a neighbourhood size of 10 and pearson correlation for knn:

python manage.py frequent_item_recommender --ns 10 --kn 10 --m pearson

Benchmarking Linked Item Recommender

python manage.py linked_item_recommender command bechmarks performace of recommendations based on mean of the ratings across neighbours

The following parameters can be used to mofify predictor behaviour:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --m similarity metric: valid values are: cosine, euclidean, msd, pearson

  • --ns neighbourhood size

The following command executes 10-fold validation for linked items recommendation with a neighbourhood size of 10 and pearson correlation for knn:

python manage.py linked_item_recommender --ns 10 --kn 10  --m pearson

Benchmarking Mean Predictor Recommender

python manage.py mean_predictor_recommender command bechmarks performance of recommendations based on mean predictons.

The following parameters can be used to modify predictor behaviour:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --m similarity metric: valid values are: cosine, euclidean, msd, pearson

  • --ns neighbourhood size

The following command executes 10-fold validation for mean prediction recommendation with a neighbourhood size of 10 and perason correlation:

python manage.py mean_predictor_recommender --ns 10 --kn 10 --m pearson

Benchmarking Collaborative Predictor Recommender

python manage.py collaborative_predictor_recommender command bechmarks performance of recommendations based on collaborative filtering predictons.

The following parameters can be used to modify predictor behaviour:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --m similarity metric: valid values are: cosine, euclidean, msd, pearson

  • --ns neighbourhood size

The following command executes 10-fold validation for collaborative filtering recommendation with a neighbourhood size of 10 and pearson correlation:

python manage.py collaborative_predictor_recommender --ns 10 --kn 10 --m pearson

Benchmarking Resnik Predictor Recommender

python manage.py resnik_predictor_recommender command bechmarks performance of recommendations based on Resnik collaborative filtering predictions.

The following parameters can be used to modify predictor behaviour:

  • --kn number of folds: by default will execute the benchmark in a leave-one-out style.

  • --m similarity metric: valid values are: cosine, euclidean, msd, pearson

  • --ns neighbourhood size

The following command executes 10-fold validation for Resnik collaborative filtering recommendation with a neighbourhood size of 10 and pearson correlation:

python manage.py resnik_predictor_recommender --ns 10 --kn 10 --m pearson

Resources

About

Experimentation Framework for Learning Recommender Systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published