Skip to content

gcbrown/easey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EASEy

An implementation of Embarrassingly Shallow Autoencoders (EASE).

EASE is a state-of-the-art prediction model for collaborative filtering on implicit feedback.

When to use EASE and when not to use EASE

EASE consistently performs near the top of recommender system benchmarks (see live benchmark). It outperforms many deep learning and graph-based approaches (see paper).

EASE is best when the number of items is small, because the most computationally complex part of training is taking the inverse of an item x item cooccurrence matrix. The good news is, this complexity is independent of the number of users or interactions.

EASE also doesn't take into account any item or user features like more complex models - it uses interactions only.

Given these two constraints, EASE is a great tool for:

  • Standalone prediction - Raw EASE scores are highly predictive
  • Candidate generation - Limit the item space to a set of relevant candidates per user
  • Feature engineering - EASE scores can be used in downstream models (e.g., a classification GBM)

Installation

pip install easey       # works on all machines
pip install easey[mkl]  # faster training and inference on Intel CPU

EASEy depends on scipy and numpy for sparse and dense matrix operations.

sparse_dot_mkl is an optional dependency that uses the Intel MKL library for parallel computation of the gram matrix (X^TX) and for faster dot products. It is generally 2-4x faster than baseline scipy.

It is simplest to install sparse_dot_mkl with conda because this ensures that MKL is linked properly. Otherwise, you will have to install the Intel MKL package on your system.

Usage

EASEy is compatible with both pandas and polars DataFrames. Technically it's compatible with any object that has array-like values accessible with index [] syntax, even a basic dict. The EASE class has two public methods - fit and predict - for training and inference respectively.

EASE has only one hyperparameter, lambda, for L2 regularization. In the original paper, values from 200 to 1,000 were found to be optimal. Lower values lead to more long-tail recommendations at the expense of possible overfitting. Higher values lead to recommending more popular items.

See movielens_example.ipynb for a simple training and inference example.

About

Embarrassingly Shallow Autoencoders (EASE) Implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors