No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Multi-way Interacting Regression via Factorization Machines

This is a Python 2 implementation of MiFM algorithm for interaction discovery and prediction (M. Yurochkin, X. Nguyen, N. Vasiloglou to appear in NIPS 2017). Code written by Mikhail Yurochkin.


This is a demonstration of MiFM on Abalone data.

First compile cython code in cython folder. On Ubuntu run:

cython g_c_sampler.pyx
python build_ext --inplace

It implemets Gibbs sampling updates and prediction function

prediction/ Python wrapper for Cython code to aggregate MCMC samples for prediction

py_scripts/ Python wrapper for Cython code to run Gibbs sampling

py_scripts/ Gibbs sampling for hyperpriors and initialization Implements MiFM class; data preprocessing; posterior analysis of interactions downloads Abalone dataset and shows how to use MiFM and extract interactions

Implementation is designed to be used in the interactive mode (e.g. Python IDE like Spyder).

Usage guide

MiFM(K=5, J=50, it=700, lin_model=True, alpha=1., verbose=False, restart=5, restart_iter=50, thr=300, rate=25, ncores=1, use_mape=False)


K: rank of matrix of coefficients V

J: number of interactions (columns) in Z

it: number of Gibbs sampling iterations

lin_model: whether to include linear effects (w_1,...,w_D)

alpha: FFM_alpha parameter. Smaller values encourage deeper interactions

verbose: whether to print intermediate RMSE train scores

restart and restart_iter: how many initializations to try with restart_iter iterations each. Then best initialization based on training RMSE is used for fitting

ncores: how many cores to use for initialization with restarts

use_mape: whether to use AMAPE instead of RMSE to select best initialization

thr: number of MCMC iterations after which samples are collected (i.e. burn-in)

rate: each rate iteration is saved


fit(X, y, cat_to_v, v_to_cat)

X: training data after one-hot encoding

y: response

cat_to_v: list of category to value after one-hot encoding (see example with Abalone data)

v_to_cat: dictionary of category to values before one-hot encoding (see example with Abalone data)

Returns list of MCMC samples. Each sample is a list [bias, linear coefficients, V, Z]

predict(self, X)

Note: can only be used on fitted object. Returns predicted values for testing data X using Monte Carlo estimator of the mean response.

score(self, X, y)

Makes predictions and computes RMSE or AMAPE