# Python APIs for the SLIM package

## Installation
Before installing the python APIs for the SLIM package, please make sure that the slim library is built correctly. To install the python APIs, please follow one of the following steps,

1. Install python-package system-wide, this requires root permission:
> sudo python setup.py install

2. Install python-package only for the current users (without sudo priveleges)
> python setup.py install --user

## Data Interface
The SLIM package is able to load
1. ijv triplets data in the form of {userid, itemid, rating} from:
    - Pandas DataFrame (pandas.DataFrame)
    - NumPy 2d arrays (numpy.array)
    - list(list) (list[list[i, j, v]])
2. csr matrix from:
    - SciPy 2D sparse csr matrix (scipy.sparse.csr.csr_matrix)

The data is stored in a SLIMatrix object. The SLIM package **ONLY** accepts SLIMatrix objects as training and validation set. 

In [1]:
# load ijv triplets from pandas DataFrames
import pandas as pd
from SLIM import SLIMatrix

traindata = pd.read_csv('../test/AutomotiveTrain.ijv', 
                        delimiter=' ', 
                        header=None, 
                        names=['userid', 'itemid', 'rating'])
trainmat = SLIMatrix(traindata)

In [2]:
# load ijv triplets from numpy 2d arrays
import numpy as np
from SLIM import SLIMatrix

traindata = np.genfromtxt('../test/AutomotiveTrain.ijv', delimiter=' ')
trainmat = SLIMatrix(traindata)

In [3]:
# load ijv triplets from list[list]
def load_data(fname, delimiter=' ', header=False):
    data = []
    f = open(fname)

    if header:
        line = f.readline()
        
        line = f.readline()
        while line:
            triplet = line.split(delimiter)
            data.append([triplet[0], triplet[1], float(triplet[2])])
    else:
        for line in f:
            triplet = line.split(delimiter)
            data.append([triplet[0], triplet[1], float(triplet[2])])
    
    return data

from SLIM import SLIMatrix

traindata = load_data('../test/AutomotiveTrain.ijv', delimiter=' ', header=False)
trainmat = SLIMatrix(traindata)

In [4]:
# load csr matrix from scipy 2d sparse csr matrices
import scipy
import numpy as np
import pandas as pd
from SLIM import SLIMatrix

traindata = pd.read_csv('../test/AutomotiveTrain.ijv', 
                        delimiter=' ', 
                        header=None, 
                        names=['userid', 'itemid', 'rating'],
                        dtype={'userid':np.int32, 'itemid':np.int32, 'rating':np.float32})

# get the map of users and the map of items
id2user, row = np.unique(traindata['userid'], return_inverse=True)
id2item, col = np.unique(traindata['itemid'], return_inverse=True)
dat = traindata['rating'].values
num_users = len(id2user)
num_items = len(id2item)

# create a scipy csr matrix
csr_matrix = scipy.sparse.csr_matrix((dat, (row, col)), shape=(num_users, num_items))

trainmat = SLIMatrix(csr_matrix)

## Parameter Setting
SLIM package supports setting parameters using a dictionary or from command line using argparse. For a detail description of the parameters allowed, please refer to the README. For an example of using argparse to pass parameters from command line, please refer to the main.py.

In [None]:
# setting parameters using dictionary
params = {'dbglvl':0, 'nnbrs':0, 'algo':'cd', 'nthreads':1, 'l1r':1., 'l2r':1.}

params['optTol'] = 1e-7

## Model Training
Training a model requires a set of parameters and a training rating matrix as a SLIMatrix object.

In [5]:
# training a SLIM model
import pandas as pd
from SLIM import SLIM, SLIMatrix

traindata = pd.read_csv('../test/AutomotiveTrain.ijv', delimiter=' ', header=None)
trainmat = SLIMatrix(traindata)

params = {'dbglvl':0, 
          'algo':'cd', 
          'nthreads':1, 
          'l1r':1., 
          'l2r':1.,
          'optTol':1e-7,
          'niters':100}

model = SLIM()
model.train(params, trainmat)

Learning takes 0.245 secs.


After training, the model can be saved and loaded. Note that, in order for the model to be used in the future, a map that project the item ids to integers are required to be stored. 

In [6]:
# save and load a SLIM model
model.save_model(modelfname='model.csr', # filename to save the model as a csr matrix
                 mapfname='map.csr' # filename to save the item map
                )

model_new = SLIM()
model_new.load_model(modelfname='model.csr', # filename of the model
                 mapfname='map.csr' # filename of the item map
                )

## Predicting with an Existing Model
In order to predict using a learned model, the item ids of the input matrix are required to be consistent with that of the model. If the matrix to be predicted is the training matrix,

In [7]:
# predict using the training matrix
output = model.predict(trainmat, nrcmds=10, outfile='output.txt')

If the matrix to be predicted is a new matrix with the same set of items, please initialize the input matrix using the model,

In [8]:
model_new = SLIM()
model_new.load_model(modelfname='model.csr', # filename of the model
                 mapfname='map.csr' # filename of the item map
                )

inputdata = pd.read_csv('../test/AutomotiveTest.ijv', delimiter=' ', header=None)
# initialize the input matrix using the item model
inputmat = SLIMatrix(inputdata, 
                     oldmat=model_new)
output = model_new.predict(inputmat, nrcmds=10, outfile='output.txt')

The output of the predict function is a dictionary in which the keys are the user ids and the values are the top-$n$ recommendation lists.

## Model Selection
We provide an useful tool for efficient model selection across a set of l1 and l2 pairs. The model selection procedure requires a set of parameters, a training matrix, a validation matrix, a list of l1 values, a list of l2 values and the length of the recommendation list. Note that, in order for the function to train the models on the training matrix and evaluate the performance on the validation matrix, the user ids and the item ids of the training matrix and the validation matrix are required to be consistent. To achive this goal, please initialize the validation matrix using the training matrix. Currently, we only support model selection using Hit Rate and Average Reciprocal Hit Rank (ARHR). NDCG, Precision, and Recall will be added soon.

In [9]:
# model selection for SLIM models
import pandas as pd
from SLIM import SLIM, SLIMatrix

traindata = pd.read_csv('../test/AutomotiveTrain.ijv', delimiter=' ', header=None)
valdata = pd.read_csv('../test/AutomotiveTest.ijv', delimiter=' ', header=None)
trainmat = SLIMatrix(traindata)
# initlaize the validation matrix using the training matrix
valmat = SLIMatrix(valdata, trainmat)

params = {'dbglvl':3, 
          'algo':'cd', 
          'nthreads':1, 
          'l1r':1., 
          'l2r':1.,
          'optTol':1e-7,
          'niters':100}

l1s = [0.01, 0.1, 0.5, 1, 2, 4, 5, 10, 20]
l2s = [0.1, 0.5, 1, 2, 5, 10, 20, 30, 50]

model = SLIM()
model.mselect(params, trainmat, valmat, l1s, l2s, nrcmds=10)

Model selection takes 18.898 secs.
The best HR is achieved by, l1: 20.0000, l2:0.1000, HR:0.1404, AR:0.0654.
The best AR is achieved by, l1: 20.0000, l2:50.0000, HR:0.1390, AR:0.0669.


## SLIM with Feature Selection
For training a SLIM model, the training algorithm needs to solve $n$ linear regression problems. SLIM with feature selection (fSLIM) reduces the number of independent variables using pre-defined heuristics for each of the linear regression problems. As a result, fSLIM achieves substantially higher training efficiency than SLIM with a modest compromise in recommendation accuracy.

We implement fSLIM in this package. To train a fSLIM model, besides the parameters for SLIM models, two more parameters *nnbrs* and *simtype* are required. The parameter *nnbrs* is the one critical parameter that distinguishes training a SLIM model from training a fSLIM model. When *nnbrs* is set to a positive integer, a fSLIM will be trained. A SLIM model will be trained when *nnbrs* is set to 0 or left blank. The parameter *simtype* sets the measurement of similarity. This package supports three similarity measurements, Jaccard similarity ("jac"), Cosine similarity ("cos"), and inner product ("dotp"). The default value for *simtype* is "cos". A fSLIM model can be used in the same way with a SLIM model. Note that, a fSLIM model can only be trained using coordinate descent.

In [10]:
# training a fSLIM model
import pandas as pd
from SLIM import SLIM, SLIMatrix

traindata = pd.read_csv('../test/AutomotiveTrain.ijv', delimiter=' ', header=None)
trainmat = SLIMatrix(traindata)

params = {'dbglvl':0, 
          'algo':'cd', 
          'nthreads':1, 
          'l1r':1., 
          'l2r':1.,
          'optTol':1e-7,
          'niters':100}

# set the fSLIM specific parameters
params['nnbrs'] = 10
params['simtype'] = "cos"

# other options for simtype
# params['simtype'] = "jac"
# params['simtype'] = "dotp"

# a fSLIM model will be trained
model = SLIM()
model.train(params, trainmat)

# use the trained fSLIM model to recommend
output = model.predict(trainmat, nrcmds=10, outfile='output.txt')

Learning takes 0.147 secs.
