CIL: Collaborative filtering

The src folder contains different models to tackle the task of predicting ratings for the dataset at hand.

The 'src/model_.py' files can be used via their 'predict_by_' methods from clients. In addition, they provide an exemplary usage in their main methods. Hence, to test a single model with default parameters, you can simply run them by e.g. $python src/model_sf.py. Different models have different optional parameters.

'src/run_*.py' focus on one particular composition and execution of models. 'mean_predictions.py', 'bagging.py' and 'stacking.py' represent ensembling methods, making use of individual models.

Note that two different validation principles have been used. For simple model evaluation, we split the test set in 90% for learning and 10% for validation, randomized. For ensembling, we used a three-way-split, splitting it in 80% for model learning, 10% for ensemble learning and 10% for validation.

To reproduce the final kaggle submissions, follow the steps listed below. Note that stacked ensembling requires two separate runs of training, one on 80% of the data and one on 90% of the data.

Run training_validation_split.py where TRAIN_PROPORTION is set to 0.8 and 0.9 and THREE_WAY_SPLIT is set to True and False, respectively. Make sure to place the files in appropriate directories so no files are overwritten. You can also choose to use the files in the repository. The training and validation indices corresponding to a 90% - 10% split are in data/ and the training and validation indices corresponding used for stacking are in data/train_valid_80_10_10/. Make sure to correctly set the paths in utils.py depending on where you save the training data.
Run the predictors listed in Table II in the paper accompanying this repository. Specify parameters via the command line, and where no value is mentioned, use the default value specified in the code. Make sure to run them on both the 80% and 90% training sets. You should have a meta_training*, meta_validation* and submission* csv file for each method.
Once all the models have been run put all the meta*.csv files in the locations specified by ENSEMBLE_INPUT_DIR in utils.py. Then run stacking.py, setting STACKING_METHOD to 'nn'. This produces the final submission file. Again, make sure all files are placed in the correct locations.

Reach out to {hahnb, kklein, kuhnl}@student.ethz.ch if you run into any problems.

Name		Name	Last commit message	Last commit date
Latest commit History 354 Commits
analysis		analysis
data		data
report		report
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CIL: Collaborative filtering

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CIL: Collaborative filtering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages