Author: Georgios Damaskinos (georgios.damaskinos@gmail.com)
SwIFT is a recommender introduced in Capturing the Moment: Lightweight Similarity Computations that employs a new similarity metric, namely I-SIM. A secondary application of I-SIM, namely I-TRUST (presented in the same paper), can be found here.
The following steps evaluate SwIFT on the MovieLens100K dataset with a test set of 100 ratings.
- Setup
- Ubuntu 18.04.3 LTS
- Python 3.7
- Java 1.8.0_77
- spark-2.4.5-bin-hadoop2.7
- apache-cassandra-3.11.6
sudo apt-get install bc
- Export variables:
- JAVA_HOME
- SPARK_HOME
- CASSANDRA_HOME
- Setup cluster:
cd utils/ RUN_TESTS=1 bash deploy_cluster.sh STANDALONE /path/to/spark_localdir/
- Shutdown cluster:
cd utils/ bash stop_cluster.sh STANDALONE
- Download and extract the MovieLens100K dataset
- Parse dataset:
python parsers/movielensParser.py ml-100k/u.data dataset.csv bash parsers/splitter.sh dataset.csv ./ 100
- Deploy SwIFT
bash local_deploy.sh trainingSet.csv testSet.csv 5 10 0 100 1 9995 /path/to/log
SwIFT consists of two main components:
- Frontend
- Accumulates the ratings in microbatches and sends them to the backend
- Provides recommendations to users and computes CTR and recall
- Backend
- Bootstraps (if necessary) the database
- Incrementally updates the database given the microbatches that the frontend sends
- Logs latency measurements and computes the RMSE
More detailed information is available in the paper.