No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Latest commit 620eaad Jul 30, 2018
Permalink
Failed to load latest commit information.
.settings First commit Jul 2, 2018
logos added U of T logos Jul 21, 2018
script added SVD stuff Jul 2, 2018
src/main/java executor now generates submission Jul 12, 2018
.classpath minor Jul 3, 2018
.gitignore added git ignore Jul 2, 2018
.project First commit Jul 2, 2018
LICENSE added license Jul 5, 2018
README.md Update README.md Jul 30, 2018
pom.xml minor Jul 3, 2018

README.md

ACM RecSys Challenge 2018 Team vl6

Team members: Maksims Volkovs, Himanshu Rai, Zhaoyue Cheng, Yichao Lu (University of Toronto), Ga Wu (University of Toronto, Vector Institute), Scott Sanner (University of Toronto, Vector Institute)

Contact: maks@layer6.ai

Introduction

This repository contains the Java implementation of our entries for both main and creative tracks. Our approach consists of a two-stage model where in the first stage a blend of collaborative filtering methods is used to quickly retrieve a set of candidate songs for each playlist with high recall. Then in the second stage a pairwise playlist-song gradient boosting model is used to re-rank the retrieved candidates and maximize precision at the top of the recommended list.

Environment

The model is implemented in Java and tested on the following environment:

  • Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
  • 256GB RAM
  • Nvidia Titan V
  • Java Oracle 1.8.0_171
  • Python, Numpy 1.14.3, Sklearn 0.19.1, Scipy 1.1.0
  • Apache Maven 3.3.9
  • CUDA 8.0 and CUDNN 8.0
  • Intel MKL 2018.1.038
  • XGBoost and XGBoost4j 0.7

Executing

All models are executed from src/main/java/main/Executor.java, the main function has examples on how to do main and creative track model training, evaluation and submission. To run the model:

  • Set all paths:
//OAuth token for spotify creative api, if doing creative track submission
String authToken = "";

// path to song audio feature file, if doing creative track submission
String creativeTrackFile = "/home/recsys2018/data/song_audio_features.txt";

// path to MPD directory with the JSON files
String trainPath = "/home/recsys2018/data/train/";

// path to challenge set JSON file
String testFile = "/home/recsys2018/data/test/challenge_set.json";

// path to python SVD script included in the repo, default location: script/svd_py.py
String pythonScriptPath = "/home/recsys2018/script/svd_py.py";

//path to cache folder for temp storage, at least 20GB should be available in this folder
String cachePath = "/home/recsys2018/cache/";
  • Compile and execute with maven:
export MAVEN_OPTS="-Xms150g -Xmx150g"
mvn clean compile
mvn exec:java -Dexec.mainClass="main.Executor" 

Note that by default the code is executing model for the main track, to run the creative track model set xgbParams.doCreative = true. For the creative track we extracted extra song features from the Spotify Audio API. We were able to match most songs from the challenge Million Playlist Dataset, and used the following fields for further feature extraction: [acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time_signature, valence]. In order to download the data for this track, you need to get the OAuth Token from Spotify API page and assign it to the authToken variable in the Executor.main function.

We prioritized speed over memory for this project so you'll need at least 100GB of RAM to run model training and inference. The full end-to-end runtime takes approximately 1.5 days.