Solutions of the KSI CUNI.CZ team for the RecSys Challenge 2018
- Python 3, tensorFlow, numpy, pandas, scikit-learn
our team participated mainly in the creative track of the RecSys Challenge 2018. As additional dataset, we utilized audio features of the tracks as collected through Spotify Audio API: https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/ (see trackFeaturesDownload.py)
furthermore the solution consisted mainly of Word2Vec models applied on both track-level and album-level (see word2VecDataPreparation.py for sentences preparation and word2vec_basic.py and word2vec_basic_albums.py for model creations). Word2Vec models are utilized by output model (output_model4.py) in the form of pre-calculated top-k most similar tracks/albums for each challenge set track.
audio features are standardized (AudioFeaturesVisualization.ipynb), processed via Siamese network (SiameseNetworkAudioFeatures.py) and utilized according to their stability w.r.t. playlist (calculated in authorAudioDistribution.py) to re-rank recommended tracks and also to recommend additional tracks should the overall stability be high.
output model further utilizes tracks from/with the same album/author/title and overall popularity of tracks. Datasets for these calculations are generated by artist_album_track_ref.py.
note that word2vec model generation is nondeterministic due to the top-k words selection. Ask authors for stored models used for generating solutions.