Solutions of the KSI CUNI.CZ team for the RecSys Challenge 2018
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
AudioFeaturesVisualization.ipynb
CleanWordToVecFromIncorrectEntries.ipynb
FeatureVarianceAnalysis.ipynb
LICENSE
README.md
SiameseNetworkAudioFeatures.py
TracksCorrection.ipynb
artist_album_track_ref.py
authorAudioDistribution.py
deeper_stats.py
interListDiversity.py
knnPopularityBasedRecommendation.py
mostPopularRecommendation.py
outputModel4.py
trackFeaturesDownload.py
word2VecDataPreparation.py
word2vec_basic.py
word2vec_basic_albums.py

README.md

RecSysChallenge2018

Solutions of the KSI CUNI.CZ team for the RecSys Challenge 2018

Dependencies

  • Python 3, tensorFlow, numpy, pandas, scikit-learn

High-level overview

  • our team participated mainly in the creative track of the RecSys Challenge 2018. As additional dataset, we utilized audio features of the tracks as collected through Spotify Audio API: https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/ (see trackFeaturesDownload.py)

  • furthermore the solution consisted mainly of Word2Vec models applied on both track-level and album-level (see word2VecDataPreparation.py for sentences preparation and word2vec_basic.py and word2vec_basic_albums.py for model creations). Word2Vec models are utilized by output model (output_model4.py) in the form of pre-calculated top-k most similar tracks/albums for each challenge set track.

  • audio features are standardized (AudioFeaturesVisualization.ipynb), processed via Siamese network (SiameseNetworkAudioFeatures.py) and utilized according to their stability w.r.t. playlist (calculated in authorAudioDistribution.py) to re-rank recommended tracks and also to recommend additional tracks should the overall stability be high.

  • output model further utilizes tracks from/with the same album/author/title and overall popularity of tracks. Datasets for these calculations are generated by artist_album_track_ref.py.

  • note that word2vec model generation is nondeterministic due to the top-k words selection. Ask authors for stored models used for generating solutions.

Instructions:

  • TBD