# Finding Similar Songs on Spotify - Part 2: Siamese Networks

In the first part of this tutorial I have introduced the traditional distance based approach to similarity estimations. The main idea is that features are extracted from the audio content. These features are numeric descriptions of semantically relevant information. An example for a high-level feature is the number of beats per minute which is a description for the tempo of a song. Music feature-sets are more abstract and describe the spectral or rhythmical distribution of energy. These are not single but vectors of numbers. Thus, a song is semantically described by this vector and if the set of extracted features spans over various music characteristics such as rhythm, timbre, harmonics, complexity, etc. then calculating the similarity of the vector's numbers is considered to be an approximation of music similarity. Thus, the lower the numerical distance between two vectors, the higher their acoustic similarity. For this reason these approaches are known as *Distance based* methods. They mainly depend on the selected sets of features and on the similarity metric chosen to compare their values.

In the second part of this tutorial we are now focussing on an approach, where the feature representation, as well as the similarity function is learned from the underlying dataset.


## Tutorial Overview

1. Loading data
2. Preprocess data
3. Define Model
4. Fit Model
5. Evaluate Model



## Requiremnts

The requirements are the same as for the first part of the tutorials. Please follow the instructions of part one if you have trouble running this tutorial.

In [147]:
%load_ext autoreload

%autoreload 2

# visualization
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')

# numeric and scientific processing
import numpy as np
import pandas as pd

# misc
import os
import progressbar

# spotify API
import spotipy
import spotipy.util as util

# local caching
from joblib import Memory

# functions from Tutorial Part 1
import tutorial_functions as tut_func

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


[autoreload of tutorial_functions failed: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
NameError: name 'keras' is not defined
]


# Loading Data

We will use the same data that we downloaded from Spotify in Part 1 of the Tutorial. Because we used the joblib library, we will not have to wait that long, because the data is already cached on our harddrive.

Update the following two variables according the credentials you received from Spotify

In [2]:
os.environ["SPOTIPY_CLIENT_ID"]     = "8a7fffc37b6c44e6b7bc344c3295034c"
os.environ["SPOTIPY_CLIENT_SECRET"] = "f19dd914ba58408c9407dd6479b23812"

The same playlists as used in Part 1:

In [3]:
playlists = [
    
     {"name": "clubbeats",    "uri": "spotify:user:spotify:playlist:37i9dQZF1DXbX3zSzB4MO0"},
     {"name": "softpop",      "uri": "spotify:user:spotify:playlist:37i9dQZF1DWTwnEm1IYyoj"},
     {"name": "electropop",   "uri": "spotify:user:spotify:playlist:37i9dQZF1DX4uPi2roRUwU"},
     {"name": "rockclassics", "uri": "spotify:user:spotify:playlist:37i9dQZF1DWXRqgorJj26U"},
     {"name": "rockhymns",    "uri": "spotify:user:spotify:playlist:37i9dQZF1DX4vth7idTQch"},
     {"name": "soft_rock",    "uri": "spotify:user:spotify:playlist:37i9dQZF1DX6xOPeSOGone"},
     {"name": "metalcore",    "uri": "spotify:user:spotify:playlist:37i9dQZF1DWXIcbzpLauPS"}, 
     {"name": "metal",        "uri": "spotify:user:spotify:playlist:37i9dQZF1DWWOaP4H0w5b0"},
     {"name": "classic_metal","uri": "spotify:user:spotify:playlist:37i9dQZF1DX2LTcinqsO68"},
     {"name": "grunge",       "uri": "spotify:user:spotify:playlist:37i9dQZF1DX11ghcIxjcjE"},
     {"name": "hiphop",       "uri": "spotify:user:spotify:playlist:37i9dQZF1DWVdgXTbYm2r0"},
     {"name": "poppunk",      "uri": "spotify:user:spotify:playlist:37i9dQZF1DXa9wYJr1oMFq"},
     {"name": "classic",      "uri": "spotify:user:spotify:playlist:37i9dQZF1DXcN1fAVSf7CR"}
    
]

Connect to the Spotify API

In [4]:
token = util.prompt_for_user_token("slychief", 
                                   "playlist-modify-public", 
                                   redirect_uri="http://localhost/")

sp = spotipy.Spotify(auth=token)

Define the local chache directory. This should be the same as in Part 1 of the tutorial.

In [5]:
memory = Memory(cachedir='/home/schindler/tmp/spotify/', verbose=0)

Unfortunately I was not able to move this function to the tutorial_functions.py file, due to the @memory annotation. (If you know a way how to solve this, please create a Github-issue with your solution).

In [6]:
@memory.cache
def get_spotify_data(track_id):
    
    # meta-data
    track_metadata      = sp.track(track_id)
    album_metadata      = sp.album(track_metadata["album"]["id"])
    artist_metadata     = sp.artist(track_metadata["artists"][0]["id"])
    
    # feature-data
    sequential_features = sp.audio_analysis(track_id)
    trackbased_features = sp.audio_features([track_id])
    
    return track_metadata, album_metadata, artist_metadata, sequential_features, trackbased_features

Start loading the Spotify Data

In [7]:
# Get Playlist meta-data
playlists = tut_func.get_playlist_metadata(sp, playlists)

# Get track-ids of all playlist entries
playlists = tut_func.get_track_ids(sp, playlists)

num_tracks_total = np.sum([playlist["num_tracks"] for playlist in playlists])

# Fetch data and features from Spotify
pbar = progressbar.ProgressBar(max_value=num_tracks_total)
pbar.start()

raw_track_data      = []
processed_track_ids = []

for playlist in playlists:

    for track_id in playlist["track_ids"]:

        try:
            # avoid duplicates in the data-set
            if track_id not in processed_track_ids:

                # retrieve data from Spotify
                spotify_data = get_spotify_data(track_id)

                raw_track_data.append([playlist["name"], spotify_data])
                processed_track_ids.append(track_id)

        except Exception as e:
            print e

        pbar.update(len(raw_track_data))

 97% (962 of 986) |#################################################################################################################################################################################     | Elapsed Time: 0:00:41 ETA: 0:00:01

# Siamese Networks

A Siamese neural network is a neural network architecture where two inputs are fed into the same stack of network layers. This is where the name comes from. The shared layers are "similar" to Siamese Twins. By feeding two inputs to the shared layers, two representations are generated which can be used for comparison. To train the network according a certain task, it requires labelled data. To learn a simlarity function, these labels should indicate if the two input are similar or dissimilar.

This is exactly the approach initially described by [TODO cite paper]
The authors create pairs of simlar and dissimilar images. These are fed into a Siamese NEtwork stack. Finally, the model calculates the eucledian distance between the two generated representations. A contrastive loss is used, to optimize the learned simlarity.

To calculate the similarity between a seed image and the rest of the collection, the model is applied to predict the distance between this seed image and every other. The result is a list of distances which has to be sorted descendingly.

The following code example follows this approach:

**Keras**

We use the high-level deep learning API Keras. [TODO: link]

[TODO: describe - auf Tom's Tutorial verweisen für instructoins]

In [11]:
from keras.layers import Input, Lambda
from keras.models import Model
from keras.layers import Dense, Dropout
from keras.optimizers import Nadam, SGD
from keras.regularizers import l2, l1
from keras import backend as K
from keras.layers.merge import concatenate

Using Theano backend.
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GTX 1080 (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 5105)


First we define a distance measure to compare the two representations. We will be using the well known Eucledian distance:

In [130]:
def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))

Now we define the Siamese Network Ar

In [64]:
def create_siamese_network(input_dim):

    dense_1 = Dense(100, activation="selu")
    dense_2 = Dense(100, activation="selu")
    dense_3 = Dense(100, activation="selu")

    input_left  = Input(shape=input_dim)
    input_right = Input(shape=input_dim)
    
    layers_left  = dense_1(dense_2(dense_3(input_left)))
    layers_right = dense_1(dense_2(dense_3(input_right)))

    distance = Lambda(euclidean_distance,
                      output_shape=lambda x: x[0])([layers_left, layers_right])

    model = Model([input_left, input_right], distance)
    
    return model

In [66]:
def contrastive_loss(y_true, y_pred):
    '''Contrastive loss from Hadsell-et-al.'06
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
    '''
    margin = 1
    return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

### Aggregate feature-data

Currently we only have a list of raw data-objects retrieved from the Spotify API. We need to transform this information to a more structured format.

In [8]:
# Aggregate Meta-data
metadata = tut_func.aggregate_metadata(raw_track_data)

# Aggregate Feature-data
feature_data = tut_func.aggregate_featuredata(raw_track_data, metadata)

# standardize sequential_features
feature_data -= feature_data.mean(axis=0)
feature_data /= feature_data.std(axis=0)

### Create Data-Pairs

In [9]:
def create_pairs(feature_data, metadata, num_pairs_per_track):
    
    data_pairs = []
    labels     = []
    
    for row_id, q_track in metadata.sample(frac=1).iterrows():
        
        for _ in range(num_pairs_per_track):
            
            # search similar and dissimilar examples
            pos_example = metadata[metadata.playlist == q_track.playlist].sample(1)
            neg_example = metadata[metadata.playlist != q_track.playlist].sample(1)

            # create feature pairs
            data_pairs.append([feature_data[[row_id]][0], feature_data[[pos_example.index]][0]])
            labels.append(1)

            data_pairs.append([feature_data[[row_id]][0], feature_data[[neg_example.index]][0]])
            labels.append(0)

    return np.array(data_pairs), np.array(labels)

In [98]:
data_pairs, labels = create_pairs(feature_data, metadata, 10)

data_pairs.shape

(19240, 2, 69)

In [136]:
import keras
from matplotlib import pyplot as plt






In [142]:
model = create_siamese_network(data_pairs[:,0].shape[1:])

# train
rms = Nadam(lr=0.001)
model.compile(loss=contrastive_loss, optimizer=rms, metrics=["mean_squared_error", "accuracy"])

In [148]:
model.fit([data_pairs[:, 0], data_pairs[:, 1]], 
          labels, 
          batch_size       = 14, 
          verbose          = 0, 
          epochs           = 25, 
          callbacks        = [tut_func.PlotLosses()], 
          validation_split = 0.1)

AttributeError: 'module' object has no attribute 'PlotLosses'

In [144]:
def similar(query_idx, ascending=False):
    res = [model.predict([feature_data[[query_idx]], feature_data[[i]]]) for i in range(feature_data.shape[0])]

    res = np.array(res)
    res = res.reshape(res.shape[0])

    if ascending:
        si = np.argsort(res)[::-1]
    else:
        si = np.argsort(res)

    display_cols = ["artist_name", "title", "album_name", "year", "playlist"]
    
    print metadata.iloc[query_idx]

    return metadata.loc[si, display_cols][:10]

In [145]:
similar(753)

track_id                                  3jJZVeExYzVYiV6Y9Fl3DX
artist_name                                  Stone Temple Pilots
title                                                      Plush
album_name                                                  Core
label                                                      Rhino
duration                                                  310346
popularity                                                    70
year                                                        1992
genres         [alternative metal, alternative rock, classic ...
playlist                                                  grunge
Name: 753, dtype: object


Unnamed: 0,artist_name,title,album_name,year,playlist
753,Stone Temple Pilots,Plush,Core,1992,grunge
768,Pearl Jam,State of Love and Trust,Singles - Original Motion Picture Soundtrack,1992,grunge
759,Soundgarden,Spoonman,Superunknown (20th Anniversary),1994,grunge
760,Alice In Chains,Man in the Box,Facelift,1990,grunge
792,Sponge,Plowed,Rotting Pinata,1994,grunge
809,Gruntruck,Crazy Love,Push,1992,grunge
797,Chris Cornell,Seasons,Singles - Original Motion Picture Soundtrack,1992,grunge
747,Alice In Chains,Would?,Dirt,1992,grunge
800,Love Battery,Between The Eyes,Between The Eyes,1992,grunge
769,Stone Temple Pilots,Creep,Core,1992,grunge


In [97]:
similar(753, ascending=True)

track_id                                  3jJZVeExYzVYiV6Y9Fl3DX
artist_name                                  Stone Temple Pilots
title                                                      Plush
album_name                                                  Core
label                                                      Rhino
duration                                                  310346
popularity                                                    70
year                                                        1992
genres         [alternative metal, alternative rock, classic ...
playlist                                                  grunge
Name: 753, dtype: object


Unnamed: 0,artist_name,title,album_name,year,playlist
787,The Presidents Of The United States Of America,Peaches,The Presidents of The United States of America...,1995,grunge
769,Stone Temple Pilots,Creep,Core,1992,grunge
771,The Presidents Of The United States Of America,Lump,The Presidents of The United States of America...,1995,grunge
775,Seether,Fine Again,Disclaimer,2002,grunge
786,Days Of The New,Shelf In The Room,The Definitive Collection,2008,grunge
813,Malfunkshun,Jezebel Woman,Return To Olympus,1995,grunge
797,Chris Cornell,Seasons,Singles - Original Motion Picture Soundtrack,1992,grunge
795,The Smashing Pumpkins,Quiet,Siamese Dream (2011 - Remaster),1993,grunge
804,Seaweed,Losing Skin,Four,1993,grunge
806,Mother Love Bone,Crown Of Thorns,Mother Love Bone,1992,grunge


In [93]:
def create_siamese_network(input_dim, loss="mean_squared_error"):

    input_left  = Input(shape=input_dim)
    input_right = Input(shape=input_dim)

    dense_1 = Dense(100, activation="selu")
    dense_2 = Dense(100, activation="selu")
    dense_3 = Dense(100, activation="selu")
    
    #network = dense_1(dense_2(dense_3))
    
    layers_left  = dense_1(dense_2(dense_3(input_left)))
    layers_right = dense_1(dense_2(dense_3(input_right)))

    L1_distance = lambda x: K.abs(x[0]-x[1])

    distance = Lambda(L1_distance,
                      output_shape=lambda x: x[0])([layers_left, layers_right])

    prediction = Dense(100, activation="elu")(distance)

    prediction = Dense(1, activation="sigmoid")(prediction)

    model = Model([input_left, input_right], prediction)

    # train
    rms = Nadam(lr=0.001)
    model.compile(loss=loss, optimizer=rms, metrics=["mean_squared_error", "accuracy"])
    
    return model

In [109]:
similar(753, ascending=True)

track_id                                  3jJZVeExYzVYiV6Y9Fl3DX
artist_name                                  Stone Temple Pilots
title                                                      Plush
album_name                                                  Core
label                                                      Rhino
duration                                                  310346
popularity                                                    70
year                                                        1992
genres         [alternative metal, alternative rock, classic ...
playlist                                                  grunge
Name: 753, dtype: object


Unnamed: 0,artist_name,title,album_name,year,playlist
803,Veruca Salt,Volcano Girls,Eight Arms To Hold You,1997,grunge
764,Pearl Jam,Daughter (Remastered),Vs.,1993,grunge
777,Silverchair,Suicidal Dream - Remastered,Frogstomp 20th Anniversary (Deluxe Edition [Re...,2015,grunge
758,Nirvana,All Apologies,In Utero - 20th Anniversary Remaster,1993,grunge
795,The Smashing Pumpkins,Quiet,Siamese Dream (2011 - Remaster),1993,grunge
772,Hole,Celebrity Skin,Celebrity Skin,1998,grunge
799,L7,Pretend We're Dead,Bricks Are Heavy,1992,grunge
809,Gruntruck,Crazy Love,Push,1992,grunge
792,Sponge,Plowed,Rotting Pinata,1994,grunge
798,Jerry Cantrell,Cut You In,Boggy Depot,1998,grunge


In [129]:
similar(470, ascending=True)

track_id                                  6DzuDDN9q4N29QXWDuQ8sx
artist_name                                         Ugly Kid Joe
title                                         Cats In The Cradle
album_name                                America's Least Wanted
label                   Digital Distribution Trinidad and Tobago
duration                                                  242173
popularity                                                    59
year                                                        1992
genres         [funk metal, glam metal, hard rock, post-grung...
playlist                                               soft_rock
Name: 470, dtype: object


Unnamed: 0,artist_name,title,album_name,year,playlist
470,Ugly Kid Joe,Cats In The Cradle,America's Least Wanted,1992,soft_rock
484,Hoobastank,The Reason,The Reason,2004,soft_rock
496,Eric Clapton,My Father's Eyes,Forever Man,2015,soft_rock
473,Whitesnake,Is This Love - 2007 Remastered Version,1987 (2007 Remaster),1987,soft_rock
500,Def Leppard,Hysteria 2013 (Re-Recorded Version) - Single,Hysteria 2013 (Re-Recorded Version) - Single,2013,soft_rock
465,Metallica,Nothing Else Matters,Metallica,1991,soft_rock
453,REO Speedwagon,Keep on Loving You - Remastered,Hi Infidelity (30th Anniversary Edition),1980,soft_rock
557,Chris Isaak,Wicked Game - Remastered,Best Of Chris Isaak,2006,soft_rock
526,Keane,Somewhere Only We Know,Hopes And Fears,2004,soft_rock
527,Foreigner,Urgent - 2008 Remastered Version,No End In Sight: The Very Best Of Foreigner,2008,soft_rock


# Evaluate

In [74]:
def evaluate(similarity_function, cut_off):

    global dist
    
    all_precisions = []

    for idx in metadata.index.values:

        dist           = similarity_function(feature_data, feature_data[[idx]])
        dist           = np.array(dist).reshape(len(dist))
        similar_tracks = metadata.loc[np.argsort(dist)[:cut_off]]
        same_label     = similar_tracks["playlist"] == metadata.loc[idx, "playlist"]
        precision      = same_label.sum() / float(cut_off)
        all_precisions.append(precision)

    all_precisions = np.array(all_precisions)

    return all_precisions.mean()

In [21]:
evaluate(lambda x,y: [model.predict([x[[i]], y]) for i in range(feature_data.shape[0])], 20)

0.84168399168399166

In [60]:
evaluate(lambda x,y: [model.predict([x[[i]], y]) for i in range(feature_data.shape[0])], 20)

0.91460498960498948

In [75]:
evaluate(lambda x,y: [model.predict([x[[i]], y]) for i in range(feature_data.shape[0])], 20)

0.97505197505197505

In [102]:
playlist_names = [pl["name"] for pl in playlists]

playlist_similarities = pd.DataFrame(np.zeros((len(playlist_names),len(playlist_names))), 
                                     index   = playlist_names, 
                                     columns = playlist_names)

In [103]:
sim = [[["clubbeats",    "electropop"],    0.8],
       [["clubbeats",    "softpop"],      0.4],
       [["electropop",   "hiphop"],      0.4],
       [["softpop",      "soft_rock"],      0.2],
       [["softpop",      "electropop"],      0.4],
       [["softpop",    "hiphop"],      0.1],
       [["rockclassics",    "rockhymns"],      0.7],
       [["soft_rock",    "rockclassics"],      0.3],
       [["soft_rock",    "rockhymns"],      0.3],
       [["metalcore",    "metal"],      0.7],
       [["metalcore",    "classic_metal"],      0.6],
       [["metal",    "classic_metal"],      0.8],
       [["classic_metal",    "grunge"],      0.5],
       [["metal",    "grunge"],      0.5],
       [["rockhymns",    "grunge"],      0.2],
       [["poppunk",    "metal"],      0.6],
       [["poppunk",    "classic_metal"],      0.4],
       [["poppunk",    "rockhymns"],      0.5],
       [["poppunk",    "rockclassics"],      0.4]
     ]

# self-similarity
for i in range(len(playlist_names)):
    for j in range(len(playlist_names)):
        if i == j:
            playlist_similarities.iloc[i,j] = 1.0

for s in sim:
    playlist_similarities.loc[s[0][0],s[0][1]] = s[1]
    playlist_similarities.loc[s[0][1],s[0][0]] = s[1]

playlist_similarities

Unnamed: 0,clubbeats,softpop,electropop,rockclassics,rockhymns,soft_rock,metalcore,metal,classic_metal,grunge,hiphop,poppunk,classic
clubbeats,1.0,0.4,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
softpop,0.4,1.0,0.4,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.1,0.0,0.0
electropop,0.8,0.4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0
rockclassics,0.0,0.0,0.0,1.0,0.7,0.3,0.0,0.0,0.0,0.0,0.0,0.4,0.0
rockhymns,0.0,0.0,0.0,0.7,1.0,0.3,0.0,0.0,0.0,0.2,0.0,0.5,0.0
soft_rock,0.0,0.2,0.0,0.3,0.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
metalcore,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.7,0.6,0.0,0.0,0.0,0.0
metal,0.0,0.0,0.0,0.0,0.0,0.0,0.7,1.0,0.8,0.5,0.0,0.6,0.0
classic_metal,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.8,1.0,0.5,0.0,0.4,0.0
grunge,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.5,0.5,1.0,0.0,0.0,0.0


In [104]:
def create_pairs_with_sims(feature_data, metadata, num_pairs_per_track, playlist_similarities):
    
    data_pairs = []
    labels     = []
    
    for row_id, q_track in metadata.sample(frac=1).iterrows():
        
        for _ in range(num_pairs_per_track):
            
            # search similar and dissimilar examples
            pos_example = metadata[metadata.playlist == q_track.playlist].sample(1)
            neg_example = metadata[metadata.playlist != q_track.playlist].sample(1)

            # create feature pairs
            data_pairs.append([feature_data[[row_id]][0], feature_data[[pos_example.index]][0]])
            labels.append(playlist_similarities.loc[q_track.playlist, pos_example.playlist])

            data_pairs.append([feature_data[[row_id]][0], feature_data[[neg_example.index]][0]])
            labels.append(playlist_similarities.loc[q_track.playlist, neg_example.playlist])

    return np.array(data_pairs), np.array(labels)

In [105]:
data_pairs, labels = create_pairs_with_sims(feature_data, metadata, 10, playlist_similarities)

In [106]:
data_pairs.shape

(19240, 2, 69)

In [107]:
model = create_siamese_network(data_pairs[:,0].shape[1:])

In [108]:
model.fit([data_pairs[:, 0], data_pairs[:, 1]], labels, batch_size=24, verbose=1, epochs=25)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f814a044e90>

In [92]:
evaluate(lambda x,y: [model.predict([x[[i]], y]) for i in range(feature_data.shape[0])], 20)

0.94339916839916838

In [119]:
def create_pairs_with_sims_and_identity(feature_data, metadata, num_pairs_per_track, playlist_similarities):
    
    data_pairs = []
    labels     = []
    
    for row_id, q_track in metadata.sample(frac=1).iterrows():
        
        data_pairs.append([feature_data[[row_id]][0], feature_data[[row_id]][0]])
        labels.append(1)
        
        for _ in range(num_pairs_per_track):
            
            # search similar and dissimilar examples
            pos_example = metadata[metadata.playlist == q_track.playlist].sample(1)
            neg_example = metadata[metadata.playlist != q_track.playlist].sample(1)

            # create feature pairs
            data_pairs.append([feature_data[[row_id]][0], feature_data[[pos_example.index]][0]])
            labels.append(playlist_similarities.loc[q_track.playlist, pos_example.playlist] - 0.1)

            data_pairs.append([feature_data[[row_id]][0], feature_data[[neg_example.index]][0]])
            labels.append(playlist_similarities.loc[q_track.playlist, neg_example.playlist] - 0.1)

    return np.array(data_pairs), np.array(labels)

In [120]:
data_pairs, labels = create_pairs_with_sims_and_identity(feature_data, metadata, 10, playlist_similarities)

In [121]:
model = create_siamese_network(data_pairs[:,0].shape[1:])

In [122]:
model.fit([data_pairs[:, 0], data_pairs[:, 1]], labels, batch_size=24, verbose=1, epochs=25)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f8139010e90>

In [250]:
evaluate(lambda x,y: [model.predict([x[[i]], y]) for i in range(feature_data.shape[0])], 20)

0.84655720338983054

In [110]:
def aggregate_features_sequential(seq_data, track_data, len_segment, m_data, with_year=False, with_popularity=False):
    
    # sequential data
    segments = seq_data["segments"]
    sl       = len(segments)
    
    mfcc              = np.array([s["timbre"]            for s in segments])
    chroma            = np.array([s["pitches"]           for s in segments])
    loudness_max      = np.array([s["loudness_max"]      for s in segments]).reshape((sl,1))
    loudness_start    = np.array([s["loudness_start"]    for s in segments]).reshape((sl,1))
    loudness_max_time = np.array([s["loudness_max_time"] for s in segments]).reshape((sl,1))
    duration          = np.array([s["duration"]          for s in segments]).reshape((sl,1))
    confidence        = np.array([s["confidence"]        for s in segments]).reshape((sl,1))
    
    # concatenate sequential features
    sequential_features = np.concatenate([mfcc, chroma, loudness_max, loudness_start, 
                                          loudness_max_time, duration, confidence], axis=1)
    
    offset  = np.random.randint(0, sl - len_segment)
    segment = sequential_features[offset:(offset+len_segment),:]
        
    # track-based data
    track_features = [track_data[0]["acousticness"],     # acoustic or not?
                      track_data[0]["danceability"],     # danceable?
                      track_data[0]["energy"],           # energetic or calm?
                      track_data[0]["instrumentalness"], # is somebody singing?
                      track_data[0]["liveness"],         # live or studio?
                      track_data[0]["speechiness"],      # rap or singing?
                      track_data[0]["tempo"],            # slow or fast?
                      track_data[0]["time_signature"],   # 3/4, 4/4, 6/8, etc.
                      track_data[0]["valence"]]          # happy or sad?
    
    if with_year:
        track_features.append(int(m_data["year"]))
        
    if with_popularity:
        track_features.append(int(m_data["popularity"]))
        
    
    return segment, track_features


In [111]:
len_segment = 20

sequential_features = []
trackbased_features = []

for i, (_, spotify_data) in enumerate(raw_track_data):
    
    _, _, _, f_sequential, f_trackbased = spotify_data
    
    seq_feat, track_feat = aggregate_features_sequential(f_sequential, 
                                                         f_trackbased, 
                                                         len_segment, 
                                                         metadata.loc[i],
                                                         with_year=True,
                                                         with_popularity=True)
    
    sequential_features.append(seq_feat)
    trackbased_features.append(track_feat)
    
sequential_features = np.asarray(sequential_features)
trackbased_features = np.asarray(trackbased_features)

print "sequential_features.shape:", sequential_features.shape
print "trackbased_features.shape:", trackbased_features.shape

sequential_features.shape: (962, 20, 29)
trackbased_features.shape: (962, 11)


In [112]:
# standardize sequential_features
rows, x, y = sequential_features.shape
sequential_features = sequential_features.reshape(rows, (x * y))
sequential_features -= sequential_features.mean(axis=0)
sequential_features /= sequential_features.std(axis=0)
sequential_features = sequential_features.reshape(rows, x, y)

In [113]:
# standardize trackbased_features
trackbased_features -= trackbased_features.mean(axis=0)
trackbased_features /= trackbased_features.std(axis=0)

In [114]:
def create_pairs_with_sims_and_identity_segments(sequential_features, trackbased_features, metadata, num_pairs_per_track, playlist_similarities):
    
    data_pairs_seq   = []
    data_pairs_track = []
    labels           = []
    
    for row_id, q_track in metadata.sample(frac=1).iterrows():
        
        data_pairs_seq.append([sequential_features[[row_id]][0], sequential_features[[row_id]][0]])
        data_pairs_track.append([trackbased_features[[row_id]][0], trackbased_features[[row_id]][0]])
        labels.append(1)
        
        for _ in range(num_pairs_per_track):
            
            # search similar and dissimilar examples
            pos_example = metadata[metadata.playlist == q_track.playlist].sample(1)
            neg_example = metadata[metadata.playlist != q_track.playlist].sample(1)

            # create feature pairs
            data_pairs_seq.append([sequential_features[[row_id]][0], sequential_features[[pos_example.index]][0]])
            data_pairs_track.append([trackbased_features[[row_id]][0], trackbased_features[[pos_example.index]][0]])
            labels.append(playlist_similarities.loc[q_track.playlist, pos_example.playlist] - 0.1)

            data_pairs_seq.append([sequential_features[[row_id]][0], sequential_features[[neg_example.index]][0]])
            data_pairs_track.append([trackbased_features[[row_id]][0], trackbased_features[[neg_example.index]][0]])
            labels.append(playlist_similarities.loc[q_track.playlist, neg_example.playlist] - 0.1)

    return np.array(data_pairs_seq), np.array(data_pairs_track), np.asarray(labels)

In [115]:
data_pairs_seq, data_pairs_track, labels = create_pairs_with_sims_and_identity_segments(sequential_features,
                                                                                        trackbased_features,
                                                                                        metadata, 
                                                                                        10, 
                                                                                        playlist_similarities)

In [116]:
from keras.layers.recurrent import LSTM
from keras.layers import Bidirectional, Input, Lambda
import random
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Input, Lambda,Convolution1D
from keras.optimizers import RMSprop, Nadam, SGD
from keras.regularizers import l2, l1
from keras import backend as K
#from keras.constraint import unit_norm
from keras.layers.merge import concatenate

In [117]:
input_dim = data_pairs_seq[:, 0].shape[1:]

input_a = Input(shape=data_pairs_seq[:, 0].shape[1:])
input_b = Input(shape=data_pairs_seq[:, 0].shape[1:])
input_a2 = Input(shape=data_pairs_track[:, 0].shape[1:])
input_b2 = Input(shape=data_pairs_track[:, 0].shape[1:])

bdlstm = Bidirectional(LSTM(29, return_sequences=False, activation="selu"))

processed_a = bdlstm(input_a)
processed_b = bdlstm(input_b)

dens = Dense(9, activation="selu")

processed_a2 = dens(input_a2)
processed_b2 = dens(input_b2)

left = concatenate([processed_a, processed_a2], axis=1)
right = concatenate([processed_b, processed_b2], axis=1)

L1_distance = lambda x: K.abs(x[0]-x[1])

distance = Lambda(L1_distance,
                  output_shape=lambda x: x[0])([left, right])

prediction = Dense(29 + 9, activation="elu")(distance)
#prediction = Dense(64, activation="elu")(prediction)

prediction = Dense(1, activation="sigmoid")(prediction)

model = Model([input_a, input_b, input_a2, input_b2], prediction)

# train
rms = Nadam(lr=0.001)
model.compile(loss="mean_squared_error", optimizer=rms, metrics=["mean_squared_error", "accuracy"])

In [118]:
model.fit([data_pairs_seq[:, 0], data_pairs_seq[:, 1], data_pairs_track[:,0], data_pairs_track[:,1]], labels, batch_size=24, verbose=1, epochs=25)

KeyboardInterrupt: 

In [314]:
def evaluate(similarity_function, cut_off):

    all_precisions = []
    
    pbar = progressbar.ProgressBar()

    for idx in pbar(metadata.index.values):

        dist           = similarity_function(sequential_features, sequential_features[[idx]], trackbased_features, trackbased_features[[idx]])
        dist           = np.array(dist).reshape(len(dist))
        similar_tracks = metadata.loc[np.argsort(dist)[::-1][:cut_off]]
        same_label     = similar_tracks["playlist"] == metadata.loc[idx, "playlist"]
        precision      = same_label.sum() / float(cut_off)
        all_precisions.append(precision)

    all_precisions = np.array(all_precisions)

    return all_precisions.mean()

In [315]:
evaluate(lambda w,x,y,z: [model.predict([w[[i]],x,y[[i]],z]) for i in range(sequential_features.shape[0])], 20)

100% (944 of 944) |#####################################################################################################################################################################################| Elapsed Time: 1:20:00 Time: 1:20:00


0.85317796610169494