# Tracklist Generator: Recommendation
We can now bring our different models together to build a system which can recommend tracklists for a user, combining the recommendation model trained in the [Embeddings notebook](2.%20Embeddings.ipynb) and the tracklist model trained in the [Tracklist Model notebook](3.%20Tracklist_Model.ipynb). To see this in action and create customised tracklists, visit the [Tracklist Generator webpage](https://tracklist-generator.azurewebsites.net/).

In [1]:
import pandas as pd
import numpy as np
import time
from scipy.sparse import csr_matrix
import scipy.sparse
import pickle
from sklearn.metrics.pairwise import cosine_similarity
import tensorflow as tf
from tensorflow.keras.layers import Input, Lambda, Dense, Concatenate, \
LSTM, Activation, Embedding, Add, Dot, Multiply, Dropout, BatchNormalization,Flatten
from tensorflow.keras.models import Model,load_model
tf.compat.v1.logging.set_verbosity(40)

## Model Setup
First, we import and setup the necessary components of our two models.
### TL-MGCCF recommendation model
We start with the recommendation model trained in the [Embeddings notebook](2.%20Embeddings.ipynb). We need to import the data inputs as well as the weights of the trained model, and sample the embeddings to produce vectors that can be used in the full tracklist recommendation model.
#### Imports

In [2]:
song_embeds = pd.read_csv('gcn_song_embeddings_2021.csv',index_col = 0)
artist_embeds = pd.read_csv('gcn_artist_embeddings_2021.csv',index_col = 0)
user_embeds = pd.read_csv('gcn_user_embeddings_2021.csv',index_col = 0)

gcn_output_user = pd.read_csv('gcn_user_embeddings_2021_unnormalized.csv',index_col = 0)
gcn_output_song = pd.read_csv('gcn_song_embeddings_2021_unnormalized.csv',index_col = 0)
gcn_output_artist = pd.read_csv('gcn_artist_embeddings_2021_unnormalized.csv',index_col = 0)

In [3]:
with open('Sparse Matrices v2/song_artist_map_sparse.pkl','rb') as f:
    song_artist_map_sparse = pickle.load(f)
    
with open('Sparse Matrices v2/song_song_sparse.pkl','rb') as f:
    song_song_sparse = pickle.load(f)

with open('Sparse Matrices v2/artist_artist_sparse.pkl','rb') as f:
    artist_artist_sparse = pickle.load(f)

In [4]:
with open('Sparse Matrices v2/user_song_sparse_scaled_user.pkl','rb') as f:
    user_song_scaled_user = np.array(pickle.load(f).todense())
    
with open('Sparse Matrices v2/user_song_sparse_scaled_song.pkl','rb') as f:
    user_song_scaled_song = np.array(pickle.load(f).todense())

with open('Sparse Matrices v2/user_artist_sparse_scaled_user.pkl','rb') as f:
    user_artist_scaled_user = np.array(pickle.load(f).todense())

with open('Sparse Matrices v2/user_artist_sparse_scaled_artist.pkl','rb') as f:
    user_artist_scaled_artist = np.array(pickle.load(f).todense())

In [5]:
with open('Sparse Matrices v2/song_lst.pkl','rb') as f:
    song_lst = pickle.load(f)

with open('Sparse Matrices v2/artist_lst.pkl','rb') as f:
    artist_lst = pickle.load(f)

with open('Sparse Matrices v2/user_lst.pkl','rb') as f:
    user_lst = pickle.load(f)

In [6]:
with open('Sparse Matrices v2/user_adj.pkl','rb') as f:
    user_adj = pickle.load(f)
    
user_adj_sparse = csr_matrix(user_adj)

with open('Sparse Matrices v2/user_selection_mat_sparse.pkl','rb') as f:
    user_selection_mat_sparse = pickle.load(f)

In [7]:
with open('user_recommendation_2021/tl_mgccf_weights_expanded.pkl','rb') as f:
    tl_mgccf_weights = pickle.load(f)

In [8]:
rec_user_embeds = tl_mgccf_weights['user_embedding']
rec_song_embeds = tl_mgccf_weights['song_embedding']
rec_artist_embeds = tl_mgccf_weights['artist_embedding']

#### Setup
We need to define the function used for the main sampling step of the TL-MGCCF model, i.e. the Bipar-GCN step, amd import a couple of previously used utility functions.

In [9]:
from tl_utils import convert_scipy_sparse_to_sparse_tensor,normalize_graph,categorical

We then adapt the Bipar-GCN code from our [Embeddings notebook](2.%20Embeddings.ipynb) so we can re-create the sampling process for new recommendations.

In [10]:
def mgccf_bipar_gcn(user_embed,item_embed,level,item_type,
                          user_item_adjacency_scaled_user,user_item_adjacency_scaled_item,
                   n_samples = [15,10]):
    """Runs the MGCCF Bipar-GCN step for a single layer and single item type to produce
    outputs h^u_k, h^v_k.

    Args:
        user_embed: User embedding matrix from previous layer (h^u_{k-1})
        item_embed: Item embedding matrix from previous layer (h^v_{k-1})
        level: Layer number (k)
        item_type: The type of item entity (song or artist)
        user_item_adjacency_scaled_user: user-item click matrix, scaled along user axis
        user_item_adjacency_scaled_item: user-item click matrix, scaled along item axis
        n_samples: list containing the number of samples to take at each level

    Returns:
        h_u_out: Level k user embedding
        h_v_out: Level k item embedding
    """
    item_level = item_type + '_%d'%level
    level_item = '_%d'%level + '_' + item_type

    h_u = user_embed
    h_v = item_embed


    N_u = categorical(tf.math.log(user_item_adjacency_scaled_user+1e-10),
                      n_samples[level-1])
    N_v = categorical(tf.math.log(tf.transpose(user_item_adjacency_scaled_item+1e-10)),
                      n_samples[level-1])

    N_u_vecs = tf.nn.embedding_lookup(item_embed,N_u)
    N_v_vecs = tf.nn.embedding_lookup(user_embed,N_v)

    N_u_vecs = tf.matmul(N_u_vecs,tl_mgccf_weights['Q_u'+level_item])
    N_v_vecs = tf.matmul(N_v_vecs,tl_mgccf_weights['Q_'+item_level])

    N_u_vecs_agg = tf.nn.tanh(tf.reduce_mean(N_u_vecs,axis=1))
    N_v_vecs_agg = tf.nn.tanh(tf.reduce_mean(N_v_vecs,axis=1))

    h_u_concat = tf.concat([h_u,N_u_vecs_agg],axis=-1)
    h_v_concat = tf.concat([h_v,N_v_vecs_agg],axis=-1)

    h_u_out = tf.nn.tanh(tf.matmul(h_u_concat,tl_mgccf_weights['W_u'+level_item],transpose_b=True))
    h_v_out = tf.nn.tanh(tf.matmul(h_v_concat,tl_mgccf_weights['W_'+item_level],transpose_b=True))
    return h_u_out,h_v_out 

In [11]:
song_artist_map_tensor = convert_scipy_sparse_to_sparse_tensor(song_artist_map_sparse)

user_song_graph_scaled_user = tf.constant(user_song_scaled_user,dtype=tf.float32)
user_song_graph_scaled_song = tf.constant(user_song_scaled_song,dtype=tf.float32)
user_artist_graph_scaled_user = tf.constant(user_artist_scaled_user,dtype=tf.float32)
user_artist_graph_scaled_artist = tf.constant(user_artist_scaled_artist,dtype=tf.float32)

Using the same process as in the [Embeddings notebook](2.%20Embeddings.ipynb), we can then generate a representative recommendation score for each song for each user by repeatedly sampling and taking the average result, before taking the cosine similarity. We add 1 to the cosine similarity scores so that all scores are positive, ranging between 0 and 2.

In [12]:
user_vecs_out = np.zeros((1275,64))
song_artist_concat_vecs_out = np.zeros((31214,64))

num_repeats = 100
for j in range(num_repeats):
    h_u_0_s,h_s_0 = mgccf_bipar_gcn(rec_user_embeds,
                                    rec_song_embeds,1,'s',
                                    user_song_graph_scaled_user,
                                    user_song_graph_scaled_song)

    h_u_0_a,h_a_0 = mgccf_bipar_gcn(rec_user_embeds,
                                    rec_artist_embeds,1,'a',
                                    user_artist_graph_scaled_user,
                                    user_artist_graph_scaled_artist)


    h_u_1_s,h_s_1 = mgccf_bipar_gcn(h_u_0_s,h_s_0,2,'s',
                                    user_song_graph_scaled_user,
                                    user_song_graph_scaled_song)

    h_u_1_a,h_a_1 = mgccf_bipar_gcn(h_u_0_a,h_a_0,2,'a',
                                    user_artist_graph_scaled_user,
                                             user_artist_graph_scaled_artist)

    user_vecs_final_s = h_u_1_s.numpy() + gcn_output_user
    user_vecs_final_a = h_u_1_a.numpy() + gcn_output_user
    song_vecs_final = h_s_1.numpy() + gcn_output_song
    artist_vecs_final = h_a_1.numpy() + gcn_output_artist

    user_vecs_final = np.concatenate([user_vecs_final_s,user_vecs_final_a],axis=-1)
    song_artist_concat_vecs = np.concatenate([song_vecs_final,
                                              song_artist_map_sparse @ artist_vecs_final],
                                             axis=-1)
    
    user_vecs_out += user_vecs_final
    song_artist_concat_vecs_out += song_artist_concat_vecs
user_vecs_out /= num_repeats
song_artist_concat_vecs_out /= num_repeats

In [13]:
user_rec_scores = 1 + pd.DataFrame(cosine_similarity(
    user_vecs_out,song_artist_concat_vecs_out),
                          index=user_lst,columns=song_lst)

We can take a look at the top recommendations for a few different users:

In [14]:
user_rec_scores.loc['Dom Dolla',:].sort_values(ascending=False).iloc[:20]

Claude VonStroke & Eddy M - Getting Hot                                1.835810
Chris Lake & Lee Foss - Lies, Deception And Fantasy                    1.822132
Duke Dumont ft. Shaun Ross - Red Light, Green Light (Biscits Remix)    1.819941
Biscits - The Pressure                                                 1.819771
Billy Kenny - Just Came For The Music                                  1.813342
Sonny Fodera & Biscits ft. Sarah Kellar - Scratch My Back              1.810984
Billy Kenny & BOT - Just A Groove                                      1.809699
Billy Kenny & Wongo - 4 My PPL                                         1.809611
Eat More Cake - Heat Of The Night (Dom Dolla Remix)                    1.804149
Ocean Roulette & VNSSA - Magic (Steve Darko Remix)                     1.803498
Cloonee - The Ciggie                                                   1.799076
Dom Dolla - You                                                        1.785118
Dom Dolla & Go Freek - Define           

In [15]:
user_rec_scores.loc['Don Diablo',:].sort_values(ascending=False).iloc[:20]

Ali Gatie - What If I Told You That I Love You (Don Diablo Remix)        1.762122
Don Diablo & Matt Nash ft. Noonie Bao - Starlight (Could You Be Mine)    1.751797
Danny Olson & Henry Hartley - Halcyon                                    1.727524
CID ft. Conrad Sewell - Secrets (BROHUG Remix)                           1.721991
Don Diablo - AnyTime                                                     1.721475
Ed Sheeran - Don't (Don Diablo Remix)                                    1.720006
SIKS & Adrien Toma - Get Funky                                           1.719788
Don Diablo - On My Mind                                                  1.715682
Don Diablo - You Can't Change Me                                         1.711892
Don Diablo - Tonight                                                     1.711120
VY•DA - With You                                                         1.708976
MØ & Diplo - Sun In Our Eyes (Don Diablo Remix)                          1.708782
Rihanna - Love O

In [16]:
user_rec_scores.loc['Anna Lunoe',:].sort_values(ascending=False).iloc[:20]

Phantoms - Designs For You                                  1.763858
Disclosure - Ecstasy                                        1.750576
Anna Lunoe ft. Nakamura Minami - Ice Cream                  1.748115
BRONSON ft. lau.ra - Heart Attack                           1.745969
Diplo & Wax Motif - Love To The World                       1.742636
Disclosure ft. Aminé & Slowthai - My High                  1.741900
Julio Bashmore - Au Seve                                    1.737348
Yaeji - Raingurl                                            1.736492
Disclosure - Get Close                                      1.725920
Destructo & Gerry Gonza - Shots To The Dome                 1.724468
Anna Lunoe & Nina Las Vegas - One Thirty                    1.719790
Disclosure ft. Sam Smith - Latch                            1.716174
Lastlings - Take My Hand                                    1.712100
The Aces - Daydream (Snakehips Remix)                       1.711540
Y-DAPT - Awesome                  

Again following the [Embeddings notebook](2.%20Embeddings.ipynb), we also get song-song recommendation scores by taking the cosine similarity of the matrix sum of the song embeddings with their mapped artist embeddings. 

In [17]:
song_artist_sum = song_embeds.values + song_artist_map_sparse @ artist_embeds.values
song_rec_scores = pd.DataFrame(cosine_similarity(song_artist_sum)+1,index=song_lst,columns=song_lst)

We save the resulting user and song representation vectors for use in future prediction and/or recommendation, along with the vectors used to calculate song similarity.

In [18]:
#with open('user_rec_vecs.pkl','wb') as f:
#    pickle.dump(user_vecs_out,f)
    
#with open('song_artist_rec_vecs.pkl','wb') as f:
#    pickle.dump(song_artist_concat_vecs_out,f)

In [19]:
#with open('song_sim_rec_vecs.pkl','wb') as f:
#    pickle.dump(song_artist_sum,f)"""

### Tracklist model
We now import the components of the tracklist model trained in the [Tracklist Model notebook](3.%20Tracklist_Model.ipynb), including the clustering assignments and the trained model itself.

In [20]:
assignments_df_1 = pd.read_csv(
    'graph_hierarchy_models/Sampled 2021 v2/assignments_df_1.csv',index_col=0)
assignments_df_2 = pd.read_csv(
    'graph_hierarchy_models/Sampled 2021 v2/assignments_df_2.csv',index_col=0)
assignments_df_3 = pd.read_csv(
    'graph_hierarchy_models/Sampled 2021 v2/assignments_df_3.csv',index_col=0)

assignments_dict = {1:assignments_df_1,2:assignments_df_2,3:assignments_df_3}

cluster_series_1 = pd.Series(np.argmax(assignments_df_1.values,axis=1),
                             index=song_embeds.index)
cluster_series_2 = pd.Series(np.argmax(assignments_df_2.values,axis=1),
                             index=song_embeds.index)
cluster_series_3 = pd.Series(np.argmax(assignments_df_3.values,axis=1),
                             index=song_embeds.index)


In [21]:
lstm_model = tf.keras.models.load_model(
    'graph_hierarchy_models/Sampled 2021 v2/lstm_model')

level_models = {}
for level in range(4):
    level_models[level] = tf.keras.models.load_model(
        'graph_hierarchy_models/Sampled 2021 v2/level_%d_model'%level)
    
with open('graph_hierarchy_models/Sampled 2021 v2/features_pred_all.pkl','rb') as f:
    features_pred_all = pickle.load(f)

Currently the LSTM model we imported also runs the GCN step at the beginning of our tracklist model. We can generate tracklists more efficiently if we separate this step from the LSTM itself, as the GCN output will be fixed, so we don't need to calculate it each time. We do this by pulling out the layers of the LSTM model and using them to create two new Keras models.

In [22]:
lstm_model.layers

[<tensorflow.python.keras.engine.input_layer.InputLayer at 0x7f129418dbd0>,
 <tensorflow.python.keras.saving.saved_model.load.GCNSeq at 0x7f12228cbe10>,
 <tensorflow.python.keras.layers.core.Lambda at 0x7f1222878650>,
 <tensorflow.python.keras.saving.saved_model.load.GCNSeq at 0x7f1222878e10>,
 <tensorflow.python.keras.layers.merge.Concatenate at 0x7f122281c3d0>,
 <tensorflow.python.keras.layers.core.Lambda at 0x7f122281cc10>,
 <tensorflow.python.keras.engine.input_layer.InputLayer at 0x7f1222821190>,
 <tensorflow.python.keras.engine.input_layer.InputLayer at 0x7f1222821890>,
 <tensorflow.python.keras.layers.recurrent_v2.LSTM at 0x7f1222826a50>]

In [23]:
single_tl_ind = Input((1),dtype = tf.int32)
gcn_song_single = lstm_model.layers[3](single_tl_ind)

gcn_artist_single = lstm_model.layers[1](single_tl_ind)
gcn_artist_single_mapped = lstm_model.layers[2](gcn_artist_single)
gcn_single_concat = Concatenate(axis=-1)([gcn_artist_single_mapped,gcn_song_single])
gcn_lstm_model_input = Model(single_tl_ind,gcn_single_concat)

gcn_tl_out = gcn_lstm_model_input.predict([[0]],batch_size=1)

In [24]:
latent_dim = 160
gcn_lookup_input = Input(shape = (1,48))
state_h_input = Input(shape = (latent_dim))
state_c_input = Input(shape = (latent_dim))

hidden_single,state_h,state_c = lstm_model.layers[-1](gcn_lookup_input,
                                     initial_state=[state_h_input,state_c_input])

lstm_pred_model = Model([gcn_lookup_input,state_h_input,state_c_input],
                   [hidden_single,state_h,state_c])

We also want to change the activation function at the final layer of the song-level model, so that it returns logits rather than softmax probabilities. We do this so that we can adjust the probabilities based on song similarity/user preferences, as explained in more detail in the Recommendation section below.

In [25]:
level_models[0].layers[-1].activation = tf.python.keras.activations.linear
level_models[0].compile()


We save the separated models and the fixed GCN representations for future use.

In [26]:
#level_models[0].save('graph_hierarchy_models/Sampled 2021 v2/level_0_model_linear')
#lstm_pred_model.save('graph_hierarchy_models/Sampled 2021 v2/lstm_pred_model')
#with open('gcn_tl_model_output.pkl','wb') as f:
#    pickle.dump(gcn_tl_out,f)

## Recommendation
We now use our two models to create a process which takes a seed song and a user as input and produces a tracklist as output. We will integrate the user recommendation scores from our TL-MGCCF recommendation model and the song similarity scores with the softmaxed probabilities of each cluster at the various levels of the hierarchy in our tracklist model. This way, the resulting process will generate realistic tracklists based on the corpus which are weighted towards the user's interests and relevance to the original seed song. We will include parameters which allow the focus given to each of these goals (i.e. user recommendation score, song similarity score, and tracklist probability) to be adjusted depending on the desired outcome. We also include a temperature control at each level of the hierarchy to allow flexibility in how closely the tracklist model sticks to the probabilities it generates. 

We first define a few utility functions for the tracklist generation process.

In [27]:
def softmax(x):
    """Numpy version of the softmax function."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)


def rescale(logits):
    """Linear rescaling of input vector to have sum one."""
    return logits/np.sum(logits)

def sample(row,num_samples,inds=True):
    """Takes a random sample along a row of an array.
    
    Args:
        row: Vector along which sample will be taken.
        num_samples: Size of sample to return.
        inds: Indicates whether to sample indices or values of the vector
    
    Returns:
        Random sample vector of size (num_samples).
    """
    if inds:
        return np.random.choice(row.shape[0],num_samples,p=row/row.sum())
    else:
        return np.random.choice(row,num_samples,p=row/row.sum())

    
def categorical_sample(arr,num_samples,inds):
    """Uses the above sampling function to take a sample across each
        row of an array. Numpy analogue of tf.categorical when inds is
        True.
        
    Args:
        arr: Array from which each row will be sampled.
        num_samples: Number of samples to take from each row.
        inds: Indicates whether to sample indices or values from each row.
        
    Returns:
        Array of size (arr.shape[0],num_samples) containing sampled values.
    """
    return np.apply_along_axis(sample,1,arr,num_samples,inds)

num_clusters =[48,384,1200]

def get_teacher(level_probs,orig_probs,level):
    """Takes the probabilities calculated by the tracklist model at a 
    single level and randomly samples to select the teacher for the next
    level. Since we are going to make some adjustments to the logits to 
    incorporate the user and song recommendation scores, we also return
    the original probability of our choice (pre-adjustment) so we can 
    track how far the adjusted probabilities stray from the raw probablity
    calculated by the tracklist model alone.
    
    Args:
        level_probs: The probabilities used to select the teacher.
        orig_probs: The original output probabilities of the tracklist
            model, pre-adjustment to incorporate user/song recommendation 
            scores.
        level: The level of the hierarchy we are sampling at.
        
    Returns:
        teacher: One-hot vector of size num_clusters[3-level] which
            indicates the chosen teacher cluster.
        prob: The probability of the chosen teacher based on the
            original probabilities calculated by the tracklist
            model.
    """
    size = level_probs.shape[0]
    chosen = np.random.choice(range(size),p=level_probs)
    teacher = np.zeros((num_clusters[3-level]))
    teacher[chosen] = 1
    prob = orig_probs[chosen]
    return teacher,prob


def get_cluster_adj(rec_vec,level,block_vec,n=15):
    """Takes a vector containing recommendation scores for each song
    and returns a recommendation score for each cluster by averaging
    the top n scores in each cluster.
    
    Args:
        rec_vec: Vector of length 31,214 containing a recommendation
            score for each song.
        level: Level of the hierarchy at which cluster recommendation
            scores are being provided.
        block_vec: Vector of length 31,214 containing zeros at songs 
            which have already appeared in the tracklist, and ones
            otherwise. 
        n: Number of top scores to consider for each cluster.
    
    Returns:
        out: Cluster recommendation score vector of size 
            num_clusters[3-level].
    """
    clust_scores = assignments_dict[level].T.values*(block_vec*rec_vec)
    top_n = -np.partition(-clust_scores,n)[:,:n]
    out = categorical_sample(top_n,n,inds=False).mean(axis=-1)
    return out


We can now write our function for generating tracklists.

In [28]:
#Some songs in our population are accapellas, typically used in mashups.
#We will exclude these from the generated tracklists.

acca_inds = [i for i in range(31214) if 'apella' in song_lst[i].lower()]

In [29]:
def generate_tl(song,
                user_rec_vec,
                temperature = [1,1,1],
                length = 15,
                user_weighting = 1,
                level_threshold = [0.1,0.05,0.05,0.005]):
    """Function which, given a seed song and a user recommendation vector, 
        generates a tracklist driven by the trained tracklist model while 
        taking into account user preferences and similarity to the seed song.
        
    Args:
        song: Name of seed song to be used in generating tracklist.
        user_rec_vec: Vector of length 31,214 containing the recommendation
            scores of each song for a user.
        temperature: List of length 3 representing the temperature used 
            in sampling at each level of the hierarchy (first element 
            corresponds to level 3). A higher temperature means that the 
            model will be more conservative in how it samples the output 
            probabilities.
        length: Length of tracklist to be generated (not including seed song).
        user_weighting: Positive float representing the weighting of user
            preferences versus the weighting placed on the seed song similarity 
            in choosing songs. Equal weight will be placed on the two when this
            has value 1; when it is less than 1, more weight will be placed on
            the seed song, and when it is greater than 1, more will be placed on
            the user.
        level_threshold: Threshold probability above which to make cluster/song
            selections at each level.
            
    Returns:
        tl_output: List of length+1 song names chosen to be in the tracklist.
        level_probs_lst: List of adjusted sampling probabilities at each 
            timestep at each level of the hierarchy.
        level_teacher_lst: List of the one-hot teacher vectors at each 
            timestep at each level of the hierarchy.
        level_chosen_probs_lst: List of probabilities of the chosen clusters/songs
            at each timestep at each level of the hierarchy, based on the original
            tracklist model probabilities.
        
    """
    if user_weighting < 0:
        raise ValueError('User weighting parameter must be positive.')
    
    song_ind = song_lst.index(song)

    state_h = np.zeros((1,160))
    state_c = np.zeros((1,160))
    
    #We initialize the output list by including the seed song
    output_tl_inds = [song_ind]
    
    #We also initialize the lists for logging the various
    #sampling related outputs at each level
    level_probs_lst = [[],[],[],[]]
    level_teacher_lst = [[],[],[],[]]
    level_chosen_probs_lst = [[],[],[],[]]
    
    #We initialize the vector used to prevent the tracklist from
    #choosing the same song multiple times
    block_vec = np.ones((1,31214))
    block_vec[:,16210] = 0
    block_vec[:,acca_inds] = 0
    block_vec[:,song_ind] = 0
    
    #We calculate the cluster recommendation scores for the input
    #song and user at each level
    song_rec_vec = song_rec_scores.loc[[song]].values
    cluster_adj_vecs = [get_cluster_adj(
        song_rec_vec,level,block_vec) for level in reversed(range(1,4))]
    cluster_adj_vecs_user = [get_cluster_adj(
        user_rec_vec,level,block_vec) for level in reversed(range(1,4))]
    
    song_scores = song_rec_vec[0]
    #We scale up the user scores slightly, as they are generally lower
    #than the song scores.
    user_scores = (user_rec_vec[0]*1.1)
    
    #To initialize the tracklist and make the generation process more 
    #stable, we obtain the first two songs of the tracklist by sampling
    #only based on the user recommendation and song similarity scores,
    #allowing the LSTM to have a warm start. We do this by randomly sampling
    #two of the top 10 songs by the product of the song and user scores,
    #with the user scores weighted by the log of the user weighting parameter.
    
    scores_prod = song_scores * user_scores**(np.log2(1+user_weighting))

    for _ in range(2):
        song_ind_input = np.array([[song_ind]])
        gcn_vec = gcn_tl_out[song_ind_input]
        _,state_h,state_c = lstm_pred_model.predict([gcn_vec,state_h,state_c])
        partitioned = np.argpartition(-1*scores_prod*block_vec,10)[0][:10]

        song_ind = np.random.choice(partitioned)
        output_tl_inds.append(song_ind)

        block_vec[:,song_ind] = 0
    
    #We now enter the main loop for sampling the tracklist model
    #and generating the tracklist.
    for i in range(length - 2):
        song_ind_input = np.array([[song_ind]])
        gcn_vec = gcn_tl_out[song_ind_input]
        hidden,state_h,state_c = lstm_pred_model.predict(
            [gcn_vec,state_h,state_c])
        
        #At each timestep, we step through the cluster hierarchy, sampling
        #a teacher cluster and passing it on to the next level.
        for level in reversed(range(4)):
            level_model = level_models[level]
            
            #We first code levels 1-3, the levels in the cluster hierarchy.
            if level > 0:
                #We get the predicted logits and unadjusted probabilities 
                #at the current level.
                level_features = features_pred_all[level-1]
                if level == 3:
                    level_out = level_model.predict(hidden)[0][0]
                else:
                    level_out = level_model.predict([hidden,teacher])[0][0]
                level_logits = level_out @ level_features.T
                orig_probs = softmax(level_logits)
                
                #In our adjustment process, we first multiply by the temperature.
                temp = temperature[3-level]
                level_logits *= temp
                
                #We then calculate our adjustment based on user and song scores.
                #We cube the scores to stretch out the distribution, as the cluster
                #scores are calculated by averages and can be quite close together.
                cluster_adj = cluster_adj_vecs[3-level]**3
                cluster_adj_user = cluster_adj_vecs_user[3-level]**3
                
                #We produce a weighted average adjustment score for each cluster
                #based on the user_weighting parameter
                cluster_adj_average = (user_weighting*cluster_adj_user + cluster_adj)\
                    /(1+user_weighting)
                
                #We then linearly translate the original logits so that the max logit
                #is equal to log(1000). This does not change the probabilities, as 
                #softmax is invariant to linear translation. However, it allows us to make
                #sure the behaviour of the multiplicative scaling performed in the next 
                #step is consistent.
                level_logits_centred = level_logits -(np.max(level_logits)-np.log(1000))
                
                #We multiply the logits by a Min-Max scaled version of the cluster
                #adjustments, so the logit for the song with the maxmimum score stays
                #the same and all others decrease. Since negative logits will actually 
                #increase in value through this process, we take the minimum of the 
                #centred logits and the scaled logits to ensure that these unlikely
                #selections do not become more likely.
                new_logits = np.minimum(level_logits_centred,level_logits_centred*(np.maximum(
                    1,cluster_adj_average))/np.max(cluster_adj_average))
                
                #We calculate the new probabilities using these logits, and remove
                #from selection any cluster which is below the threshold probability.
                level_probs = softmax(new_logits)
                level_probs = rescale(level_probs*(level_probs>level_threshold[3-level]))
                
                #We can now get the sampled teacher cluster to pass to the next level,
                #and update our logs.
                teacher,prob = get_teacher(level_probs,orig_probs,level)
                teacher = np.reshape(teacher,(1,1,-1))
                level_probs_lst[level].append(level_probs)
                level_teacher_lst[level].append(teacher[0][0])
                level_chosen_probs_lst[level].append(prob)

            else:
                #At the song level, we can calculate the original song probabilities.
                song_logits = level_model.predict([hidden,teacher])[0][0]
                song_probs = softmax(song_logits)
        
        #We can then perform a similar process to the above, centralising the logits
        #and scaling by the user and song scores.
        scores_avg = (user_weighting*user_scores**3 + song_scores**3)/(1+user_weighting)
        song_logits_centred = song_logits -(np.max(song_logits)-np.log(1000))
        new_song_logits = np.minimum(song_logits_centred,
                                     song_logits_centred*(np.maximum(
                                         1,scores_avg))/np.max(scores_avg))
        
        p = softmax(new_song_logits)
        
        #This time we also use the blocking vector to make sure previously chosen songs
        #aren't selected again.
        p = rescale(p*block_vec[0]*(p>level_threshold[3]))
        
        level_probs_lst[0].append(p)
        song_chosen = np.random.choice(range(31214),p= p)

        #We update the logs, the blocking vector, and the output song index list.
        level_chosen_probs_lst[0].append(song_probs[song_chosen])
        block_vec[:,song_chosen] = 0
                
        output_tl_inds.append(song_chosen)
        song_ind = song_chosen
    
    #Finally, we convert the song indices to song titles for the output.
    tl_output = [song_embeds.index[ind] for ind in output_tl_inds]
    return tl_output,level_probs_lst,level_teacher_lst,level_chosen_probs_lst

We can now generate tracklists and examine the results. It takes just under five seconds to generate a tracklist of length 15. Along with the song titles themselves, we can also use the logging outputs to evaluate the performance of the model on user recommendation, seed song similarity, and tracklist model probability at each level. If we take the log of the probabilities, we can sum these values up to measure a total 'reward' for each song chosen. Future research may involve using this approach to train a reinforcement learning model to generate these tracklists in a manner which optimises this reward. 

A couple of example tracklists are below, along with the components of the 'reward'. To generate your own custom tracklists, visit the [webpage](https://tracklist-generator.azurewebsites.net/).

In [30]:
song = 'BYOR - Feel That Way'
user = 'CID'
start = time.time()
tl,probs,teachers,chosen = generate_tl(song,
                                       user_rec_scores.loc[[user]].values,
                                       temperature=[1,1,1],
                                       user_weighting = 2)
print('Time taken: ',f'{time.time()-start:.2f}','sec')
probs_df = pd.DataFrame(reversed(chosen),
                        columns = tl[3:],
                        index = ['Level 3','Level 2','Level 1','Song Level']
                       ).T.applymap(np.log)/np.array([[1,1,1.5,2]])
df = pd.concat([1.1*((user_rec_scores.loc[[user],tl])).T,
                ((song_rec_scores.loc[[song],tl])).T,
                probs_df],
               axis=1)
df['Reward'] = df.sum(axis=1)
df

Time taken:  5.44 sec


Unnamed: 0,CID,BYOR - Feel That Way,Level 3,Level 2,Level 1,Song Level,Reward
BYOR - Feel That Way,1.89436,2.0,,,,,3.89436
CID & Riddim Commission - ME N U,1.93135,1.822293,,,,,3.753642
BYOR - Rhyme & Reason,1.850056,1.946125,,,,,3.79618
FISHER - Freaks,1.81611,1.728146,-1.641302,-0.830542,-0.880283,-1.637177,-1.445048
Diplo & SIDEPIECE - On My Mind,1.816816,1.714306,-0.94115,-0.737371,-0.555028,-1.17396,0.123614
GAWP - Moon,1.823019,1.790237,-0.916761,-2.783687,-0.506645,-2.683128,-3.276965
Aazar & GODAMN - Big Beat,1.660076,1.578171,-1.599382,-1.164677,-0.275493,-1.936402,-1.737708
Black V Neck - Them Girls,1.85549,1.835485,-1.158024,-0.785483,-0.417164,-1.671444,-0.34114
Martin Ikin & Sammy Porter - Back To Funk,1.814412,1.759811,-0.857725,-0.723534,-0.53678,-1.66477,-0.208585
Jack Wins ft. Caitlyn Scarlett - Animals (BYOR Remix),1.837764,1.939277,-0.58649,-2.307516,-0.396762,-1.657751,-1.171478


We can look at how picking different users with the same song affects the output of the model when the user weighting is relatively high.

In [31]:
song = 'Chris Lorenzo ft. Puppah Nas-T & Denise - Work'
user = 'Diplo'
start = time.time()
tl,probs,teachers,chosen = generate_tl(song,
                                       user_rec_scores.loc[[user]].values,
                                       temperature=[1,0.75,0.75],
                                       user_weighting = 2)
print('Time taken: ',f'{time.time()-start:.2f}','sec')
probs_df = pd.DataFrame(reversed(chosen),
                        columns = tl[3:],
                        index = ['Level 3','Level 2','Level 1','Song Level']
                       ).T.applymap(np.log)/np.array([[1,1,1.5,2]])
df = pd.concat([1.1*((user_rec_scores.loc[[user],tl])).T,
                ((song_rec_scores.loc[[song],tl])).T,
                probs_df],
               axis=1)
df['Reward'] = df.sum(axis=1)
df

Time taken:  4.67 sec


Unnamed: 0,Diplo,Chris Lorenzo ft. Puppah Nas-T & Denise - Work,Level 3,Level 2,Level 1,Song Level,Reward
Chris Lorenzo ft. Puppah Nas-T & Denise - Work,1.705591,2.0,,,,,3.705591
Anna Lunoe - 303(Co-Prod. by Chris Lake),1.786591,1.843603,,,,,3.630194
Dombresky - Soul Sacrifice,1.889843,1.698666,,,,,3.588509
DJ Glen & Bruno Furlan - Another Planet,1.531388,1.680635,-0.643358,-3.115743,-0.298181,-1.37163,-2.216888
Eli Brown - Searching For Someone,1.342988,1.817806,-0.723647,-0.724417,-0.110147,-2.090449,-0.487865
Chris Lake & Chris Lorenzo pres. Anti Up - Concentrate,1.688403,1.932156,-0.464934,-1.522772,-0.135872,-1.153567,0.343413
Dom Dolla - Take It,1.752309,1.779926,-1.689459,-0.952235,-0.047623,-1.642638,-0.799719
Chris Lorenzo - Every Morning,1.616959,1.965763,-0.737898,-1.419065,-0.128263,-1.717451,-0.419954
Claude VonStroke & Eddy M - Getting Hot,1.698032,1.73792,-0.56992,-0.948917,-0.060238,-2.397599,-0.540722
Chris Lake & Green Velvet - Deceiver,1.594454,1.775812,-1.545226,-0.968087,-0.057104,-0.851188,-0.051339


In [32]:
song = 'Chris Lorenzo ft. Puppah Nas-T & Denise - Work'
user = 'AC Slater'
start = time.time()
tl,probs,teachers,chosen = generate_tl(song,
                                       user_rec_scores.loc[[user]].values,
                                       temperature=[1,0.75,0.75],
                                       user_weighting = 2)
print('Time taken: ',f'{time.time()-start:.2f}','sec')
probs_df = pd.DataFrame(reversed(chosen),
                        columns = tl[3:],
                        index = ['Level 3','Level 2','Level 1','Song Level']
                       ).T.applymap(np.log)/np.array([[1,1,1.5,2]])
df = pd.concat([1.1*((user_rec_scores.loc[[user],tl])).T,
                ((song_rec_scores.loc[[song],tl])).T,
                probs_df],
               axis=1)
df['Reward'] = df.sum(axis=1)
df

Time taken:  4.76 sec


Unnamed: 0,AC Slater,Chris Lorenzo ft. Puppah Nas-T & Denise - Work,Level 3,Level 2,Level 1,Song Level,Reward
Chris Lorenzo ft. Puppah Nas-T & Denise - Work,1.706392,2.0,,,,,3.706392
Chris Lorenzo - Every Morning,1.863491,1.965763,,,,,3.829255
Shift K3Y & Taiki Nulight - Lil Mama,1.885667,1.855477,,,,,3.741144
Walker_&_Royce - The Biznes,1.757741,1.754653,-0.42712,-2.194549,-0.24432,-1.017245,-0.37084
Phlegmatic Dogs - Cuatrocats (VOLAC Remix),1.869213,1.726347,-0.331285,-1.309376,-1.51473,-1.392469,-0.9523
Golf Clap & MASTERIA - Freak It Out,1.806044,1.602198,-0.435166,-2.766159,-0.487615,-1.439797,-1.720493
Chris Lorenzo - Bad Bitch,1.862189,1.911749,-0.414924,-2.056942,-0.235867,-1.339685,-0.273482
Chris Lake & Chris Lorenzo pres. Anti Up - Concentrate,1.828792,1.932156,-0.438845,-1.141911,-0.21695,-1.835482,0.127761
Walker_&_Royce ft. Sue Yenn - Bodies Do The Talking,1.749988,1.763016,-0.534825,-2.038827,-0.233795,-1.794045,-1.088488
Redlight - Fried Eggs,1.76852,1.886208,-0.425167,-2.11072,-0.217795,-1.33228,-0.431234


## Adding New Users
Along with generating tracklists based on the preferences of existing users, our model can also make recommendations for new users, needing only a list of liked songs as input. We have manually created a list of such songs, along with associated preference scores, to serve as an example. You can make your own selection to generate personalised tracklists on the [Tracklist Generator webpage](https://tracklist-generator.azurewebsites.net/).

In [33]:
gm_songs = {'Chris Lake & Solardo - Free Your Body':1,
            'Illyus_&_Barrientos - Shout':0.8,
            'Chris Lorenzo ft. Puppah Nas-T & Denise - Work':1,
            'Purple Disco Machine - Body Funk (Dom Dolla Remix)':0.85,
            'FISHER - You Little Beauty': 0.9,
            'Shakedown - At Night (Purple Disco Machine Remix)':0.85,
            'Roberto Surace - Joys':0.8,
            'Chris Lake & Lee Foss - Lies, Deception And Fantasy':1,
            'Dom Dolla - San Frandisco (Walker_&_Royce Remix)':0.9,
            'Walker_&_Royce ft. VNSSA - Word': 0.95,
            'JOYRYDE - Hot Drum': 1,
            'JOYRYDE - I Ware House': 0.9,
            'Skrillex & Boys Noize ft. Ty Dolla $ign - Midnight Hour':0.8,
            'AC Slater & Chris Lorenzo - Giant Mouse':0.9,
            'Chris Lake & Chris Lorenzo pres. Anti Up - Concentrate':1,
            'Chris Lake & Chris Lorenzo pres. Anti Up - Hey Pablo': 0.85,
            'Holy Goof & Chris Lorenzo - Shutdown': 0.85,
            "Drake - God's Plan (Holy Goof Bootleg)":0.9,
            'Born Dirty & BELLECOUR - Tokyo Bill':0.85,
            'Salvatore Ganacci - Horse (Black V Neck Remix)': 0.8,
            'bbno$ & Y//2//K - Lalala (Oliver Heldens Remix)':0.75,
            'Party Favor & Salvatore Ganacci - Wasabi (J. Worra Remix)':0.7,
            'Wuki & Ship Wrek - Techno Logic':0.9,
            'Habstrakt - De La Street':0.85,
            'Habstrakt & BELLECOUR - Lasagne':0.85,
            'Ferreck Dawn & Robosonic & Nikki Ambers - In My Arms': 0.85,
            'Alex Adair - Make Me Feel Better (Don Diablo & CID Remix)':0.75,
            'Dom Dolla - Take It': 0.85,
            'AC Slater & Chris Lorenzo - Fly Kicks (Wax Motif Remix)':0.9,
            'Wax Motif & Matroda - Lose Control':0.75,
            'Matroda - Beef Stick': 0.8,
            'RÜFÜS DU SOL - Innerbloom':0.8,
            'DJ Snake & Tchami & Malaa & MERCER - Made In France': 0.85,
            'Mason Maynard - Puffy':0.75,
            'FISHER - Stop It':0.75,
            'Aazar & BELLECOUR - Bonaparte':0.7,
            'Phantoms - Designs For You':0.8,
            'Alison Wonderland - Good Enough (Valentino Khan Remix)':0.7,
            'Martin Ikin - No No':0.8,
            'Martin Ikin - Hooked': 0.8,
            'Silk City ft. Dua Lipa - Electricity':0.7,
            'Eli Brown - Another Dimension':0.65,
            'Solardo & Eli Brown - XTC':0.65,
            'Cloonee - Lose Control':0.7,
            'Dr. Fresch - Sick': 0.7,
            'Getter & Ghastly - 666!':0.65,
            'Format:B & DJ PP - In My House':0.65,
            'Throttle - Hit The Road Jack':0.6,
            'CAZZTEK - Came To Get Funky':0.65,
            'GAWP - Prime Society':0.7,
            'Basement Jaxx - Jump N Shout (Erik Hagleton Remix)':0.75,
            'GODAMN - Groovy Circle':0.8,
            'Gorgon City - Elizabeth Street':0.65,
            'Halsey ft. Big Sean & Stefflon Don - Alone (Calvin Harris Remix)':0.7,
            'Ibranovski - Symmetry':0.85,
            'Krystal Klear - Neutron Dance':0.7,
            'Kyle Watson - The Sample':0.75,
            'Detlef ft. Dajae - Deep Dip':0.75,
            'Malaa - Bling Bling (Delayers Remix)':0.75,
            'Melé & Shovell - Pasilda':0.7,
            'Moksi - Lights Down Low(Co-Prod. by GTA)':0.8,
            'Noizu - No More':0.9,
            'OFFAIAH - Trouble':0.6,
            'Pawsa - The Groovy Cat':0.6,
            'Phlegmatic Dogs - Cuatrocats':0.75,
            'Phlegmatic Dogs - Keepmastik (Taiki Nulight Remix)':0.75,
            'Prok_&_Fitch & Green Velvet ft. Shamonique - WOW':0.7,
            'Redlight - Sports Mode':0.85,
            'Shiba San & Green Velvet - Chance':0.8,
            'Ship Wrek - Need It':0.75,
            'Tchami ft. Luke James - World To Me (Dillon Nathaniel Remix)':0.7,
            'VOLAC - Funky':0.75,
            'Wiwek & Moksi - Masta':0.7,
            'YYVNG - Wynehouse':0.8,
            'rrotik & Lliam Taylor - Bounce Back (rrotik Flip)':0.65}

In [34]:
gm_inds = [song_lst.index(x) for x in gm_songs]
gm_scores = [gm_songs[x] for x in gm_songs]

We now write the functions which take our the preference information for our new user and produce the graphs and vectors necessary to make recommendations for them. The only obstacle to feeding our user information directly into the TL-MGCCF model to make recommendations is that there is no trained embedding for our new user. However, we can approximate this embedding by adding the user into the user adjacency matrix, which was calculated using cosine similarity of the user-song click matrix, and performing feature pooling on the existing user embeddings (i.e. multiplying the embeddings by the new adjacency matrix) to produce an embedding for our new user. We can then follow the same sampling steps of the Bipar-GCN step to produce a final vector that can be used for song recommendation.

In [35]:
def create_new_user_info(ind_lst,score_lst=None):
    """Takes a list of song indices representing user preferences and 
    produces a user embedding and TL-MGCCF GCN output vector for that
    user.
    
    Args:
        ind_lst: List of song indices for the new users preferred songs.
        score_lst: List of positive scores for each song in the ind_lst.
            If not provided, all songs are assumed to be equally scored.
    Returns:
        user_selection_scaled: 'Click' vector of length 31,214 for the 
            user, containing positive values at the user's preferred
            songs.
        user_selection_scaled_artist: 'Click' vector for the artists
            of the user's preferred songs.
        user_embed_single: Embedding vector for the user calculated by
            pooling the embedding vectors of artists with similar click 
            vectors.
        Vector which is the output of the MGE and Skip-Connection 
            steps in TL-MGCCF for the user.
    """
    user_selection_vec = np.zeros((1,31214))
    if score_lst is None:
        score_lst = [1 for _ in ind_lst]
    for i in range(len(gm_inds)):
        user_selection_vec[:,ind_lst[i]] = score_lst[i]
    sim = cosine_similarity(user_selection_vec,user_selection_mat_sparse)

    sim_scaled = sim/np.sum(sim)
    user_embed_single = sim_scaled @ rec_user_embeds


    user_selection_scaled = user_selection_vec/np.sum(user_selection_vec)
    user_artist_selection_scaled = user_selection_scaled @ song_artist_map_sparse

    user_adj_concat = np.concatenate([user_adj,sim])
    user_adj_concat = np.concatenate([user_adj_concat,
                                      np.concatenate([sim.T,[[0]]],axis=0)],axis=1)
    user_adj_concat_sparse = csr_matrix(user_adj_concat)

    norm_adjacency = normalize_graph(user_adj_concat_sparse.copy())
    user_embeds_concat = np.concatenate([rec_user_embeds,user_embed_single])
    output = user_embeds_concat @ tl_mgccf_weights['user_gcn_kernel']
    output = output * tl_mgccf_weights['skip_weight_user'] + norm_adjacency @ output
    output = output + tl_mgccf_weights['user_gcn_bias']
    output = tf.nn.tanh(output)
    return user_selection_scaled,user_artist_selection_scaled,user_embed_single,output[[-1]]

def mgccf_bipar_gcn_new_user(obj_embed,level,obj_type,
                      new_user_obj_adjacency_scaled_user,new_user_embed,
                              n_samples = [15,10]):
    """Performs the Bipar-GCN step of TL-MGCCF with the new user information.
    
    Args:
        item_embed: Item embedding matrix from previous layer (h^v_{k-1})
        level: Layer number (k)
        item_type: The type of item entity (song or artist)
        new_user_obj_adjacency_scaled_user: new user-item click vector, 
            scaled along user axis
        user_item_adjacency_scaled_item: new user-item click vector, 
            scaled along item axis
        n_samples: list containing the number of samples to take at each level
    
    Returns:
        h_new_out: Level k embedding for new user
    """
    level_obj = '_%d'%level + '_' + obj_type

    N_u_new = categorical(
        tf.math.log(new_user_obj_adjacency_scaled_user+1e-10),n_samples[level-1])
    N_u_new_vecs = tf.nn.embedding_lookup(obj_embed,N_u_new)
    N_u_new_vecs = tf.matmul(N_u_new_vecs,tl_mgccf_weights['Q_u'+level_obj])
    N_u_new_vecs_agg = tf.nn.tanh(tf.reduce_mean(N_u_new_vecs,axis=1))
    h_new_concat = tf.concat([new_user_embed,N_u_new_vecs_agg],axis=-1)
    h_new_out =  tf.nn.tanh(tf.matmul(
        h_new_concat,tl_mgccf_weights['W_u'+level_obj],transpose_b=True))
    return h_new_out

With these functions setup, we can replicate the process used above to generate user recommendations for our new user. We still need to run the sampling across the full data at level 1, but we're going to save the average of those results so we don't have to do so in future.

In [36]:
start = time.time()
new_user_selection_scaled,\
new_user_artist_selection_scaled,\
new_user_embed_single,\
new_user_gcn_output = create_new_user_info(gm_inds,gm_scores)

new_user_vec_out = np.zeros((1,64))
h_u_0_s_out = np.zeros_like(h_u_0_s)
h_u_0_a_out = np.zeros_like(h_u_0_a)


h_s_0_out = np.zeros_like(h_s_0)
h_a_0_out = np.zeros_like(h_a_0)

num_repeats = 100
for j in range(num_repeats):
    
    h_new_0_s = mgccf_bipar_gcn_new_user(rec_song_embeds,1,'s',
                            new_user_selection_scaled,new_user_embed_single)

    h_new_0_a = mgccf_bipar_gcn_new_user(rec_artist_embeds,1,'a',
                            new_user_artist_selection_scaled,new_user_embed_single)
    
    h_u_0_s,h_s_0 = mgccf_bipar_gcn(rec_user_embeds,
                                    rec_song_embeds,1,'s',
                                    user_song_graph_scaled_user,
                                    user_song_graph_scaled_song)

    h_u_0_a,h_a_0 = mgccf_bipar_gcn(rec_user_embeds,
                                    rec_artist_embeds,1,'a',
                                    user_artist_graph_scaled_user,
                                    user_artist_graph_scaled_artist)

    h_new_1_s = mgccf_bipar_gcn_new_user(h_s_0,2,'s',
                            new_user_selection_scaled,h_new_0_s)

    h_new_1_a = mgccf_bipar_gcn_new_user(h_a_0,2,'a',
                            new_user_artist_selection_scaled,h_new_0_a)
    
    
    new_vec_final_s = h_new_1_s.numpy() + new_user_gcn_output
    new_vec_final_a = h_new_1_a.numpy() + new_user_gcn_output
    new_vec_final = np.concatenate([new_vec_final_s,new_vec_final_a],axis=-1)
    new_user_vec_out += new_vec_final
    h_s_0_out += h_s_0
    h_a_0_out += h_a_0
    h_u_0_s_out += h_u_0_s
    h_u_0_a_out += h_u_0_a

new_user_vec_out /= num_repeats
h_s_0_out /= num_repeats
h_a_0_out /= num_repeats
h_u_0_s_out /= num_repeats
h_u_0_a_out /= num_repeats

In [37]:
#with open('user_recommendation_2021/h_s_0.pkl','wb') as f:
#    pickle.dump(h_s_0_out.numpy(),f)
#with open('user_recommendation_2021/h_a_0.pkl','wb') as f:
#    pickle.dump(h_a_0_out.numpy(),f)
#with open('user_recommendation_2021/h_u_0_s.pkl','wb') as f:
#    pickle.dump(h_u_0_s_out.numpy(),f)
#with open('user_recommendation_2021/h_u_0_a.pkl','wb') as f:
#    pickle.dump(h_u_0_a_out.numpy(),f)

We can take a look at the top recommendations for our new user:

In [38]:
new_user_rec_scores = 1 + pd.DataFrame(cosine_similarity(
    new_user_vec_out,song_artist_concat_vecs_out),columns=song_lst,index=['New User'])
new_user_rec_scores.iloc[0,:].sort_values(ascending=False).iloc[:20]

Eat More Cake - Heat Of The Night (Dom Dolla Remix)                      1.791325
Chris Lake & Chris Lorenzo pres. Anti Up - Concentrate                   1.786469
Chris Lake & Walker_&_Royce - Dance With Me                              1.782698
Dombresky & Noizu - Rave Alarm                                           1.779401
Biscits - Real Low (Billy Kenny Remix)                                   1.778579
Louis The Child ft. Foster The People - Every Color (Dombresky Remix)    1.774779
Dombresky - Utopia                                                       1.773920
Dom Dolla - Take It                                                      1.773655
Chris Lake - Lose My Mind                                                1.773262
Dombresky - Meli-Melo                                                    1.771207
Walker_&_Royce ft. VNSSA - Word                                          1.765100
Chris Lake ft. Dances With White Girls - Operator (Ring Ring)            1.761089
Anna Lunoe - 303

It's also easy now to generate tracklists for our new user too; just plug in the recommendation vector for our new user to the function we built above.

In [39]:
song = 'Will Clarke - Hallelujah'
start = time.time()
tl,probs,teachers,chosen = generate_tl(song,
                                       new_user_rec_scores.values,
                                       temperature=[1,0.75,0.75],
                                       user_weighting = 1)
print('Time taken: ',f'{time.time()-start:.2f}','sec')
probs_df = pd.DataFrame(reversed(chosen),
                        columns = tl[3:],
                        index = ['Level 3','Level 2','Level 1','Song Level']
                       ).T.applymap(np.log)/np.array([[1,1,1.5,2]])
df = pd.concat([1.1*((new_user_rec_scores.loc[:,tl])).T,
                ((song_rec_scores.loc[[song],tl])).T,
                probs_df],
               axis=1)
df['Reward'] = df.sum(axis=1)
df

Time taken:  4.44 sec


Unnamed: 0,New User,Will Clarke - Hallelujah,Level 3,Level 2,Level 1,Song Level,Reward
Will Clarke - Hallelujah,1.803459,2.0,,,,,3.803459
PAX - Snake,1.895982,1.787306,,,,,3.683288
Green Velvet & Claude VonStroke pres. Get Real - Jolean,1.859657,1.842268,,,,,3.701924
Chris Lake & Solardo - Free Your Body,1.866796,1.751482,-0.805361,-1.448012,-0.139512,-1.160711,0.064682
Rebūke - Rattle,1.70275,1.836175,-0.669024,-1.95276,-0.409402,-1.433939,-0.926199
Shadow Child - The DBG,1.632631,1.749762,-0.514334,-1.818926,-0.413768,-1.706373,-1.071009
Sharam Jey & Andruss ft. Dewitt Sound - Right Back,1.822392,1.607653,-0.499635,-1.570453,-0.060086,-2.066724,-0.766852
FISHER - You Little Beauty,1.810165,1.516303,-0.43994,-1.509335,-0.060082,-1.510953,-0.193843
Eli Brown - Searching For Someone,1.755079,1.735155,-0.50323,-1.052826,-0.072271,-1.932667,-0.07076
Roberto Surace - Joys,1.615897,1.564602,-0.40566,-1.17472,-0.059913,-1.94685,-0.406643
