# 5. Build recommender
In this notebook, I build the classical musician recommender using the selected Word2Vec model and prepare for the recommendation app.

#### The recommender
* Input: A list of user's favorite artists.
* Method: Find the vector representations of the input artists from Word2Vec and compute average.
* Output: Return artists that are close to the computed average

The recommendation is implemented by the `recommend_artists` function in `musicians.py`.

#### App preparation
I plan on implementing my recommender via a Streamlit app. I'm planning on using Streamlit's "multiselect" feature to allow users to select their favorite artists. Since Streamlit's "multiselect" feature can be extremely slow, I decided to decrease the number of artists available by selecting only the artists with a high enough degree in the musicians graph.

In [None]:
import networkx as nx 
import pickle

from gensim.models import Word2Vec, KeyedVectors
from musicians import *

In [2]:
### Load data
# load selected Word2Vec model
word_vectors = KeyedVectors.load("graph_80000/selected_model/word2vec.wordvectors", mmap = 'r')

# load dictionaries for ID:name correspondence
with open('graph_80000/ID_name.pkl', 'rb') as f:
    ID_name = pickle.load(f)
with open('graph_80000/name_ID.pkl', 'rb') as f:
    name_ID = pickle.load(f)
    

## (a) Build & examine the recommender
The code for the recommender can be found in `recommend_artists` in `musicians.py`.

In [3]:
# selected model 
favorites = ['Mitsuko Uchida', 'Krystian Zimerman']
favorites_ID = [name_ID[item] for item in favorites]
recommend_artists(favorites_ID, word_vectors, ID_name, 20)

['Andreas Haefliger',
 'Radu Lupu',
 'Hiroko Ehara',
 'Stephen Kovacevich',
 'Maurizio Pollini',
 'Murray Perahia',
 'Yvonne Lefébure',
 'Richard Goode',
 'Martha Argerich',
 'Yukio Yokoyama',
 'Hisako Kawamura',
 'Claude Frank',
 'Misha Goldstein',
 'Suske Quartett',
 'Elly Ney',
 'Leonard Shure',
 'Francesco Piemontesi',
 'Benjamin Hochman',
 'Seong-Jin Cho',
 'Friedrich Gulda']

## (b) Prepare for recommendation in app
* The artist recommender will be presented as a Streamlit app, where the user will be able to select various artists that they enjoy listening to.
* Unfortunately, Streamlit's "multiselect" feature can be extremely slow for a large list.
* I thus decided to decrease the number of artists available for selection in the Streamlit app for time efficiency purposes. 
* I decided to select artists with high degree in the artist graph. That is, I selected artists that have at least 21 collaborators.

In [50]:
# specify min degree
min_degree = 11

In [5]:
### load graph
G = nx.read_gml("graph_80000/graph.gml")
# convert edge's 'album' attribute from list to set
for n1,n2,edge in G.edges(data=True):
    edge['albums'] = set(edge['albums'])

In [51]:
# order artists based on degree
artists_degree = dict()
for ID in ID_name:
    deg = G.degree(ID) if type(G.degree(ID)) == int else 0
    artists_degree[ID] = deg

# select artists with high degrees
high_degree = set()
for (key, item) in artists_degree.items():
    if item >= min_degree:
        high_degree.add(key)
        
# get the names of selected artists with high degree
name_ID_small = dict()
for (name, ID) in name_ID.items():
    if ID in high_degree:
        name_ID_small[name] = ID
        
# get dictionary ID:name
ID_name_small = {ID:name for (name, ID) in name_ID_small.items()}

print("number of artists: ", len(name_ID_small))

number of artists:  53945


In [49]:
# save small dictionary
#with open('app/artists.pickle', 'wb') as f:
#    pickle.dump(name_ID_small, f)