### This notebook shows the steps to build a recommender system using the Collaborative Filtering approach. 
### The goal is to recommend artists based on user's past activity and interests of similar users.

* [Import Libraries ](#section-1)
* [Read Data](#section-2)
* [Data Prep](#section-3)
* [Recommender System](#section-4)
    - [Helper Functions](#subsection-1)
    - [Prep Model Inputs](#subsection-2)
    - [Matrix Factorization (MF) Model](#subsection-3)
    - [Evaluation Metrics](#subsection-4)
    - [Examples](#subsection-5)

<a id="section-1"></a>
# Import Libraries 

In [2]:
import numpy as np 
import pandas as pd 
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from scipy import sparse
import random
import lightfm 
from lightfm import LightFM, cross_validation
from lightfm.evaluation import precision_at_k, auc_score
from sklearn.metrics.pairwise import cosine_similarity



<a id="section-2"></a>
# Read Data

### The original dataset is quite large. I only read 50% of rows for faster run. 

In [3]:
p = 0.50  # to randomly select 50% of the rows

In [4]:
path = "/Users/lilianvalin/.cache/kagglehub/datasets/andrewmvd/spotify-playlists/versions/1/spotify_dataset.csv"


In [5]:
df_playlist = pd.read_csv(path, skiprows=1, names=['user_id', 'artistname', 'trackname', 'playlistname'], on_bad_lines='skip')

df_playlist.head()

Unnamed: 0,user_id,artistname,trackname,playlistname
0,9cc0cfd4d7d7885102480dd99e7a90d6,Elvis Costello,(The Angels Wanna Wear My) Red Shoes,HARD ROCK 2010
1,9cc0cfd4d7d7885102480dd99e7a90d6,Elvis Costello & The Attractions,"(What's So Funny 'Bout) Peace, Love And Unders...",HARD ROCK 2010
2,9cc0cfd4d7d7885102480dd99e7a90d6,Tiffany Page,7 Years Too Late,HARD ROCK 2010
3,9cc0cfd4d7d7885102480dd99e7a90d6,Elvis Costello & The Attractions,Accidents Will Happen,HARD ROCK 2010
4,9cc0cfd4d7d7885102480dd99e7a90d6,Elvis Costello,Alison,HARD ROCK 2010


### Size of dataframe

In [6]:
df_playlist.shape

(12891680, 4)

### Clean up column names

In [7]:
df_playlist.columns = df_playlist.columns.str.replace('"', '')
df_playlist.columns = df_playlist.columns.str.replace('name', '')
df_playlist.columns = df_playlist.columns.str.replace(' ', '')
df_playlist.columns

Index(['user_id', 'artist', 'track', 'playlist'], dtype='object')

<a id="section-3"></a>
# Data Prep

### For recommender system, I'm only keeping the artists with frequency higher than 50

In [8]:
df_playlist = df_playlist.groupby('artist').filter(lambda x : len(x)>=50)

### And keeping the users with at least 10 unique artists in their playlists to lessen the impact of cold start problem

In [9]:
df_playlist = df_playlist[df_playlist.groupby('user_id').artist.transform('nunique') >= 10]

### group by to get the frequnecy count for each user and artist (# of times that an artist has appeared in playlists created by a user)

In [10]:
size = lambda x: len(x)
df_freq = df_playlist.groupby(['user_id', 'artist']).agg('size').reset_index().rename(columns={0:'freq'})[['user_id', 'artist', 'freq']].sort_values(['freq'], ascending=False)
df_freq.head()

Unnamed: 0,user_id,artist,freq
2250147,defced0ece4ce946160b0d2698142eac,Vitamin String Quartet,3346
397954,26b51e580277e131f87e4c7ee4c0887a,Vitamin String Quartet,3306
665245,414050deadb38aafd8d4ad22ca634055,Vitamin String Quartet,2587
2194367,d993ff8f2de226e2c6803e47a22e9d7e,Lata Mangeshkar,2281
17756,014e695cc6df96011b90a5beb3206012,Ilaiyaraaja,2242


### create a DF for artists and add artist id

In [11]:
df_artist = pd.DataFrame(df_freq["artist"].unique())
df_artist = df_artist.reset_index()
df_artist = df_artist.rename(columns={'index':'artist_id', 0:'artist'})
df_artist.head()

Unnamed: 0,artist_id,artist
0,0,Vitamin String Quartet
1,1,Lata Mangeshkar
2,2,Ilaiyaraaja
3,3,Peggy Lee
4,4,Wolfgang Amadeus Mozart


In [12]:
df_artist.shape

(23515, 2)

### add artist_id to the main DF

In [13]:
df_freq  = pd.merge(df_freq , df_artist, how='inner', on='artist')

<a id="section-4"></a>
# Recommendar System 

### I use the LightFM library and run a traditional MF model since the dataset doesn't include any user or artist features, 
### but the library allows you to build a hybrid model too. 

### LightFM documentation: 

https://making.lyst.com/lightfm/docs/

### You can find some examples in LightFM GitHub:
https://github.com/lyst/lightfm/blob/master/examples/

<a id="subsection-1"></a>
## Helper Functions

### Helpers functions are from the repo below: 
https://github.com/aayushmnit/cookbook/blob/master/recsys.py

In [14]:
def create_interaction_matrix(df,user_col, item_col, rating_col, norm= False, threshold = None):
    '''
    Function to create an interaction matrix dataframe from transactional type interactions
    Required Input -
        - df = Pandas DataFrame containing user-item interactions
        - user_col = column name containing user's identifier
        - item_col = column name containing item's identifier
        - rating col = column name containing user feedback on interaction with a given item
        - norm (optional) = True if a normalization of ratings is needed
        - threshold (required if norm = True) = value above which the rating is favorable
    Expected output - 
        - Pandas dataframe with user-item interactions ready to be fed in a recommendation algorithm
    '''
    interactions = df.groupby([user_col, item_col])[rating_col] \
            .sum().unstack().reset_index(). \
            fillna(0).set_index(user_col)
    if norm:
        interactions = interactions.applymap(lambda x: 1 if x > threshold else 0)
    return interactions

In [15]:
# https://github.com/aayushmnit/cookbook/blob/master/recsys.py
def create_user_dict(interactions):
    '''
    Function to create a user dictionary based on their index and number in interaction dataset
    Required Input - 
        interactions - dataset create by create_interaction_matrix
    Expected Output -
        user_dict - Dictionary type output containing interaction_index as key and user_id as value
    '''
    user_id = list(interactions.index)
    user_dict = {}
    counter = 0 
    for i in user_id:
        user_dict[i] = counter
        counter += 1
    return user_dict

In [16]:
# https://github.com/aayushmnit/cookbook/blob/master/recsys.py
def create_item_dict(df,id_col,name_col):
    '''
    Function to create an item dictionary based on their item_id and item name
    Required Input - 
        - df = Pandas dataframe with Item information
        - id_col = Column name containing unique identifier for an item
        - name_col = Column name containing name of the item
    Expected Output -
        item_dict = Dictionary type output containing item_id as key and item_name as value
    '''
    item_dict ={}
    for i in range(df.shape[0]):
        item_dict[(df.loc[i,id_col])] = df.loc[i,name_col]
    return item_dict

In [17]:
# https://github.com/aayushmnit/cookbook/blob/master/recsys.py
def runMF(interactions, n_components=30, loss='warp', k=15, epoch=30,n_jobs = 4):
    '''
    Function to run matrix-factorization algorithm
    Required Input -
        - interactions = dataset create by create_interaction_matrix
        - n_components = number of embeddings you want to create to define Item and user
        - loss = loss function other options are logistic, brp
        - epoch = number of epochs to run 
        - n_jobs = number of cores used for execution 
    Expected Output  -
        Model - Trained model
    '''
    
    #uncommented for train test split
#     x = sparse.csr_matrix(interactions.values)
    model = LightFM(no_components= n_components, loss=loss,k=k)
    model.fit(x,epochs=epoch,num_threads = n_jobs)
    return model

In [18]:
# https://github.com/aayushmnit/cookbook/blob/master/recsys.py
def sample_recommendation_user(model, interactions, user_id, user_dict, 
                               item_dict,threshold = 0,nrec_items = 10, show = True):
    '''
    Function to produce user recommendations
    Required Input - 
        - model = Trained matrix factorization model
        - interactions = dataset used for training the model
        - user_id = user ID for which we need to generate recommendation
        - user_dict = Dictionary type input containing interaction_index as key and user_id as value
        - item_dict = Dictionary type input containing item_id as key and item_name as value
        - threshold = value above which the rating is favorable in new interaction matrix
        - nrec_items = Number of output recommendation needed
    Expected Output - 
        - Prints list of items the given user has already bought
        - Prints list of N recommended items  which user hopefully will be interested in
    '''
    n_users, n_items = interactions.shape
    user_x = user_dict[user_id]
    scores = pd.Series(model.predict(user_x,np.arange(n_items)))
    scores.index = interactions.columns
    scores = list(pd.Series(scores.sort_values(ascending=False).index))
    
    known_items = list(pd.Series(interactions.loc[user_id,:] \
                                 [interactions.loc[user_id,:] > threshold].index) \
								 .sort_values(ascending=False))
    
    scores = [x for x in scores if x not in known_items]
    return_score_list = scores[0:nrec_items]
    known_items = list(pd.Series(known_items).apply(lambda x: item_dict[x]))
    scores = list(pd.Series(return_score_list).apply(lambda x: item_dict[x]))
    if show == True:
        print("Known Likes:")
        counter = 1
        for i in known_items:
            print(str(counter) + '- ' + i)
            counter+=1

        print("\n Recommended Items:")
        counter = 1
        for i in scores:
            print(str(counter) + '- ' + i)
            counter+=1
    return return_score_list

<a id="subsection-2"></a>
## Prep Model Inputs

### Create interaction matrix

In [55]:
interactions = create_interaction_matrix(df = df_freq, user_col = "user_id", item_col = 'artist_id', rating_col = 'freq', norm= False, threshold = None)
interactions.head()

artist_id,0,1,2,3,4,5,6,7,8,9,...,23505,23506,23507,23508,23509,23510,23511,23512,23513,23514
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00055176fea33f6e027cd3302289378b,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0007f3dd09c91198371454c608d47f22,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000b0f32b5739f052b9d40fcc5c41079,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000c11a16c89aa4b14b328080f5954ee,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
00123e0f544dee3ab006aa7f1e5725a7,0.0,0.0,0.0,0.0,1.0,34.0,0.0,0.0,165.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [56]:
interactions.shape

(13665, 23515)

### Create User Dict

In [52]:
user_dict = create_user_dict(interactions=interactions)

### Create Item dict

In [53]:
artists_dict = create_item_dict(df = df_artist, id_col = 'artist_id', name_col = 'artist')

In [54]:
size = len(artists_dict)
print(size)

23515


### Train-Test split

In [26]:
x = sparse.csr_matrix(interactions.values)
train, test = lightfm.cross_validation.random_train_test_split(x, test_percentage=0.2, random_state=None)

<a id="subsection-3"></a>
## Matrix Factorization (MF) Model

### How does a MF model work?

https://developers.google.com/machine-learning/recommendation/collaborative/matrix

### Train the Matrix Factorization Model

In [27]:
%time
model = runMF(interactions = train,
                 n_components = 30,
                 loss = 'warp',
                 k = 15,
                 epoch = 30,
                 n_jobs = 4)

CPU times: user 2 µs, sys: 4 µs, total: 6 µs
Wall time: 6.91 µs


#### You can do hyper-parameter tuning for better results

<a id="subsection-4"></a>
## Evaluation Metrics

 ### Compute AUC score for Train set

In [28]:
train_auc = auc_score(model, train, num_threads=4).mean()
print('Train AUC: %s' % train_auc)

Train AUC: 0.9734495


### Compute AUC score for Test set

#### The parameter train_interactions allows you to exclude known positives in training set from the predicitons and score calculations. 
#### This is to avoid re-recommending the items the user has alreardy interacted with

In [29]:
test_auc = auc_score(model, test, train_interactions=train, num_threads=4).mean()
print('Test AUC: %s' % test_auc)

Test AUC: 0.97400916


### Compute Precision scores
#### Precision score is based on the number of positives items in the K highest ranked items. 

In [30]:
train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10, train_interactions=train).mean()

In [31]:
print('train Precision %.2f, test Precision %.2f.' % (train_precision, test_precision))

train Precision 0.46, test Precision 0.25.


<a id="subsection-5"></a>
## Examples

### Let's see some examples of recommendations:

In [32]:
rec_list = sample_recommendation_user(model = model, 
                                      interactions = interactions, 
                                      user_id = '9cc0cfd4d7d7885102480dd99e7a90d6', 
                                      user_dict = user_dict,
                                      item_dict = artists_dict, 
                                      threshold = 0,
                                      nrec_items = 10,
                                      show = True)

Known Likes:
1- Thunderclap Newman
2- Spector
3- The Len Price 3
4- Miles Kane
5- Lissie
6- Crosby, Stills & Nash
7- Noel Gallagher's High Flying Birds
8- Noah And The Whale
9- Joshua Radin
10- Tom Petty
11- Elbow
12- Crowded House
13- Biffy Clyro
14- Madness
15- Tom Petty And The Heartbreakers
16- Oasis
17- Elvis Costello
18- Elvis Costello & The Attractions
19- Pearl Jam
20- Bruce Springsteen
21- Paul McCartney

 Recommended Items:
1- The Rolling Stones
2- Bob Dylan
3- Johnny Cash
4- Neil Young
5- The Who
6- R.E.M.
7- Mumford & Sons
8- David Bowie
9- Foo Fighters
10- Radiohead


In [33]:
rec_list = sample_recommendation_user(model = model, 
                                      interactions = interactions, 
                                      user_id = '9cc0cfd4d7d7885102480dd99e7a90d6', 
                                      user_dict = user_dict,
                                      item_dict = artists_dict, 
                                      threshold = 0,
                                      nrec_items = 10,
                                      show = True)

Known Likes:
1- Thunderclap Newman
2- Spector
3- The Len Price 3
4- Miles Kane
5- Lissie
6- Crosby, Stills & Nash
7- Noel Gallagher's High Flying Birds
8- Noah And The Whale
9- Joshua Radin
10- Tom Petty
11- Elbow
12- Crowded House
13- Biffy Clyro
14- Madness
15- Tom Petty And The Heartbreakers
16- Oasis
17- Elvis Costello
18- Elvis Costello & The Attractions
19- Pearl Jam
20- Bruce Springsteen
21- Paul McCartney

 Recommended Items:
1- The Rolling Stones
2- Bob Dylan
3- Johnny Cash
4- Neil Young
5- The Who
6- R.E.M.
7- Mumford & Sons
8- David Bowie
9- Foo Fighters
10- Radiohead


In [34]:
def recommend_artists(model, interactions, liked_artists, artist_dict, threshold=0, nrec_items=10, show=True):
    '''
    Fonction pour produire des recommandations basées sur une liste d'artistes aimés.
    Entrées :
        - model : Modèle de factorisation matricielle entraîné.
        - interactions : Dataset utilisé pour entraîner le modèle (matrice utilisateur-élément).
        - liked_artists : Liste d'artistes aimés par l'utilisateur.
        - artist_dict : Dictionnaire avec les identifiants des artistes comme clés et leurs noms comme valeurs.
        - threshold : Valeur au-dessus de laquelle une interaction est considérée positive.
        - nrec_items : Nombre d'artistes recommandés.
        - show : Booléen pour afficher les résultats.
    Sorties :
        - Liste des artistes recommandés (identifiants).
    '''
    n_users, n_items = interactions.shape
    
    # Conversion des noms d'artistes en leurs identifiants
    liked_artist_ids = [key for key, value in artist_dict.items() if value in liked_artists]
    transformed_items = [int(k) for k, v in item_dict.items() if v in known_items]
    
    # Création d'un vecteur utilisateur fictif basé sur les artistes aimés
    user_vector = pd.Series(0, index=interactions.columns)
    user_vector[liked_artist_ids] = 1  # Marquer les artistes aimés avec une interaction positive
    
    # Prédire les scores pour tous les artistes
    scores = pd.Series(model.predict(0, np.arange(n_items)))  # On utilise un utilisateur fictif (indice 0)
    scores.index = interactions.columns
    scores = list(pd.Series(scores.sort_values(ascending=False).index))
    
    # Exclure les artistes déjà aimés
    scores = [x for x in scores if x not in liked_artist_ids]
    
    # Sélectionner les N meilleurs artistes recommandés
    recommended_artist_ids = scores[:nrec_items]
    
    # Convertir les identifiants en noms d'artistes
    liked_artists_names = list(pd.Series(liked_artist_ids).apply(lambda x: artist_dict[x]))
    recommended_artists = list(pd.Series(recommended_artist_ids).apply(lambda x: artist_dict[x]))
    
    if show:
        print("Artistes aimés :")
        for i, artist in enumerate(liked_artists_names, 1):
            print(f"{i}- {artist}")
        
        print("\nSuggestions d'artistes :")
        for i, artist in enumerate(recommended_artists, 1):
            print(f"{i}- {artist}")
    
    return recommended_artists

In [35]:
liked_artists = ['Thunderclap Newman', 'Spector', 'The Len Price 3', 'Miles Kane', 'Lissie', 'Crosby, Stills & Nash', "Noel Gallagher's High Flying Birds", 'Noah And The Whale', 'Joshua Radin', 'Tom Petty', 'Elbow', 'Crowded House', 'Biffy Clyro', 'Madness', 'Tom Petty And The Heartbreakers', 'Oasis', 'Elvis Costello', 'Elvis Costello & The Attractions', 'Pearl Jam', 'Bruce Springsteen', 'Paul McCartney']  # Liste des artistes aimés par l'utilisateur

# Appeler la fonction de recommandation avec un nombre de 3 recommandations
recommended_artists = recommend_artists(
    model, interactions, liked_artists, artists_dict, threshold=0, nrec_items=4, show=True
)

NameError: name 'item_dict' is not defined

In [105]:
# https://github.com/aayushmnit/cookbook/blob/master/recsys.py
def sample_recommendation_user(model, interactions, user_dict, known_items,
                               item_dict,threshold = 0,nrec_items = 10, show = True):
    '''
    Function to produce user recommendations
    Required Input - 
        - model = Trained matrix factorization model
        - interactions = dataset used for training the model
        - user_id = user ID for which we need to generate recommendation
        - user_dict = Dictionary type input containing interaction_index as key and user_id as value
        - item_dict = Dictionary type input containing item_id as key and item_name as value
        - threshold = value above which the rating is favorable in new interaction matrix
        - nrec_items = Number of output recommendation needed
    Expected Output - 
        - Prints list of items the given user has already bought
        - Prints list of N recommended items  which user hopefully will be interested in
    '''
    n_users, n_items = interactions.shape
    user_x = user_dict[user_id]
    scores = pd.Series(model.predict(user_x,np.arange(n_items)))
    scores.index = interactions.columns
    scores = list(pd.Series(scores.sort_values(ascending=False).index))
    
    known_items = [int(k) for k, v in item_dict.items() if v in known_items]
    print('k', known_items)
    
    scores = [x for x in scores if x not in known_items]
    return_score_list = scores[0:nrec_items]
    known_items = list(pd.Series(known_items).apply(lambda x: item_dict[x]))
    
    scores = list(pd.Series(return_score_list).apply(lambda x: item_dict[x]))
    print(scores)
    return return_score_list

In [125]:
# Ajouter un nouvel utilisateur à user_dict
new_user_id = '003a1b2c3d4e5f67890abcde12345678'  # Nouvel ID utilisateur unique
new_user_index = len(user_dict)  # Nouveau index, basé sur la taille actuelle de user_dict
user_dict[new_user_id] = new_user_index  # Ajout dans user_dict

In [128]:
interactions

artist_id,0,1,2,3,4,5,6,7,8,9,...,23505,23506,23507,23508,23509,23510,23511,23512,23513,23514
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00055176fea33f6e027cd3302289378b,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0007f3dd09c91198371454c608d47f22,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000b0f32b5739f052b9d40fcc5c41079,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000c11a16c89aa4b14b328080f5954ee,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
00123e0f544dee3ab006aa7f1e5725a7,0.0,0.0,0.0,0.0,1.0,34.0,0.0,0.0,165.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ffe11226cdea81a2db9262c0ec7f5d71,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ffe32d5412269f3041c58cbf0dde3306,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
fff60baf392613ed33f745b89a9b38f7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
fff616055993498d6127f3f467cf9f2b,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [126]:
# Exemple : Créer un vecteur d'interaction pour un nouvel utilisateur
# Si l'utilisateur a aimé les articles avec les index 2, 5 et 7
new_user_interactions = [0, 1, 0, 0, 0, 1, 0, 1]  # Exemple de vecteur d'interactions

# Ajouter cette ligne dans la matrice d'interaction
interactions.loc[new_user_id] = new_user_interactions

ValueError: cannot set a row with mismatched columns

In [121]:
# Exemple de `known_items` (articles que l'utilisateur a déjà achetés ou appréciés)
known_items = [2, 5, 7]  # Index des articles connus de l'utilisateur

In [122]:
item_dict = {0: 'Item A', 1: 'Item B', 2: 'Item C', 3: 'Item D', 4: 'Item E', 
             5: 'Item F', 6: 'Item G', 7: 'Item H'}

In [123]:
# Appeler la fonction de recommandation pour le nouvel utilisateur
recommended_items = sample_recommendation_user(
    model, interactions, new_user_id, user_dict, item_dict, threshold=0, nrec_items=10, show=True
)

NameError: name 'new_user_id' is not defined

In [132]:
# https://github.com/aayushmnit/cookbook/blob/master/recsys.py
def sample_recommendation_user(model, interactions, user_id, user_dict, 
                               item_dict,threshold = 0,nrec_items = 10, show = True):
    '''
    Function to produce user recommendations
    Required Input - 
        - model = Trained matrix factorization model
        - interactions = dataset used for training the model
        - user_id = user ID for which we need to generate recommendation
        - user_dict = Dictionary type input containing interaction_index as key and user_id as value
        - item_dict = Dictionary type input containing item_id as key and item_name as value
        - threshold = value above which the rating is favorable in new interaction matrix
        - nrec_items = Number of output recommendation needed
    Expected Output - 
        - Prints list of items the given user has already bought
        - Prints list of N recommended items  which user hopefully will be interested in
    '''
    n_users, n_items = interactions.shape
    user_x = user_dict[user_id]
    scores = pd.Series(model.predict(user_x,np.arange(n_items)))
    scores.index = interactions.columns
    scores = list(pd.Series(scores.sort_values(ascending=False).index))
    
    known_items = list(pd.Series(interactions.loc[user_id,:] \
                                 [interactions.loc[user_id,:] > threshold].index) \
								 .sort_values(ascending=False))
    
    scores = [x for x in scores if x not in known_items]
    return_score_list = scores[0:nrec_items]
    print(known_items)
    known_items = list(pd.Series(known_items).apply(lambda x: item_dict[x]))
    scores = list(pd.Series(return_score_list).apply(lambda x: item_dict[x]))
    return return_score_list

In [45]:
rec_list = sample_recommendation_user(model = model, 
                                      interactions = interactions, 
                                      user_id = 'new_user', 
                                      user_dict = user_dict,
                                      item_dict = artists_dict, 
                                      threshold = 0,
                                      nrec_items = 10,
                                      show = True)

ValueError: The user feature matrix specifies more features than there are estimated feature embeddings: 13665 vs 13666.

In [51]:
# Nombre d'éléments (clés) dans le dictionnaire
print(len(artists_dict))
# Afficher les clés et les valeurs du dictionnaire
print(artists_dict)
# Afficher une clé et une valeur de l'exemple
print(list(artists_dict.items())[0])  # Affiche la première clé et sa valeur

23515
(np.int64(0), 'Vitamin String Quartet')


In [64]:
import numpy as np

# Exemple avec la matrice interactions existante (en supposant que vous avez déjà la matrice)
# interactions = pd.DataFrame(np.zeros((13665, 23515)))  # Votre matrice d'interactions avec 13665 utilisateurs et 23515 articles

# Indices où vous voulez mettre des valeurs non nulles (interaction)
indices = [10]

# Créer un vecteur avec des 0 partout, puis définir les indices donnés
new_user_interactions = np.zeros(interactions.shape[1])  # Créer un vecteur de zéros de taille égale au nombre d'articles
new_user_interactions[indices] = 1  # Définir les indices comme ayant une interaction (valeur 1, ou autre valeur si nécessaire)

# Trouver l'indice du dernier utilisateur
last_user_index = interactions.index[-1]  # Dernière ligne (dernier utilisateur)
print(f"Last user index: {last_user_index}")

# Modifier les interactions du dernier utilisateur
interactions.loc[last_user_index] = new_user_interactions  # Mettre à jour les interactions du dernier utilisateur

# Afficher la matrice mise à jour (vous pouvez afficher seulement les premières lignes si la matrice est grande)
print(interactions.head())  # Affiche les premières lignes de la matrice mise à jour

Last user index: fff77dadf8528083c920b9c018847e8b
artist_id                         0      1      2      3      4      5      \
user_id                                                                      
00055176fea33f6e027cd3302289378b    0.0    0.0    0.0    0.0    0.0    0.0   
0007f3dd09c91198371454c608d47f22    0.0    0.0    0.0    0.0    0.0    0.0   
000b0f32b5739f052b9d40fcc5c41079    0.0    0.0    0.0    0.0    0.0    0.0   
000c11a16c89aa4b14b328080f5954ee    0.0    0.0    0.0    0.0    0.0    0.0   
00123e0f544dee3ab006aa7f1e5725a7    0.0    0.0    0.0    0.0    1.0   34.0   

artist_id                         6      7      8      9      ...  23505  \
user_id                                                       ...          
00055176fea33f6e027cd3302289378b    0.0    0.0    0.0    0.0  ...    0.0   
0007f3dd09c91198371454c608d47f22    0.0    0.0    0.0    0.0  ...    0.0   
000b0f32b5739f052b9d40fcc5c41079    0.0    0.0    0.0    0.0  ...    0.0   
000c11a16c89aa4b14b3280

In [65]:
interactions

artist_id,0,1,2,3,4,5,6,7,8,9,...,23505,23506,23507,23508,23509,23510,23511,23512,23513,23514
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00055176fea33f6e027cd3302289378b,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0007f3dd09c91198371454c608d47f22,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000b0f32b5739f052b9d40fcc5c41079,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000c11a16c89aa4b14b328080f5954ee,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
00123e0f544dee3ab006aa7f1e5725a7,0.0,0.0,0.0,0.0,1.0,34.0,0.0,0.0,165.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ffe11226cdea81a2db9262c0ec7f5d71,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ffe32d5412269f3041c58cbf0dde3306,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
fff60baf392613ed33f745b89a9b38f7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
fff616055993498d6127f3f467cf9f2b,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [66]:
rec_list = sample_recommendation_user(model = model, 
                                      interactions = interactions, 
                                      user_id = 'fff77dadf8528083c920b9c018847e8b', 
                                      user_dict = user_dict,
                                      item_dict = artists_dict, 
                                      threshold = 0,
                                      nrec_items = 1,
                                      show = True)

Known Likes:
1- Jamey Aebersold Play-A-Long

 Recommended Items:
1- Panic! At The Disco


In [81]:
from scipy.sparse import vstack, csr_matrix

# Étape 1: Créer le vecteur d'interactions pour le nouvel utilisateur
new_user_interactions = np.zeros(interactions.shape[1])  # Initialise avec 0
new_user_interactions[[22996, 16449, 14397]] = 1  # Mettre 1 pour les artistes aimés

# Étape 2: Ajouter le vecteur à la matrice d'interactions
new_user_id = 'new_user'  # Un ID unique pour le nouvel utilisateur
interactions.loc[new_user_id] = new_user_interactions  # Ajout à la matrice

# Étape 3: Mettre à jour le dictionnaire user_dict
user_dict[new_user_id] = interactions.index.get_loc(new_user_id)  # Associer l'ID à l'indice

# Étape 5: Générer des recommandations pour le nouvel utilisateur
sample_recommendation_user(
    model=model, 
    interactions=interactions, 
    user_id=new_user_id, 
    user_dict=user_dict,
    item_dict=artists_dict, 
    threshold=0,
    nrec_items=1,
    show=True
)

ValueError: The user feature matrix specifies more features than there are estimated feature embeddings: 13665 vs 13666.

In [82]:
from scipy.sparse import csr_matrix, vstack
import numpy as np

# Étape 1: Créer un vecteur d'interactions pour le nouvel utilisateur
new_user_interactions = np.zeros(train.shape[1])  # Train = matrice d'interactions d'entraînement
new_user_interactions[[22996, 16449, 14397]] = 1  # Indices des items connus aimés par l'utilisateur

# Étape 2: Ajouter le vecteur à la matrice d'interactions (optionnel si non nécessaire pour prévision directe)
new_user_id = train.shape[0]  # ID de l'utilisateur est la prochaine ligne
train_extended = vstack([train, csr_matrix(new_user_interactions)])  # Ajouter le vecteur

# Étape 3: Si des matrices de caractéristiques sont utilisées, ajoutez les caractéristiques utilisateur
try:
    # Ajout de caractéristiques utilisateur si nécessaire
    new_user_features = csr_matrix((1, user_features.shape[1]))  # Ligne sparse
    user_features_extended = vstack([user_features, new_user_features])
except NameError:
    # Si user_features n'existe pas, créez une matrice identité
    user_features_extended = csr_matrix(np.eye(train_extended.shape[0]))

# Étape 4: Prédire les scores pour tous les items pour le nouvel utilisateur
item_ids = np.arange(train_extended.shape[1])  # Tous les items
scores = model.predict(
    user_ids=new_user_id,  # Index du nouvel utilisateur
    item_ids=item_ids,  # Tous les items
    user_features=user_features_extended,  # Matrice mise à jour
    item_features=None  # Spécifiez ici si item_features est utilisé
)

# Étape 5: Trier et recommander les items non vus
seen_items = set(np.where(new_user_interactions > 0)[0])  # Items déjà vus
recommended_items = [
    item for item in np.argsort(-scores)  # Trier par scores décroissants
    if item not in seen_items  # Filtrer les items déjà vus
]

# Résultats
top_n = 10  # Nombre de recommandations à afficher
print("Top recommandations pour le nouvel utilisateur :")
for i in range(top_n):
    print(f"Item ID: {recommended_items[i]}, Score: {scores[recommended_items[i]]}")

ValueError: The user feature matrix specifies more features than there are estimated feature embeddings: 13665 vs 13666.