**Milestone 2**


*   Now that we have explored the data, let's apply different algorithms to build recommendation systems.
*   Note: Use the shorter version of the data, i.e., the data after the cutoffs as used in Milestone 1.

**Load the dataset**

In [89]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [90]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.metrics.pairwise import cosine_similarity

from collections import defaultdict

from sklearn.metrics import mean_squared_error

!pip install scipy

from pandas.core.arrays.interval import le
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics.pairwise import cosine_similarity

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [91]:
df_final = pd.read_csv('/content/drive/MyDrive/df_final (6).csv')

**Popularity-Based Recommendation SystemsÂ¶**

Let's take the count and sum of play counts of the songs and build the popularity recommendation systems based on the sum of play counts.

In [92]:
average_count = df_final.groupby(['song_id']).mean()['play_count']

play_freq = df_final.groupby(['song_id']).count()['play_count']

In [93]:
final_play = pd.DataFrame({'avg_count': average_count, 'play_freq': play_freq})

final_play.head()

Unnamed: 0_level_0,avg_count,play_freq
song_id,Unnamed: 1_level_1,Unnamed: 2_level_1
21,1.622642,265
22,1.492424,132
52,1.729216,421
62,1.72807,114
93,1.452174,115


Now, let's create a function to find the top n songs for a recommendation based on the average play count of song. We can also add a threshold for a minimum number of playcounts for a song to be considered for recommendation.

In [94]:
def top_n_songs(data, n, min_interaction = 100):

  recommendations = data[data['play_freq'] > min_interaction]

  recommendations = recommendations.sort_values(by = 'avg_count', ascending = False)

  return recommendations.index[:n]

In [95]:
list(top_n_songs(final_play, 10, 100))

[7224, 6450, 9942, 5531, 5653, 8483, 2220, 657, 614, 352]

**User User Similarity-Based Collaborative Filtering**

To build the user-user-similarity-based and subsequent models we will use the "surprise" library.

In [96]:
!pip install surprise 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [97]:
from surprise import accuracy

from surprise.reader import Reader

from surprise.dataset import Dataset

from surprise.model_selection import GridSearchCV

from surprise.model_selection import train_test_split

from surprise.prediction_algorithms.knns import KNNBasic

from surprise.prediction_algorithms.matrix_factorization import SVD

from surprise.model_selection import KFold

from surprise import CoClustering

**Some useful functions**

Below is the function to calculate precision@k and recall@k, RMSE and F1_Score@k to evaluate the model performance.

**Think About It:** Which metric should be used for this problem to compare different models?

*   When building a recommendation system that will be implementing different models I do think the F1 score should be used because it will show the mean between the precision and recall. While I do believe that the precision and recall metrics are important in the beginning stages, the F1 score will give a better idea of how its performing while comparing models during this assessment.

In [98]:
def precision_recall_at_k(model, k = 30, threshold = 1.5):
    """Return precision and recall at k metrics for each user"""

    user_est_true = defaultdict(list)
    
    predictions=model.test(testset)
    
    for uid, _, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))

    precisions = dict()
    recalls = dict()
    for uid, user_ratings in user_est_true.items():

        user_ratings.sort(key = lambda x : x[0], reverse = True)

        n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)

        n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[ : k])

        n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold))
                              for (est, true_r) in user_ratings[ : k])

        precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0

        recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0
    
    precision = round((sum(prec for prec in precisions.values()) / len(precisions)), 3)

    recall = round((sum(rec for rec in recalls.values()) / len(recalls)), 3)
    
    accuracy.rmse(predictions)

    print('Precision: ', precision)

    print('Recall: ', recall)

    print('F_1 score: ', round((2 * precision * recall) / (precision + recall), 3))

**Think About It:** In the function precision_recall_at_k above the threshold value used is 1.5. How precision and recall are affected by changing the threshold? What is the intuition behind using the threshold value of 1.5?

*   I think it is fair to have the threshold at 1.5 because it allows the 
model to be reduced so it can locate the 10 best recommendations for the user. Because it tightens things up a bit I do think it could impact the precision and recall but possibly for the better. If it is lower than around 0.6 or so then it could indicate that the recommendation system is not working properly and the threshold number should be changed. Setting the threshold value of 1.5 is a good spot in that it rests somewhere within the average count values and also looking at the past history regarding the play counts of songs.

In [99]:
reader = Reader(rating_scale= (0, 5))

data = Dataset.load_from_df(df_final[['user_id', 'song_id', 'play_count']], reader)

trainset, testset = train_test_split(data, test_size=0.4, random_state = 42)

**Think About It:** How changing the test size would change the results and outputs?

*   I think it would be because a larger portion of your data should be used toward the model and using it for fitting and training. So if there is a higher test size then it is taking away a larger amoun of the data that needs to be used for the model training. This in turn could then affect the outputs and results of the data.

In [100]:
sim_options = {'name': 'msd',
               'user_based': True}

sim_user_user = KNNBasic(sim_options = sim_options, k = 30, random_state = 1, verbose = False)

sim_user_user.fit(trainset)

precision_recall_at_k(sim_user_user)

RMSE: 1.0672
Precision:  0.412
Recall:  0.598
F_1 score:  0.488


**Observations and Insights:**

*   The prescision value looks a little low which at this juncture may indicate that the song recommendations provided are not very relevant.
*   The recall value is right at 0.6 which means that relevant songs are being recommended to the user and that the model has the value at a good spot.
*   The RMSE is still over 1 and that could be improved so the predicted ratings are closer to the actual ratings. The overall F1 score is still lower than it should be so this could also be improved with a better model from the data.




In [101]:
sim_user_user.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.63   {'actual_k': 30, 'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.6254190211665536, details={'actual_k': 30, 'was_impossible': False})

In [102]:
sim_user_user.predict(6958, 3232, verbose = True)

user: 6958       item: 3232       r_ui = None   est = 1.36   {'actual_k': 30, 'was_impossible': False}


Prediction(uid=6958, iid=3232, r_ui=None, est=1.3609600600037504, details={'actual_k': 30, 'was_impossible': False})

**Observations and Insights:**

*   Comparing the two samples it looks like the average play count that is predicted for the user with a listened song is considerably higher than the same user that hasn't listened to the particular song. With the actual rating being 2 for the user that has listened to the song, the predicted rating is not too far off but could be improved.



Now, let's try to tune the model and see if we can improve the model performance.

In [None]:
param_grid = {'k': [10, 20, 30], 'min_k': [3, 6, 9],
              'sim_options': {'name': ["cosine", 'pearson', "pearson_baseline"],
                              'user_based': [True], "min_support": [2, 4]}
              }

gs = GridSearchCV(KNNBasic, param_grid, measures = ['rmse'], cv = 3, n_jobs = -1)

gs.fit(data)

print(gs.best_score['rmse'])

1.0469980258705576


In [None]:
print(gs.best_params['rmse'])

{'k': 30, 'min_k': 9, 'sim_options': {'name': 'pearson_baseline', 'user_based': True, 'min_support': 2}}


In [None]:
sim_options = {'name': 'pearson_baseline',
               'user_based': True}

sim_user_user_optimized = KNNBasic(sim_options = sim_options, k = 30, min_k = 9, random_state = 1, verbose = False)

sim_user_user_optimized.fit(trainset)

precision_recall_at_k(sim_user_user_optimized)

RMSE: 1.0521
Precision:  0.413
Recall:  0.721
F_1 score:  0.525


**Observations and Insights:**

*   The noticable change from the metrics was the recall that improved after the hyperparameters were fine tuned. While they were all ready at a good level before the model was tuned, it showed the most improvement.
*   The RMSE metric was lowered but not by very much, so it could be a better fit but the fine tuning helped. It should be improved though so the accuracy of the model is better for predictions. The F1 score also improved slightly but mainly from the improvement in the recall value. This is another aspect that could be increased closer to 1 to enhance the accuracy of the model.
*   The one metric that stayed pretty much the same after fine tuning was the precision metric which barely improved, so because of this the F1 score did not change very much.




In [None]:
sim_user_user_optimized.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.96   {'actual_k': 24, 'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.962926073914969, details={'actual_k': 24, 'was_impossible': False})

In [None]:
sim_user_user_optimized.predict(6958, 3232, verbose = True)

user: 6958       item: 3232       r_ui = None   est = 1.45   {'actual_k': 10, 'was_impossible': False}


Prediction(uid=6958, iid=3232, r_ui=None, est=1.4516261428486725, details={'actual_k': 10, 'was_impossible': False})

**Observations and Insights:**

*   After optimizing and fine tuning the predictions the predicted rating for the user that has listened to a particular song (item 1671) increased substantially and is very close to the actual rating of 2. For the user that hasn't listened to a particular song (item 3232), the predicted values also increased from before the fine tuning but not by much.
*   The k quantities were lower than 30 for both predictions after the optimizing, possibly meaning that there are less observations that are similar to them.



**Think About It:** Along with making predictions on listened and unknown songs can we get 5 nearest neighbors (most similar) to a certain song?

I think using the KNN algorithm could potentially be a solution to finding the 5 closest neighbors to a particular song. This way could calculate the distances among those points to determine what is the nearest 5 neighbors to that song in terms of similarity.

In [None]:
sim_user_user_optimized.get_neighbors(0, k = 5)

[42, 1131, 17, 186, 249]

Below we will be implementing a function where the input parameters are:

**data:** A song dataset

**user_id:** A user-id against which we want the recommendations

**top_n:** The number of songs we want to recommend

**algo:** The algorithm we want to use for predicting the play_count

The output of the function is a set of top_n items recommended for the given user_id based on the given algorithm

In [None]:
def get_recommendations(data, user_id, top_n, algo):
    
    recommendations = []
    
    user_item_interactions_matrix = data.pivot_table(index = 'user_id', columns = 'song_id', values = 'play_count')
    
    non_interacted_products = user_item_interactions_matrix.loc[user_id][user_item_interactions_matrix.loc[user_id].isnull()].index.tolist()
    
    for item_id in non_interacted_products:
        
        est = algo.predict(user_id, item_id).est
        
        recommendations.append((item_id, est))

    recommendations.sort(key = lambda x : x[1], reverse = True)

    return recommendations[:top_n]

In [None]:
recommendations = get_recommendations(df_final, 6958, 5, sim_user_user_optimized)

In [None]:
pd.DataFrame(recommendations, columns = ['song_id', 'predicted_ratings'])

Unnamed: 0,song_id,predicted_ratings
0,5531,2.553335
1,317,2.518269
2,4954,2.406776
3,8635,2.396606
4,5943,2.390723


**Observations and Insights:**

*   The top 5 recommended songs from the new function all have predicted ratings that are well above the threshold of 2 that was used in previous algorithms and models. I believe this is a good sign that this new function is strong with its new predictions and have provided high rated song recommendations to the user.



**Correcting the play_counts and Ranking the above songs**

In [None]:
def ranking_songs(recommendations, final_rating):
  ranked_songs = final_rating.loc[[items[0] for items in recommendations]].sort_values('play_freq', ascending = False)[['play_freq']].reset_index()

  ranked_songs = ranked_songs.merge(pd.DataFrame(recommendations, columns = ['song_id', 'predicted_ratings']), on = 'song_id', how = 'inner')

  ranked_songs['corrected_ratings'] = ranked_songs['predicted_ratings'] - 1 / np.sqrt(ranked_songs['play_freq'])

  ranked_songs = ranked_songs.sort_values('corrected_ratings', ascending = False)
  
  return ranked_songs

**Think About It:** In the above function to correct the predicted play_count a quantity 1/np.sqrt(n) is subtracted. What is the intuition behind it? Is it also possible to add this quantity instead of subtracting?

*   I think the reason behind the quantity being subtracted is that it allows the predicted play counts to have a more realistic and better prediction to the songs. Adding to the quantity might not condense the amount of similar items so that it finds the best recommendations, so perhaps that is the reason that it is better to subtract the quantity.

In [None]:
ranking_songs(recommendations, final_play)

Unnamed: 0,song_id,play_freq,predicted_ratings,corrected_ratings
0,5531,618,2.553335,2.513109
2,317,411,2.518269,2.468943
1,5943,423,2.390723,2.342101
3,4954,183,2.406776,2.332854
4,8635,155,2.396606,2.316284


**Observations and Insights:**

*   The rankings of the predicted ratings and corrected ratings for the top 5 are both parallel in that they are ranked in order for each column. What also is relevant here is that the play frequency is also parallel with the ratings in that they are ranked in order like the other columns. This indicates that there is a correlation between how much a song is played and how high the ratings will be for that song. The impact of the user-item interaction then can't be understated and that the play frequency is important to the ratings and makes it more dependable in its predictions.

**Item Item Similarity-based collaborative filtering recommendation systems**

In [None]:
sim_options = {'name': 'cosine',
               'user_based': False}

sim_item_item = KNNBasic(sim_options = sim_options, random_state = 1, verbose = False)

sim_item_item.fit(trainset)

precision_recall_at_k(sim_item_item)

RMSE: 1.0394
Precision:  0.307
Recall:  0.562
F_1 score:  0.397


**Observations and Insights:**

*   After running the model it looks like the precision metric is quite low which is keeping the F1 score much lower than it should be. This must mean that the recommendations provided for the items are not relevant at this point. The recall is a bit lower than it should be which is around 0.6 so improvement is needed here as well where the relevant items recommended could be fine tuned a bit.
*   The RMSE metric is over 1 which means that this is another area that can be improved since it is still far from being accurate.



In [None]:
sim_item_item.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.36   {'actual_k': 20, 'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.3614157231762556, details={'actual_k': 20, 'was_impossible': False})

In [None]:
sim_item_item.predict(3232, 1671, verbose = True)

user: 3232       item: 1671       r_ui = None   est = 1.70   {'was_impossible': True, 'reason': 'User and/or item is unknown.'}


Prediction(uid=3232, iid=1671, r_ui=None, est=1.6989607635206787, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'})

**Observations and Insights:**

*   The estimated prediction of 1.36 for the user that has listened to this particular song is much lower than the provided thresold of 2 with the k value being around 20. Maybe this means that with less likely neighbors that this is not a good song to recommend based off the prediction.
*   The estimated prediction of 1.70 with the user that has not listend to the particular song is much higher than the listened to song. It is still below the threshold of 2 which means at this point the metrics for recommendations are still not very strong.



In [None]:
param_grid = {'k': [10, 20, 30], 'min_k': [3, 6, 9],
              'sim_options': {'name': ["cosine", 'pearson', "pearson_baseline"],
                              'user_based': [False], "min_support": [2, 4]}
              }

gs = GridSearchCV(KNNBasic, param_grid, measures = ['rmse'], cv = 3, n_jobs = -1)

gs.fit(data)

print(gs.best_score['rmse'])


1.0233479073264584


In [None]:
print(gs.best_params['rmse'])

{'k': 30, 'min_k': 6, 'sim_options': {'name': 'pearson_baseline', 'user_based': False, 'min_support': 2}}


**Think About It:** How do the parameters affect the performance of the model? Can we improve the performance of the model further? Check the list of hyperparameters here.

*   The usage of the grid search is vital because it will take the hyperparameters listed in the above model and look at all possibilities within those hyperparameters and models to construct a model that will be most beneficial. Perhaps adjusting the values in the parameter grid could be one way of making the model better depending on how the final output looks after entering the data.

In [None]:
sim_options = {'name': 'msd',
               'user_based': False}

sim_item_item_optimized = KNNBasic(sim_options = sim_options, k = 30, min_k = 6, random_state = 1, verbose = False)

sim_item_item_optimized.fit(trainset)

precision_recall_at_k(sim_item_item_optimized)

RMSE: 1.0423
Precision:  0.34
Recall:  0.563
F_1 score:  0.424


**Observations and Insights:**

*   After optimizing the model it looks as though that the tuned hyperparamters did not change the metrics too much. The RMSE metric even regressed and is slightly farther away from being accurate. The precision metric improved more than the other metrics but it was not a significant increase, so it helped improve the F1 score but only slightly because the recall metric stayed roughly the same.
*   Overall the accuracy of the metrics is still well below par so the fine tuning of the model did not have much of an effect on the metrics. Maybe going back and revising the parameters in the model could help these values improve.



In [None]:
sim_item_item_optimized.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.33   {'actual_k': 20, 'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.3319659681769878, details={'actual_k': 20, 'was_impossible': False})

In [None]:
sim_item_item_optimized.predict(6958, 3232, verbose = True)

user: 6958       item: 3232       r_ui = None   est = 1.47   {'actual_k': 20, 'was_impossible': False}


Prediction(uid=6958, iid=3232, r_ui=None, est=1.4746440368585654, details={'actual_k': 20, 'was_impossible': False})

**Observations and Insights:**

*   After optimizing the model it looks like both predictions for the user went down and were farther away from the threshold of 2. This is on par with the other outputs after fine tuning the hyperparameters in that they had little impact and even showed signs of regression in some areas. The song that was listened to by the user stayed roughly the same with little regression while the song that was not listened to showed a lot more decline in its estimated rating. Both outputs had actual k values of 20 like the previous model before the optimization.

In [None]:
sim_item_item_optimized.get_neighbors(0, k = 5)

[3, 10, 24, 30, 36]

In [None]:
recommendations = get_recommendations(df_final, 6958, 5, sim_item_item_optimized)

In [None]:
pd.DataFrame(recommendations, columns = ['song_id', 'predicted_play_count'])

Unnamed: 0,song_id,predicted_play_count
0,9942,2.048219
1,2842,1.949116
2,3050,1.885211
3,4939,1.77119
4,1691,1.743513


In [None]:
ranking_songs(recommendations, final_play)

Unnamed: 0,song_id,play_freq,predicted_ratings,corrected_ratings
3,9942,150,2.048219,1.96657
2,2842,232,1.949116,1.883463
1,3050,233,1.885211,1.819698
4,4939,133,1.77119,1.684479
0,1691,249,1.743513,1.68014


**Observations and Insights:**

*   From the 5 selected neighbors provided the model recommended 5 songs that are all above or at least somewhat close to the threshold of 2. The play frequencies for the songs are all high enough to show that this is a fairly reliable prediction for the 5 songs. The corrected ratings are all pretty close to the predicted ratings, so this shows that this is also a good model for recommending songs if you are looking simply at the closest neighbors to the user.

**Model Based Collaborative Filtering - Matrix Factorization**

Model-based Collaborative Filtering is a personalized recommendation system, the recommendations are based on the past behavior of the user and it is not dependent on any additional information. We use latent features to find recommendations for each user.

In [None]:
svd = SVD(random_state = 1)

svd.fit(trainset)

precision_recall_at_k(svd)

RMSE: 1.0252
Precision:  0.41
Recall:  0.633
F_1 score:  0.498


In [None]:
svd.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.27   {'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.267473397214638, details={'was_impossible': False})

In [None]:
svd.predict(6958, 3232, verbose = True)

user: 6958       item: 3232       r_ui = None   est = 1.56   {'was_impossible': False}


Prediction(uid=6958, iid=3232, r_ui=None, est=1.5561675084403663, details={'was_impossible': False})

**Improving matrix factorization based recommendation system by tuning its hyperparameters**

In [None]:
param_grid = {'n_epochs': [10, 20, 30], 'lr_all': [0.001, 0.005, 0.01],
              'reg_all': [0.2, 0.4, 0.6]}

gs = GridSearchCV(SVD, param_grid, measures = ['rmse'], cv = 3, n_jobs = -1)

gs.fit(data)

print(gs.best_score['rmse'])

1.0126517175612602


In [None]:
print(gs.best_params['rmse'])

{'n_epochs': 30, 'lr_all': 0.01, 'reg_all': 0.2}


**Think About It:** How do the parameters affect the performance of the 
model? Can we improve the performance of the model further? Check the available hyperparameters here.

*   The parameters that they are all set at it looks like it has improved the metrics considerly in most areas. The F1 score has improved but is still below where it should be, while the RMSE metric has been lowered but only slightly. The prediction value for the user that has listened to a particular song is the only noticeable regression. None of the parameters that are being used are near the default options, so perhaps using those values might be a better method to see how the model has affected the outputs.



In [None]:
svd_optimized = SVD(n_epochs = 30, lr_all = 0.01, reg_all = 0.2, random_state = 1)

svd_optimized = svd_optimized.fit(trainset)

precision_recall_at_k(svd_optimized)

RMSE: 1.0141
Precision:  0.415
Recall:  0.635
F_1 score:  0.502


**Observations and Insights:**

*   Using the best possible parameters from the grid search it has shown to have made the metrics better in all four areas. Maybe using the default options would have not made the model better and that this is the best model for future recommendations after through all the possible iterations. It has pushed the F1 score closer to 0.6 which will make this a more reliable model.



In [None]:
svd_optimized.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.34   {'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.3432395286125096, details={'was_impossible': False})

In [None]:
svd_optimized.predict(6958, 3232, verbose = True)

user: 6958       item: 3232       r_ui = None   est = 1.44   {'was_impossible': False}


Prediction(uid=6958, iid=3232, r_ui=None, est=1.442548446117648, details={'was_impossible': False})

**Observations and Insights:**

*   The optimized hyperparameters increased the estimated value by a small amount with using 2 still as a threshold with a particular song that the user has listened to. The song that the user has not listend to has a estimated value that decreased after fine tuning the hyperparameters. I do wonder if this process that uses latent features is affecting the values after tuning the hyperparameters or any regression is more of an aberration.





In [None]:
svd_recommendations = get_recommendations(df_final, 6958, 5, svd_optimized)

In [None]:
ranking_songs(svd_recommendations, final_play)

Unnamed: 0,song_id,play_freq,predicted_ratings,corrected_ratings
2,7224,107,2.601899,2.505225
1,5653,108,2.108728,2.012502
4,8324,96,2.014091,1.912029
0,9942,150,1.940115,1.858465
3,6450,102,1.952493,1.853478


**Observations and Insights:**

*   The tuned hyperparameters have I think changed the model for the better, with the top 5 song recommendations having higher ratings in both columns. They are all a lot closer to the 2 threshold with all the songs ratings being above or just below that threshold value. The difference between the predicted and corrected ratings look about the same after the model was tuned. All the songs have high enough play frequencies that this recommendation model looks stronger.



**Cluster Based Recommendation System**

In clustering-based recommendation systems, we explore the similarities and differences in people's tastes in songs based on how they rate different songs. We cluster similar users together and recommend songs to a user based on play_counts from other users in the same cluster.

In [None]:
clust_baseline = CoClustering(random_state = 1)

clust_baseline.fit(trainset)

precision_recall_at_k(clust_baseline)

RMSE: 1.0487
Precision:  0.397
Recall:  0.582
F_1 score:  0.472


In [None]:
clust_baseline.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.29   {'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.2941824757363074, details={'was_impossible': False})

In [None]:
clust_baseline.predict(6958, 3232, verbose = True)

user: 6958       item: 3232       r_ui = None   est = 1.48   {'was_impossible': False}


Prediction(uid=6958, iid=3232, r_ui=None, est=1.4785259100797417, details={'was_impossible': False})

**Improving clustering-based recommendation system by tuning its hyper-parameters**

In [None]:
param_grid = {'n_cltr_u': [5, 6, 7, 8], 'n_cltr_i': [5, 6, 7, 8], 'n_epochs': [10, 20, 30]}

gs = GridSearchCV(CoClustering, param_grid, measures = ['rmse'], cv = 3, n_jobs = -1)

gs.fit(data)

print(gs.best_score['rmse'])

1.060435951447511


In [None]:
print(gs.best_params['rmse'])

{'n_cltr_u': 5, 'n_cltr_i': 5, 'n_epochs': 10}


**Think About It:** How do the parameters affect the performance of the model? Can we improve the performance of the model further? Check the available hyperparameters here.

*   I think that having less clusters for the items and users will benefit the recommendation system in that it will make it easier to recommend items and help find users that are similar. Having too many clusters I wonder could make things too complicated for the recommendation system and would require too many iterations of the optimization loop.



In [None]:
clust_tuned = CoClustering(n_cltr_u = 3, n_cltr_i = 3, n_epochs = 30, random_state = 1)

clust_tuned.fit(trainset)

precision_recall_at_k(clust_tuned)

RMSE: 1.0487
Precision:  0.397
Recall:  0.582
F_1 score:  0.472


**Observations and Insights:**

*   Condensing the total number of clusters to the default values of 3 after the tuning, the F1 score is below average with the recall being close to average and the precision being relatively low. The RMSE metric is around 1 which is a bit high but isn't too far off from other past models that have been implemented. I wonder if more clusters would help these metrics scores at all.



In [None]:
clust_tuned.predict(6958, 1671, r_ui = 2, verbose = True)

user: 6958       item: 1671       r_ui = 2.00   est = 1.29   {'was_impossible': False}


Prediction(uid=6958, iid=1671, r_ui=2, est=1.2941824757363074, details={'was_impossible': False})

In [None]:
clust_tuned.predict(6958, 3232, verbose = True)

user: 6958       item: 3232       r_ui = None   est = 1.48   {'was_impossible': False}


Prediction(uid=6958, iid=3232, r_ui=None, est=1.4785259100797417, details={'was_impossible': False})

**Observations and Insights:**

*   It looks as though the clusters have made the estimated values for both predictions with the user-song interaction a bit below average. The cluster with the user having not listened to the particular song is definitely higher than the song that the user has listened to using the threshold of 2. This seems to be not too much of a surprise with the metrics being below average overall as well. There is definitely some room for improvements regarding the fine tuning of the hyperparameters with the clustering.



**Implementing the recommendation algorithm based on optimized CoClustering model**

In [None]:
def get_recommendations(data, user_id, top_n, algo):

    recommendations = []
    
    user_item_interactions_matrix = data.pivot(index = 'user_id', columns = 'song_id', values = 'play_count')
    
    non_interacted_products = user_item_interactions_matrix.loc[user_id][user_item_interactions_matrix.loc[user_id].isnull()].index.tolist()
    
    for item_id in non_interacted_products:
        
        est = algo.predict(user_id, item_id).est
        
        recommendations.append((item_id, est))

    recommendations.sort(key = lambda x: x[1], reverse = True)

    return recommendations[:top_n]

In [None]:
clustering_recommendations = get_recommendations(df_final, 6958, 5, clust_tuned)

**Correcting the play_count and Ranking the above songs**

In [None]:
def ranking_songs(recommendations, final_rating):
  
    ranked_songs = final_play.loc[[items[0] for items in recommendations]].sort_values('play_freq', ascending = False)[['play_freq']].reset_index()

    ranked_songs = ranked_songs.merge(pd.DataFrame(recommendations, columns = ['song_id', 'predicted_ratings']), on = 'song_id', how = 'inner')

    ranked_songs['corrected_ratings'] = ranked_songs['predicted_ratings'] - 1 / np.sqrt(ranked_songs['play_freq'])

    ranked_songs = ranked_songs.sort_values('corrected_ratings', ascending = False)

    return ranked_songs

In [None]:
ranking_songs(clustering_recommendations, final_play)

Unnamed: 0,song_id,play_freq,predicted_ratings,corrected_ratings
2,7224,107,3.094797,2.998124
4,8324,96,2.311498,2.209436
1,9942,150,2.215039,2.13339
0,5531,618,2.124563,2.084337
3,4831,97,2.123783,2.022248


**Observations and Insights:**

*   After going through the extraction and finding the songs that have not been interacted with, it looks like the ratings for these songs are even higher than past models. The clustering method then makes me wonder if this is a better way to help find songs for users in the recommendation system stemming from the results. The top 5 recommended songs also have high play frequencies, so I feel more comfortable using this model then the other models used as of right now.



**Content Based Recommendation Systems**

**Think About It:** So far we have only used the play_count of songs to find recommendations but we have other information/features on songs as well. Can we take those song features into account?

*   I do think that other aspects of these songs can be used toward building a stronger recommendation system. If we know the genre of music or even the year of the album release then I think that can help identify user interests and that in turn can find similar users which can help recommend songs that the user would more likely listen to. The only part of this that could possibly hinder the best recommendation system regarding song information would be if any of those features are missing.



In [None]:
df_small = df_final

In [None]:
df_small['text'] = df_small['title'] + ' ' + df_small['release'] + ' ' + df_small['artist_name']

df_small.head()

Unnamed: 0.1,Unnamed: 0,user_id,song_id,play_count,title,release,artist_name,year,text
0,200,6958,447,1,Daisy And Prudence,Distillation,Erin McKeown,2000,Daisy And Prudence Distillation Erin McKeown
1,202,6958,512,1,The Ballad of Michael Valentine,Sawdust,The Killers,2004,The Ballad of Michael Valentine Sawdust The Ki...
2,203,6958,549,1,I Stand Corrected (Album),Vampire Weekend,Vampire Weekend,2007,I Stand Corrected (Album) Vampire Weekend Vamp...
3,204,6958,703,1,They Might Follow You,Tiny Vipers,Tiny Vipers,2007,They Might Follow You Tiny Vipers Tiny Vipers
4,205,6958,719,1,Monkey Man,You Know I'm No Good,Amy Winehouse,2007,Monkey Man You Know I'm No Good Amy Winehouse


In [None]:
df_small = df_small[['user_id', 'song_id', 'play_count', 'title', 'text']]

df_small = df_small.drop_duplicates(subset = ['title'])

df_small = df_small.set_index('title')

df_small.head(5)

Unnamed: 0_level_0,user_id,song_id,play_count,text
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Daisy And Prudence,6958,447,1,Daisy And Prudence Distillation Erin McKeown
The Ballad of Michael Valentine,6958,512,1,The Ballad of Michael Valentine Sawdust The Ki...
I Stand Corrected (Album),6958,549,1,I Stand Corrected (Album) Vampire Weekend Vamp...
They Might Follow You,6958,703,1,They Might Follow You Tiny Vipers Tiny Vipers
Monkey Man,6958,719,1,Monkey Man You Know I'm No Good Amy Winehouse


In [None]:
df_small.shape

(561, 4)

In [None]:
indices = pd.Series(df_small.index)

indices[ : 5]

0                 Daisy And Prudence
1    The Ballad of Michael Valentine
2          I Stand Corrected (Album)
3              They Might Follow You
4                         Monkey Man
Name: title, dtype: object

In [None]:
import nltk

nltk.download("punkt")

nltk.download("stopwords")
 
nltk.download("wordnet")

import re

from nltk import word_tokenize

from nltk.stem import WordNetLemmatizer

from nltk.corpus import stopwords

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

import nltk

nltk.download('omw-1.4')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

**We will create a function to pre-process the text data:**

In [None]:
def tokenize(text):
    
    text = re.sub(r"[^a-zA-Z]"," ", text.lower())
    
    tokens = word_tokenize(text)
    
    words = [word for word in tokens if word not in stopwords.words("english")] 
    
    text_lems = [WordNetLemmatizer().lemmatize(lem).strip() for lem in words]

    return text_lems

In [None]:
tfidf = TfidfVectorizer(tokenizer = tokenize)

song_tfidf = tfidf.fit_transform(df_small['text'].values).toarray()

In [None]:
pd.DataFrame(song_tfidf)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1427,1428,1429,1430,1431,1432,1433,1434,1435,1436
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
557,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
558,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
559,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
similar_songs = cosine_similarity(song_tfidf, song_tfidf)

similar_songs

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

Finally, let's create a function to find most similar songs to recommend for a given song.

In [None]:
def recommendations(title, similar_songs):
    
    recommended_songs = []
    
    idx = indices[indices == title].index[0]

    score_series = pd.Series(similar_songs[idx]).sort_values(ascending = False)

    top_10_indexes = list(score_series.iloc[1 : 11].index)
    print(top_10_indexes)
    
    for i in top_10_indexes:
        recommended_songs.append(list(df_small.index)[i])
        
    return recommended_songs

Recommending 10 songs similar to Learn to Fly

In [None]:
recommendations('Learn To Fly', similar_songs)

[509, 234, 423, 345, 394, 370, 371, 372, 373, 375]


['Everlong',
 'The Pretender',
 'Nothing Better (Album)',
 'From Left To Right',
 'Lifespan Of A Fly',
 'Under The Gun',
 'I Need A Dollar',
 'Feel The Love',
 'All The Pretty Faces',
 'Bones']

**Observations and Insights:**

*   It looks like there is some similarities in that the first two songs in the recommendation list are songs from the same band and one song has the word 'fly' in it like the song we are looking up recommendations for based off the similarities. That is a pretty solid indication that the text features extraction worked to find similar songs. There are plenty songs in the database that have similar qualities it looks like so this would make for a more in-depth recommendation system for finding similar songs for a user.

