## 4. Modeling

In this notebook, I will train machine learning models to power the recommender system.  
I will be using the [scikit-surprise library](https://surprise.readthedocs.io/en/stable/) to build the recommender system.

In [1]:
!pip install surprise

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.1.tar.gz (11.8 MB)
[K     |████████████████████████████████| 11.8 MB 4.7 MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1633714 sha256=82e5a91a970509f467283a95e8d281c920f87fb7de1071a9a4ce9a2e212ae1f1
  Stored in directory: /root/.cache/pip/wheels/76/44/74/b498c42be47b2406bd27994e16c5188e337c657025ab400c1c
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.1 surprise-0.1


In [2]:
from collections import defaultdict
from tqdm import tqdm
import pandas as pd
from surprise.model_selection import KFold
from surprise import Reader,Dataset
from surprise.model_selection import cross_validate
from surprise import KNNBasic
from surprise import BaselineOnly
from surprise import CoClustering
from random import sample, choice

In [3]:
reading_no_zero_df = pd.read_csv('reading_no_zero.csv')

### Machine Learning models in scikit-surprise

The explanation for each of the models used is taken from the [scikit-surprise documentation](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).

- Baseline Only: Algorithm predicting the baseline estimate for given user and item.
- KNN Basic: A basic collaborative filtering algorithm.
- Co-clustering: A collaborative filtering algorithm based on co-clustering.

In [4]:
algos = {
          'KNN Basic': KNNBasic(), 
          'Basline Only': BaselineOnly(), 
          'Co-clustering': CoClustering()
        }
       
reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(reading_no_zero_df[['user', 'item', 'rating']], reader)

In [5]:
overall_results = []
for algo_name, algo in tqdm(algos.items()): 
    algo_results = []
    results = cross_validate(algo, data, measures=['RMSE'], cv=5, verbose=False)
    algo_results.append(algo_name)
    algo_results.append(results['test_rmse'].mean())
    overall_results.append(algo_results)

  0%|          | 0/3 [00:00<?, ?it/s]

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.


 33%|███▎      | 1/3 [03:58<07:57, 238.79s/it]

Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...


100%|██████████| 3/3 [05:01<00:00, 100.38s/it]


In [6]:
overall_results_df = pd.DataFrame(overall_results, columns = ['Algorithm Name', 'Mean CV RMSE Score'])
overall_results_df.sort_values('Mean CV RMSE Score')

Unnamed: 0,Algorithm Name,Mean CV RMSE Score
1,Basline Only,1.215998
2,Co-clustering,1.259028
0,KNN Basic,1.336451


**Data Shown:** The RMSE scores shown are the average over 5 cross-validation folds. The better performing models have an error in the range of 1.2+. For a rating system out of 10, this is around 12% error.

**Insights:** By default in scikit-surprise, the similarity between users is calculated via their [mean squared difference](https://surprise.readthedocs.io/en/stable/similarities.html?highlight=msd#surprise.similarities.msd).

![msd](../images/msd.JPG)

Notation explanation from [scikit-surprise](https://surprise.readthedocs.io/en/stable/notation_standards.html#notation-standards)
- U : the set of all users. u and v denotes users.
- I : the set of all items. i and j denotes items.
- rui : the true rating of user u for item i.

In [7]:
sim_options = {'name':'cosine'}
algos = {
            'KNN Basic':KNNBasic(sim_options = sim_options)
        }

reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(reading_no_zero_df[['user', 'item', 'rating']], reader)

overall_results_cosine = []
for algo_name, algo in tqdm(algos.items()): 
    algo_results = []
    results = cross_validate(algo, data, measures=['RMSE'], cv=5, verbose=False)
    algo_results.append(algo_name)
    algo_results.append(results['test_rmse'].mean())
    overall_results_cosine.append(algo_results)
    
overall_results_cosine_df = pd.DataFrame(overall_results_cosine, columns = ['Algorithm Name', 'Mean CV RMSE Score (cosine)'])
overall_results_cosine_df.sort_values('Mean CV RMSE Score (cosine)')

  0%|          | 0/1 [00:00<?, ?it/s]

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.


100%|██████████| 1/1 [06:05<00:00, 365.67s/it]


Unnamed: 0,Algorithm Name,Mean CV RMSE Score (cosine)
0,KNN Basic,1.451253


**Data Shown:** Try out [cosine similarity](https://surprise.readthedocs.io/en/stable/similarities.html?highlight=msd#surprise.similarities.cosine) instead.

![cosine](../images/cosine.JPG)

In [8]:
combined_results_df = pd.merge(left = overall_results_cosine_df, right = overall_results_df, how = 'right', on = 'Algorithm Name')
combined_results_df.sort_values('Mean CV RMSE Score')

Unnamed: 0,Algorithm Name,Mean CV RMSE Score (cosine),Mean CV RMSE Score
1,Basline Only,,1.215998
2,Co-clustering,,1.259028
0,KNN Basic,1.451253,1.336451


**Data Shown:** All average RMSE score for 5 CV folds.

**Insights:** Cosine similarity version performed worse than the default msd similarity.

----
### Precision@k and Recall@k

Rather than using RMSE to gauge the accuracy of a model, Precision@k and Recall@k are more commonly used to measure the performance of a recommender system.

![cosine](../images/precision_recall.png)

k in this case refers to the number of recommendations. Relevance is defined as recommendations with a score equal to or higher than a defined threshold.   
For example, Precision@10 with threshold 7 refers to the number of recommendations that the user rated >= 7, divided by the 10 recommendations that the model makes.

These metrics prioritize the quality of the top k recommendations over the average quality of all recommendations like RMSE. This makes sense practically as well as the user will only care about the top k recommendations being shown, rather than every single item's rating.  

### Measurement metrics for this project
For this project, I am opting for top 10 recommendations, i.e. k = 10 and a threshold rating of 7/10.

### Baseline Model

For the baseline model, it will randomly pick 10 manga titles to recommend to each user. The precision@k and recall@k scores are then calculated using these 10 random recommendations.

In [9]:
list_of_titles = reading_no_zero_df['item'].unique().tolist()
list_of_users = reading_no_zero_df['user'].unique().tolist()
k = 10

# Precision@k calculation using random 1000 experiments
precision = []
recall = []

for i in tqdm(range(reading_no_zero_df['user'].nunique())):
    random_user = choice(list_of_users)
    random_user_titles = reading_no_zero_df[(reading_no_zero_df['user']==random_user) & (reading_no_zero_df['rating']>=7)]['item'].unique().tolist()
    total_relevant_titles = reading_no_zero_df[(reading_no_zero_df['user']==random_user) & (reading_no_zero_df['rating']>=7)]['item'].nunique()
    
    # Since all titles can be selected, we select a random k titles
    random_10_titles = sample(list_of_titles, k)    
    num_correct_titles = []
    for title in random_10_titles:
        if title in random_user_titles:
            num_correct_titles.append(title)

    if total_relevant_titles == 0:
        recall.append(0)
    else:
        recall.append(len(num_correct_titles)/total_relevant_titles)
    
    precision.append(len(num_correct_titles)/10)
    
    list_of_users.remove(random_user)

precision_at_k = sum(precision)/len(precision)
recall_at_k = sum(recall)/len(recall)

print(f'Average precision@k over 1,000 experiments is {precision_at_k}')
print(f'Average recall@k over 1,000 experiments is {recall_at_k}')

100%|██████████| 14138/14138 [11:18<00:00, 20.82it/s]

Average precision@k over 1,000 experiments is 0.007801669260149857
Average recall@k over 1,000 experiments is 0.003205779093554163





In [10]:
baseline_results = pd.DataFrame(columns = ['Algorithm Name', 'Precision@k', 'Recall@k'])
baseline_results.loc[0] = ['Baseline (Random Recommendations)',precision_at_k, recall_at_k]
baseline_results

Unnamed: 0,Algorithm Name,Precision@k,Recall@k
0,Baseline (Random Recommendations),0.007802,0.003206


In [11]:
def precision_recall_at_k(predictions, k=10, threshold=3.5):
    """Return precision and recall at k metrics for each user"""

    # First map the predictions to each user.
    user_est_true = defaultdict(list)
    for uid, _, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))

    precisions = dict()
    recalls = dict()
    for uid, user_ratings in user_est_true.items():

        # Sort user ratings by estimated value
        user_ratings.sort(key=lambda x: x[0], reverse=True)

        # Number of relevant items
        n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)

        # Number of recommended items in top k
        n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])

        # Number of relevant and recommended items in top k
        n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold))
                              for (est, true_r) in user_ratings[:k])

        # Precision@K: Proportion of recommended items that are relevant
        # When n_rec_k is 0, Precision is undefined. We here set it to 0.

        precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0

        # Recall@K: Proportion of relevant items that are recommended
        # When n_rel is 0, Recall is undefined. We here set it to 0.

        recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0

    return precisions, recalls

In [12]:
# defining a function to calculate precision@k and recall@k
reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(reading_no_zero_df[['user', 'item', 'rating']], reader)
sim_options = {'name':'cosine'}

def calculate_precision_recall(k, threshold, splits):
    algos = {'KNN Basic':KNNBasic(),
             'Basline Only':BaselineOnly(), 
             'Co-clustering':CoClustering(),  
            
             }
    kf = KFold(n_splits=splits, random_state = 42)
    all_precision_recall = []
    for algo_name, algo in tqdm(algos.items()):
        algo_precision_recall_list = []
        precision_list = []
        recall_list = []

        for trainset, testset in kf.split(data):
            algo.fit(trainset)
            predictions = algo.test(testset)
            precisions, recalls = precision_recall_at_k(predictions, k, threshold)
            precision_list.append(sum(prec for prec in precisions.values()) / len(precisions))
            recall_list.append(sum(rec for rec in recalls.values()) / len(recalls))

        precision_average = sum(precision_list)/len(precision_list)
        recall_average = sum(recall_list)/len(recall_list)
        algo_precision_recall_list.append(algo_name)
        algo_precision_recall_list.append(precision_average)
        algo_precision_recall_list.append(recall_average)
        all_precision_recall.append(algo_precision_recall_list)

    all_precision_recall_df = pd.DataFrame(all_precision_recall, columns = ['Algorithm Name', 'Average Precision@k Score', 'Average Recall@k Score'])
    return all_precision_recall_df.sort_values('Average Recall@k Score', ascending = False)


In [13]:
calculate_precision_recall(k = 10, threshold = 7, splits = 5)

  0%|          | 0/3 [00:00<?, ?it/s]

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.


 33%|███▎      | 1/3 [04:09<08:18, 249.38s/it]

Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
100%|██████████| 3/3 [05:15<00:00, 105.15s/it]


Unnamed: 0,Algorithm Name,Average Precision@k Score,Average Recall@k Score
1,Basline Only,0.909319,0.862302
0,KNN Basic,0.901041,0.854036
2,Co-clustering,0.892452,0.819997


**Observation:** Function is working as intended as the results are the same as above.

In [14]:
calculate_precision_recall(k = 10, threshold = 8, splits = 5)

  0%|          | 0/3 [00:00<?, ?it/s]

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.


 33%|███▎      | 1/3 [04:08<08:16, 248.33s/it]

Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
100%|██████████| 3/3 [05:17<00:00, 105.68s/it]


Unnamed: 0,Algorithm Name,Average Precision@k Score,Average Recall@k Score
1,Basline Only,0.735769,0.611244
0,KNN Basic,0.709247,0.599862
2,Co-clustering,0.682934,0.598209


In [15]:
calculate_precision_recall(k = 10, threshold = 9, splits = 5)

  0%|          | 0/3 [00:00<?, ?it/s]

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.


 33%|███▎      | 1/3 [04:04<08:08, 244.44s/it]

Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
100%|██████████| 3/3 [05:11<00:00, 103.67s/it]


Unnamed: 0,Algorithm Name,Average Precision@k Score,Average Recall@k Score
2,Co-clustering,0.339543,0.276268
0,KNN Basic,0.262405,0.135293
1,Basline Only,0.238302,0.129471


**Data Shown:** 3 sets of results all based on k = 10 but varying thresholds of 7, 8 and 9 respectively. 

**Insights:** As expected, the results deprove as the threshold is continuously increased. Personally, I find 7 out of 10 to be a good enough score for a recommendation. Rather than choosing a model that performs less badly at a higher threshold, I want the best model at recommending a good enough manga title. Hence, I will be using the Baseline Only model that performed the best at threshold = 7. Its recall and precision scores are almost a whole 1% higher than the 2nd best model.

The ML models vastly outperform the baseline random experiments model results.