# Model analysis

The goal of this notebook is to import all relevant models and compare the metrics on a subset of the dataset to decide on the best model. The primary reason for using a subset of the dataset is because doc_embeddings take too long to generate.

In [24]:
import sys
sys.path.append("../") 

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [25]:
from preprocess import load_dataset, preprocess_dataset
from evaluation import compute_similarity_scores, compute_metrics
from visualisation import visualise_model_performance

In [26]:
import pandas as pd
from sentence_transformers import SentenceTransformer

In [34]:
# top models recommended on sbert (top 5, sorted by performance semantic search), performance >45
models_to_test_clean = [
    'multi-qa-mpnet-base-dot-v1',
    'multi-qa-distilbert-cos-v1',
    'multi-qa-MiniLM-L6-cos-v1',
    'all-MiniLM-L12-v2',
    'all-MiniLM-L6-v2',
]

In [27]:
# top models recommended on sbert (top 5, sorted by performance semantic search), performance >45
models_to_test_strings = [
    'sentence-transformers/multi-qa-mpnet-base-dot-v1',
    'sentence-transformers/multi-qa-distilbert-cos-v1',
    'sentence-transformers/multi-qa-MiniLM-L6-cos-v1',
    'sentence-transformers/all-MiniLM-L12-v2',
    'sentence-transformers/all-MiniLM-L6-v2',
]

models_to_test_list = []
for model in models_to_test_strings:
    # wrap in SentenceTransformer
    model_wrapped = SentenceTransformer(model)
    models_to_test_list.append(model_wrapped)

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/multi-qa-mpnet-base-dot-v1
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/multi-qa-distilbert-cos-v1
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L12-v2
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-trans

In [28]:
# process data
df_examples = pd.read_parquet('c:\\Users\\ellen\\Documents\\semantic-search-eval\\src\\data/shopping_queries_dataset_examples.parquet')
df_products = pd.read_parquet('c:\\Users\\ellen\\Documents\\semantic-search-eval\\src\\data/shopping_queries_dataset_products.parquet')

# https://github.com/amazon-science/esci-data: suggested filter for task 1: Query-Product Ranking 
# Query-Product Ranking: Given a user specified query and a list of matched products, the goal of this 
# task is to rank the products so that the relevant products are ranked above the non-relevant ones.
df_examples_products = pd.merge(
    df_examples,
    df_products,
    how='left',
    left_on=['product_locale','product_id'],
    right_on=['product_locale', 'product_id']
)

# take the small version for this task
df_task_1_all = df_examples_products[df_examples_products["small_version"] == 1]
df_task_1_train = df_task_1_all[df_task_1_all["split"] == "train"]

# split into locale
df_task_1_train_us = df_task_1_train[df_task_1_train['product_locale'] == 'us']

In [29]:
df_train_clean = preprocess_dataset(df_task_1_train_us)

INFO:root:There are 419653 rows in this dataset.
INFO:root:There are 1619 duplicates in this dataset.
INFO:root:There are 418034 deduplicated rows in this dataset.
INFO:root:There are 208220 nan product descriptions in this dataset.
INFO:root:There are 209814 non-nan product descriptions in this dataset.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_not_nan['relevance'] = df_not_nan['esci_label'].map(esci_weighting)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_not_nan['product_doc'] = df_not_nan['product_title'] + ' ' + df_not_nan['product_descripti

In [42]:
# get all unique query ids and sample 50 of them
unique_query_ids = df_train_clean['query_id'].unique()
random_query_ids = pd.Series(unique_query_ids).sample(n=200, random_state=42).tolist()
df_train_clean_sample = df_train_clean[df_train_clean['query_id'].isin(random_query_ids)]

In [43]:
similarity_scores_list = []
ndcg_list = []
recall_list = []
mrr_list = []

for model in models_to_test_list:
    similarity_scores = compute_similarity_scores(model, df_train_clean_sample)
    df_train_clean_sample['similarity_scores'] = similarity_scores
    ndcg10, recall10, mrr10 = compute_metrics(df_train_clean_sample)
    similarity_scores_list.append(similarity_scores)
    ndcg_list.append(ndcg10)
    recall_list.append(recall10)
    mrr_list.append(mrr10)

Batches: 100%|██████████| 68/68 [00:17<00:00,  3.94it/s]
Batches: 100%|██████████| 68/68 [10:26<00:00,  9.21s/it]
INFO:root:Embeddings complete
INFO:root:Query shape: torch.Size([2158, 768])
INFO:root:Doc shape: torch.Size([2158, 768])
INFO:root:Similarities matrix shape: torch.Size([2158])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_train_clean_sample['similarity_scores'] = similarity_scores
INFO:root:Create qrels object
INFO:root:Create run object
INFO:root:Calculate metrics
Batches: 100%|██████████| 68/68 [00:06<00:00, 10.19it/s]
Batches: 100%|██████████| 68/68 [04:37<00:00,  4.08s/it]
INFO:root:Embeddings complete
INFO:root:Query shape: torch.Size([2158, 768])
INFO:root:Doc shape: torch.Size([2158, 768])
INFO:root:Similarities matrix shape: torch.Size([2158])
IN

In [44]:
df_results = pd.DataFrame(index=models_to_test_clean)
# df_results['similarity_scores'] = similarity_scores_list
df_results['ndcg@10'] = ndcg_list
df_results['recall@10'] = recall_list
df_results['mrr@10'] = mrr_list

In [47]:
df_results

Unnamed: 0,ndcg@10,recall@10,mrr@10
multi-qa-mpnet-base-dot-v1,0.893274,0.857138,0.955
multi-qa-distilbert-cos-v1,0.881114,0.856439,0.933131
multi-qa-MiniLM-L6-cos-v1,0.887377,0.857062,0.944681
all-MiniLM-L12-v2,0.887035,0.853035,0.927208
all-MiniLM-L6-v2,0.884412,0.858878,0.928839
