Here we evaluate the performance of the model on the test set.

- documents_olx_20240823_1135 index was built using 'xlm-r-distilroberta-base-paraphrase-v1' sentence transformer model.

In [111]:
from sentence_transformers import SentenceTransformer
import json
from tqdm.auto import tqdm
import pandas as pd
from elasticsearch import Elasticsearch
import time
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

In [2]:
with open('../data/documents-olx-2024_08_25-with-ids.json', 'rt') as f_in:
    documents = json.load(f_in)

In [3]:
model_name = 'xlm-r-distilroberta-base-paraphrase-v1'
model = SentenceTransformer(model_name)



In [6]:
es_client = Elasticsearch('http://localhost:9200') 

In [7]:
def elastic_search(query_vector, index_name = "documents_olx_20240829_5014",
                   field = 'question_text_vector'):
    
    knn = {
        "field": field,
        "query_vector": query_vector,
        "k": 5,
        "num_candidates": 10000,
    }

    search_query = {
        "knn": knn,
        "_source": ["text", "section", "question", "course", "id"],
        "min_score": 0.1,  # Optional: Minimum score threshold for results
        "sort": [{ "_score": "asc" }]  # Sort by score if applicable
    }

    es_results = es_client.search(
        index=index_name,
        body=search_query
    )
    
    result_docs = []
    
    for hit in es_results['hits']['hits']:
        result_docs.append(hit['_source'])

    return result_docs


In [8]:
df_ground_truth = pd.read_csv('../data/ground-truth-data_2024_08_25.csv')

In [9]:
ground_truth = df_ground_truth.to_dict(orient='records')

In [10]:
ground_truth[0]

{'question': 'Jak mogę założyć konto na OLX przez stronę internetową?',
 'document': '2ddb5f16'}

In [11]:
def hit_rate(relevance_total):
    cnt = 0

    for line in relevance_total:
        if True in line:
            cnt = cnt + 1

    return cnt / len(relevance_total)


def mrr(relevance_total):
    total_score = 0.0

    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:
                total_score = total_score + 1 / (rank + 1)

    return total_score / len(relevance_total)

In [12]:
def evaluate(ground_truth, search_function):
    results_total = []
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['document']
        results = search_function(q)
        relevance = [d['id'] == doc_id for d in results]
        relevance_total.append(relevance)
        results_total.append(results)

    return {
        'results_total': results_total,
        'relevance_total': relevance_total,
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

ES text only: 0.7395720769397017, 0.6032418413658963

In [13]:
ground_truth[0]

{'question': 'Jak mogę założyć konto na OLX przez stronę internetową?',
 'document': '2ddb5f16'}

In [14]:
def question_text_vector_knn(q):
    question = q['question']

    v_q = model.encode(question)

    return elastic_search(v_q)

In [15]:
res = evaluate(ground_truth, question_text_vector_knn)

100%|██████████| 461/461 [00:54<00:00,  8.39it/s]


In [16]:
print(res['hit_rate'])
print(res['mrr'])

0.702819956616052
0.2190889370932759


Let's analyze when the model fails to retrieve the correct document.

In [17]:
missed_questions = [i for i, r in enumerate(res['relevance_total']) if not any(r)]

In [18]:
import random 
random.seed(10)
for i in random.sample(missed_questions, 5):
    print("Pytanie:", ground_truth[i])
    expected_doc_id = ground_truth[i]['document']
    print("Oczekiwany dokument:", [doc['question'] for doc in documents if doc['id'] == expected_doc_id])
    print("Oczekiwany dokument:", [doc['text'] for doc in documents if doc['id'] == expected_doc_id])
    
    returned_docs = res['results_total'][i]
    print("\nZnalezione dokumenty:")
    for j, doc in enumerate(returned_docs):
        print(j)
        print(doc['question'])
        print(doc['text'])
        print("\n")
    
    print("\n\n")

Pytanie: {'question': 'Które informacje można zmienić w zakładce Profil na koncie OLX?', 'document': '6c146627'}
Oczekiwany dokument: ['Zmiana danych kontaktowych']
Oczekiwany dokument: ['Krok 1. Zaloguj się na swoje konto OLX używając komputera (w aplikacji mobilnej możesz zmienić tylko nazwę osoby kontaktowej).\nKrok 2. Przejdź do zakładki Twoje konto → Profil.\nKrok 3. Kliknij przycisk Edytuj.\nKrok 4. Zmień podstawowe informacje.\nKrok 5. Kliknij przycisk Zapisz.\nPamiętaj, że gdy zmienisz dane kontaktowe, będą one widoczne wyłącznie w ogłoszeniach, które masz zamiar dodać. Jeżeli chcesz zmienić dane w ogłoszeniach już aktywnych, musisz edytować te ogłoszenia — sprawdź instrukcję.']

Znalezione dokumenty:
0
Czy nowy System Ocen jest dostępny we wszystkich kategoriach?
Oceny mogą wystawiać wyłącznie użytkownicy, którzy kupili przedmiot z Przesyłką OLX — i dopiero po jej otrzymaniu. Oznacza to, że oceny mogą być wystawiane wyłącznie po odebraniu przez Kupującego przedmiotu kupionego 

Several takeaways.

Adjust ground truth. The results should be meaningful. Two small adjustments are needed:
- We need to adjust the prompt to make sure that questions are formulated in a way that a user could really ask.
- We should exclude the documents with term definitions from ground truth.

Regarding retrieval:
- We should check top 10 and let LLM rerank them.
- We should check the performance of hybrid search.
- We should check different embeddings.

## Evaluation after adjusting ground truth

`../data/ground-truth-data_2024_08_30.csv`

In [19]:
ground_truth_updated = pd.read_csv('../data/ground-truth-data_2024_08_30.csv')

In [20]:
ground_truth_updated = ground_truth_updated.to_dict(orient='records')

In [21]:
ground_truth_updated[0]

{'question': 'Jak mogę założyć konto przez stronę internetową OLX?',
 'document': '2ddb5f16'}

In [22]:
res_updated = evaluate(ground_truth_updated, question_text_vector_knn)

100%|██████████| 388/388 [00:29<00:00, 12.99it/s]


In [23]:
print(res_updated['hit_rate'])
print(res_updated['mrr'])

0.6984536082474226
0.2290807560137462


The update did not significantly change the numbers. Those numbers are still quite low. Let's check how hybrid search performs.

In [24]:
def elastic_search_hybrid(
    query: str, 
    vector: list,
    index_name: str = "documents_olx_20240829_5014",
    field: str = 'question_text_vector', 
    section_to_exclude: str = "Definicje",
    k: int = 5,
    size: int = 10):
    
    # Exclusion Filter
    exclusion_filter = {
        "term": {"section": section_to_exclude}  # Term query for exclusion
    }
    
    knn_query = {
        "field": field,
        "query_vector": vector,
        "k": k,
        "num_candidates": 10000,
        "boost": 0.5,
        "filter": {
            "bool": {
                "must_not": [
                    exclusion_filter
                ]
            }
        }
    }

    keyword_query = {
        "bool": {
            "must": {
                "multi_match": {
                    "query": query,
                    "fields": ["question", "text", "section"],
                    "type": "best_fields",
                    "boost": 0.5,
                }
            },
            "filter": {
                "bool": {
                    "must_not": [
                        exclusion_filter
                    ]
                }
            }
        }
    }

    search_query = {
        "knn": knn_query,
        "query": keyword_query,
        "size": k,
        "_source": ["text", "section", "question", "course", "id"]
    }

    es_results = es_client.search(
        index=index_name,
        body=search_query
    )
    
    result_docs = []
    
    for hit in es_results['hits']['hits']:
        result_docs.append(hit['_source'])

    return result_docs

In [43]:
def question_hybrid(q, k=5):
    question = q['question']

    v_q = model.encode(question)

    return elastic_search_hybrid(question, v_q, field='question_text_vector', k=k)

def evaluate_hybrid(ground_truth, search_function):
    relevance_total = []
    results_total = []

    for q in tqdm(ground_truth):
        doc_id = q['document']
        results = search_function(q)
        relevance = [d['id'] == doc_id for d in results]
        relevance_total.append(relevance)
        results_total.append(results)

    return {
        'results_total': results_total,
        'relevance_total': relevance_total,
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

In [44]:
res_hybrid = evaluate_hybrid(ground_truth_updated, question_hybrid)

  0%|          | 0/388 [00:00<?, ?it/s]

100%|██████████| 388/388 [00:25<00:00, 15.45it/s]


In [45]:
print(res_hybrid['hit_rate'])
print(res_hybrid['mrr'])

0.7835051546391752
0.6411941580756015


Numbers went up, but still not good enough. Let's check the performance with reraanking.

## Reranking

In [46]:
# get 10 top results for each question
def question_hybrid_10(q):
    return question_hybrid(q, k=10)

In [47]:
res_hybrid_10 = evaluate_hybrid(ground_truth_updated, question_hybrid_10)

100%|██████████| 388/388 [00:34<00:00, 11.31it/s]


In [48]:
# potential improvement:
print(res_hybrid_10['hit_rate'])
print(res_hybrid_10['mrr'])

0.8608247422680413
0.6499468172148584


In [49]:
from openai import OpenAI
client = OpenAI()

In [50]:
prompt_template = """
Wybierz k {k} dokumentów, które najlepiej odpowiadają na poniższe pytanie:

Pytanie:

SEKCJA: {section}
PYTANIE: {question}
ODPOWIEDŹ: {text}

Dokumenty:
1. {doc1_question}
2. {doc2_question}
3. {doc3_question}
4. {doc4_question}
5. {doc5_question}
6. {doc6_question}
7. {doc7_question}
8. {doc8_question}
9. {doc9_question}
10. {doc10_question}

Odpowiedź powinna być w parsowalnym formacie JSON. Nie używaj bloków kodu:

["id_1", "id_2", ..., "id_k"]
""".strip()

In [51]:
# create a prompt for the question ground_truth_updated[0]
doc_id = ground_truth_updated[0]['document']
doc = [doc for doc in documents if doc['id'] == doc_id][0]
found_docs = question_hybrid(ground_truth_updated[0], k=10)
found_docs_questions = ['id: ' + doc['id'] + ". " + doc['question'] + " " + doc['text'] for doc in found_docs]

In [52]:
prompt = prompt_template.format(
    k=5,
    section=doc['section'],
    question=doc['question'],
    text=doc['text'],
    doc1_question=found_docs_questions[0],
    doc2_question=found_docs_questions[1],
    doc3_question=found_docs_questions[2],
    doc4_question=found_docs_questions[3],
    doc5_question=found_docs_questions[4],
    doc6_question=found_docs_questions[5],
    doc7_question=found_docs_questions[6],
    doc8_question=found_docs_questions[7],
    doc9_question=found_docs_questions[8],
    doc10_question=found_docs_questions[9],
)

In [53]:
response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{"role": "user", "content": prompt}]
    )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [54]:
 json.loads(response.choices[0].message.content)

['2ddb5f16', '7c7a4947', '4dcf8b35', 'ff50559b', '37570bca']

In [57]:
# reranking is significant as we change the set of top 5 documents
[doc['id'] for doc in found_docs]

['2ddb5f16',
 '7c7a4947',
 '4dcf8b35',
 'aa9e24ac',
 'ff50559b',
 '6a293a6d',
 'ea07084a',
 'd30e6624',
 '37570bca',
 'e8caa77b']

In [102]:
import re 

def search_reranked_retrieval(q, k=10, max_doc_candidate_length=500):
    
    doc_id = q['document']
    doc = [doc for doc in documents if doc['id'] == doc_id][0]
    found_docs = question_hybrid(q, k=k)
    found_docs_questions = ['id: ' + doc['id'] + ". " + doc['question'] + " " + doc['text'] for doc in found_docs]

    prompt = prompt_template.format(
        k=5,
        section=doc['section'],
        question=doc['question'],
        text=doc['text'],
        doc1_question=found_docs_questions[0][:max_doc_candidate_length],
        doc2_question=found_docs_questions[1][:max_doc_candidate_length],
        doc3_question=found_docs_questions[2][:max_doc_candidate_length],
        doc4_question=found_docs_questions[3][:max_doc_candidate_length],
        doc5_question=found_docs_questions[4][:max_doc_candidate_length],
        doc6_question=found_docs_questions[5][:max_doc_candidate_length],
        doc7_question=found_docs_questions[6][:max_doc_candidate_length],
        doc8_question=found_docs_questions[7][:max_doc_candidate_length],
        doc9_question=found_docs_questions[8][:max_doc_candidate_length],
        doc10_question=found_docs_questions[9][:max_doc_candidate_length],
    )

    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{"role": "user", "content": prompt}]
    )

    json_response = response.choices[0].message.content
    
    if len(json_response) == 0:
        logger.info("Empty response from GPT-4o")
        return [doc['id'] for doc in found_docs[:5]]
    
    # quick preprocessing
    json_response_preprocessed = re.sub(r'```json', '', json_response)
    json_json_response_preprocessedresponse = re.sub(r'```', '', json_response_preprocessed)
    ids = json.loads(json_response)
    ids = [re.sub("id_", "", str(id)) for id in ids]
    ids = [re.sub("id:", "", str(id)) for id in ids]
    ids = [id.strip() for id in ids]
    ids = [id for id in ids if len(id) > 2]
    
    if len(ids) > 5:
        logger.warning("More than 5 ids returned from GPT-4o")
        return ids[:5]
    elif len(ids) < 5:
        logger.warning("Less than 5 ids returned from GPT-4o after preprocessing")
        logger.warning(json_response)
        tmp_ids = ids + [doc['id'] for doc in found_docs if doc['id'] not in ids]
        logger.info(f"Returning {tmp_ids[:5]}")
        return tmp_ids[:5]
    else:
        return ids

In [113]:
def evaluate_reranking(ground_truth, search_function):
    relevance_total = []
    result_total = []

    for q in tqdm(ground_truth):
        time.sleep(0.25)
        doc_id = q['document']
        try:
            results = search_function(q, k=10)
        except Exception as e:
            logging.error(e)
            # empty list of results of length 10
            results = ['' for _ in range(10)]
        relevance = [d == doc_id for d in results]
        relevance_total.append(relevance)
        result_total.append(results)

    return {
        'results_total': result_total,
        'relevance_total': relevance_total,
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

In [None]:
res_reranking = evaluate_reranking(ground_truth_updated, search_reranked_retrieval)

In [115]:
res_reranking

{'results_total': [['2ddb5f16',
   '4dcf8b35',
   'aa9e24ac',
   'ff50559b',
   'd30e6624'],
  ['e8caa77b', '4dcf8b35', '1106b1dc', '6c146627', 'aa9e24ac'],
  ['33813c2c', '9e41cc6b', 'c95af011', '342037c3', 'b61f3a70'],
  ['2ddb5f16', '4dcf8b35', '6c146627', '943a92e0', 'ff50559b'],
  ['2ddb5f16', '943a92e0', '33813c2c', '1106b1dc', '469cc1f8'],
  ['6c146627', '4dcf8b35', '78b8c86d', '2ddb5f16', '1106b1dc'],
  ['943a92e0', '7e9de748', 'ce9623ca', '8e9de748', '9623ca'],
  ['943a92e0', '78b8c86d', 'e3610c01', '7e9de748', 'd132587e'],
  ['943a92e0', '469cc1f8', '9e41cc6b', '1106b1dc', 'ec79b98a'],
  ['943a92e0', '78b8c86d', '2ddb5f16', '47ae523b', '7e9de748'],
  ['9e41cc6b', '943a92e0', '33813c2c', 'd132587e', 'b8121ab6'],
  ['9e41cc6b', '8ecf30ba', 'a67ae3a0', '469cc1f8', '4dcf8b35'],
  ['9e41cc6b', '4dcf8b35', '943a92e0', '2ddb5f16', '78b8c86d'],
  ['9e41cc6b', '4dcf8b35', '943a92e0', '2ddb5f16', '342037c3'],
  ['9e41cc6b', '943a92e0', '0f5525bd', '342037c3', '33813c2c'],
  ['7e9de748'

In [116]:
print(res_reranking['hit_rate'])
print(res_reranking['mrr'])

0.8298969072164949
0.7781786941580756


This is significantly better. Let's check few examples where the model failed to retrieve the correct document.

In [117]:
failed_retrieval = [i for i, r in enumerate(res_reranking['relevance_total']) if not any(r)]

In [120]:
for i in random.sample(failed_retrieval, 5):
    print("Pytanie:", ground_truth_updated[i])
    expected_doc_id = ground_truth_updated[i]['document']
    print("Oczekiwany dokument:", [doc['question'] for doc in documents if doc['id'] == expected_doc_id])
    print("Oczekiwany dokument:", [doc['text'] for doc in documents if doc['id'] == expected_doc_id])
    
    returned_docs = res_reranking['results_total'][i]
    print("\nZnalezione dokumenty:")
    for j, doc_id in enumerate(returned_docs):
        print(j)
        doc = [doc for doc in documents if doc['id'] == doc_id][0]
        print(doc['question'])
        print(doc['text'])
        print("\n")
    
    print("\n\n")

Pytanie: {'question': 'Do jakiej kategorii wstawić ogłoszenie o sprzedaży wózka dziecięcego?', 'document': 'be4ae5c8'}
Oczekiwany dokument: ['Którą kategorię wybrać?']
Oczekiwany dokument: ['Sprawdź, w której kategorii powinien znaleźć się Twój przedmiot lub usługa. Wybierz kategorię:\nAntyki i Kolekcje\nDla Dzieci\nDla Firm\nDla Ukrainy\nDom i Ogród\nElektronika\nModa\nMotoryzacja\nMuzyka i Edukacja\nNieruchomości\nNoclegi\nPraca\nRolnictwo\nSport i Hobby\nWypożyczalnia\nUsługi\nZdrowie i Uroda\nZwierzęta']

Znalezione dokumenty:
0
Jak dodać ogłoszenie?
Aby dodać ogłoszenie:
Krok 1. Zaloguj się do swojego konta w OLX.
Krok 2. Jeśli korzystasz z:
strony internetowej OLX — kliknij przycisk Dodaj ogłoszenie — znajdziesz go w prawym, górnym rogu.
aplikacji mobilnej OLX — kliknij przycisk Dodaj — znajdziesz go na dole ekranu.
Dodaj_ogłoszenie.png
Jeśli grafika jest nieczytelna, kliknij prawy przycisk myszy i wybierz „Otwórz grafikę w nowej karcie".
Krok 3. Uzupełnij szczegóły ogłoszenia.
T

## Increasing number of documents retrieved to 15

Seems like we don't have a good candidate. Let's check if we can make it work better if LLM has more documents to choose from.

In [122]:
import re 

def search_reranked_retrieval(q, k=15, max_doc_candidate_length=200):
    
    doc_id = q['document']
    doc = [doc for doc in documents if doc['id'] == doc_id][0]
    found_docs = question_hybrid(q, k=k)
    found_docs_questions = ['id: ' + doc['id'] + ". " + doc['question'] + " " + doc['text'] for doc in found_docs]

    prompt = prompt_template.format(
        k=5,
        section=doc['section'],
        question=doc['question'],
        text=doc['text'],
        doc1_question=found_docs_questions[0][:max_doc_candidate_length],
        doc2_question=found_docs_questions[1][:max_doc_candidate_length],
        doc3_question=found_docs_questions[2][:max_doc_candidate_length],
        doc4_question=found_docs_questions[3][:max_doc_candidate_length],
        doc5_question=found_docs_questions[4][:max_doc_candidate_length],
        doc6_question=found_docs_questions[5][:max_doc_candidate_length],
        doc7_question=found_docs_questions[6][:max_doc_candidate_length],
        doc8_question=found_docs_questions[7][:max_doc_candidate_length],
        doc9_question=found_docs_questions[8][:max_doc_candidate_length],
        doc10_question=found_docs_questions[9][:max_doc_candidate_length],
        doc11_question=found_docs_questions[10][:max_doc_candidate_length],
        doc12_question=found_docs_questions[11][:max_doc_candidate_length],
        doc13_question=found_docs_questions[12][:max_doc_candidate_length],
        doc14_question=found_docs_questions[13][:max_doc_candidate_length],
        doc15_question=found_docs_questions[14][:max_doc_candidate_length],
    )

    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{"role": "user", "content": prompt}]
    )

    json_response = response.choices[0].message.content
    
    if len(json_response) == 0:
        logger.info("Empty response from GPT-4o")
        return [doc['id'] for doc in found_docs[:5]]
    
    # quick preprocessing
    json_response_preprocessed = re.sub(r'```json', '', json_response)
    json_json_response_preprocessedresponse = re.sub(r'```', '', json_response_preprocessed)
    ids = json.loads(json_response)
    ids = [re.sub("id_", "", str(id)) for id in ids]
    ids = [re.sub("id:", "", str(id)) for id in ids]
    ids = [id.strip() for id in ids]
    ids = [id for id in ids if len(id) > 2]
    
    if len(ids) > 5:
        logger.warning("More than 5 ids returned from GPT-4o")
        return ids[:5]
    elif len(ids) < 5:
        logger.warning("Less than 5 ids returned from GPT-4o after preprocessing")
        logger.warning(json_response)
        tmp_ids = ids + [doc['id'] for doc in found_docs if doc['id'] not in ids]
        logger.info(f"Returning {tmp_ids[:5]}")
        return tmp_ids[:5]
    else:
        return ids

In [123]:
def evaluate_reranking_more_options(ground_truth, search_function):
    relevance_total = []
    result_total = []

    for q in tqdm(ground_truth):
        time.sleep(0.25)
        doc_id = q['document']
        try:
            results = search_function(q, k=15)
        except Exception as e:
            logging.error(e)
            # empty list of results of length 10
            results = ['' for _ in range(15)]
        relevance = [d == doc_id for d in results]
        relevance_total.append(relevance)
        result_total.append(results)

    return {
        'results_total': result_total,
        'relevance_total': relevance_total,
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

In [None]:
res_reranking_more_options = evaluate_reranking_more_options(ground_truth_updated, search_reranked_retrieval)

In [125]:
res_reranking_more_options

{'results_total': [['2ddb5f16',
   'aa9e24ac',
   'ff50559b',
   'd30e6624',
   'e8caa77b'],
  ['e8caa77b', '6c146627', '4dcf8b35', '1106b1dc', 'aa9e24ac'],
  ['33813c2c', '342037c3', '1106b1dc', '469cc1f8', 'b61f3a70'],
  ['2ddb5f16', '6c146627', 'e8caa77b', 'bde8bc7c', 'd4e7a30b'],
  ['2ddb5f16', '33813c2c', '943a92e0', '1106b1dc', '469cc1f8'],
  ['2ddb5f16', '9e41cc6b', 'e8caa77b', '6c146627', '38edb241'],
  ['943a92e0', '7e9de748', 'ce9623ca', 'd132587e', 'c4479dd3'],
  ['943a92e0', 'd132587e', '78b8c86d', '2e80fa3d', '15542325'],
  ['943a92e0', '9e41cc6b', '1106b1dc', 'c95af011', '469cc1f8'],
  ['943a92e0', '2ddb5f16', '7e9de748', 'fbdcafed', '59d738e5'],
  ['9e41cc6b', '943a92e0', '342037c3', '33813c2c', 'ce9623ca'],
  ['9e41cc6b', 'd30e6624', '469cc1f8', '943a92e0', 'd132587e'],
  ['9e41cc6b', '4dcf8b35', '943a92e0', '2ddb5f16', 'e8caa77b'],
  ['9e41cc6b', '4dcf8b35', '943a92e0', '2ddb5f16', '7a78f438'],
  ['9e41cc6b', '943a92e0', '0f5525bd', '342037c3', '33813c2c'],
  ['7e9de74

In [126]:
print(res_reranking_more_options['hit_rate'])
print(res_reranking_more_options['mrr'])

0.8376288659793815
0.7984106529209622


It's still better. We shall use this approach in the final application.