Instantiate libraries

In [1]:
# Add rooth path
import sys, os
root_path = os.path.abspath(os.path.join(".."))
if root_path not in sys.path:
    sys.path.append(root_path)

# Instantiate evaluator
from langchain_ollama import OllamaLLM
from evaluation.test_bench.scorer import Scorer

# bleurt_checkpoint = "../../data/BLEURT-20" # Folder containing regression model and tools to run Bleurt scorer
bert_model_type = "bert-base-uncased"

model = 'mistral-small3.1:24b'
base_url = 'http://192.168.249.7:11434'
llm = OllamaLLM(base_url=base_url, model=model)

scorer = Scorer(llm=llm, bert_model_type=bert_model_type)

  from .autonotebook import tqdm as notebook_tqdm


Load questions set and context of reference

In [2]:
questions_path = "../evaluation/test_set/question_set.json"
ref_path = "../evaluation/test_set/target_context.json"

questions = scorer.load_questions(questions_path)
refs = scorer.load_references(ref_path)

Instantiate embedding model

In [3]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model =  HuggingFaceEmbeddings( # Instantiate the embedding method
        model_name="Alibaba-NLP/gte-multilingual-base",     
        model_kwargs={"device" : 'cpu', "trust_remote_code" : True},
        encode_kwargs={'normalize_embeddings': True} 
    )




Some weights of the model checkpoint at Alibaba-NLP/gte-multilingual-base were not used when initializing NewModel: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Instantiate databases

In [4]:
from langchain_chroma import Chroma
from pymongo import MongoClient

community_db = Chroma(
    collection_name="community_db",
    embedding_function=embedding_model,
    persist_directory="../../data/db/rag",
)

items_db = Chroma( # All items in collection
    collection_name="items_db",
    embedding_function=embedding_model,
    persist_directory="../../data/db/testbench",
)

chunked_db = Chroma( # For fine grained similarity search
    collection_name="chunked_db",
    embedding_function=embedding_model,
    persist_directory="../../data/db/chunked",
)

client = MongoClient("mongodb://192.168.211.96:27017/")
collection = client["metadata_db"]["metadata_collection"] # metadata collection, used for the pre filtering step

Instantiate retriever method

In [5]:
from retrieval.retrievers.adaptative_retriever import AdaptativeRetriever

retriever = AdaptativeRetriever(llm, items_db, community_db, chunked_db, collection, top_k=20)

Query system and retrieve relevant documents

In [6]:
id = 10
question = questions[id]
target = refs[id]
print(question)

What are the key services required for managing commercial vehicle operations?


In [7]:
is_diagram, context, ids = retriever.retrieve(question)
print(context)

Extracted filter is : {'item type': 'service', 'community': 'commercial vehicle operations'}
Commercial Vehicle Administration needs to be able to perform electronic clearance of commercial vehicle credentials and safety records of a commercial vehicle and its driver in order to maintain the smooth flow of goods through its roadways.
	Commercial Vehicle Administration needs to be able to inform the appropriate parties of issues dealing with the clearance of a commercial vehicle or its driver in order to maintain the smooth flow of goods through its roadways..
The 'cvo03: electronic clearance' service has 5 needs, each one linked to functional objects and some of their requirements. The needs are the following : 
	 Commercial Vehicle Administration needs to be able to determine the weight and other characteristics of commercial vehicles operating on its roadways as part of the clearance process.
	Commercial Vehicle Administration needs to collect and maintain electronic records of comme

Evaluate retrieved context by comparing it with the reference context

In [8]:
scores = scorer.evaluate(target, context, question)

In [9]:
for k, v in scores.items():
    print(f"{k} score : {v}")

bleurt score : not initialize
bert score : [0.3, 0.49, 0.39]
meteor score : 0.2
judge score : {'grade': 0.5, 'explanation': 'The answer addresses some aspects of managing commercial vehicle operations but is incomplete and unclear. It does not list all 22 key services required for managing commercial vehicle operations as specified in the reference answer. Instead, it provides detailed information on specific services (e.g., cvo03: electronic clearance, cvo07: roadside cvo safety, cvo05: commercial vehicle parking) but fails to provide a comprehensive list of all services. The answer is also unclear in its structure and focus, making it difficult to determine the full scope of services involved.'}
