In [2]:
from IPython.display import HTML

# necessary for scrolling horizontally
HTML("""
<style>
    .output_scroll {overflow-x: auto;}
</style>
""")

#### This is the same notebook as test.ipynb, but the pipeline is run with GPT-4o instead of GPT-4o-mini.

In [3]:
from src.agent import Agent

It is also possible to run the Agent without reranking at all, but then a lot of irrelevant documents are extracted and sent to an LLM, which is undesirable. Therefore, only the following experiments will be run:

1. Semantic retrieval and reranking with cross encoder only (without Elasticsearch).
2. Semantic retrieval and reranking with cross encoder and Elasticsearch scores. Their scores will be fused with weights according to the formula: $$\text{score} = \alpha \cdot \text{el\_score} + (1 - \alpha) \cdot \text{ce\_score},$$ where $\text{el\_score}$ is an Elasticsearch score, $\text{ce\_score}$ is a cross encoder. Both scores are squashed into $[0, 1]$ interval, so the resulting score also belongs to that interval. This experiment will be conducted for $\alpha = 0.7$.
3. Semantic retrieval and reranking with Elasticsearch scores only. In this case, the scores will be fused as in the previous experiment but instead of reranking scores cosine similarity of the retrieved documents will be used. Once again, $\alpha = 0.7$.

In all the experiments, the documents with a score between $0.2$ and $0.9$ will be checked by the LLM and updated if necessary, and all the documents with a score above $0.9$ are updated by the LLM right away. Of course, one can just increase the confidence threshold in order to check more and update (automatically) less documents.

## 1. Semantic retrieval and reranking with cross encoder only (without Elasticsearch)

In [3]:
config = {
    "embedding_model_name": "text-embedding-3-small",
    "llm_model_name": "gpt-4o",
    "dim": 1536,
    "reranker_model_name": "cross-encoder/ms-marco-MiniLM-L-6-v2",
    "use_elastic_search": False,
    "use_cross_encoder_reranking": True,
}

In [6]:
def create_agent(config_dict):
    agent = Agent(**config_dict)
    return agent

In [5]:
from IPython.display import clear_output

query = "We removed the ability to archive queries, and instead added the ability to completely delete them. Update all relevant knowledge"
agent = create_agent(config)
await agent.run(query)
clear_output(wait=True)
agent.print_tree()

As we can see, semantic search with reranking is not very effective. It produces very low scores which prevents from setting a meaningful threshold. Let's try combining it with Elasticsearch.

## 2. Semantic retrieval and reranking with cross encoder and Elasticsearch scores.

In [6]:
config_elastic = {
    "embedding_model_name": "text-embedding-3-small",
    "llm_model_name": "gpt-4o",
    "dim": 1536,
    "reranker_model_name": "cross-encoder/ms-marco-MiniLM-L-6-v2",
    "use_elastic_search": True,
    "use_cross_encoder_reranking": True,
    "weight_elastic": 0.7
}

In [7]:
from IPython.display import clear_output
import os
from dotenv import load_dotenv
from elasticsearch import Elasticsearch

load_dotenv()

# Clear Elasticsearch index (in case it was updated in the previous run)
es = Elasticsearch(os.environ.get('ELASTICSEARCH_URL'))
index_name = "documents"
if es.indices.exists(index=index_name):
    es.indices.delete(index=index_name)
    print(f"Index '{index_name}' deleted.")
    clear_output(wait=True)

query = "We removed the ability to archive queries, and instead added the ability to completely delete them. Update all relevant knowledge"
agent = create_agent(config_elastic)
await agent.run(query)
clear_output(wait=True)
agent.print_tree()

### Updates from the 'update' group

In [8]:
agent.display_changes(group='update')

### Updates from the 'check' group

In [9]:
agent.display_changes(group='check')

As we can see, reranking with Elasticsearch helps to identify relevant documents. Now let's see what happens if we remove cross encoder from reranking process.

## 3. Semantic retrieval and reranking with Elasticsearch scores only.

In [4]:
config_elastic_only = {
    "embedding_model_name": "text-embedding-3-small",
    "llm_model_name": "gpt-4o",
    "dim": 1536,
    "use_elastic_search": True,
    "use_cross_encoder_reranking": False,
    "weight_elastic": 0.7
}

In [7]:
from IPython.display import clear_output
import os
from dotenv import load_dotenv
from elasticsearch import Elasticsearch

load_dotenv()

# Clear Elasticsearch index (in case it was updated in the previous run)
es = Elasticsearch(os.environ.get('ELASTICSEARCH_URL'))
index_name = "documents"
if es.indices.exists(index=index_name):
    es.indices.delete(index=index_name)
    print(f"Index '{index_name}' deleted.")
    clear_output(wait=True)

query = "We removed the ability to archive queries, and instead added the ability to completely delete them. Update all relevant knowledge"
agent = create_agent(config_elastic_only)
await agent.run(query)
clear_output(wait=True)
agent.print_tree()

### Updates from the 'update' group

In [8]:
agent.display_changes(group='update')

### Updates from the 'check' group

In [10]:
agent.display_changes(group='check')