<a href="https://colab.research.google.com/github/mertcan-basut/nlp/blob/main/retrieve_and_rerank.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Background Information

### Context recall

**recall** *(retrieval evaluation metric)* : How many of the relevant documents are retrieved.

`recall@K= # of relevant docs returned / # of relevant documents in dataset`

### LLM recall

![LLM recall](https://www.pinecone.io/_next/image/?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2Fvr8gru94%2Fproduction%2Fca206b6ada9163bffad313e0e18feee0b460c768-1212x688.png&w=1920&q=75)

**LLM recall** refers to the ability of an LLM to find information from the text placed within its context window.

When storing information in the middle of a context window, an LLM's ability to recall that information becomes worse than had it not been provided in the first place.

### Two-stage retrieval

A **reranking model (cross-encoder)** is a type of model that, given a query and document pair, will output a similarity score. Rerankers are much more accurate than embedding models (bi-encoder). But they are slow, so that is why two-stage retrieval is required to perform reranking on a small set of documents retrieved from a large set.

![reranker/cross-encoder](https://www.pinecone.io/_next/image/?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2Fvr8gru94%2Fproduction%2F9f0d2f75571bb58eecf2520a23d300a5fc5b1e2c-2440x1100.png&w=3840&q=75)

A reranker can receive the raw information directly into the large transformer computation, meaning less information loss. Rerankers run at user query time, and this allows analyzing the document's meaning specific to the user query.

![embedding model/bi-encoder](https://www.pinecone.io/_next/image/?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2Fvr8gru94%2Fproduction%2F4509817116ab72e27bae809c38cb48fbf1578b5d-2760x1420.png&w=3840&q=75)

Bi-encoders must compress all of the possible meanings of a document into a single vector resulting in information loss. Additionally, bi-encoders have no context on the query because the embeddings are created before user query time.

### Sources
🌐https://www.pinecone.io/learn/series/rag/rerankers/

## Implementation

In [None]:
!pip install -q langchain langchain-openai langchain-community
!pip install -q chromadb
!pip install -q python-dotenv

In [2]:
!echo "AZURE_OPENAI_API_KEY=editme" > .env
!echo "AZURE_OPENAI_ENDPOINT=editme" >> .env

In [3]:
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document as LangChainDocument

import json

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv(), override=True) # read local .env file

from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


### Prepare data and vector store

In [4]:
with open("/content/drive/MyDrive/data/corpus_dataset.json", 'r') as f:
  data = json.load(f)

In [5]:
docs = [LangChainDocument(page_content=e['text'], metadata={'topic': e['topic']}) for e in data]

In [6]:
vectordb = Chroma.from_documents(
  documents=docs,
  embedding=AzureOpenAIEmbeddings(model="text-embedding-ada-002"),
  persist_directory="/content/drive/MyDrive/data/chroma/"
)
vectordb._collection.count()

28

In [None]:
query = ""

### Similarity metrics

In [None]:
vectordb = Chroma(persist_directory="/content/drive/MyDrive/data/chroma/")
embeddings = vectordb.get(include=["embeddings"])['embeddings']

In [None]:
# cosine
# l2

### Lexical search

#### BM25

### Semantic search

#### LangChain similarity search

In [7]:
vectordb = Chroma(persist_directory="/content/drive/MyDrive/data/chroma/")

In [None]:
# embedding_model = AzureOpenAIEmbeddings(model="text-embedding-ada-002")
# embedding_model.embed_query("Hello")

In [36]:
documents = vectordb.similarity_search_with_score(query="Hello!", k=10)

#### HuggingFace Bi-Encoder

### Reranking

#### HuggingFace Cross-Encoder

#### OpenAI Completions as Cross-Encoder

### Two-stage retrieval

In [None]:
# langchain compression