## Hybrid Search

Qdrant supports hybrid search by combining search results from sparse and dense vectors.

dense vectors are the ones you have probably already been using -- embedding models from OpenAI, BGE, SentenceTransformers, etc. are typically dense embedding models. They create a numerical representation of a piece of text, represented as a long list of numbers. These dense vectors can capture rich semantics across the entire piece of text.

sparse vectors are slightly different. They use a specialized approach or model (TF-IDF, BM25, SPLADE, etc.) for generating vectors. These vectors are typically mostly zeros, making them sparse vectors. These sparse vectors are great at capturing specific keywords and similar small details.

This notebook walks through setting up and customizing hybrid search with Qdrant and "prithvida/Splade_PP_en_v1" variants from Huggingface.

In [1]:
# !pip install llama-index
# !pip install llama-index-vector-stores-qdrant llama-index-readers-file llama-index-embeddings-fastembed llama-index-llms-openai
# !pip install -U qdrant_client fastembed
# !pip install python-dotenv
# !pip install ragas
# !pip install trulens_eval

In [2]:
import logging
import sys
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display

# qdrant official client
from qdrant_client import QdrantClient, AsyncQdrantClient

# LLama-index dependencies
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import SimpleDirectoryReader
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings

# setting the embedding model to BAAI/bge-base-en-v1.5 and FastEmbed to inference these models
# Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
# embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5" , max_length=1024)

# load all environment variables
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
QDRANT_CLOUD_ENDPOINT = os.getenv("QDRANT_CLOUD_ENDPOINT")
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

In [3]:
# lets loading the documents using SimpleDirectoryReader
from llama_index.core import Document
reader = SimpleDirectoryReader("./data/69_markdown_test/" , recursive=True)
documents = reader.load_data(show_progress=True)

# combining all the documents into a single document for later chunking and splitting
# documents = Document(text="\n\n".join([doc.text for doc in documents]))

Loading files: 100%|██████████| 1/1 [00:00<00:00,  5.59file/s]


## Setting up Vector Database

We will be using qDrant as the Vector database
There are 4 ways to initialize qdrant 

1. Inmemory
```python
client = qdrant_client.QdrantClient(location=":memory:")
```
2. Disk
```python
client = qdrant_client.QdrantClient(path="./data")
```
3. Self hosted or Docker
```python

client = qdrant_client.QdrantClient(
    # url="http://<host>:<port>"
    host="localhost",port=6333
)
```

4. Qdrant cloud
```python
client = qdrant_client.QdrantClient(
    url=QDRANT_CLOUD_ENDPOINT,
    api_key=QDRANT_API_KEY,
)
```

for this notebook we will be using qdrant cloud

In [4]:
# creating a qdrant client instance

client = QdrantClient(
    # you can use :memory: mode for fast and light-weight experiments,
    # it does not require to have Qdrant deployed anywhere
    # but requires qdrant-client >= 1.1.1
    # location=":memory:"
    # otherwise set Qdrant instance address with:
    url=QDRANT_CLOUD_ENDPOINT,
    # otherwise set Qdrant instance with host and port:
    # host="localhost",
    # port=6333
    # set API KEY for Qdrant Cloud
    api_key=QDRANT_API_KEY,
    # path="./db/"
)

# setting up asynchronous client
aclient = AsyncQdrantClient(
    # you can use :memory: mode for fast and light-weight experiments,
    # it does not require to have Qdrant deployed anywhere
    # but requires qdrant-client >= 1.1.1
    # location=":memory:"
    # otherwise set Qdrant instance address with:
    url=QDRANT_CLOUD_ENDPOINT,
    # otherwise set Qdrant instance with host and port:
    # host="localhost",
    # port=6333
    # set API KEY for Qdrant Cloud
    api_key=QDRANT_API_KEY,
    # path="./db/"
)

Settings.chunk_size = 512

## enable Hybrid RAG here
vector_store = QdrantVectorStore(client=client, 
                                 aclient=aclient ,
                                 collection_name="2_Hybrid_RAG",
                                 enable_hybrid=True , 
                                 batch_size=20)

Both client and aclient are provided. If using `:memory:` mode, the data between clients is not synced.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

In [5]:
## ingesting data into vector database

## lets set up an ingestion pipeline

from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import MarkdownNodeParser
from llama_index.core.ingestion import IngestionPipeline

pipeline = IngestionPipeline(
    transformations=[
        # MarkdownNodeParser(include_metadata=True),
        # TokenTextSplitter(chunk_size=500, chunk_overlap=20),
        SentenceSplitter(chunk_size=1024, chunk_overlap=20),
        Settings.embed_model,
    ],
    vector_store=vector_store,

)
Settings.chunk_size = 512
# Ingest directly into a vector db
nodes = pipeline.run(documents=documents , show_progress=True)

Parsing nodes:   0%|          | 0/43 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/45 [00:00<?, ?it/s]

## Setting Up Retriever

In [7]:
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

## Modifying Prompts

In [8]:
qa_prompt_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)

refine_prompt_str = (
    "We have the opportunity to refine the original answer "
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "Given the new context, refine the original answer to better "
    "answer the question: {query_str}. "
    "If the context isn't useful, output the original answer again.\n"
    "Original Answer: {existing_answer}"
)

from llama_index.core import ChatPromptTemplate

# Text QA Prompt
chat_text_qa_msgs = [
    ("system","You are a AI assistant who is well versed with medical information and only answer question per training to the medical domain"),
    ("user", qa_prompt_str),
]
text_qa_template = ChatPromptTemplate.from_messages(chat_text_qa_msgs)

# Refine Prompt
chat_refine_msgs = [
    ("system","Always answer the question, even if the context isn't helpful.",),
    ("user", refine_prompt_str),
]
refine_template = ChatPromptTemplate.from_messages(chat_refine_msgs)

In [9]:

from llama_index.llms.openai import OpenAI
llm = OpenAI()

RAG = index.as_query_engine(
        text_qa_template=text_qa_template,
        refine_template=refine_template,
        llm=llm,
        similarity_top_k=2,
        sparse_top_k=12,
        vector_store_query_mode="hybrid"
        )


response = RAG.query("Tell me more about Dosage adjustment is required in patients whose creatinine clearance is less than 30 mL/min and who are not receiving regularly scheduled hemodialysis. (8.6) See 17 for PATIENT COUNSELING INFORMATION ")


In [10]:
display(Markdown(str(response.response)))

In patients with renal impairment whose known creatinine clearance is less than 30 mL/min and who are not receiving regularly scheduled hemodialysis, the recommended two-dose regimen for DALVANCE is 750 mg followed one week later by 375 mg. No dosage adjustment is recommended for patients receiving regularly scheduled hemodialysis, and DALVANCE can be administered without regard to the timing of hemodialysis. This information is based on the details provided in section 8.6 of the document.

## Performing Evaluation using RAGAS

### Creating Synthetic Test Set

In [12]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# documents = load your documents

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader

# the document passes here is from the 2nd cell
testset = generator.generate_with_llamaindex_docs(documents, 10, distributions) 
testset = testset.to_pandas()

embedding nodes:   0%|          | 0/106 [00:00<?, ?it/s]

Filename and doc_id are the same for all nodes.


Generating:   0%|          | 0/10 [00:00<?, ?it/s]

In [15]:
# testset.to_csv('./archives/eval_data3.csv', index=False)
import pandas as pd

testset = pd.read_csv('./archives/eval_data.csv')

## Run Evaluation using Truelens

In [16]:
from trulens_eval import Tru
tru = Tru()

# tru.reset_database()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.


In [17]:
from trulens_eval.feedback.provider import OpenAI
from trulens_eval import Feedback
import numpy as np

# Initialize provider class
provider = OpenAI()

# select context to be used in feedback. the location of context is app specific.
from trulens_eval.app import App
context = App.select_context(RAG)

# Define a groundedness feedback function
f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()) # collect context chunks into a list
    .on_output()
)

# Question/answer relevance between overall question and answer.
f_answer_relevance = (
    Feedback(provider.relevance)
    .on_input_output()
)
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

✅ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In context_relevance_with_cot_reasons, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In context_relevance_with_cot_reasons, input context will be set to __record__.app.query.rets.source_nodes[:].node.text .


In [18]:
from trulens_eval import TruLlama

tru_query_engine_recorder = TruLlama(RAG,app_id="2_Hybrid_RAG",feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance])

In [20]:
eval_questions = testset['question'].to_list()

with tru_query_engine_recorder as recording:
    for question in eval_questions:
        response = RAG.query(question)

Groundedness per statement in source:   0%|          | 0/1 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/3 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/5 [00:00<?, ?it/s]

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Adithya\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Groundedness per statement in source:   0%|          | 0/3 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/3 [00:00<?, ?it/s]

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Adithya\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Groundedness per statement in source:   0%|          | 0/1 [00:00<?, ?it/s]

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Adithya\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

In [21]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [22]:
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,relevance,groundedness_measure_with_cot_reasons,context_relevance_with_cot_reasons,relevance_calls,groundedness_measure_with_cot_reasons_calls,context_relevance_with_cot_reasons_calls,latency,total_tokens,total_cost
0,0_Naive_RAG,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_2a084b36e219af3247b101b8a0c3d7d0,"""What are the Gram-positive microorganisms tha...","""DALVANCE (dalbavancin) is effective against G...",-,"{""record_id"": ""record_hash_2a084b36e219af3247b...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-05-25T15:22:24.419886"", ""...",2024-05-25T15:22:27.628004,0.9,0.0,0.5,[{'args': {'prompt': 'What are the Gram-positi...,[{'args': {'source': ['15 References\n\n1. Cli...,[{'args': {'question': 'What are the Gram-posi...,3,1509,0.002285
1,0_Naive_RAG,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_233e2b854caa1fae1f9a5cb45caa7fae,"""What were the characteristics of the patients...","""I'm sorry, but the provided context informati...",-,"{""record_id"": ""record_hash_233e2b854caa1fae1f9...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-05-25T15:22:28.229274"", ""...",2024-05-25T15:22:31.224825,1.0,1.0,0.5,[{'args': {'prompt': 'What were the characteri...,[{'args': {'source': ['Specific Populations\n\...,[{'args': {'question': 'What were the characte...,2,1959,0.002969
2,0_Naive_RAG,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_253dcd210a269b21a054ad3d853719e0,"""How do CYP450 substrates interact with dalbav...","""In vitro studies have shown that dalbavancin ...",-,"{""record_id"": ""record_hash_253dcd210a269b21a05...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-05-25T15:22:31.783235"", ""...",2024-05-25T15:22:34.618613,0.9,0.5,0.8,[{'args': {'prompt': 'How do CYP450 substrates...,[{'args': {'source': ['12.3 Pharmacokinetics\n...,[{'args': {'question': 'How do CYP450 substrat...,2,1274,0.001941
3,0_Naive_RAG,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_798023af5ef331a3e6846ff7d41e5a16,"""What are the warnings and precautions regardi...","""The warnings and precautions regarding hypers...",-,"{""record_id"": ""record_hash_798023af5ef331a3e68...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-05-25T15:22:35.088900"", ""...",2024-05-25T15:22:39.301764,0.8,0.983333,0.5,[{'args': {'prompt': 'What are the warnings an...,[{'args': {'source': ['3 Dosage Forms And Stre...,[{'args': {'question': 'What are the warnings ...,4,2147,0.003285
4,0_Naive_RAG,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_5dacdded6fdd9bf4f53eca8885b3dc39,"""What is the recommended dosage regimen for DA...","""The recommended dosage regimen for DALVANCE f...",-,"{""record_id"": ""record_hash_5dacdded6fdd9bf4f53...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-05-25T15:22:39.807016"", ""...",2024-05-25T15:22:42.558752,1.0,1.0,1.0,[{'args': {'prompt': 'What is the recommended ...,[{'args': {'source': ['Full Prescribing Inform...,[{'args': {'question': 'What is the recommende...,2,2091,0.003166


In [23]:
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://192.168.1.5:8501 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

In [None]:
tru.stop_dashboard()