<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/qdrant_arize.png" width="500"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>
<h1 align="center">Tuning a RAG Pipeline using Qdrant and Arize Phoenix</h1>

ℹ️ This notebook requires an OpenAI API key.

### **1. Import Relevant Packages**

In [None]:
import os

# Setup projects
SIMPLE_RAG_PROJECT = "simple-rag"
HYBRID_RAG_PROJECT = "hybrid-rag"
os.environ["PHOENIX_PROJECT_NAME"] = SIMPLE_RAG_PROJECT

In [None]:
import datetime
import json
import os
import pickle
import ssl
import time
import urllib
from getpass import getpass
from urllib.request import urlopen

import certifi
import nest_asyncio
import openai
import pandas as pd
import phoenix as px
import requests
from bs4 import BeautifulSoup
from llama_index.core import (
    ServiceContext, StorageContext, download_loader,
    load_index_from_storage, set_global_handler
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.graph_stores.simple import SimpleGraphStore
from llama_index.core.indices.vector_store.base import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from phoenix.evals import (
    HallucinationEvaluator, OpenAIModel, QAEvaluator,
    RelevanceEvaluator, run_evals
)
from phoenix.session.evaluation import get_qa_with_reference, get_retrieved_documents
from phoenix.trace import DocumentEvaluations, SpanEvaluations
from tqdm import tqdm

import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient, models
from qdrant_client.http.models import PointStruct

nest_asyncio.apply()  # needed for concurrent evals in notebook environments
pd.set_option("display.max_colwidth", 1000)

### **2. Launch Phoenix**
You can run Phoenix in the background to collect trace data emitted by any LlamaIndex application that has been instrumented with the OpenInferenceTraceCallbackHandler. Phoenix supports LlamaIndex's one-click observability which will automatically instrument your LlamaIndex application! You can consult our integration guide for a more detailed explanation of how to instrument your LlamaIndex application.

Launch Phoenix and follow the instructions in the cell output to open the Phoenix UI (the UI should be empty because we have yet to run the LlamaIndex application).

In [None]:
session = px.launch_app()

Be sure to enable phoenix as your global handler for tracing!

In [None]:
set_global_handler("arize_phoenix")

### **3. Setup your openai key and retrieve the documents to be used**

In [None]:
from dotenv import load_dotenv
load_dotenv()

In [None]:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

### **4. Retrieve the documents / dataset to be used**

In [None]:
from datasets import load_dataset

# If the dataset is gated/private, make sure you have run huggingface-cli login
dataset = load_dataset("atitaarora/qdrant_doc", split="train")

In [None]:
dataset.info

### **5. Definition of global chunk properties and chunk processing**
Processing each document with desired **TEXT_SPLITTER_ALGO , CHUNK_SIZE , CHUNK_OVERLAP** etc

In [None]:
## Global config for chunk processing
CHUNK_SIZE = 512 #1000
CHUNK_OVERLAP = 50

### **6. Process dataset as langchain (or llamaindex) document for further processing**

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document as LangchainDocument
from llama_index.core import Document

## Split and process the document chunks from the given dataset

def process_document_chunks(dataset,chunk_size,chunk_overlap):
    langchain_docs = [
        LangchainDocument(page_content=doc["text"], metadata={"source": doc["source"]})
        for doc in tqdm(dataset)
    ]

    # could showcase another variation of processed documents
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        add_start_index=True,
        separators=["\n\n", "\n", ".", " ", ""],
    )

    docs_processed = []
    for doc in langchain_docs:
        docs_processed += text_splitter.split_documents([doc])

    ## Converting Langchain document chunks above into Llamaindex Document for ingestion
    llama_documents = [
        Document.from_langchain_format(doc)
        for doc in docs_processed
    ]
    return llama_documents

In [None]:
documents = process_document_chunks(dataset, CHUNK_SIZE, CHUNK_OVERLAP)
len(documents)

### **7. Setting up Qdrant and Collection**

We first set up the qdrant client and then create a collection so that our data may be stored.

In [None]:
##Uncomment to initialise qdrant client in memory
#client = qdrant_client.QdrantClient(
#    location=":memory:",
#)

##Uncomment below to connect to Qdrant Cloud
client = QdrantClient(
    os.environ.get("QDRANT_URL"), 
    api_key=os.environ.get("QDRANT_API_KEY"),
)

## Uncomment below to connect to local Qdrant
#client = qdrant_client.QdrantClient("http://localhost:6333")

In [None]:
## Collection Name 
COLLECTION_NAME = "qdrant_docs_arize_dense"

In [None]:
## General Collection level operations

## Get information about existing collections 
client.get_collections()

## Get information about specific collection
#collection_info = client.get_collection(COLLECTION_NAME)
#print(collection_info)

## Deleting collection, if need be
#client.delete_collection(COLLECTION_NAME)

In [None]:
## Declaring the intended Embedding Model with Fastembed
from fastembed.embedding import TextEmbedding

pd.DataFrame(TextEmbedding.list_supported_models())

### **8. Document Embedding processing and Ingestion**

This example uses a `QdrantVectorStore` and creates a new collection to work fully connected with Qdrant but you can use whatever LlamaIndex application you like.

In [None]:
import llama_index
from llama_index.core import Settings
from llama_index.vector_stores.qdrant import QdrantVectorStore
from phoenix.trace import suppress_tracing
## Uncomment it if you'd like to use FastEmbed instead of OpenAI
## For the complete list of supported models,
##please check https://qdrant.github.io/fastembed/examples/Supported_Models/
from llama_index.embeddings.fastembed import FastEmbedEmbedding

vector_store = QdrantVectorStore(client=client, collection_name=COLLECTION_NAME)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

##Uncomment if using FastEmbed
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

## Uncomment it if you'd like to use OpenAI Embeddings instead of FastEmbed
#Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

Settings.llm = OpenAI(model="gpt-4-1106-preview", temperature=0.0)

with suppress_tracing():
  index = VectorStoreIndex.from_documents(
      documents,
      storage_context=storage_context,
      show_progress=True
  )

### **8a. Connecting to existing Collection**

This example uses a `QdrantVectorStore` and uses the previously generated collection to work fully connected with Qdrant.

In [None]:
## Uncomment it if using an existing collection
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from llama_index.core.indices.vector_store import VectorIndexRetriever
vector_store = QdrantVectorStore(client=client, collection_name=COLLECTION_NAME)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

In [None]:
client.count(collection_name=COLLECTION_NAME)

### **9.Running an example query and printing out the response.**

In [None]:
##Initialise retriever to interact with the Qdrant collection
retriever = VectorIndexRetriever(
    index=index,
    vector_store_query_mode=VectorStoreQueryMode.DEFAULT,
    similarity_top_k=5
)

In [None]:
response = retriever.retrieve("What is quantization?")
for i, node in enumerate(response):
    print(i + 1, node.text, end="\n\n")

In [None]:
response

In [None]:
# We can view the above data in the UI
px.active_session().view()

### **10. Run Your Query Engine and View Your Traces in Phoenix**

We've compiled a list of the baseline questions about Qdrant. Let's download the sample queries and take a look.

In [None]:
## Loading the Eval dataset
from datasets import load_dataset
qdrant_qa = load_dataset("atitaarora/qdrant_doc_qna", split="train")
qdrant_qa_question = qdrant_qa.select_columns(['question'])

In [None]:
qdrant_qa_question['question'][:10]

In [None]:
query_engine = index.as_query_engine()
for query in tqdm(qdrant_qa_question['question'][:10]):
    try:
      query_engine.query(query)
    except Exception as e:
      pass

Check the Phoenix UI as your queries run. Your traces should appear in real time.

Open the Phoenix UI with the link below if you haven't already and click through the queries to better understand how the query engine is performing. For each trace you will see a break

Phoenix can be used to understand and troubleshoot your by surfacing:
 - **Application latency** - highlighting slow invocations of LLMs, Retrievers, etc.
 - **Token Usage** - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls
 - **Runtime Exceptions** - Critical runtime exceptions such as rate-limiting are captured as exception events.
 - **Retrieved Documents** - view all the documents retrieved during a retriever call and the score and order in which they were returned
 - **Embeddings** - view the embedding text used for retrieval and the underlying embedding model
LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts
 - **Prompt Templates** - Figure out what prompt template is used during the prompting step and what variables were used.
 - **Tool Descriptions** - view the description and function signature of the tools your LLM has been given access to
 - **LLM Function Calls** - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.

<img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/RAG_trace_details.png" alt="Trace Details View on Phoenix" style="width:100%; height:auto;">

In [None]:
print(f"🚀 Open the Phoenix UI if you haven't already: {session.url}")

### **11. Export and Evaluate Your Trace Data**
You can export your trace data as a pandas dataframe for further analysis and evaluation.

In this case, we will export our retriever spans into two separate dataframes:

queries_df, in which the retrieved documents for each query are concatenated into a single column, retrieved_documents_df, in which each retrieved document is "exploded" into its own row to enable the evaluation of each query-document pair in isolation. This will enable us to compute multiple kinds of evaluations, including:

relevance: Are the retrieved documents grounded in the response? Q&A correctness: Are your application's responses grounded in the retrieved context? hallucinations: Is your application making up false information?

In [None]:
queries_df = get_qa_with_reference(px.Client())
retrieved_documents_df = get_retrieved_documents(px.Client())

In [None]:
queries_df

In [None]:
retrieved_documents_df

### **12. Define your evaluation model and your evaluators**

Next, define your evaluation model and your evaluators.

Evaluators are built on top of language models and prompt the LLM to assess the quality of responses, the relevance of retrieved documents, etc., and provide a quality signal even in the absence of human-labeled data. Pick an evaluator type and instantiate it with the language model you want to use to perform evaluations using our battle-tested evaluation templates.

In [None]:
eval_model = OpenAIModel(
    model="gpt-4-turbo-preview",
)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)
relevance_evaluator = RelevanceEvaluator(eval_model)

hallucination_eval_df, qa_correctness_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
relevance_eval_df = run_evals(
    dataframe=retrieved_documents_df,
    evaluators=[relevance_evaluator],
    provide_explanation=True,
)[0]

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_correctness_eval_df),
)
px.Client().log_evaluations(DocumentEvaluations(eval_name="Relevance", dataframe=relevance_eval_df))

Your evaluations should now appear as annotations on the appropriate spans in Phoenix.

![A view of the Phoenix UI with evaluation annotations](https://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/evals/traces_with_evaluation_annotations.png)

### **13. Let's try Hybrid search now**

In [None]:
## Define a new collection to store your hybrid emebeddings
COLLECTION_NAME_HYBRID = "qdrant_docs_arize_hybrid"

In [None]:
##Reprocess documents with different settings if needed 
#documents = process_document_chunks(dataset , CHUNK_SIZE , CHUNK_OVERLAP)

In [None]:
##List of supported sparse vector models
from fastembed.sparse.sparse_text_embedding import SparseTextEmbedding
SparseTextEmbedding.list_supported_models()

### **14. Ingest Sparse and Dense vectors into Qdrant**

Ingest sparse and dense vectors into Qdrant Collection.
We are using Splade++ model for Sparse Vector Model and default Fastembed model - bge-small-en-1.5 for dense embeddings. 

In [None]:
import llama_index
from llama_index.core import Settings
from llama_index.vector_stores.qdrant import QdrantVectorStore
from fastembed.sparse.sparse_text_embedding import SparseTextEmbedding, SparseEmbedding
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from typing import List, Tuple

sparse_model_name = "prithivida/Splade_PP_en_v1"

# This triggers the model download
sparse_model = SparseTextEmbedding(model_name=sparse_model_name, batch_size=32)

batch_size = 10
parallel = 0

## Computing sparse vectors
def compute_sparse_vectors(
    texts: List[str],
    ) -> Tuple[List[List[int]], List[List[float]]]:
    indices, values = [], []
    for embedding in sparse_model.embed(texts):
        indices.append(embedding.indices.tolist())
        values.append(embedding.values.tolist())
    return indices, values

## Creating a vector store with Hybrid search enabled
vector_store = QdrantVectorStore(
    client=client,
    collection_name=COLLECTION_NAME_HYBRID,
    enable_hybrid=True,
    sparse_doc_fn=compute_sparse_vectors,
    sparse_query_fn=compute_sparse_vectors)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

## Ingesting sparse and dense vectors into Qdrant collection
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    show_progress=True
)

In [None]:
## collection level operations
client.get_collection(COLLECTION_NAME_HYBRID)
#client.delete_collection(COLLECTION_NAME_HYBRID)

In [None]:
## Check the number of documents matches the expected number of document chunks 
client.count(collection_name=COLLECTION_NAME_HYBRID)

### **15. Hybrid Search with Qdrant**

In [None]:
## Initialise Hybrid Vector Store 
vector_store_hybrid = QdrantVectorStore(
    client=client,
    collection_name=COLLECTION_NAME_HYBRID,
    enable_hybrid=True,
    batch_size=20,  # this is important for the ingestion
)

## Followed by initializing index for interacting with the Hybrid Collection in Qdrant

hybrid_index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store_hybrid,
    storage_context=storage_context,
)

In [None]:
!pip freeze | grep transformers

In [None]:
##TODO add this to poetry
#!pip install "transformers[torch]"

In [None]:
## Before moving further , lets try Sparse Vector Search Retriever 
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from llama_index.core.indices.vector_store import VectorIndexRetriever
from phoenix.trace import using_project

sparse_retriever = VectorIndexRetriever(
    index=hybrid_index,
    vector_store_query_mode=VectorStoreQueryMode.SPARSE,
    sparse_top_k=2,
)

## Pure sparse vector search
with using_project(HYBRID_RAG_PROJECT):
    nodes = sparse_retriever.retrieve("What is quantization?")
    for i, node in enumerate(nodes):
        print(i + 1, node.text, end="\n\n")

In [None]:
from phoenix.trace import using_project
## Let's try Hybrid Search Retriever now
hybrid_retriever = VectorIndexRetriever(
    index=hybrid_index,
    vector_store_query_mode=VectorStoreQueryMode.HYBRID,
    sparse_top_k=2,
    similarity_top_k=5,
    alpha=0.1,
)

with using_project(HYBRID_RAG_PROJECT):
    nodes = hybrid_retriever.retrieve("What is quantization?")
    for i, node in enumerate(nodes):
        print(i + 1, node.text, end="\n\n")

In [None]:
from phoenix.trace import using_project
# We shouldn't be modifying the alpha parameter after the retriever has been created
# but that's the easiest way to show the effect of the parameter
#hybrid_retriever._alpha = 0.1
hybrid_retriever._alpha = 0.9

with using_project(HYBRID_RAG_PROJECT):
    nodes = hybrid_retriever.retrieve("What is quantization?")
    for i, node in enumerate(nodes):
        print(i + 1, node.text, end="\n\n")

### **16. Re-Run Your Query Engine and View Your Traces in Phoenix**

Let's rerun the list of the baseline questions about Qdrant on the Hybrid Retriever. 

In [None]:
## Switching phoenix project space
from phoenix.trace import using_project

# Switch project to run evals
with using_project(HYBRID_RAG_PROJECT):
# all spans created within this context will be associated with the `HYBRID_RAG_PROJECT` project.

    ##Reuse the previously loaded dataset `qdrant_qa_question`

    query_engine_hybrid = hybrid_index.as_query_engine()
    for query in tqdm(qdrant_qa_question['question'][:10]):
        try:
          query_engine_hybrid.query(query)
        except Exception as e:
          pass

In [None]:
print(f"🚀 Open the Phoenix UI if you haven't already: {session.url}")

In [None]:
## Switching phoenix project space
from phoenix.trace import using_project


queries_df_hybrid = get_qa_with_reference(px.Client(), project_name=HYBRID_RAG_PROJECT)
retrieved_documents_df_hybrid = get_retrieved_documents(px.Client(), project_name=HYBRID_RAG_PROJECT)

In [None]:
queries_df_hybrid

In [None]:
retrieved_documents_df_hybrid

### **17. Define your evaluation model and your evaluators for Hybrid Search**

Next, define your evaluation model and your evaluators.

Evaluators are built on top of language models and prompt the LLM to assess the quality of responses, the relevance of retrieved documents, etc., and provide a quality signal even in the absence of human-labeled data. Pick an evaluator type and instantiate it with the language model you want to use to perform evaluations using our battle-tested evaluation templates.

In [None]:


# all spans created within this context will be associated with the `HYBRID_RAG_PROJECT` project.
eval_model = OpenAIModel(
    model="gpt-4-turbo-preview",
)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)
relevance_evaluator = RelevanceEvaluator(eval_model)

hallucination_eval_df_hybrid, qa_correctness_eval_df_hybrid = run_evals(
    dataframe=queries_df_hybrid,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
relevance_eval_df_hybrid = run_evals(
    dataframe=retrieved_documents_df_hybrid,
    evaluators=[relevance_evaluator],
    provide_explanation=True,
)[0]

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df_hybrid),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_correctness_eval_df_hybrid),
    project_name=HYBRID_RAG_PROJECT,
)
px.Client().log_evaluations(DocumentEvaluations(eval_name="Relevance", dataframe=relevance_eval_df_hybrid),
                            project_name=HYBRID_RAG_PROJECT)