# Welcome to trace-able RAG with Traceloop and Pinecone

This Notebook demonstrates how to configure tracing and monitoring for your RAG pipeline with Traceloop.

It demonstrates:

*   Configuring Traceloop to observe your RAG pipeline
*   Creating a simple RAG pipeline using LangChain and Pinecone
*   Altering on low relevance scores from Pinecone when there is no good context to return for a given query


# Setup

Step 1: Install dependencies (the Python packages that our program uses)

In [None]:
!pip install -qU \
    langchain==0.1.20 \
    openai \
    datasets==2.10.1 \
    pinecone-client \
    tiktoken \
    traceloop-sdk==0.19.0 \
    langchain_openai \
    langchain_pinecone

# Prepare target data

Step 2. Load a dataset that contains arxiv papers about Llama2. This data could really be anything, including your private company data.

In [None]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

# Configure an embeddings model

Step 3. Set up OpenAI embeddings, using the text-embedding-3-small (1536 dimensions)

In [None]:
import os
from google.colab import userdata

from langchain_openai import OpenAIEmbeddings
# Set our secret OPENAI_API_KEY to an environment variable of the same name
# Which the OpenAIEmbeddings class expects to find
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

embed_model = OpenAIEmbeddings(model="text-embedding-3-small")

# Enable Traceloop

Step 4. Set up Traceloop for tracing and monitoring

In [None]:
from google.colab import userdata
from traceloop.sdk import Traceloop
from traceloop.sdk.instruments import Instruments

Traceloop.init(api_key=userdata.get('TRACELOOP_API_KEY'))

# Configure Pinecone

Step 5. Set up connection to Pinecone, using the latest Serverless offering, which means you don't need to worry about specifying your workload or storage needs upfront.

In [None]:
from pinecone import Pinecone
from google.colab import userdata
from pinecone import ServerlessSpec

# Use a Pinecone serverless index for effortless scaling
spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = userdata.get('PINECONE_API_KEY')

# configure client
pc = Pinecone(api_key=api_key)

# Set up Pinecone serverless index

Step 6. Check if a Pinecone index with our desired name already exists, and create it if it does not

In [None]:
import time

index_name = 'traceloop-rag'
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of text-embedding-3-small
        metric='dotproduct',
        spec=spec
    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

# Data ingest

Step 7. Loop through our dataset, convert each chunk to vectors, and upsert the vectors alongside metadata

In [None]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

# Connect to Pinecone via LangChain

Step 8. Set up a LangChain vectorstore using Pinecone

In [None]:
from langchain_pinecone import PineconeVectorStore

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = PineconeVectorStore(
    index, embed_model, text_field
)

# Implement a trace-able RAG pipeline with Traceloop

Step 9. Create a simple RAG chain, using Traceloop's decorator, which produces a trace-able system we can use to ask questions of our knowledgebase.

In [None]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from traceloop.sdk.decorators import workflow

from google.colab import userdata
OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")

@workflow(name="rag_backed_query")
def rag_backed_query(query: str):
  # completion llm
  llm = ChatOpenAI(
      openai_api_key=OPENAI_API_KEY,
      model_name='gpt-4o',
      temperature=0.0
  )
  qa = RetrievalQA.from_chain_type(
      llm=llm,
      chain_type="stuff",
      retriever=vectorstore.as_retriever()
  )
  return qa.invoke(query)

# Demonstrate similarity search

Step 10. We can issue a similarity search directly against our vectorstore / knoweldgebase by issuing a query to our LangChain vectorstore.


In [None]:
query = "What is so special about Llama 2?"

res = rag_backed_query(query=query)
print(res["result"])

# Observability is critical to RAG pipelines

Let's now simulate a query that our RAG pipeline handles poorly, by asking it something that doesn't exist in our knowledgebase / vectorstore and that the OpenAI foundation model (Chat GPT 4o) doesn't already know well.

This will demonstrate how Traceloop is able to observe and filter on low relevance scores returned by Pinecone's vector database.

In [None]:
query = "Can you explain how Gemini works?"

res = rag_backed_query(query=query)
print(res["result"])