# Example RAG with Opik Tracing and Evals

This simple example will get you started with using Opik, Weaviate, and the OpenAI API to build a RAG system.


# Set up your Environment

[Comet](https://www.comet.com/) provides a hosted version of the Opik platform, simply [create a free account](https://www.comet.com/site/products/opik/) and grab you API Key from the UI.

First, we need pip install the opik and openai libraries.

In [22]:
%pip install -U opik openai --quiet

Note: you may need to restart the kernel to use updated packages.


  res = process_handler(cmd, _system_body)
  res = process_handler(cmd, _system_body)
  res = process_handler(cmd, _system_body)


Now, we'll configure Opik and OpenAI with our respective API keys.

In [23]:
import opik

opik.configure(use_local=False)

OPIK: Existing Opik clients will not use updated values for "url", "api_key", "workspace".
OPIK: Opik is already configured. You can check the settings by viewing the config file at C:\Users\nicet\.opik.config


In [24]:
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

Traces will now be automatically logged to the Opik UI where you can inspect the inputs, outputs, and configure evaluation metrics. After you run this cell, follow the link to the Comet UI to see you traces.

# Set up Weaviate Client

Weaviate is a vector database which supports billion scale vector search with sub 50ms query times. We'll use Weaviate to query for books in this example.

In [25]:
%pip install -U weaviate-client --quiet

Note: you may need to restart the kernel to use updated packages.


In [26]:
import os
import weaviate
from weaviate.classes.init import Auth
from weaviate.classes.init import AdditionalConfig, Timeout


WEAVIATE_CLUSTER_URL = os.getenv('WEAVIATE_CLUSTER_URL') or 'https://zxzyqcyksbw7ozpm5yowa.c0.us-west2.gcp.weaviate.cloud'
WEAVIATE_API_KEY = os.getenv('WEAVIATE_API_KEY') or 'n6mdfI32xrXF3DH76i8Pwc2IajzLZop2igb6' # This is a read key

weaviate_client = weaviate.connect_to_weaviate_cloud(
    cluster_url=WEAVIATE_CLUSTER_URL,
    auth_credentials=Auth.api_key(WEAVIATE_API_KEY),
    headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]},
)

print(weaviate_client.is_connected())

website_collection = weaviate_client.collections.get(name="Hackathon")

True


# Write a RAG app with OpenAI, Weaviate and Opik Traces

Next, we will build a very simple LLM reasoning application and log the trace data to Opik where we can apply additional evaluation metrics and debug the LLM response.

We will use Opik to collect traces to inspect the inputs and outputs of the reasoning tasks, and to create evaluation metrics for hallicinations and other common or custom issues you want to detect.

Opik integrates with OpenAI to provide a simple way to log traces for all OpenAI LLM calls. This works for all OpenAI models, including if you are using the streaming API.

In [27]:
from opik.integrations.openai import track_openai
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()

os.environ["OPIK_PROJECT_NAME"] = "rag-project" #name your project. This will appear as the project name in the Opik UI


client = OpenAI()
client = track_openai(client)

We are using the @opik.track decorator and the OpenAI logging integration to automatically log our traces and spans. Learn more here https://www.comet.com/docs/opik/tracing/log_traces#using-an-integration

In [None]:
@opik.track
def retrieve_context(user_query):
    # Semantic Search
    response = website_collection.query.near_text(
        query=user_query,
        limit=3
    )

    text_chunks = []
    for chunk in response.objects:
        text_chunks.append(chunk)
    return text_chunks

In [None]:
@opik.track
def generate_response(user_query, recommended_books):
  prompt = f"""
  You're a helpful assistant, reply to a chatbot message for someone inquiring for
  document replacement recommendations. The user query was {user_query}


  These were the text chunks that were extracted from the vector
  search:

  Also return links if applicable

  {recommended_books}
  """

  response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
          {
              "role": "user",
              "content": prompt
          }
      ]
  )

  return (response.choices[0].message.content)

In [30]:
@opik.track(name="rag-example")
def llm_chain(user_query):
    context = retrieve_context(user_query)
    response = generate_response(user_query, context)
    return response

In [None]:
# Use the LLM chain
user_query = input("What types of documents you need to replace?")
data = llm_chain(user_query)
print(data)

If you need to replace your passport, the process generally involves submitting a completed application form, providing proof of citizenship, a recent passport photo, and paying the applicable fees. Here are a few steps to guide you through the process:

1. **Complete the Application Form**: Use Form DS-11 if you are applying for a replacement (you cannot use Form DS-82 since it's for renewals and not for lost items).

2. **Provide Proof of Citizenship**: You’ll need to present an original or certified copy of your birth certificate, or a previous U.S. passport if you have that available.

3. **Passport Photo**: Provide a recent passport photo that meets the official requirements (2x2 inches with a clear view of the full face).

4. **Submit in Person**: Since you are replacing a lost passport, you'll typically need to submit your application in person at a Passport Acceptance Facility, such as a post office or county clerk's office.

5. **Pay the Fees**: Be prepared to pay the required

In [32]:
from openai import OpenAI
load_dotenv()
apikey = os.getenv("OPENAI_API_KEY")
print(apikey)


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


In [33]:
""" client = OpenAI(api_key=apikey)
completion = client.chat.completions.create(
    model="gpt-4o",
    store=True,
    messages=[
        {"role": "user", "content": "write a haiku about ai"}
    ]
) """

' client = OpenAI(api_key=apikey)\ncompletion = client.chat.completions.create(\n    model="gpt-4o",\n    store=True,\n    messages=[\n        {"role": "user", "content": "write a haiku about ai"}\n    ]\n) '

In [34]:
import requests
from bs4 import BeautifulSoup
import re
import textwrap

def scrape_text(url, n=500):
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch {url}: Status code {response.status_code}")
    
    soup = BeautifulSoup(response.text, "html.parser")
    
    # Extract all text and clean up whitespace
    for script in soup(["script", "style"]):
        script.extract()  # Remove script and style elements

    text = soup.get_text(separator=" ", strip=True)
    
    # Use regex to split into sentences while keeping punctuation
    sentences = re.split(r'(?<=[.!?]) +', text)
    
    chunks = []
    current_chunk = ""
    
    for sentence in sentences:
        if len(current_chunk) + len(sentence) <= n:
            current_chunk += " " + sentence if current_chunk else sentence
        else:
            chunks.append({"text": current_chunk.strip(), "link": url})
            current_chunk = sentence
    
    if current_chunk:
        chunks.append({"text": current_chunk.strip(), "link": url})
    
    return chunks

# Example usage:
# result = scrape_text("https://example.com", n=500)
# print(result)


In [35]:
data = scrape_text("https://www.ca.gov/lafires/get-help-online/#Replacing-your-personal-documents", n=500)
print(data)

[{'text': 'Get help online | CA.gov Skip to Main Content Official California website California government websites use .ca.gov A .ca.gov website is part of Californiaâ\x80\x99s government.', 'link': 'https://www.ca.gov/lafires/get-help-online/#Replacing-your-personal-documents'}, {'text': "EspaÃ±ol í\x95\x9cêµ\xadì\x96´ Tagalog Tiáº¿ng Viá»\x87t ç¹\x81é«\x94ä¸\xadæ\x96\x87 Õ\x80Õ¡ÕµÕ¥Ö\x80Õ¥Õ¶ Translate Menu Custom Google Search Submit Close Services Departments About California Get help Home 2025 Los Angeles Fires Get help online Get help online 2025 Los Angeles Fires Get help online 2025 Los Angeles fires Get help online Recovery services finder Get help in person Plan your in-person visit See real-time info Start your recovery Return to your home safely Cleanup and debris removal Help your business Volunteer Track LA's progress You can get help online with food, expenses, shelter, and more.", 'link': 'https://www.ca.gov/lafires/get-help-online/#Replacing-your-personal-documents'}, 

In [36]:
import weaviate
from weaviate.classes.init import Auth
import os

load_dotenv()
# Best practice: store your credentials in environment variables
wcd_url = os.environ["WEAVIATE_CLUSTER_URL"]
wcd_api_key = os.environ["WEAVIATE_API_KEY"]

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=wcd_url,                                    # Replace with your Weaviate Cloud URL
    auth_credentials=Auth.api_key(wcd_api_key),             # Replace with your Weaviate Cloud key
)

print(client.is_ready())  # Should print: `True`



True


In [37]:
questions = client.collections.get("Hackathon")

with questions.batch.dynamic() as batch:
    for d in data:
        batch.add_object({
            "text": d["text"],
            "link": d["link"],
        })
        if batch.number_errors > 10:
            print("Batch import stopped due to excessive errors.")
            break

failed_objects = questions.batch.failed_objects
if failed_objects:
    print(f"Number of failed imports: {len(failed_objects)}")
    print(f"First failed object: {failed_objects[0]}")

client.close()  # Free up resourcete

In [38]:
import json
#query
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=wcd_url,                                    # Replace with your Weaviate Cloud URL
    auth_credentials=Auth.api_key(wcd_api_key),             # Replace with your Weaviate Cloud key
)

questions = client.collections.get("Hackathon")

response = questions.query.near_text(
    query="drivers license",
    limit=2
)

for obj in response.objects:
    print(json.dumps(obj.properties, indent=2))

client.close()

{
  "text": "Staying healthy Tips for staying healthy during a wildfire by California Department of Public Health (CDPH) Check your air quality through the South Coast Air Quality Monitoring District Get the prescriptions you need through the federal Emergency Prescription Assistance Program Worker safety tips from the Department of Industrial Relations Mental health Free crisis counseling Available online or in person Mental health resources for youth by California Health and Human Services Agency (CalHHS) Replacing your personal documents Driver\u00e2\u0080\u0099s license and ID cards Fees are waived for licenses or IDs lost in the fires.",
  "link": "https://www.ca.gov/lafires/get-help-online/#Replacing-your-personal-documents"
}
{
  "text": "Staying healthy Tips for staying healthy during a wildfire by California Department of Public Health (CDPH) Check your air quality through the South Coast Air Quality Monitoring District Get the prescriptions you need through the federal Emerge