## Telemetry Using Phoenix

Phoenix is a powerful tool designed to simplify the management and visualization of telemetry data. It helps us handle complex traces, making it easier to analyze and diagnose issues in our system. With Phoenix, we can monitor our RAG system's performance, identify bottlenecks, and gain insights into how different components of our application interact.

In [1]:
import utils
import phoenix as px

### Launching Phoenix App

Next cell to launch the Phoenix app, which will set up a local server and host a user interface (UI). The default URL for accessing the app is `localhost:6006`, and this will be displayed once we call the application.

In [2]:
utils.make_url()
px.launch_app()

[1mFOLLOW THIS URL TO OPEN THE UI: http://widxvlspbcig.labs.coursera.org[0m
🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://arize.com/docs/phoenix


<phoenix.session.session.ThreadSession at 0x7e4068df6aa0>

### Preparing the telemetry

Now we'll configure the telemetry to work with Phoenix.

In [3]:
from phoenix.otel import register
from opentelemetry.trace import Status, StatusCode
phoenix_project_name = "example-rag-pipeline"

# With phoenix, we just need to register to get the tracer provider with the appropriate endpoint.
endpoint="http://127.0.0.1:6006/v1/traces"
tracer_provider_phoenix = register(project_name=phoenix_project_name, endpoint = endpoint)

# Retrieve a tracer for manual instrumentation
tracer = tracer_provider_phoenix.get_tracer(__name__)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: example-rag-pipeline
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: http://127.0.0.1:6006/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



### Using the Pipeline

#### Retrieve

In [4]:
def retrieve(query, fail=False):
    # Start a span to trace the retrieval process. Now we can pass a span kind: retriever
    with tracer.start_as_current_span("retrieving_documents", openinference_span_kind = 'retriever') as span:
        # Log the event of starting retrieval
        span.add_event("Starting retrieve")
        # Record the input query as an attribute for visibility
        # Phoenix allows us to use span.set_input
        span.set_input(query)
        try:
            # Simulate a retrieval failure if 'fail' is True
            if fail:
                raise ValueError(f"Retrieve failed for query: {query}")

            # Simulated list of retrieved documents
            retrieved_docs = ['retrieved doc1', 'retrieved doc2', 'retrieved doc3']
            # Record details about each retrieved document
            for i, doc in enumerate(retrieved_docs):
                span.set_attribute(f"retrieval.documents.{i}.document.id", i)
                span.set_attribute(f"retrieval.documents.{i}.document.content", doc)
                span.set_attribute(f"retrieval.documents.{i}.document.metadata", f"Metadata for document {i}")
        except Exception as e:
            # If an exception occurs, log and set the span status to indicate an error
            span.set_status(Status(StatusCode.ERROR, str(e)))
            span.set_attribute("error.type", type(e).__name__)
            span.set_attribute("error.message", str(e))
            # Reraise the exception for handling by the caller
            raise

        # Mark the span as successful if no error was raised
        span.set_status(Status(StatusCode.OK))
        return retrieved_docs

### Chains

A chain is a connection point between different steps in an LLM application. It links together various operations, like starting a request or passing information from a retriever to an LLM call. Chains help keep things organized and simple. 

#### The remaining RAG functions

In [5]:
@tracer.chain
def format_documents(retrieved_docs):
    t = ''
    for i, doc in enumerate(retrieved_docs):
        t += f'Retrieved doc: {doc}\n'
    return t

@tracer.chain
def augment_prompt(query, formatted_documents):
    # Create a prompt that combines the query and formatted documents
    PROMPT = f"Answer the query: {query}.\nRelevant documents:\n{formatted_documents}"
    return PROMPT

@tracer.chain
def generate(prompt):
    generated_text = f"Generated text for prompt {prompt}"
    return generated_text

@tracer.chain
def rag_pipeline(query, fail = False):
        # Step 1: Retrieve documents based on the query
        retrieved_docs = retrieve(query, fail = fail)
        # Step 2: Format the retrieved documents
        formatted_docs = format_documents(retrieved_docs)
        # Step 3: Augment the query with relevant documents to form a prompt
        prompt = augment_prompt(query, formatted_docs)
        # Step 4: Generate a response from the augmented prompt
        generated_response = generate(prompt)
        return generated_response

### Using the UI to analyze the traces

In [6]:
response = rag_pipeline("This is a test query")
try:
    response = rag_pipeline("This is a test query that failed", fail = True)
except:
    pass

In [7]:
utils.make_url()

[1mFOLLOW THIS URL TO OPEN THE UI: http://widxvlspbcig.labs.coursera.org[0m


In [None]:
utils.restart_kernel()

## Tracing and Evaluation with Weaviate

Implement a small RAG pipeline to answer an FAQ related question for a clothing store.

In [1]:
from phoenix.otel import register
from opentelemetry.trace import Status, StatusCode
import phoenix as px
import flask_app
import weaviate
import utils
import weaviate_server

 * Serving Flask app 'flask_app'
 * Debug mode: off


In [2]:
utils.make_url()
session = px.launch_app()

[1mFOLLOW THIS URL TO OPEN THE UI: http://widxvlspbcig.labs.coursera.org[0m
🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://arize.com/docs/phoenix


Unknown project: UHJvamVjdDoy

GraphQL request:4:3
3 | ) {
4 |   node(id: $id) {
  |   ^
5 |     __typename
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/graphql/execution/execute.py", line 530, in await_result
    return_type, field_nodes, info, path, await result
  File "/usr/local/lib/python3.10/dist-packages/strawberry/schema/schema_converter.py", line 788, in _async_resolver
    return await await_maybe(
  File "/usr/local/lib/python3.10/dist-packages/strawberry/utils/await_maybe.py", line 13, in await_maybe
    return await value
  File "/usr/local/lib/python3.10/dist-packages/phoenix/server/api/queries.py", line 530, in node
    raise NotFound(f"Unknown project: {id}")
phoenix.server.api.exceptions.NotFound: Unknown project: UHJvamVjdDoy
Unknown project: UHJvamVjdDoy

GraphQL request:4:3
3 | ) {
4 |   node(id: $id) {
  |   ^
5 |     __typename
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/graphql/execution

### Configuring the tracer

In [3]:
from phoenix.otel import register
phoenix_project_name = "rag-pipeline-with-weaviate"

# With phoenix, we just need to register to get the tracer provider with the appropriate endpoint. Providing auto_instrument = True, OpenAI calls are automatically traced
# TogetherAI is OpenAI compatible!
tracer_provider_phoenix = register(project_name=phoenix_project_name, endpoint="http://127.0.0.1:6006/v1/traces", auto_instrument=True)

# Retrieve a tracer for manual instrumentation
tracer = tracer_provider_phoenix.get_tracer(__name__)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: rag-pipeline-with-weaviate
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: http://127.0.0.1:6006/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



### Preparing the Weaviate client and collection

In [4]:
# Connecting the weaviate client
client = weaviate.connect_to_local(port=8079, grpc_port=50050)

In [5]:
import joblib
data = joblib.load("faq.joblib")

In [7]:
data[0]

{'question': 'What are your store hours?',
 'answer': 'Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.',
 'type': 'general information'}

In [8]:
# Loading the collection
collection = client.collections.get("Faq")

In [9]:
len(collection)

26

### The Retriever

In [10]:
def retrieve(query_text, limit=5):
    # Start a span for the query
    with tracer.start_as_current_span(
        "query_weaviate", openinference_span_kind="retriever"
    ) as span:
        # Set the input for the span
        span.set_input(query_text)

        # Query the collection
        collection_name = "Faq"
        chunks = client.collections.get(collection_name)
        results = chunks.query.near_text(query=query_text, limit=limit)

        # Set the retrieved documents as attributes on the span
        for i, document in enumerate(results.objects):
            span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid))
            span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata))
            span.set_attribute(
                f"retrieval.documents.{i}.document.content", str(document.properties)
            )  

        return results

In [11]:
# Process and format the retrieved results
@tracer.chain 
def format_context(results):
    context = ""
    for item in results.objects:
        properties = item.properties
        context += f"Question: {properties['question']}\n"
        context += f"Answer: {properties['answer']}\n"
    return context

In [12]:
# Create a prompt with the retrieved information
@tracer.chain
def create_prompt(query_text, context):
    prompt = f"""
Based on the following information, please answer the FAQ related question: "{query_text}"

Relevant FAQ (ordered by relevance):
{context}
"""
    return prompt

### LLM call with `openai` library

In [13]:
import httpx
from openai import OpenAI, DefaultHttpxClient

In [14]:
# Custom transport to bypass SSL verification
transport = httpx.HTTPTransport(local_address="0.0.0.0", verify=False)

# Create a DefaultHttpxClient instance with the custom transport
http_client = DefaultHttpxClient(transport=transport)

# You can use any openai compatible endpoint here!
llm_client = OpenAI(
    api_key = '', 
    base_url="http://proxy.dlai.link/coursera_proxy/together/",
   http_client=http_client, 
)

In [15]:
# There is no need to trace as the auto_instrument was set to true
def query_openai(prompt):
    response = llm_client.chat.completions.create(
        model="meta-llama/Llama-3.2-3B-Instruct-Turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant from a customer support."},
            {"role": "user", "content": prompt},
        ],
    )
    return response.choices[0].message.content

In [16]:
@tracer.chain
def rag_pipeline(query):
    # Execute the query
    retrieved_documents = retrieve(query)
    context = format_context(retrieved_documents)
    
    # Create a prompt with the retrieved information
    final_prompt = create_prompt(query, context)
    
    # Execute the OpenAI query
    final_answer = query_openai(final_prompt)

    return final_answer

In [17]:
response = rag_pipeline("Can I get a refund or exchange for another shirt?")
print(response)

Based on the provided information, the answer to your question is:

"No, you cannot get a refund or exchange for another shirt if it's a sale item, unless stated otherwise. However, if the item is not a sale item and you initiate an exchange through our Returns Center within 30 days of delivery, we can process your return and provide a replacement. Please note that return shipping costs are covered for domestic returns, but you will be responsible for the cost of international returns."


In [18]:
response = rag_pipeline("What are your working hours?")
print(response)

Based on the provided information, the answer to the FAQ question "What are your working hours?" is:

Our customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.


In [19]:
# Checkout the traces in the Phoenix UI!
utils.make_url()

[1mFOLLOW THIS URL TO OPEN THE UI: http://widxvlspbcig.labs.coursera.org[0m
