# RAG Workbench - Distributed Tracing Tutorial

In this example notebook, we showcase how to implement distributed tracing using the LastMile Tracing SDK and RAG Workbench UI.

## Notebook Outline
* [Introduction](#intro)
* [Install and Setup](#setup)
* [Step 1: Instrument server code](#step1)
* [Step 2: Instrument client code](#step2)
* [Step 3: View Results in RAG Workbench UI](#step3)

<a name="intro"></a>
# Introduction
**Distributed tracing** allows you to track and analyze the flow of requests across multiple services or components in a distributed system.

This example involves the client sending a request to a server to generate a riddle with `gpt-3.5-turbo`. The server uses OpenAI's `gpt-3.5-turbo` to generate the riddle and returns it to the client. The LastMile Tracing SDK is used to to instrument tracing and capture trace information.

<a name="setup"></a>

# Isntall and Setup

First install the required packages.

In [1]:
!pip install lastmile-eval --upgrade
!pip install multiprocess
!pip install flask



We also need the following API tokens/keys:

* **LastMile AI API Token:** Go to the [LastMile Settings page](https://lastmileai.dev/settings?page=tokens). You will need to first create a LastMile AI account.
* **OpenAI API Key:** Go to [OpenAI API Keys page](https://platform.openai.com/account/api-keys) to create and access your OpenAI API Key.

Run the code cell below after setting the keys either in **Google Colab Secrets** or in `.env` in your directory. Avoid inputting keys directly in the notebook.

In [2]:
import os

try:
    # If running on Google Colab, use userdata to securely input keys
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    LASTMILE_API_TOKEN = userdata.get('LASTMILE_API_TOKEN')
except ModuleNotFoundError:
    # If running locally, load keys from .env file
    from dotenv import load_dotenv
    load_dotenv()
    OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
    LASTMILE_API_TOKEN = os.getenv('LASTMILE_API_TOKEN')

os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
os.environ['LASTMILE_API_TOKEN'] = LASTMILE_API_TOKEN

<a name="step1"></a>
# Step 1: Instrument Server Code
In this step, we are starting a Flask server that expsoses an endpoint for generating riddles. The server uses a `LastMile Tracer` for distributed tracing (across the client).


In [3]:
import json

import multiprocess
from flask import Flask, request
from openai import OpenAI

from lastmile_eval.rag.debugger.api import LastMileTracer
from lastmile_eval.rag.debugger.tracing import get_lastmile_tracer


def server():
    # Instantiate LastMile Tracer
    tracer: LastMileTracer = get_lastmile_tracer(
        tracer_name="my-tracer",
        project_name="Distributed-Tracing",
    )

    app = Flask(__name__)

    @app.route("/generate")
    def generate()-> str:
        """
        Endpoint that generates a riddle using OpenAI's GPT-3.5-turbo model.
        """
        span_context = request.headers.get("span") # Expect a span to be passed to this endpoint

        # Trace the span
        with tracer.start_as_current_span("generate_endpoint", context = span_context):
            response = OpenAI().chat.completions.create(messages = [{"role": "user", "content":"tell me a riddle"}], model = "gpt-3.5-turbo")
            riddle = response.choices[0].message.content
            return json.dumps(riddle)

    app.run(port=1234, debug=False)

# Start Server in a Subprocess to avoid blocking execution of succeeding cells
process = multiprocess.Process(target=server)
process.start()

<a name="step2"></a>
# Step 2: Instrument Client Code
In order distribute the trace context, we need to export the span context to a string and pass it as a header to the server.

In [4]:
import requests
from opentelemetry import trace

from lastmile_eval.rag.debugger.api import LastMileTracer
from lastmile_eval.rag.debugger.tracing import export_span, get_lastmile_tracer

# Instantiate Tracer2
tracer2: LastMileTracer = get_lastmile_tracer(
    tracer_name="my-tracer",
    project_name="Distributed-Tracing"
)

@tracer2.start_as_current_span("client")
def client_say_riddle():
    try:
        print("Message: Sending request to subprocess server, with span context.")

        # Export Span
        response = requests.get('http://127.0.0.1:1234/generate', headers={"span": export_span(trace.get_current_span())})
        return response.text

    except Exception as e:
        print(e)

client_say_riddle()

Message: Sending request to subprocess server, with span context.


'"I speak without a mouth and hear without ears. I have no body, but I come alive with wind. What am I?"'

<a name="step3"></a>

# Step 3: View Results in RAG Workbench UI
Now we can view our distributed trace in the RAG Workbench UI!

From your terminal, export your LASTMILE_API_TOKEN

```bash
export LASTMILE_API_TOKEN="<your-api-token>"
```

Next, run the following command in your terminal to launch the UI:

```bash
rag-debug launch
```

Navigate to the url provided by the RAG Workbench (opens up your web browser). This will look like http://localhost:8080/

1. Click the **Traces Tab**.
2. Select Project **'Distributed Tracing'**.

TODO (tanya) - The traces aren't showing up under the Project - Distributed Tracing

<img width="850" alt="Screen Shot 2024-05-28 at 1 28 58 AM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/35393b82-4d1e-4346-afc3-e4f6d99cd765/">

3. Let's click into the Trace.

<img width="850" alt="Screen Shot 2024-05-28 at 1 30 29 AM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/d25b9b96-2e30-4dfe-8245-076ae860c50c"/>

Here we can see that the server span with `generate_endpoint` has been successfully propagated to the client span with the Tracer.

This shows that we're able to trace across multiple services or components in a distributed system.