# LLM Tracing

With the Parea SDK, you can gain visibility into **any LLM application**. Together with the web application, Parea speeds up your debugging, evaluating, and monitoring workflows.
Parea is also framework and provider-agnostic. Parea traces your prompts and chains, whether deployed from Parea or within your codebase.

For this example, ee will create a simple chat app and instrument trace logs and feedback with Parea. We will also add tags and other metadata to enrich our traces. The chat app uses three 'chained' components to generate a text argument on a provided subject:

1. An argument generation function
2. Critique function
3. Refine function

Each function will call an LLM provider; in our case, we'll use OpenAI, but you could easily call any other provider. Parea's log dashboard provides a detailed trace of your LLM calls, so you can step into the details of each step for further analysis and investigation.

![DashboardDetailedView](img/dashboard_detailed_view.png)

Let's go!

## Prerequisites

First, install the parea-ai SDK package. If you have an account with Parea, your LLM API Keys will be automatically used, so you won't need to redefine them here.
All you need is your Parea API key. Follow the instructions in the [docs](https://docs.parea.ai/api-reference/authentication) to get your api keys.

In [None]:
%pip install -U parea-ai > /dev/null
%pip install attrs > /dev/null

Next, configure the API Key in the environment to log traces to your account.

In [3]:
import os

os.environ["PAREA_API_KEY"] = "<your-api-key>"

## Using the SDK

Next, define your chat application. Using the trace decorator will automatically generate a trace_id for each of your LLM calls. Nested functions with a trace decorator will automatically get associated with a parent trace. The decorated functions inputs/outputs, name, and other information are recorded and visible on Parea's log dashboard. Note: Logging is executed on a background thread to avoid blocking your app's execution.

We've created three prompts on Parea and have deployed them. Learn how to deploy a prompt [here](https://docs.parea.ai/deployments/deployments).

![Deployed_Prompts](img/deployed_prompts.png)

Now we only need the deployment id for each prompt to get started. You can also do this without a deployed prompt for the same experience (example [here](https://github.com/parea-ai/parea-sdk-py/blob/fc506a8fa9b5a118b15918cc00cdc5e323dcf9bb/parea/cookbook/tracing_without_deployed_prompt.py)).

In [12]:
from datetime import datetime
from parea.schemas import Completion
from parea import Parea, trace

p = Parea(api_key=os.getenv("PAREA_API_KEY"))


def argument_generator(query: str, additional_description: str = "") -> str:
    return p.completion(
        Completion(
            deployment_id="p-tbFUZ5rRaXshj8o5Opfyr",
            llm_inputs={
                "additional_description": additional_description,
                "date": f"{datetime.now()}",
                "query": query,
            },
        )
    ).content


def critic(argument: str) -> str:
    return p.completion(
        Completion(
            deployment_id="p-iAuVLFHy6VypfGZxwAWW0",
            llm_inputs={"argument": argument},
        )
    ).content


def refiner(query: str, additional_description: str, current_arg: str, criticism: str) -> str:
    return p.completion(
        Completion(
            deployment_id="p-rEjM4X10rJomOD8Rj9gzJ",
            llm_inputs={
                "additional_description": additional_description,
                "date": f"{datetime.now()}",
                "query": query,
                "argument": current_arg,
                "criticism": criticism,
            },
        )
    ).content


# Non deployed version
# from parea.schemas import LLMInputs, Message, ModelParams, Role


# def argument_generator(query: str, additional_description: str = "") -> str:
#   return p.completion(
#     Completion(llm_configuration=LLMInputs(model="gpt-3.5-turbo", model_params=ModelParams(temp=0),
#         messages=[
#           Message(role=Role.system,
#                   content=f"You are a debater making an argument on a topic." f"{additional_description}" f" The current time is {datetime.now()}"),
#           Message(role=Role.user, content=f"The discussion topic is {query}"),
#         ],
#       )
#     )
#   ).content


# def critic(argument: str) -> str:
#   return p.completion(
#     Completion(llm_configuration=LLMInputs(model="gpt-3.5-turbo", model_params=ModelParams(temp=0),
#         messages=[
#           Message(
#             role=Role.system,
#             content=f"You are a critic."
#                     "\nWhat unresolved questions or criticism do you have after reading the following argument?"
#                     "Provide a concise summary of your feedback.",
#           ),
#           Message(role=Role.system, content=argument),
#         ],
#       )
#     )
#   ).content


# def refiner(query: str, additional_description: str, current_arg: str, criticism: str) -> str:
#   return p.completion(
#     Completion(llm_configuration=LLMInputs(model="gpt-3.5-turbo", model_params=ModelParams(temp=0),
#         messages=[
#           Message(
#             role=Role.system,
#             content=f"You are a debater making an argument on a topic. {additional_description}."
#                     f"The current time is {datetime.now()}",
#           ),
#           Message(role=Role.user, content=f"The discussion topic is {query}"),
#           Message(role=Role.assistant, content=current_arg),
#           Message(role=Role.user, content=criticism),
#           Message(role=Role.system, content="Please generate a new argument that incorporates the feedback "
#                                             "from the user."),
#         ],
#       )
#     )
#   ).content


@trace
def argument_chain(query: str, additional_description: str = "") -> str:
    argument = argument_generator(query, additional_description)
    criticism = critic(argument)
    return refiner(query, additional_description, argument, criticism)

Now call the chain. If you set up your API key correctly at the start of this notebook, all the results should be traced to [Parea](https://www.app.parea.ai/logs). We will prompt the app to generate an argument that coffee is good for you.

In [13]:
result1 = argument_chain(
    "Whether coffee is good for you.",
    additional_description="Provide a concise, few sentence argument on why coffee is good for you.",
)
print(result1)

Coffee is good for you because it contains several bioactive compounds, such as caffeine and chlorogenic acids, which have been extensively studied and shown to have various health benefits. Caffeine, in moderate amounts, can enhance alertness, improve cognitive function, and boost physical performance. Chlorogenic acids have antioxidant and anti-inflammatory properties, which can help protect against chronic diseases like heart disease and certain types of cancer. Numerous studies have also linked coffee consumption to a reduced risk of developing conditions such as type 2 diabetes, Parkinson's disease, and liver disease. However, it's important to note that individual responses to coffee can vary, and excessive consumption or added sugars and unhealthy additives can negate the potential benefits. As with any dietary choice, moderation and mindful consumption are key.


![Logs](./img/logs.png)

## Recording feedback

The above is all you need to save your app's traces to Parea! You can try changing the functions or raising errors in the above code to see how it's visualized in [Parea](https://www.app.parea.ai/logs).

You can use the trace_id for other things like monitoring user feedback. You can use the get_current_trace_id() helper function to get the trace_id from within the function context.

Below, our `argument_chain2` function is identical to the previous one except that we return the trace_id for use outside the function context.

In [6]:
from parea import get_current_trace_id
from typing import Tuple


@trace
def argument_chain2(query: str, additional_description: str = "") -> Tuple[str, str]:
    trace_id = get_current_trace_id()  # get parent's trace_id
    argument = argument_generator(query, additional_description)
    criticism = critic(argument)
    return refiner(query, additional_description, argument, criticism), trace_id

In [7]:
result, trace_id = argument_chain2(
    "Whether coffee is good for you.",
    additional_description="Provide a concise, few sentence argument on why coffee is good for you.",
)
print(trace_id)

749c63be-74b4-4134-9a0e-a499c9d58f59


With the trace_id, you can now log feedback from a user after the run is completed. Feedback scores range from 0.0 (bad) to 1.0 (good).

In [8]:
from parea.schemas import FeedbackRequest

p.record_feedback(FeedbackRequest(trace_id=trace_id, score=0.7))

![Feedback](./img/feedback.png)

The completion response from the SDK also has other useful information.You can get useful statistics such as tokens, latency, whether the call was cached and more.

Let's return the CompletionResponse object and examine the response.

In [9]:
import json
from parea.schemas import CompletionResponse
from attrs import asdict


# let's return the full CompletionResponse to see what other information is returned
def refiner2(query: str, additional_description: str, current_arg: str, criticism: str) -> CompletionResponse:
    return p.completion(
        Completion(
            deployment_id="p-rEjM4X10rJomOD8Rj9gzJ",
            llm_inputs={
                "additional_description": additional_description,
                "date": f"{datetime.now()}",
                "query": query,
                "argument": current_arg,
                "criticism": criticism,
            },
        )
    )


@trace
def argument_chain3(query: str, additional_description: str = "") -> CompletionResponse:
    argument = argument_generator(query, additional_description)
    criticism = critic(argument)
    return refiner2(query, additional_description, argument, criticism)


result2 = argument_chain3(
    "Whether coffee is good for you.",
    additional_description="Provide a concise, few sentence argument on why coffee is good for you.",
)
print(json.dumps(asdict(result2), indent=2))

{
  "inference_id": "4a518e40-410c-4fc6-9521-d7df9295a50f",
  "content": "Coffee is good for you because numerous studies have shown its potential health benefits. For example, research has indicated that coffee consumption may lower the risk of certain diseases, including Parkinson's disease, liver disease, and certain types of cancer. Additionally, coffee contains caffeine, which can improve focus, alertness, and cognitive performance. Moreover, the antioxidants in coffee can help protect against oxidative stress and inflammation, which are linked to various health conditions. Therefore, incorporating coffee into a balanced diet can contribute to overall well-being.",
  "latency": 2.87,
  "input_tokens": 234,
  "output_tokens": 102,
  "total_tokens": 336,
  "cost": 0.0008,
  "model": "gpt-3.5-turbo-0613",
  "provider": "OpenAIProvider('gpt-3.5-turbo-0613')",
  "cache_hit": false,
  "status": "success",
  "start_timestamp": "2023-10-18 19:23:32 UTC",
  "end_timestamp": "2023-10-18 19:

## Enriching traces

One way to make your application traces more useful or actionable is add tags or metadata to the logs. The trace decorator accepts additional properties such as:

- tags: List[str]
- metadata: Dict[str, str] - arbitrary key-value metadata
- target: str - a gold standard/expected output
- end_user_identifier: str - unique identifier for your end user

Below is an example. Note: you can also define these properties on the Completion object itself!

In [10]:
from typing import Tuple


# you can also add metadata and tags via the decorator
@trace(
    tags=["cookbook-example-deployed", "feedback_tracked-deployed"],
    metadata={"source": "python-sdk", "deployed": "True"},
)
def argument_chain_tags_metadata(query: str, additional_description: str = "") -> Tuple[CompletionResponse, str]:
    trace_id = get_current_trace_id()  # get parent's trace_id
    argument = argument_generator(query, additional_description)
    criticism = critic(argument)
    return refiner2(query, additional_description, argument, criticism), trace_id

In [11]:
import json
from attrs import asdict

result2, trace_id = argument_chain_tags_metadata(
    "Whether coffee is good for you.",
    additional_description="Provide a concise, few sentence argument on why coffee is good for you.",
)
print(json.dumps(asdict(result2), indent=2))

p.record_feedback(
    FeedbackRequest(
        trace_id=trace_id,
        score=0.7,  # 0.0 (bad) to 1.0 (good)
        target="Coffee is wonderful. End of story.",
    )
)

{
  "inference_id": "fb7910bd-0438-4f80-b128-a81b81f6e5a3",
  "content": "Coffee is good for you because it contains a high amount of antioxidants, which have been linked to various health benefits. While the argument acknowledges that causation cannot be definitively proven, numerous studies have shown a consistent association between coffee consumption and a reduced risk of chronic diseases, such as heart disease, certain cancers, and neurodegenerative disorders like Parkinson's disease. Additionally, moderate coffee consumption has been associated with improved cognitive function, increased metabolism, and a lower risk of developing conditions like type 2 diabetes. While individual differences and potential negative effects should be considered, the overall body of research suggests that coffee can be a beneficial addition to a healthy lifestyle.",
  "latency": 3.98,
  "input_tokens": 469,
  "output_tokens": 131,
  "total_tokens": 600,
  "cost": 0.0012,
  "model": "gpt-3.5-turbo-061

Now you can navigate to the detailed logs with the trace_id to see the additional data.

![MetaData](./img/meta_data.png)

You can see all of your logs on the main dashboard and filter, search, and sort by various criteria.

![Dashboard](./img/dashboard.png)

## Recap
You made an example LLM application in this walkthrough and instrumented it using Parea's SDK.

You also added tags and metadata and even logged feedback to the logs. The SDK integrates wonderfully with your deployed prompts on Parea, keeping your code flexible and lightweight. Now you can iterate, debug, and monitor your application with ease.
