# Quickstart

This tutorial demonstrates how to use the `Baseten` model class in async mode to perform language model-based evaluations using Flow-Judge-v0.1 deployed model on Baseten. For detailed instructions on how to use Baseten, visit the [Baseten readme](https://github.com/flowaicom/flow-judge/blob/main/flow_judge/models/adapters/baseten/README.md).

## Setup

Let's instantiate the `Baseten` model class in async mode. The async implementation makes use of Baseten's async inference approach. See [here](https://docs.baseten.co/invoke/async).

You can imagine this as *fire-and-forget* functionality. Completion requests are made to the deployed model, once data is processed and inference is complete, the output is sent to a predefined webhook. The webhook url is part of the original request. The `Flow-Judge` library then connects with the webhook and *listens* for a response. The library makes use of this approach to allow configurability for concurrent execution.

Optionally Flow AI has deployed a webhook proxy that accepts this request signature and feeds-it-forward to the client. This can be found under the URL: "https://proxy.flow-ai.dev"

### Pre-requisite

1. Sign-up to [Baseten](https://www.baseten.co/)
2. Generate a Baseten API Key from [here](https://app.baseten.co/settings/api_keys)
3. Generate a Webhook secret from [here](https://app.baseten.co/settings/secrets)

### Additional Requirements

Set your `Baseten API key`, `Webhook secret` and `GPU` option in the environment.

In [1]:
import os

os.environ["BASETEN_WEBHOOK_SECRET"] = "your_baseten_webhook_secret"
os.environ["BASETEN_API_KEY"] = "your_baseten_api_key"

# You can optionally switch the GPU to H100.
# This will deploy the FlowJudge model on H100 40GB
# A10G deployment is Flow-Judge-v0.1-AWQ
# H100 deployment is Flow-Judge-v0.1-FP8
# !! Manually changing the hardware on Baseten's UI may cause compatibility issues !!
os.environ["BASETEN_GPU"] = "A10G"

### Instantiate the Baseten model

Set the following required options for async execution mode of the Baseten model class: 
1. `exec_async=True`
2. `webhook_proxy_url=https://proxy.flow-ai.dev` (or [run the proxy locally](https://github.com/flowaicom/flow-judge/blob/main/flow_judge/models/adapters/baseten/README.md))

Optionally you can set the `async_batch_size` option to a value > 0 (defaults to `128`). This is the number of concurrent requests sent to the deployed model. It is associated with the concurrency goals you want to achieve and can be actively configured in Baseten's UI. For more information, see [here](https://docs.baseten.co/performance/concurrency). Our current deployment configuration allows a concurrency target of `128` and max replica of `1` for the deployed model as the default on Baseten. This means if you have max replica set to 1 on Baseten, it can accept concurrent requests of `128`. The batch size you set for the Baseten model class should be equivalent to the number of `concurrency_target * number_of_replicas`

In [None]:
from flow_judge import Baseten, AsyncFlowJudge
from flow_judge.metrics import RESPONSE_FAITHFULNESS_5POINT

# Async model execution
model = Baseten(
    webhook_proxy_url="https://proxy.flow-ai.dev",
    exec_async=True,
)

# Instantiate the Async Judge with the model and a metric
# The library includes multiple default metrics and you can implement your own.
faithfulness_judge = AsyncFlowJudge(
    metric=RESPONSE_FAITHFULNESS_5POINT,
    model=model,
)

## Running Evaluations

Let's test batched evaluations with our example csr data on the faithfulness 5 point likert.

We use the `async_batch_evaluate` method from the AsyncFlowJudge class. Underneath this uses batched processing utilizing the batch_size set with the `async_batch_size` argument of the Baseten model class. If there are failures, for example with networking, the batch will process and errors will be propagated as log outputs. The output would include the successful responses.

In [None]:
# Read the sample data
import json
from flow_judge import EvalInput
with open("sample_data/csr_assistant.json", "r") as f:
    data = json.load(f)

# Create a list of inputs and outputs
inputs_batch = [
    [
        {"query": sample["query"]},
        {"context": sample["context"]},
    ]
    for sample in data
]
outputs_batch = [{"response": sample["response"]} for sample in data]

# Create a list of EvalInput
eval_inputs_batch = [EvalInput(inputs=inputs, output=output) for inputs, output in zip(inputs_batch, outputs_batch)]

# Run the batch evaluation
results = await faithfulness_judge.async_batch_evaluate(eval_inputs_batch, save_results=False)

In [None]:
from IPython.display import Markdown, display

# Visualizing the results
for i, result in enumerate(results):
    display(Markdown(f"__Sample {i+1}:__"))
    display(Markdown(f"__Feedback:__\n{result.feedback}\n\n__Score:__\n{result.score}"))
    display(Markdown("---"))

Similarly you can run a single evaluation task using the `async_evaluate` method on the `AsyncFlowJudge` class. Under the hood, this will process a single async request and attach listeners to the webhook for the response.

In [None]:
result = await faithfulness_judge.async_evaluate(eval_inputs_batch[0], save_results=False)

In [None]:
# Display the result
display(Markdown(f"__Feedback:__\n{result.feedback}\n\n__Score:__\n{result.score}"))