# 🏃‍♀️ Quickstart

Get started using Weave to:
- Log and debug language model inputs, outputs, and traces
- Build rigorous, apples-to-apples evaluations for language model use cases
- Organize all the information generated across the LLM workflow, from experimentation to evaluations to production

See the full Weave documentation [here](https://wandb.me/weave).


## 🪄 Install `weave` library and login


Start by installing the library and logging in to your account.

In this example, we're using openai so you should [add an openai API key](https://platform.openai.com/docs/quickstart/step-2-setup-your-api-key).


In [1]:
import os
import wandb

#========================================
# Set up your environment variables properly
#========================================
# os.environ["WANDB_BASE_URL"] = "https://api.wandb.ai"
# os.environ["OPENAI_API_KEY"] = ""
# os.environ["WANDB_API_KEY"] = ""

wandb.login()
PROJECT = "keisuke-kamata/weave-handson-kamata"

[34m[1mwandb[0m: Currently logged in as: [33mkeisuke-kamata[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


# Track inputs & outputs of functions

Weave allows users to track function calls: the code, inputs, outputs, and even LLM tokens & costs! In the following sections we will cover:

* Custom Functions
* Vendor Integrations
* Nested Function Calling
* Error Tracking

Note: in all cases, we will:

```python
import weave                    # import the weave library
weave.init('project-name')      # initialize tracking for a specific W&B project
```

## Track custom functions

Add the @weave.op decorator to the functions you want to track

In [3]:
import weave

weave.init(PROJECT)

@weave.op()
def strip_user_input(user_input):
    return user_input.strip()


result = strip_user_input("    hello    ")
print(result)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
hello
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/0197864f-a1e1-7a72-b9dc-a14d7c003593


You can find your interactive dashboard by clicking any of the  👆 wandb links above.

After adding `weave.op` and calling the function, visit the link and see it tracked within your project.

💡 We automatically track your code, have a look at the code tab!

## Vendor Integrations (OpenAI, Anthropic, Mistral, etc...)

Here, we're automatically tracking all calls to `openai`. We automatically track a lot of LLM libraries, but it's really easy to add support for whatever LLM you're using, as you'll see below.

In [4]:
from openai import OpenAI
import weave

weave.init(PROJECT)

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    messages=[
        {
            "role": "system",
            "content": "You are a grammar checker, correct the following user input.",
        },
        {"role": "user", "content": "That was so easy, it was a piece of pie!"},
    ],
    temperature=0,
)
generation = response.choices[0].message.content
print(generation)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/0197864f-b0a3-7c43-9eb2-4b0e2aedfd0a
That was so easy, it was a piece of cake!


## Track nested functions

Now that you've seen the basics, let's combine all of the above and track some deeply nested functions alongside LLM calls.



In [5]:
from openai import OpenAI

import weave

weave.init(PROJECT)


@weave.op()
def strip_user_input(user_input):
    return user_input.strip()


@weave.op()
def correct_grammar(user_input):
    client = OpenAI()

    stripped = strip_user_input(user_input)
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=[
            {
                "role": "system",
                "content": "You are a grammar checker, correct the following user input.",
            },
            {"role": "user", "content": stripped},
        ],
        temperature=0,
    )
    return response.choices[0].message.content


result = correct_grammar("   That was so easy, it was a piece of pie!    ")
print(result)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/0197864f-be17-7030-a122-8aa3a1584369
That was so easy, it was a piece of cake!


## Track Errors

Whenever your code crashes, weave will highlight what caused the issue. This is especially useful for finding things like JSON parsing issues that can occasionally happen when parsing data from LLM responses.

In [6]:
import json
from openai import OpenAI
import weave

weave.init(PROJECT)

@weave.op()
def strip_user_input(user_input):
    return user_input.strip()


@weave.op()
def correct_grammar(user_input):
    client = OpenAI()

    stripped = strip_user_input(user_input)
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=[
            {
                "role": "system",
                "content": "You are a grammar checker, correct the following user input.",
            },
            {"role": "user", "content": stripped},
        ],
        temperature=0,
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)


result = correct_grammar("   That was so easy, it was a piece of pie!    ")
print(result)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/0197864f-c96a-7c02-a424-864425cfc430


BadRequestError: Error code: 400 - {'error': {'message': "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}}

# Trace Advanced Tips

* Customize logged inputs and outputs
* Control Sampling rate
* Call Display Name

In [None]:
from dataclasses import dataclass
from typing import Any
import weave

@dataclass
class CustomObject:
    x: int
    secret_password: str

def postprocess_inputs(inputs: dict[str, Any]) -> dict[str, Any]:
    return {k:v for k,v in inputs.items() if k != "hide_me"}

def postprocess_output(output: CustomObject) -> CustomObject:
    return CustomObject(x=output.x, secret_password="REDACTED")

@weave.op(
    postprocess_inputs=postprocess_inputs,
    postprocess_output=postprocess_output,
)
def func(a: int, hide_me: str) -> CustomObject:
    return CustomObject(x=a, secret_password=hide_me)

weave.init(PROJECT)
func(a=1, hide_me="password123")

### Control sampling rate

You can control how frequently an op's calls are traced by setting the tracing_sample_rate parameter in the @weave.op decorator. This is useful for high-frequency ops where you only need to trace a subset of calls.

Note that sampling rates are only applied to root calls. If an op has a sample rate, but is called by another op first, then that sampling rate will be ignored.

In [None]:
@weave.op(tracing_sample_rate=0.1)  # Only trace ~10% of calls
def high_frequency_op(x: int) -> int:
    return x + 1

@weave.op(tracing_sample_rate=1.0)  # Always trace (default)
def always_traced_op(x: int) -> int:
    return x + 1

### Call Display Name

In [None]:
import weave

weave.init(PROJECT)
# Decorate your function
@weave.op
def my_function(name: str):
    return f"Hello, {name}!"

# Call your function -- Weave will automatically track inputs and outputs
print(my_function("World"))

In [None]:
# 1st method
weave.init(PROJECT)
result = my_function("World", __weave={"display_name": "My Custom Display Name"})

In [None]:
# 2nd method
weave.init(PROJECT)
result, call = my_function.call("World")
call.set_display_name("My Custom Display Name")


In [None]:
# 3rd method
@weave.op(call_display_name="My Custom Display Name")
def my_function(name: str):
    return f"Hello, {name}!"
weave.init(PROJECT)
my_function("World")


### Redacting PII

Some organizations process Personally Identifiable Information (PII) such as names, phone numbers, and email addresses in their Large Language Model (LLM) workflows. Storing this data in Weights & Biases (W&B) Weave poses compliance and security risks.

The Sensitive Data Protection feature allows you to automatically redact Personally Identifiable Information (PII) from a trace before it is sent to Weave servers. This feature integrates Microsoft Presidio into the Weave Python SDK, which means that you can control redaction settings at the SDK level.

[Detailed Documentation](https://weave-docs.wandb.ai/guides/tracking/redact-pii)

# Tracking Objects

Organizing experimentation is difficult when there are many moving pieces. You can capture and organize the experimental details of your app like your system prompt or the model you're using within `weave.Objects`. This helps organize and compare different iterations of your app. In this section, we will cover:

* General Object Tracking
* Tracking Models
* Tracking Datasets

## Prompt Tracking


Weave is unopinionated about how a Prompt is constructed. If your needs are simple, you can use our built-in weave.StringPrompt or weave.MessagesPrompt classes. If your needs are more complex you can subclass those or our base class weave.Prompt and override the format method.

When you publish one of these objects with weave.publish, it will appear in your Weave project on the "Prompts" page.


In [7]:
# StringPrompt1
weave.init(PROJECT)

system_prompt = weave.StringPrompt("You are a pirate")
weave.publish(system_prompt, name="pirate_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "system",
      "content": system_prompt.format()
    },
    {
      "role": "user",
      "content": "Explain general relativity in one paragraph."
    }
  ],
)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
📦 Published to https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave/objects/pirate_prompt/versions/vw0B516qfp9QzLXdYuVMyg2qcQOpAAyO5NxcDYN9By0
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978651-66de-77e0-a978-54c14060ce7c


In [8]:
# StringPrompt2
weave.init(PROJECT)

system_prompt = weave.StringPrompt("Talk like a pirate. I need to know I'm listening to a pirate.")
weave.publish(system_prompt, name="pirate_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "system",
      "content": system_prompt.format()
    },
    {
      "role": "user",
      "content": "Explain general relativity in one paragraph."
    }
  ],
)


weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
📦 Published to https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave/objects/pirate_prompt/versions/k7ha9GuhcoHmXkWTDskhVrh5cBlGakVl4Y6Cj6UJz0o
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978651-7ad0-7ae0-8ec3-8082b5946f5a


In [9]:
# MessagesPrompt1
weave.init(PROJECT)


prompt = weave.MessagesPrompt([
    {
        "role": "system",
        "content": "You are a stegosaurus, but don't be too obvious about it."
    },
    {
        "role": "user",
        "content": "What's good to eat around here?"
    }
])
weave.publish(prompt, name="dino_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=prompt.format(),
)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
📦 Published to https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave/objects/dino_prompt/versions/mhaLGywDiHMP5fPncCOVCVLByRKXbOVGwVbBBqwM2d0
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978651-919e-7e42-9e41-57469ab5db3a


In [10]:
# parameterizing prompts

weave.init(PROJECT)

prompt = weave.StringPrompt("Solve the equation {equation}")
weave.publish(prompt, name="calculator_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": prompt.format(equation="1 + 1 = ?")
    }
  ],
)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
📦 Published to https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave/objects/calculator_prompt/versions/IlIaTulvlYDYMNJ8Z3ho6jwEl7Q5uFXgXylunHDc2tg
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978651-a708-7d62-a538-cba3d139fc7e


## Model Tracking

Models are so common of an object type, that we have a special class to represent them: `weave.Model`. The only requirement is that we define a `predict` method.

In [13]:
from openai import OpenAI

import weave

weave.init(PROJECT)

class OpenAIGrammarCorrector(weave.Model):
    # Properties are entirely user-defined
    openai_model_name: str
    system_message: str

    @weave.op()
    def predict(self, user_input):
        client = OpenAI()

        response = client.chat.completions.create(
            model=self.openai_model_name,
            messages=[
                {"role": "system", "content": self.system_message},
                {"role": "user", "content": user_input},
            ],
        )
        return response.choices[0].message.content

corrector = OpenAIGrammarCorrector(
    openai_model_name="o3-mini-2025-01-31",
    system_message="You are a grammar checker, correct the following user input.",
)


result = corrector.predict("     That was so easy, it was a piece of pie!       ")
print(result)

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978652-56c8-7f61-b0da-976c663b38f4
That was so easy; it was a piece of pie!


## Dataset Tracking

Similar to models, a `weave.Dataset` object exists to help track, organize, and operate on datasets

In [14]:
dataset = weave.Dataset(
    name="grammar-correction",
    rows=[
        {
            "user_input": "   That was so easy, it was a piece of pie!   ",
            "expected": "That was so easy, it was a piece of cake!",
        },
        {"user_input": "  I write good   ",
         "expected": "I write well"},
        {
            "user_input": "  GPT-3 is smartest AI model.   ",
            "expected": "GPT-3 is the smartest AI model.",
        },
    ],
)

weave.publish(dataset)

📦 Published to https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave/objects/grammar-correction/versions/ufehp2WGGKN38xlpCxYy9Zv4wgVw0GWWleYBvZDjOf4


ObjectRef(entity='keisuke-kamata', project='weave-handson-kamata', name='grammar-correction', _digest='ufehp2WGGKN38xlpCxYy9Zv4wgVw0GWWleYBvZDjOf4', _extra=())

Notice that we saved a versioned `GrammarCorrector` object that captures the configurations you're experimenting with.

## Retrieve Published Objects & Ops

You can publish objects and then retrieve them in your code. You can even call functions from your retrieved objects!

In [None]:
import weave

weave.init(PROJECT)
ref_url = ""
prompt = weave.ref(ref_url).get()

print(prompt)

# Offline Evaluation



## Method1: [Standard Method](https://weave-docs.wandb.ai/guides/core-types/evaluations)
Forces you to do both prediction and evaluation sample by sample

In [15]:
# Method1
import weave
from weave import Evaluation

# Our dataset has "input_text" but our model expects "question"
examples = [
    {"input_text": "What is the capital of France?", "expected": "Paris"},
    {"input_text": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
    {"input_text": "What is the square root of 64?", "expected": "8"},
]

@weave.op()
def preprocess_example(example):
    # Rename input_text to question
    return {
        "question": example["input_text"]
    }

@weave.op()
def match_score(expected: str, output: dict) -> dict:
    return {'match': expected == output['generated_text']}

@weave.op()
def function_to_evaluate(question: str):
    return {'generated_text': f'Answer to: {question}'}

# Create evaluation with preprocessing
evaluation = weave.Evaluation(
    dataset=examples,
    scorers=[match_score],
    preprocess_model_input=preprocess_example
)

# Initialize the Weave project
weave.init(PROJECT)

# In Jupyter/Colab, use 'await' instead of 'asyncio.run'
async def run_eval():
    await evaluation.evaluate(function_to_evaluate)

await run_eval()


weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave


🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978652-ca87-7b60-b920-190fd8609a7f
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978652-d0bc-70a2-8ff7-c15b49dc1870
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978652-f376-7bb0-9bea-66b6bb35fd9b
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-09e4-7db3-b292-db9c2715b13c
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-0da5-72d1-95ae-550aaa4aada4
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-0fa6-7e31-847b-c539a31f6f45
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-0e1c-77b1-95ce-4ab917fd7c38
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-1ce7-76b2-b3ca-c0ec70e73424


## Method2: [EvaluationLogger](https://weave-docs.wandb.ai/guides/evaluation/evaluation_logger)
Allows you to do batch predictions first!

In [16]:
import weave
from weave.flow.eval_imperative import EvaluationLogger

# Initialize the logger with optional metadata
eval_logger = EvaluationLogger(
    model="my_local_model",
    dataset="my_dataset"
)

# Example input data
eval_samples = [
    {'inputs': {'a': 1, 'b': 2}, 'expected': 3},
    {'inputs': {'a': 2, 'b': 3}, 'expected': 5},
    {'inputs': {'a': 3, 'b': 4}, 'expected': 7},
]

# Local model logic: simply add the numbers
@weave.op
def user_model(a: int, b: int) -> int:
    return a + b

# Evaluate each sample
for sample in eval_samples:
    inputs = sample["inputs"]
    model_output = user_model(**inputs)  # Call model with unpacked input

    # Log the prediction
    pred_logger = eval_logger.log_prediction(
        inputs=inputs,
        output=model_output
    )

    # Compare output with expected value
    expected = sample["expected"]
    correctness_score = model_output == expected
    pred_logger.log_score(
        scorer="correctness",
        score=correctness_score
    )

    # Finalize log for this prediction
    pred_logger.finish()

# Log overall evaluation summary
summary_stats = {"subjective_overall_score": 1.0}
eval_logger.log_summary(summary_stats)

print("Evaluation logging complete. View results in the Weave UI.")


Evaluation logging complete. View results in the Weave UI.


# Online Evaluation

# Feedbacks

In [None]:
import weave
client = weave.init(PROJECT)

call = client.get_call("") #@param

# Adding an emoji reaction
call.feedback.add_reaction("👍")

# Adding a note
call.feedback.add_note("this is a note")

# Adding custom key/value pairs.
# The first argument is a user-defined "type" string.
# Feedback must be JSON serializable and less than 1 KB when serialized.
call.feedback.add("correctness", { "value": 5 })

# Scorers

## Getting started

In [17]:
import weave
from weave import Scorer

weave.init(PROJECT)

class LengthScorer(Scorer):
    @weave.op
    def score(self, output: str) -> dict:
        """A simple scorer that checks output length."""
        return {
            "length": len(output),
            "is_short": len(output) < 100
        }

@weave.op
def generate_text(prompt: str) -> str:
    return "Hello, world!"

# Get both result and Call object
result, call = generate_text.call("Say hello")

# Now you can apply scorers
await call.apply_scorer(LengthScorer())

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978652-f374-7551-be08-fb85796672cd


ApplyScorerSuccess(result={'length': 13, 'is_short': True}, score_call=Call(_op_name=<Future at 0x118fe91d0 state=finished returned str>, trace_id='01978652-f375-7430-bfe7-2531320fa563', project_id='keisuke-kamata/weave-handson-kamata', parent_id=None, inputs={'self': ObjectRef(entity='keisuke-kamata', project='weave-handson-kamata', name='LengthScorer', _digest='XVVxYVRSCw5MDmba2zOIvw2Qaisk0ElPyOZMw1n3WJE', _extra=()), 'output': 'Hello, world!'}, id='01978652-f376-7bb0-9bea-66b6bb35fd9b', output={'length': 13, 'is_short': True}, exception=None, summary={}, _display_name=None, attributes=AttributesDict({'weave': {'client_version': '0.51.44', 'source': 'python-sdk', 'sys_version': '3.13.2 (main, Feb  4 2025, 14:51:09) [Clang 16.0.0 (clang-1600.0.26.6)]', 'os_name': 'Darwin', 'os_version': 'Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:25 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6020', 'os_release': '24.5.0'}}), started_at=None, ended_at=datetime.datetime(2025, 6, 19, 3, 54, 32, 

🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-09e2-7691-b64a-f0a4f2db3c95
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-0fa8-7a50-9e9b-ac754328daad
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-1cee-7190-899d-a247e2513aa2
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/01978653-1ced-79f2-bcab-a6ff740ca040


## Using Scorers as Guardrails
Guardrails act as safety checks that run before allowing LLM output to reach users. Here's a practical example:

In [18]:
import weave
from weave import Scorer
import asyncio
import nest_asyncio  # Required for running asyncio in Google Colab

# Apply nest_asyncio to avoid event loop issues in Google Colab
nest_asyncio.apply()

# ==== 1. Define text generation function ====
@weave.op
def generate_text(prompt: str) -> str:
    """Simulated LLM text generation (basic logic)."""
    responses = {
        "hello": "Hello! How can I help you?",
        "bad": "You are terrible!",  # Example of toxic response
        "good": "You are wonderful!",
    }
    return responses.get(prompt.lower(), "I don't understand your request.")

# ==== 2. Define the Toxicity Scorer ====
class ToxicityScorer(Scorer):
    @weave.op
    def score(self, output: str) -> dict:
        """
        Evaluate the generated content for toxic language.
        """
        toxic_words = {"terrible", "hate", "stupid"}  # Simple keyword-based detection
        flagged = any(word in output.lower() for word in toxic_words)

        return {
            "flagged": flagged,
            "reason": "Detected toxic language" if flagged else None
        }

# ==== 3. Function to generate safe responses ====
async def generate_safe_response(prompt: str) -> str:
    # Generate text using LLM
    result, call = generate_text.call(prompt)

    # Apply toxicity scoring
    safety = await call.apply_scorer(ToxicityScorer())

    # If flagged as toxic, return a warning message
    if safety.result["flagged"]:
        return f"I cannot generate that content: {safety.result['reason']}"

    return result

# ==== 4. Run test cases ====
async def main():
    prompts = ["hello", "bad", "good"]

    for prompt in prompts:
        response = await generate_safe_response(prompt)
        print(f"Prompt: {prompt}\nResponse: {response}\n")

# Run the async function in Google Colab
loop = asyncio.get_event_loop()
loop.run_until_complete(main())


Prompt: hello
Response: Hello! How can I help you?

Prompt: bad
Response: I cannot generate that content: Detected toxic language

Prompt: good
Response: You are wonderful!



## Using Scorers as Monitors

Monitors help track quality metrics over time without blocking operations. This is useful for:

* Identifying quality trends
* Detecting model drift
* Gathering data for model improvements





In [19]:
import weave
from weave import Scorer
import json
import xml.etree.ElementTree as ET
import random
import asyncio
import nest_asyncio  # Required for Google Colab

# Apply nest_asyncio to avoid event loop issues in Google Colab
nest_asyncio.apply()

# ==== 1. Define Text Generation Function ====
@weave.op
def generate_text(prompt: str) -> str:
    """Simulated LLM response generation."""
    if prompt.lower() == "json":
        return '{"message": "Hello, world!"}'  # Valid JSON
    elif prompt.lower() == "xml":
        return "<message>Hello, world!</message>"  # Valid XML
    else:
        return "Generated response..."

# ==== 2. Custom Scorer for JSON Validation ====
class CustomJSONScorer(Scorer):
    @weave.op
    def score(self, output: str) -> dict:
        """Check if the output is valid JSON."""
        try:
            json.loads(output)
            return {"valid_json": True}
        except json.JSONDecodeError:
            return {"valid_json": False}

# ==== 3. Custom Scorer for XML Validation ====
class CustomXMLScorer(Scorer):
    @weave.op
    def score(self, output: str) -> dict:
        """Check if the output is valid XML."""
        try:
            ET.fromstring(output)
            return {"valid_xml": True}
        except ET.ParseError:
            return {"valid_xml": False}

# ==== 4. Function to Generate Response with Monitoring ====
async def generate_with_monitoring(prompt: str) -> str:
    """
    Generates a response and applies monitoring (randomly 10% of the time).
    """
    # Generate text and capture tracking info
    result, call = generate_text.call(prompt)

    # Sample monitoring (only apply scorers to 10% of calls)
    if random.random() < 0.1:
        # Apply manual scorers asynchronously
        json_score = await call.apply_scorer(CustomJSONScorer())
        xml_score = await call.apply_scorer(CustomXMLScorer())

        print(f"Monitoring Applied - JSON Valid: {json_score.result['valid_json']}, XML Valid: {xml_score.result['valid_xml']}")

    return result

# ==== 5. Run Test Cases ====
async def main():
    prompts = ["json", "xml", "text"]

    for prompt in prompts:
        response = await generate_with_monitoring(prompt)
        print(f"Prompt: {prompt}\nResponse: {response}\n")

# Run the async function in Google Colab
loop = asyncio.get_event_loop()
loop.run_until_complete(main())



Prompt: json
Response: {"message": "Hello, world!"}

Prompt: xml
Response: <message>Hello, world!</message>

Prompt: text
Response: Generated response...



# Media

## Image

In [30]:
import weave
from openai import OpenAI
import requests
from PIL import Image

weave.init(project_name=PROJECT)
client = OpenAI()

@weave.op()
def generate_image(prompt: str) -> Image:
    response = client.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        quality="standard",
        n=1,
    )
    image_url = response.data[0].url
    image_response = requests.get(image_url, stream=True)
    image = Image.open(image_response.raw)

    # return a PIL.Image.Image object to be logged as an image
    return image

image = generate_image("a cat with a pumpkin hat")

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/019786ac-f857-70a3-b417-d0383fec63b5


## Audio

In [23]:
import weave
from openai import OpenAI
import wave

weave.init(project_name=PROJECT)
client = OpenAI()

@weave.op
def make_audio_file_streaming(text: str) -> wave.Wave_read:
    with client.audio.speech.with_streaming_response.create(
        model="tts-1",
        voice="alloy",
        input=text,
        response_format="wav",
    ) as res:
        res.stream_to_file("output.wav")

    # return a wave.Wave_read object to be logged as audio
    return wave.open("output.wav")

make_audio_file_streaming("Hello, how are you? What did you do yesterday?")

weave version 0.51.54 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: keisuke-kamata.
View Weave data at https://wandb.ai/keisuke-kamata/weave-handson-kamata/weave
🍩 https://wandb.ai/keisuke-kamata/weave-handson-kamata/r/call/019786ca-2ab0-7a20-89eb-c0d8828c2372


<wave.Wave_read at 0x169c1d450>

## Video

In [None]:
import time
from google import genai
from google.genai import types
from moviepy.editor import VideoFileClip, ColorClip, VideoClip
import weave

weave.init(PROJECT)

@weave.op()
def store_videos(name, client, generated_video):
  client.files.download(file=generated_video.video)
  generated_video.video.save(f"new_video{name}.mp4")  # save the video

@weave.op()
def save_videos_in_weave(video_path):
    VideoFileClip(video_path, has_mask=False, audio=True)
    new_clip = clip.subclip(0, 1)
    return new_clip

@weave.op()
def generate_videos(prompt):
    client = genai.Client()  # read API key from GOOGLE_API_KEY
    operation = client.models.generate_videos(
        model="veo-2.0-generate-001",
        prompt="Panning wide shot of a calico kitten sleeping in the sunshine",
        config=types.GenerateVideosConfig(
            person_generation="dont_allow",  # "dont_allow" or "allow_adult"
            aspect_ratio="16:9",  # "16:9" or "9:16"
            ),
        )
    
    while not operation.done:
        time.sleep(10)
        operation = client.operations.get(operation)
    
    for n, generated_video in enumerate(operation.response.generated_videos):
        client.files.download(file=generated_video.video)
        generated_video.video.save(f"video{n}.mp4")  # save the video
        save_videos_in_weave(f"video{n}.mp4")
        
 
generate_videos("Panning wide shot of a calico kitten sleeping in the sunshine")
