Skip to content

[bug] Async streaming validation duplicates output in the presence of multiple validators #1090

@JosephCatrambone

Description

@JosephCatrambone

Quick thank you to user new in the Discord. Link to the thread: https://discord.com/channels/1085077079697150023/1288085320805388298/1288085864521666591

Describe the bug
When AsyncGuard.use_many is given multiple guards (in this case, DetectPII and ToxicLanguage) with on_fail="fix", the stream of outputs will be reduplicated.

To Reproduce
Steps to reproduce the behavior:

import asyncio
import os
from dotenv import load_dotenv
import litellm
import openai
import guardrails

from guardrails.hub import DetectPII, ToxicLanguage


from typing import Callable, Dict, Optional
from guardrails.validators import (
    PassResult,
    register_validator,
    ValidationResult,
    Validator,
)


# Load environment variables
load_dotenv()
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_key = os.getenv("OPENAI_API_KEY")


# VERSION 1
guard = guardrails.AsyncGuard().use_many(
    DetectPII(pii_entities="pii", on_fail="fix")
)

# VERSION 2
# guard = guardrails.AsyncGuard().use_many(
#    DetectPII(pii_entities="pii", on_fail="fix"), ToxicLanguage(on_fail="fix")
# )

async def generate_text():
    fragment_generator = await guard(
            litellm.acompletion,
            api_key=openai.api_key,
            api_base=openai.api_base,
            model="openai/mistralai/Mistral-Nemo-Instruct-2407",
            messages=[
                {"role": "system", "content": "Only write my sentences provided please and nothing else please."},
                {
                    "role": "user",
                    "content": """Peter is funny and lives in New York. My name is Peter. Who are you Brian ?""",
                },
            ],
            max_tokens=1024,
            temperature=0,
            stream=True,
        )
    
    text = ""
    async for op in fragment_generator:
            print(op)
            await asyncio.sleep(0)
            text += op.validated_output

    print(text)
# Run the async function to generate text
import asyncio

asyncio.run(generate_text())

Expected behavior
Example model output: "My friend Alex is a researcher at Purdue University. (Numerous Obscenities)"
Expected cleaned output: "My friend is a researcher at ."
Observed output: "My friend My friend friend is a researcher at friend is a researcher at ."

Library version:
Guardrails 0.5.10

Additional context
Happens in a notebook and in a terminal.

Metadata

Metadata

Assignees

Labels

StalebugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions