# Validation

## Introduction

**What is a validation**

Validation is the backbone of reliable software. In escence, validation means checking that the results of a given function correspond to what we expect as output from that function. Traditionally, all validation was deterministic and rule based (i.e. 'output must not be larger than') but with the advent of LLM's we can also incorporate probabilistic validation into our stack (i.e. 'answer cannot contain aggressive language'). This new type of validation has been traditionally refered to with the fancy name of 'guardrail' although if we dig a bit we will find its just another type of validation - using LLM's instead of rules to flag the response.

In instructor, we treat all validation as... validation. You can do rule-based validation, probabilistic validation or both - all under one and the same framework. For this we will make extensive use of Pydantic's powerful [validators](https://docs.pydantic.dev/latest/concepts/validators/#field-validators) functionality.

Essentially a validator looks like this:

In [None]:
def validation_function(value):
    if condition(value):
        raise ValueError("Value is not valid")
    return mutation(value)

It consists of three basic steps:
1. We check if a value corresponds to a certain condition
2. If not we raise an error with an optional retry
3. If it does we return the value or a mutation of the value

**Validation applications**

Validators will enable us to have a tighter control on how our applications work. Fundamentally, we will be targetting the main disadvantage of LLM's compared to their more traditional counterpart: their stochasticity and unpredictability.

Some straightforward examples of this are:
- Flagging outputs that contain forbidden words that are present in a blacklist
- Flagging outputs that have an undesired tone (racism, violence etc.)

But if we get creative we can move into complex, non-trivial applications:
- Making sure that a citation actually is obtained directly from the provided content
- Ensuring that the model's answer follows after a given context has been provided
- Checking that the syntax of an SQL query is valid before running it

## Setup and Dependencies

As always, we're going to use the [`instructor`](https://github.com/jxnl/instructor) library to help us in integrating these powerful validators. `instructor` will handle all the output parsing and validation plus the automatic retries to get a compliant response. This will make it very easy for us devs to add new validation logic, without paying unnecessary overhead.

In [5]:
# !pip install instructor -U

In [6]:
import instructor 
from openai import OpenAI

client = instructor.patch(OpenAI())

## Rule-based examples

Let's see how we can use some deterministic validation examples. These examples are deterministic because the logic for this validation is entirely rule based and the same input will always result in the same output.

### Keyword blacklist
#### Example: flagging for violence

For starters, we don't want it to engage in topics that contain explicit violence so we will stop it from returning an answer containing words from a violence blacklist.

For this we can use a field_validator, which will check if a given field complies with our validation logic - in this case the user message.

In [13]:
blacklist = {
    "rob",
    "steal",
    "hurt",
    "kill",
    "attack",
}

In [38]:
from pydantic import BaseModel, ValidationError, field_validator
from pydantic.fields import Field


class UserMessage(BaseModel):
    message: str

    @field_validator('message')
    def message_cannot_have_blacklisted_words(cls, v: str) -> str:
        for word in v.split(): 
            if word.lower() in blacklist:
                raise ValueError(f"`{word}` was found in the message `{v}`")
        return v

try:
    UserMessage(message="Hey, what should I do with someone that bullies me about my health practices?")
    UserMessage(message="I want to hurt him")
except ValidationError as e:
    print(e)

1 validation error for UserMessage
message
  Value error, `hurt` was found in the message `I want to hurt him` [type=value_error, input_value='I want to hurt him', input_type=str]
    For further information visit https://errors.pydantic.dev/2.4/v/value_error


### Outsourcing validation
#### Example: filter using OpenAI Moderation

We would probably want to take this further and say we want to flag any answer that is hateful, contains harrassment etc. Thankfully, openai provides a moderation endpoint which covers all these use cases and is free to use when using openai models. 

Let's see how we can implement this with instructor.

In [40]:
class UserMessage(BaseModel):
    message: str

    @field_validator('message')
    def message_must_comply_with_openai_mod(cls, v: str) -> str:
        response = client.moderations.create(input=v)
        out = response.results[0]
        cats = dict(out.categories)
        if out.flagged:
            raise ValueError(f"`{v}` was flagged for {[i for i in cats if cats[i]]}")
        
        return v 

Now we have a more comprehensive flagging for violence and we can actually outsource the moderation of our messages.

In [41]:
try:
    UserMessage(message="What should I do with someone that bullies me about my health practices?")
    UserMessage(message="I want to make him suffer the consequences")
except ValidationError as e:
    print(e)

1 validation error for UserMessage
message
  Value error, `I want to make him suffer the consequences` was flagged for ['harassment', 'violence'] [type=value_error, input_value='I want to make him suffer the consequences', input_type=str]
    For further information visit https://errors.pydantic.dev/2.4/v/value_error


And we get flagging for other topics like religion, race etc.

In [37]:
try:
    UserMessage(message="What should I do with someone that bullies me about my health practices?")
    UserMessage(message="I will mock his religion")
except ValidationError as e:
    print(e)

1 validation error for UserMessage
message
  Value error, `I will mock his religion` was flagged for ['harassment'] [type=value_error, input_value='I will mock his religion', input_type=str]
    For further information visit https://errors.pydantic.dev/2.4/v/value_error


The point here is how easily we could change from using keywords to using an external api for moderation - with `field_validators`, its a function edit away.

### Text structure filtering
#### Example: filtering very long messages

We can also flag based on other aspects of the input text. Let's say we don't want the assistant to return very long texts because the user might lose interest and disengage from the conversation.

In [51]:
from pydantic import BaseModel, ValidationError, field_validator
from pydantic.fields import Field


class AssistantMessage(BaseModel):
    message: str

    @field_validator('message')
    def message_must_be_short(cls, v: str) -> str:
        if len(v.split())>=100:
            raise ValueError(f"Text was flagged for being longer than 100 words.")
        
        return v     

In [52]:
try:
    AssistantMessage(message="""
    Certainly! Lorem ipsum is a placeholder text commonly used in the printing and typesetting industry. Here's a sample of Lorem ipsum text:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam euismod velit vel tellus tempor, non viverra eros iaculis. Sed vel nisl nec mauris bibendum tincidunt. Vestibulum sed libero euismod, eleifend tellus id, laoreet elit. Donec auctor arcu ac mi feugiat, vel lobortis justo efficitur. Fusce vel odio vitae justo varius dignissim. Integer sollicitudin mi a justo bibendum ultrices. Quisque id nisl a lectus venenatis luctus.

Please note that Lorem ipsum text is a nonsensical Latin-like text used as a placeholder for content, and it has no specific meaning. It's often used in design and publishing to demonstrate the visual aspects of a document without focusing on the actual content."""
                    )
except ValidationError as e:
    print(e)

1 validation error for AssistantMessage
message
  Value error, Text was flagged for being longer than 100 words. [type=value_error, input_value="\n    Certainly! Lorem i... on the actual content.", input_type=str]
    For further information visit https://errors.pydantic.dev/2.4/v/value_error


### Validating answer from context
#### Example: avoiding hallucination

When using external knowledge bases we want to make sure that the agent is using the provided context to answer and is not inventing the answer itself. This is very easy to do with both validators as we will show in the following example where we will check if the provided citation is actually included in the text chunk.

In [12]:
from pydantic import ValidationInfo,BaseModel,field_validator

class AnswerWithCitation(BaseModel):
    answer: str
    citation: str

    @field_validator('citation')
    @classmethod
    def citation_exists(cls, v: str, info: ValidationInfo): 
        context = info.context
        if context:
            context = context.get('text_chunk')
            if v not in context:
                raise ValueError(f"Citation `{v}` not found in text chunks")
        return v

In [13]:
try:
    AnswerWithCitation.model_validate(
        {"answer": "Blueberries are packed with protein", "citation": "Blueberries contain high levels of protein"},
        context={"text_chunk": "Blueberries are very rich in antioxidants"}, 
    )
except ValidationError as e:
    print(e)

1 validation error for AnswerWithCitation
citation
  Value error, Citation `Blueberries contain high levels of protein` not found in text chunks [type=value_error, input_value='Blueberries contain high levels of protein', input_type=str]
    For further information visit https://errors.pydantic.dev/2.4/v/value_error


## LLM-based examples

But in some cases we need more complex validation than rule-based validation allows. What we need is probabilistic validation and we will handle this using LLMs as an integral part of our validation workflow. 

Thankfully, `instructor` provides a handy `llm_validator` utility which we can use by simply specifying the directive we want it to follow.

Let's see a few interesting use cases that LLM's enable.

### Content-based filtering
#### Example: always keeping agent on topic

Let's say we want an agent that helps us improve our health by answering questions and suggesting specific practices daily. We want to make sure the agent does not answer about any other topic because the knowledge base does not contain information about other topics and it will be prone to hallucinate. The process for doing this is very similar as above, but this time we will add an LLM in our validator. 

In [40]:
from instructor import llm_validator
from pydantic import BaseModel, ValidationError
from typing import Annotated, Optional
from pydantic import Field
from pydantic.functional_validators import AfterValidator

In [None]:
class AssistantMessage(BaseModel):
    message: Annotated[str, AfterValidator(llm_validator("don't talk about any other topic except health best practices and topics"))]

try:
    AssistantMessage(message="I would suggest you to visit Sicily as they say it is very nice in winter.")
except ValidationError as e:
    print(e)

Great! Now we can be sure that our model will only speak about what it knows about.

### Content-based filtering with more than one variable
#### Example: checking agent thinking logic with CoT

Another nice use case for probabilistic validation is to validate the agent's thinking process and see if it makes sense before returning. When using [chain of thought](https://learnprompting.org/docs/intermediate/chain_of_thought#:~:text=Chain%20of%20Thought%20(CoT)%20prompting,CoT%20(Wei%20et%20al.)), we want the models to be able to think in steps and answer after following through on its thinking logic. If there's any errors in the logic, then the result will not be correct. Thankfully, its easy to check this in a similar way as we did in the previous example.

Here we will use Pydantic's [model_validator](https://docs.pydantic.dev/latest/concepts/validators/#model-validators) which allows us to apply validation over all the properties of the `AIResponse` at once.

For this, we will define a `Validation` class which will contain the desired format of the output of our llm call (this was handled by `llm_validator` in the previous example. 

This class is very straighforward: we will ask it to tell us whether the chain of thought is valid and if no, why.

In [54]:
class Validation(BaseModel):
    is_valid: bool = Field(..., description="Whether the value is valid based on the rules or contradictory")
    error_message: Optional[str] = Field(..., description="The error message if the value is not valid, to be used for re-asking the model")

In [55]:
def validate_chain_of_thought(values):
    chain_of_thought = values["chain_of_thought"]
    answer = values["answer"]
    resp = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": "You are a validator. Determine if the value is valid for the statement. If it is not, explain why.",
            },
            {
                "role": "user",
                "content": f"Verify that `{answer}` follows the chain of thought: {chain_of_thought}",
            },
        ],
        # this comes from client = instructor.patch(OpenAI())
        response_model=Validation,
    )
    print(resp)
    if not resp.is_valid:
        raise ValueError(resp.error_message)
    return values

In [56]:
from pydantic import BaseModel, model_validator
from typing import Any

class AIResponse(BaseModel):
    chain_of_thought: str
    answer: str

    @model_validator(mode='before')
    @classmethod
    def chain_of_thought_makes_sense(cls, data: Any) -> Any:
        # here we assume data is the dict representation of the model
        # since we use 'before' mode.
        return validate_chain_of_thought(data)

In [57]:
try:
    resp = AIResponse(
        chain_of_thought="The user is diabetic.", answer="The user shouldn't need to care about sugar blood levels."
        # chain_of_thought="2+2=4", answer="grass is green."
)
except ValidationError as e:
    print(e)

is_valid=True error_message=None


## Putting validation to use

How can we integrate any of these examples when using the openai api? It is very easy with `instructor`: after patching the openai client, we just need to specify a `response_model` and all the validation will happen under the hood. 

We can even set a max number of retries so that if the llm returns a wrong result, we give it a few more chances to get it right by sending the original answer plus the reason why it got rejected. We can do this by just adding `max_retries` when calling the `openai` client.

In [13]:
class HealthAnswer(BaseModel):
    answer: Annotated[str, AfterValidator(llm_validator("don't talk about any other topic except health best practices and topics"))]

In [18]:
model = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Which is the best headphone brand for producing music?"},
    ],
    response_model=HealthAnswer,
    max_retries=2,
)

ValidationError: 1 validation error for HealthAnswer
answer
  Assertion failed, The statement is not related to health best practices or topics. [type=assertion_error, input_value="Unfortunately, I don't h... that suits your needs.", input_type=str]
    For further information visit https://errors.pydantic.dev/2.4/v/assertion_error

# Conclusion