---
title: "Making Large Language Models Reliable using Guardrails AI"
jupyter: python3
format:
  html:
    code-overflow: wrap
filters:
  - line-highlight
---

In [63]:
import openai

## Language Models are not reliable

I asked GPT-3 to define what "reliable software" is. Here's what it said:

In [70]:
import textwrap

wrapper = textwrap.TextWrapper(width=70, break_long_words=False, replace_whitespace=False)

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "user",
            "content": "You are a helpful assistant.",
            "role": "user",
            "content": "How do you define reliable software?",
        }
    ],
    temperature=0,
)

print("\n".join(wrapper.wrap(response["choices"][0]["message"]["content"])))

Reliable software can be defined as software that consistently
performs its intended functions accurately and efficiently, without
any unexpected failures or errors. It is dependable, trustworthy, and
can be relied upon to deliver consistent results under various
conditions and user interactions. Reliable software is robust, stable,
and resilient, ensuring that it operates as expected even in the
presence of unforeseen circumstances or changes in the environment. It
is also maintainable, allowing for easy updates, bug fixes, and
enhancements without compromising its reliability.


I think it is safe to say that Language models like GPT-3 don't meet the criteria of reliable software, at least when prompted and used naively. For instance, let's take this very simple prompt of adding two numbers. In this case, I would like gpt-3 to act as a calculator and return back to me the result. Perhaps this is too contrived but for many tasks, we will require the model to consistently use a given output format.

In [71]:
#| echo: TRUE
#| eval: FALSE
#| source-line-numbers: 6
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "user", "content": "You are a helpful assistant.",
            "role": "user", "content": "Return only the integer answer, 1+1=",
        }
    ],
    temperature=0,
)

response['choices'][0]["message"]['content']

'The integer answer to 1+1 is 2.'

You can see that even though I explicitly asked "Return only the integer answer" expecting only `2` to be returned but the model returned a full sentence string instead.

When designing a system where a language model is one component, how can we adapt it to make it more reliable?

How can we enforce an output interface for the LLM model to adhere to (without having to muck around with different prompting strategies manually)?

There seems to be several patterns that are emerging:

- Use a tool like [guardrails AI](https://www.guardrails.ai/) where one can specify the format using the RAIL spec, prompting the model with a RAIL spec and taking corrective action if the model doesn't adhere to the spec.
- Use "function calling" capabilities of a few closed-source chat APIs like [OpenAI's API](https://openai.com/blog/function-calling-and-other-api-updates) along with integrations with pydantic (see [askmarvinai](https://www.askmarvin.ai/welcome/overview/) and [instructor](https://jxnl.github.io/instructor/))
-  Use a tool like [outlines](https://github.com/outlines-dev/outlines) where one can specify the output format as a regex, JSON schema or pydantic model and outlines will perform a regex-guided generation of the output by modifying the generated model probabilities of tokens so as to adhere to the regex. 

I am going to examine the guardrails AI approach in more detail.

## Guardrails AI overview

### Overview
Let's start with [guardrails AI](https://docs.guardrailsai.com/). Using the same query we used above

In [72]:
query = """1+1=?"""
print(query)

1+1=?


We define the desired answer format/schema using a popular python library pydantic

In [73]:
from pydantic import BaseModel, Field


class IntegerAnswer(BaseModel):
    """The answer to a question."""

    value: int = Field(description="The answer to the question.")

We then write this guardrails code.

In [74]:
import guardrails as gd

instructions = """
You are a helpful assistant only capable of communicating with valid JSON, and no other text.
"""

prompt = """
${query}

${gr.complete_json_suffix_v2}
"""

guard = gd.Guard.from_pydantic(
    instructions=instructions,
    prompt=prompt,
    output_class=IntegerAnswer,
)

Guardrails will build the prompt for us given a prompt template that we had to craft.

For crafting the prompt-template, we make use of
- variables like `query` which we pass in like so `${query}`
- constants like `complete_json_suffix_v2` which reference pre-defined prompt templates which we can find in [constants.xml](https://github.com/guardrails-ai/guardrails/blob/main/guardrails/constants.xml) file

Let's inspect the prompt that guardrails generated for us:

In [75]:
print(guard.instructions.source)
print(guard.prompt.source)


You are a helpful assistant only capable of communicating with valid JSON, and no other text.


${query}


Given below is XML that describes the information to extract from this document and the tags to extract it into.

<output>
    <integer name="value" description="The answer to the question."/>
</output>


ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and specific types. Be correct and concise.

Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`
- `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}`
- `<object name='baz'><string name="foo

Next we use the guard object to call our language model. 

The guard object is a wrapper around the language model that will perform the following steps:
- Prepare the prompt using the template and variables
- Call the language model
- Parse the output using the schema
- If the output doesn't match the schema
    - it will proceed to perfrom a corrective action
        - By default the corrective action is to re-prompt the model asking it to resolve the issue 
    - it will repeat this process until the output matches the schema or until a maximum number of attempts is reached.
- The result is returned as both a string and a structured object

Ok now let's try it out! 

In [76]:
import warnings

with warnings.catch_warnings():
    # ignore the UserWarning about Instructions do not have any variables
    warnings.filterwarnings("ignore", category=UserWarning)
    raw_llm_output, validated_output = guard(
        llm_api=openai.Completion.create,
        prompt_params={"query": query},
        num_reasks=0,
        engine="text-davinci-003",
        max_tokens=1024,
        temperature=0,
    )

validated_output

{'value': 2}

Now let's mock the case when our language model will return a non-JSON response.

In [79]:
from unittest.mock import MagicMock, patch

magic_mock = MagicMock()
magic_mock.return_value = {
    "object": "chat.completion",
    "choices": [
        {
            "index": 0,
            "text": '{"value": "the answer is 2"}',
            "finish_reason": "stop",
        }
    ],
    "usage": {"prompt_tokens": 18, "completion_tokens": 12, "total_tokens": 30},
}

with patch("openai.Completion.create", magic_mock):
    raw_llm_output, validated_output = guard(
        llm_api=openai.Completion.create,
        prompt_params={"query": query},
        num_reasks=1,
        engine="text-davinci-003",
        max_tokens=1024,
        temperature=0,
    )

print(f"{raw_llm_output=}")

print("\n".join(wrapper.wrap(repr(validated_output))))

raw_llm_output='{"value": "the answer is 2"}'
SkeletonReAsk(incorrect_value={'value': 'the answer is 2'},
fail_results=[FailResult(outcome='fail', metadata=None,
error_message='JSON does not match schema', fix_value=None)])


The returned response now indicates a failure due to an incorrect value. What happened is the first prompt looks exactly like the one we used above, but the second prompt is different. It is a prompt that is asking the model to resolve the issue by returning a JSON response.

Given we mocked the model to return a non-JSON response, the model will fail to resolve the issue and will return a failure response.

In [57]:
print("\n".join(wrapper.wrap(magic_mock.call_args_list[0].kwargs["prompt"])))


You are a helpful assistant only capable of communicating with valid JSON, and no other text.

ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and specific types. Be correct and concise. If you are unsure anywhere, enter `null`.

Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`
- `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}`
- `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index" format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`



I was given the following JSON resp

### Advanced features in guardrails

#### 

### Introduce validators and corrective action



For more involved validation than the return type, you can use the validators that guardrails provides out of the box, or we can define our own validators. To show how this works, let's define a validator that checks that the sum of the two numbers is greater than 0.

In [61]:
from guardrails.validators import ValidRange

class Answer(BaseModel):
    value: int = Field(
        description="The answer to the question.",
        validators=[ValidRange(min=0, on_fail="exception")],
    )


guard = gd.Guard.from_pydantic(
    output_class=Answer, prompt=prompt, instructions=instructions
)

If the language model returns a value that is not a positive integer, an exception will be raised. Let's mock the model to return a negative integer.

In [62]:
magic_mock = MagicMock()
magic_mock.return_value = {
    "id": "chatcmpl-8CazZKUCp8KbiUt49J5x7eINiMlvl",
    "choices": [
        {
            "index": 0,
            "text": '{"value": "-2"}',
            "finish_reason": "stop",
        }
    ],
    "usage": {"prompt_tokens": 18, "completion_tokens": 12, "total_tokens": 30},
}

with patch("openai.Completion.create", magic_mock):
    try:
        raw_llm_output, validated_output = guard(
            llm_api=openai.Completion.create,
            prompt_params={"query": query},
            num_reasks=1,
            engine="text-davinci-003",
            max_tokens=1024,
            temperature=0,
        )
    except Exception as e:
        print(type(e))

<class 'guardrails.validators.ValidatorError'>


### Complex validators out of the box

For certain cases like checking if the returned output is valid SQL or valid Python, you can use the built-in guardrail validators for these cases.

see the guardrails [validators page](https://docs.guardrailsai.com/api_reference/validators/) for more details.

### Custom validators
Earlier this year there was a [popular video of how ChatGPT couldn't stick to performing legal chess moves](https://www.youtube.com/watch?v=iWhlrkfJrCQ&ab_channel=GothamChess). Guardrails AI has an [example in progress](https://docs.guardrailsai.com/examples/valid_chess_moves/) of how to use custom validators to enforce a legal chess game.

### Routing between two possible response schemas

For instance if you have a language model that can return more than one possible schema, you can use a choice validator to route between the schemas.


### Flexibility of the guardrails approach
The guardrails approach is very flexible and can be used to validate any kind of language model. 

### Using guardrails to validate a gpt2 model loaded locally


### Using guardrails against the anyscale API

### Weaknesses of Guardrails
- Given that guardrails relies on re-prompts to correct the model, it is not suitable for use cases where the model is expensive to call. 
- Default prompts provided by guardrails might not be optimal for your use case.

### Areas of improvement

- Inheriting validators from pydantic models would be nice but support for it is still lacking.
- Using different models to perform correction


In [17]:
gd.

SyntaxError: invalid syntax (2764988694.py, line 1)

In [None]:
response

<OpenAIObject chat.completion id=chatcmpl-8CazZKUCp8KbiUt49J5x7eINiMlvl at 0x111287ce0> JSON: {
  "id": "chatcmpl-8CazZKUCp8KbiUt49J5x7eINiMlvl",
  "object": "chat.completion",
  "created": 1698012825,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The integer answer to 1+1 is 2."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 12,
    "total_tokens": 30
  }
}

In [None]:
raw_llm_output, validated_output = guard(
    openai.Completion.create,
    prompt_params={"doctors_notes": doctors_notes},
    engine="text-davinci-003",
    max_tokens=1024,
    temperature=0,
)

In [None]:
PatientInfo(gender="a random string", age=100241)

In [None]:
PatientInfo.parse_raw(raw_llm_output)

PatientInfo(gender='Male', age=49)

In [None]:
PatientInfo.parse_obj(validated_output)

PatientInfo(gender='Male', age=49)

In [None]:
response = openai.ChatCompletion.create(
    model="meta-llama/Llama-2-70b-chat-hf",
    messages=[
        {
            "role": "user", "content": "You are a helpful assistant.",
            "role": "user", "content": "Return an integer, 1+1=.",
        }
    ]
)
response

<OpenAIObject text_completion id=meta-llama/Llama-2-70b-chat-hf-5a7926c5dee748e5b83600f28f3b116c at 0x116d00e00> JSON: {
  "id": "meta-llama/Llama-2-70b-chat-hf-5a7926c5dee748e5b83600f28f3b116c",
  "object": "text_completion",
  "created": 1697433235,
  "model": "meta-llama/Llama-2-70b-chat-hf",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": " Sure! 1 + 1 = 2."
      },
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 12,
    "total_tokens": 30
  }
}

In [None]:
print(response["choices"][0]["message"]["content"])

 Sure! 1 + 1 = 2.


In [None]:
openai.api_base = "https://api.openai.com/v1/"
openai.api_key = "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"

In [None]:
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "user", "content": "You are a helpful assistant.",
            "role": "user", "content": "Return an integer, 1+1=.",
        }
    ]
)
response

<OpenAIObject chat.completion id=chatcmpl-8AAGsj9aV2yuw2frp9YLGGZKycWqH at 0x116d65b70> JSON: {
  "id": "chatcmpl-8AAGsj9aV2yuw2frp9YLGGZKycWqH",
  "object": "chat.completion",
  "created": 1697433454,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The integer expression 1+1 is equal to 2."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 13,
    "total_tokens": 29
  }
}

In [None]:
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {
            "role": "user", "content": "You are a helpful assistant.",
            "role": "user", "content": "Return an integer, 1+1=.",
        }
    ]
)
response

<OpenAIObject chat.completion id=chatcmpl-8AAGZGYbMM6OOHTjdTIrqKIKyJlBs at 0x116c9a930> JSON: {
  "id": "chatcmpl-8AAGZGYbMM6OOHTjdTIrqKIKyJlBs",
  "object": "chat.completion",
  "created": 1697433435,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 1,
    "total_tokens": 17
  }
}

In [None]:
print(response["choices"][0]["message"]["content"])

2


In [None]:
import outlines.text.generate as generate
import outlines.models as models
from huggingface_hub import login

login()

Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (osxkeychain).
Your token has been saved to /Users/marwansarieddine/.cache/huggingface/token
Login successful


In [None]:
# model = models.transformers("meta-llama/Llama-2-7b-chat-hf", device="mps")

# prompt = """1+1="""
# answer = generate.integer(model, max_tokens=20)(prompt)
# print(answer)


model = models.transformers("gpt2", device="mps")

prompt = """1+1="""
answer = generate.integer(model, max_tokens=20)(prompt)
print(answer)

02633031351812286487554313921229487190791897


In [None]:
prompt = """1+1="""
answer = generate.integer(model, max_tokens=1)(prompt)
print(answer)

0


In [None]:
model = models.transformers("gpt2", device="mps")

prompt = """1+1="""
answer = generate.integer(model, max_tokens=20)(prompt)
print(answer)