In [1]:
!guardrails hub install hub://guardrails/valid_length --quiet
!guardrails hub install hub://guardrails/two_words --quiet
!guardrails hub install hub://guardrails/valid_range --quiet
!guardrails hub install hub://guardrails/lowercase --quiet
!guardrails hub install hub://guardrails/one_line --quiet

%pip install pypdfium2

Installing hub:[35m/[0m[35m/guardrails/[0m[95mvalid_length...[0m
✅Successfully installed guardrails/valid_length!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mtwo_words...[0m
✅Successfully installed guardrails/two_words!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mvalid_range...[0m
✅Successfully installed guardrails/valid_range!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mlowercase...[0m
✅Successfully installed guardrails/lowercase!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mone_line...[0m
✅Successfully installed guardrails/one_line!


Collecting pypdfium2
  Using cached pypdfium2-4.30.0-py3-none-macosx_11_0_arm64.whl.metadata (48 kB)
Using cached pypdfium2-4.30.0-py3-none-macosx_11_0_arm64.whl (2.7 MB)
Installing collected packages: pypdfium2
Successfully installed pypdfium2-4.30.0
Note: you may need to restart the kernel to use updated packages.


# Extracting entities from a Terms of Service document

!!! note
    To download this example as a Jupyter notebook, click [here](https://github.com/ShreyaR/guardrails/blob/main/docs/examples/extracting_entities.ipynb).

In this example, we will use Guardrails to extract key information from a Terms-of-Service document.

## Objective

We want to extract structured information about all fees and interest rates associated with the Chase credit card.

## Step 0: Download PDF and load it as string

To get started, download the document from [here](https://github.com/ShreyaR/guardrails/blob/main/docs/examples/data/chase_card_agreement.pdf) and save it in `data/chase_card_agreement.pdf`.

Guardrails has some built-in functions to help with common tasks. Here, we will use the `read_pdf` function to load the PDF as a string.

In [2]:
import guardrails as gd

from rich import print

content = gd.docs_utils.read_pdf("data/chase_card_agreement.pdf")

print(f"Chase Credit Card Document:\n\n{content[:275]}\n...")



## Step 1: Create the RAIL Spec

Ordinarily, we would create an RAIL spec in a separate file. For the purposes of this example, we will create the spec in this notebook as a string following the RAIL syntax. For more information on RAIL, see the [RAIL documentation](/docs/how_to_guides/rail).  We will also show the same RAIL spec in a code-first format using a Pydantic model.

Here, we request:

1. A list of the fees associated with the card. We ask for sub-information, each with its own quality criteria and corrective action.
2. A object (i.e. key-value pairs) for the interest.

XML option:

In [3]:
rail_str = """
<rail version="0.1">

<output>

    <list name="fees" description="What fees and charges are associated with my account?">
        <object>
            <integer name="index" format="1-indexed" />
            <string name="name" format="lower-case; two-words" on-fail-lower-case="noop" on-fail-two-words="reask"/>
            <string name="explanation" format="one-line" on-fail-one-line="noop" />
            <float name="value" format="percentage"/>
        </object>
    </list>
    <object name="interest_rates" description="What are the interest rates offered by the bank on savings and checking accounts, loans, and credit products?" />
</output>


<prompt>
Given the following document, answer the following questions. If the answer doesn't exist in the document, enter 'None'.

${document}

${gr.xml_prefix_prompt}

${output_schema}

${gr.json_suffix_prompt_v2_wo_none}</prompt>

</rail>
"""

Pydantic model option:

In [4]:
from guardrails.hub import LowerCase, TwoWords, OneLine
from pydantic import BaseModel, Field
from typing import List

prompt = """
Given the following document, answer the following questions. If the answer doesn't exist in the document, enter 'None'.

${document}

${gr.xml_prefix_prompt}

${output_schema}

${gr.json_suffix_prompt_v2_wo_none}"""

class Fee(BaseModel):
    index: int = Field(validators=[("1-indexed", "noop")])
    name: str = Field(validators=[LowerCase(on_fail="fix"), TwoWords(on_fail="reask")])
    explanation: str = Field(validators=[OneLine()])
    value: float = Field(validators=[("percentage", "noop")])

class CreditCardAgreement(BaseModel):
    fees: List[Fee] = Field(description="What fees and charges are associated with my account?")
    interest_rates: dict = Field(description="What are the interest rates offered by the bank on savings and checking accounts, loans, and credit products?")

    Importing validators from `guardrails.validators` is deprecated.
    All validators are now available in the Guardrails Hub. Please install
    and import them from the hub instead. All validators will be
    removed from this module in the next major release.

    Install with: `guardrails hub install hub://<namespace>/<validator_name>`
    Import as: from guardrails.hub import `ValidatorName`
    
  warn(


## Step 2: Create a `Guard` object with the RAIL Spec

We create a `gd.Guard` object that will check, validate and correct the output of the LLM. This object:

1. Enforces the quality criteria specified in the RAIL spec.
2. Takes corrective action when the quality criteria are not met.
3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.

From XML:

In [5]:
guard = gd.Guard.from_rail_string(rail_str)

`from guardrails.validators import LowerCase` is deprecated and
support will be removed after version 0.5.x. Please switch to the Guardrails Hub syntax:
`from guardrails.hub import LowerCase` for future updates and support.
For additional details, please visit: https://hub.guardrailsai.com/validator/guardrails/lowercase.

  warn(
`from guardrails.validators import TwoWords` is deprecated and
support will be removed after version 0.5.x. Please switch to the Guardrails Hub syntax:
`from guardrails.hub import TwoWords` for future updates and support.
For additional details, please visit: https://hub.guardrailsai.com/validator/guardrails/two_words.

  warn(
`from guardrails.validators import OneLine` is deprecated and
support will be removed after version 0.5.x. Please switch to the Guardrails Hub syntax:
`from guardrails.hub import OneLine` for future updates and support.
For additional details, please visit: https://hub.guardrailsai.com/validator/guardrails/one_line.

  warn(


From Pydantic:

In [6]:
guard = gd.Guard.from_pydantic(output_class=CreditCardAgreement, prompt=prompt)

As we can see, a few formatters weren't supported. These formatters won't be enforced in the output, but this information can still be used to generate a prompt.

We see the prompt that will be sent to the LLM. The `{document}` is substituted with the user provided value at runtime.

In [7]:
print(guard.base_prompt)

  print(guard.base_prompt)


## Step 3: Wrap the LLM API call with `Guard`

In [8]:
import openai

raw_llm_response, validated_response, *rest = guard(
    openai.completions.create,
    prompt_params={"document": content[:6000]},
    model="gpt-3.5-turbo-instruct",
    max_tokens=2048,
    temperature=0,
)

The `guard` wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary).

We can see that the output is a dictionary with the correct schema and types.

In [9]:
print(validated_response)

In [10]:
guard.history.last.tree