In [1]:
import guardrails as gd

The ideas in this tutorial are drawn from [From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting](https://arxiv.org/pdf/2309.04269.pdf)

In this tutorial, we will try to get a high quality summary from this news article https://www.nytimes.com/2024/02/17/nyregion/trump-civil-cases-millions.ht

In [2]:
article = '''On Friday, the judge overseeing Mr. Trump’s civil fraud case issued a final ruling that inflicted a staggering financial penalty. With interest, the former president has been ordered to pay New York State about $450 million, a sum that threatens to wipe out a stockpile of cash, stocks and bonds that he amassed since leaving the White House, according to a New York Times review of Mr. Trump’s financial records. He will have only 30 days or so to either come up with the money or persuade an outside company to post a bond.

The judge, Arthur F. Engoron, also imposed several new restrictions on Mr. Trump and his family business. For three years, Mr. Trump cannot run any New York company, including portions of his own, nor can he obtain a loan from a New York bank. The same restrictions apply to his adult sons for a two-year period. And the family business will be under the thumb of a watchful outsider, a court-appointed monitor who can hamstring the company if she does not like what she sees.

All told, the judge’s decision poses unprecedented threats to Mr. Trump’s finances, his family business and his ego at a critical time for the former president. Although Mr. Trump will not go bankrupt and the Trump Organization will not go out of business, the company’s loudest hype man could for now become a silent partner in his hometown properties. The organization will be another real estate company in a city full of them — this one facing unusual new constraints that could impede its ability to compete.

“Justice Engoron’s order could impose several years of paralysis at the Trump Organization,” said Jim Wheaton, a professor at William & Mary Law School who focuses on legal issues involving corporate entities and has studied Mr. Trump’s finances. The ruling, he added, could even “freeze the ability to make legitimate business decisions.”

One of Mr. Trump’s lawyers, Christopher M. Kise, called the financial penalty “draconian and unconstitutional,” and said that the decision “will cause irreparable damage to both the business community and the rule of law in our country.”

The punishments facing Mr. Trump and his company, some hard-hitting and some symbolic, could serve as a harrowing prelude to his criminal trials, the first of which is scheduled to begin next month. In those trials, for the first time, he faces the threat of prison.
'''

We want an LLM to summarize the news article above. 

Let's create a prompt for our summarization LLM.

We'll give this LLM a creative name...The Summarizer.

In [3]:
summarizer_prompt_tmplt = '''Here is a portion of a news article:

### START NEWS ARTICLE ####
{article}
### END NEWS ARTICLE 

Please provide a summary of the news article above
'''

Let's establish criteria for what a high-quality summary is.
We will represent this criteria in a Pydantic class, so that we use it in a Guard.

For each criteria, we must establish a valid range of scores.

I have added a convenience function `to_prompt`. This allows the prompt criteria to be inserted into a prompt as natural language. 

In [6]:
from pydantic import BaseModel, Field
from guardrails.validators import ValidRange
class SummaryRatings(BaseModel):
    informative: int = Field(description="An informative summary captures the important information in the article and presents it accurately",
                             validators=[ValidRange(min=1, max=5, on_fail='reask')])
    coherent: int = Field(description="A coherent summary is well-structured and well-organized",
                           validators=[ValidRange(min=1, max=5, on_fail='reask')])
    attributable: int = Field(description="Is all the information in the summary fully attributable to the Article?",
                              validators=[ValidRange(min=1, max=5, on_fail='reask')])
    concise: int = Field(description="A good summary should convey the main ideas in the Article in less words.",
                                    validators=[ValidRange(min=1, max=5, on_fail='reask')])
    @classmethod
    def to_prompt(cls):
        '''Returns rating criteria formatted as a bulleted list, for use in prompts.'''
        return "\n- " + "\n- ".join([f"{str(k.title())}: {v.description}" for k, v in cls.model_fields.items()])

We need to explain our criteria in a prompt, so that an LLM can understand it. 

This LLM (which we will call the Critic) will use our criteria to evaluate the Summarizer.

In [9]:
critic_prompt_tmplt = f'''Here is a portion of a news article:

### BEGIN ARTICLE ###
{article}
### END ARTICLE ###

Here is a summary of the article above:

### BEGIN SUMMARY ###
{{summary}}
### END SUMMARY ###

Please rate the quality of the summary, according to the following criteria:
{SummaryRatings.to_prompt()}
'''

Let's create some functions that allow us to conveniently query the Summarizer and the Critic.

In [11]:
import openai
from openai import AzureOpenAI
import os

client = OpenAI()

def get_completion(prompt, model):
    '''A generic function to get chat completions from an OpenAI model.'''
    return client.chat.completions.create(
        model=model,
        messages=[{"role": "system", "content": "You are a helpful assistant."}, {"role":"user", "content":prompt}]
    ).choices[0].message.content

#We will use GPT-4 as the Critic
def invoke_critic(prompt, *args, **kwargs):
    return get_completion(prompt, "gpt-4")
#We will use GPT-3.5-Turbo as the Summarizer
def invoke_summarizer(prompt, *args, **kwargs):
    return get_completion(prompt, "gpt-35-turbo")

Now we can create a Guard that uses the Critic.

The Critic will act as a validator. It produces ratings, according to our SummaryRatings schema from above.

If the Critic's ratings fall short of a threshold (which we set with the `thresh` parameter), then our validator fails its input.

In [12]:
from guardrails import Guard
from guardrails.validators.llm_critic import LLMCritic
from guardrails.validators import ToxicLanguage

#Create a validator from the Critic
critic = LLMCritic(
    rating_schema=SummaryRatings,
    critic_prompt_tmplt=critic_prompt_tmplt,
    critic_llm_api=invoke_critic,
    on_fail="reask", 
    thresh=5,
    critic_guard_kwargs={
        "num_reasks":5
    }
)
#Createa a Guard that uses the validator
guard = Guard.from_string(validators=[critic])

Now, let's use our critic to validate a piece of text.

The example below serves as a sanity check of our validator.

However, it also illustrates an interesting use-case...

The LLMCritic validator can be used to validate results from very inferior algorithms. This can help determine whether data needs to be sent to manual review, for example. 



In [None]:
raw_output, validated_output, *rest = Guard().use(critic).validate("Donald Trump was fined.")

In [4]:
print(validated_output)

None


As you can see above, there is no valid output. The critic simply fails the poor summary.

In our next example, we use the Critic to validate GPT-35-Turbo as a Summarizer.

In [None]:
guard = Guard.from_string(validators=[critic])

raw_output, validated_output, *rest = guard(
    prompt=summarizer_prompt_tmplt.format(article=article),
    llm_api=invoke_summarizer
)

In [19]:
print(validated_output)

We can see above that GPT-3.5-Turbo was able to produce a summary that satisfied the Critic.

If we inspect the Guard's history, we see that the Critic was able to teach the Summarizer how to improve.

It took a follow up call, with modified instructions, in order to prompt the Summarizer to produce a satisfying response.

In [18]:
from rich import print


print(guard.history.last.tree)