# Pydantic (JSON) parser
This output parser allows users to specify an arbitrary JSON schema and query LLMs for JSON outputs that conform to that schema.

Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie's ability already drops off dramatically. 

Use Pydantic to declare your data model. Pydantic's BaseModel like a Python dataclass, but with actual type checking + coercion.

In [1]:
import os
os.environ['OPENAI_API_KEY'] = 'sk-'


In [2]:
from langchain.prompts import (
    PromptTemplate,
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

In [3]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

In [4]:
model_name = "text-davinci-003"
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)

In [5]:
# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query=joke_query)

output = model(_input.to_string())

parser.parse(output)

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

In [21]:
# Here's another example, but with a compound typed field.
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")


actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query=actor_query)

output = model(_input.to_string())

parser.parse(output)

Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'The Green Mile', 'Cast Away', 'Toy Story'])

In [44]:
news_query = '''Additionally, please extract the following information from the article: 

summary: <Brief summary of the news article>
crisis_assessment": <Your judgement about the crisis potential>
crisis_ranking": <Rank the crisis on a scale of 1(low) to 10(high)>
locations_affected": 
      name": <Location name>
      category: <Location category such as country, state, region, city, or unknown>
people_affected:
      specific_count: <Total count of people affected>
      by_location: <Count of people affected in each mentioned location>
      estimate: <Rough estimate if an exact count isn't provided>
crisis_category: <Identified crisis type>
key_stakeholders: <List of key involved parties>
causes_and_triggers: <Causes or triggers of the crisis>
response_efforts: <Efforts in response to the crisis>
timeline_and_progress: <Description of the crisis timeline and progress>
resource_requirements: <Required resources to manage the crisis>

###
News Article:

Two monkeys taken from the Dallas Zoo were found Tuesday in an abandoned home after going missing the day before from their enclosure, which had been cut. But no arrests have been made, deepening the mystery at the zoo that has included other cut fences, the escape of a small leopard and the suspicious death of an endangered vulture.

Dallas police said they found the two emperor tamarin monkeys after getting a tip that they could be in an abandoned home in Lancaster, located just south of the zoo. The animals were located, safe, in a closet, and then returned to zoo for veterinary evaluation.

Police said earlier Tuesday that they were still working to determine whether or not the incidents over the last few weeks are related.

Meanwhile, in Louisiana, officials were investigating after 12 squirrel monkeys were taken from a zoo there on Sunday and considering whether there could be a connection.

Here's what is known so far about the incidents:

WHAT HAS HAPPENED AT THE DALLAS ZOO?

The zoo closed Jan. 13 after workers arriving that morning found that the clouded leopard, named Nova, was missing. After a search that included police, the leopard weighing 20-25 pounds (9-11 kilograms) was found later that day near her habitat.

Police said a cutting tool was intentionally used to make the opening in her enclosure. A similar gash also was found in an enclosure for langur monkeys, though none got out or appeared harmed, police said.

On Jan. 21, an endangered lappet-faced vulture named Pin was found dead by arriving workers. Gregg Hudson, the zoo's president and CEO, called the death “very suspicious” and said the vulture had “a wound,"" but declined to give further details.

Hudson said in a news conference following Pin's death that the vulture enclosure didn't appear to be tampered with.

On Monday police said the two emperor tamarin monkeys — which have long whiskers that look like a mustache — were believed to have been taken after someone cut an opening in their enclosure.

The following day police released a photo and video of a man they said they wanted to talk to about the monkeys. The photo shows a man eating Doritos chips while walking, and in the video clip he is seen walking down a path.

WHAT COULD BE THE MOTIVE IN TAKING THE MONKEYS?

Lynn Cuny, founder and president of Wildlife Rescue & Rehabilitation in Kendalia, Texas, said she wouldn’t be surprised if it turns out the monkeys were taken to be sold. Depending on the buyer, she said, a monkey like those could be sold for “several thousands” of dollars.

“Primates are high-dollar animals in the wildlife pet trade in this country,” Cuny said. “Everybody that wants one wants one for all the wrong reasons — there’s never any good reason to have any wild animal as a pet.”

She said there were a variety of ways the taken monkeys could have been in danger, from an improper diet to exposure to cold. Temperatures in Dallas dipped into the 20s on Tuesday during a winter storm.

WHAT IS KNOWN ABOUT THE VULTURE?

Pin's death has been hard on the staff, a zoo official said.

The vulture was “a beloved member of the bird department,” according to Harrison Edell, the zoo’s executive vice president for animal care and conservation.

Speaking at a news conference, Edell said Pin was at least 35 years old and had been at the zoo for 33 years. “A lot of our teams have worked closely with him for all of that time,” Edell said.

Pin, one of four lappet-faced vultures at the zoo, was said to have sired 11 offspring, and his first grandchild hatched in early 2020.

Edell said Pin's death was not only a personal loss but also a loss for the species, which “could potentially go extinct in our lifetime.”

WHAT IS KNOWN ABOUT SECURITY?

Hudson, the zoo's CEO, said in a news conference following Pin’s death that normal operating procedures included over 100 cameras to monitor public, staff and exhibit areas, and that number had been increased. Overnight presence of security and staff was also raised.

Where possible, he said, zoo officials limited the ability of animals to go outside overnight.

After Nova went missing, officials said they had reviewed surveillance video but not what it showed.

The zoo was closed Tuesday and Wednesday due to the storm.

WHAT HAPPENED IN LOUISIANA?

The 12 squirrel monkeys were discovered missing Sunday from their enclosure at a zoo in the state's southeast.

Their habitat at Zoosiana in Broussard, about 60 miles (96 kilometers) west of Baton Rouge, had been “compromised” and some damage was done to get in, city Police Chief Vance Olivier said Tuesday. He declined to provide further details on the damage, citing the ongoing investigation.

He said police did not have any suspects yet but were still searching through video files.

Zoosiana said in a Facebook post that the remaining monkeys have been assessed and appear unharmed.

HAVE THERE BEEN OTHER INCIDENTS BEFORE AT THE DALLAS ZOO?

In 2004, a 340-pound (154-kilogram) gorilla named Jabari jumped over a wall and went on a 40-minute rampage that injured three people before police shot and killed the animal.###'''


In [7]:
news_query = '''

###
News Article:

Two monkeys taken from the Dallas Zoo were found Tuesday in an abandoned home after going missing the day before from their enclosure, which had been cut. But no arrests have been made, deepening the mystery at the zoo that has included other cut fences, the escape of a small leopard and the suspicious death of an endangered vulture.

Dallas police said they found the two emperor tamarin monkeys after getting a tip that they could be in an abandoned home in Lancaster, located just south of the zoo. The animals were located, safe, in a closet, and then returned to zoo for veterinary evaluation.

Police said earlier Tuesday that they were still working to determine whether or not the incidents over the last few weeks are related.

Meanwhile, in Louisiana, officials were investigating after 12 squirrel monkeys were taken from a zoo there on Sunday and considering whether there could be a connection.

Here's what is known so far about the incidents:

WHAT HAS HAPPENED AT THE DALLAS ZOO?

The zoo closed Jan. 13 after workers arriving that morning found that the clouded leopard, named Nova, was missing. After a search that included police, the leopard weighing 20-25 pounds (9-11 kilograms) was found later that day near her habitat.

Police said a cutting tool was intentionally used to make the opening in her enclosure. A similar gash also was found in an enclosure for langur monkeys, though none got out or appeared harmed, police said.

On Jan. 21, an endangered lappet-faced vulture named Pin was found dead by arriving workers. Gregg Hudson, the zoo's president and CEO, called the death “very suspicious” and said the vulture had “a wound,"" but declined to give further details.

Hudson said in a news conference following Pin's death that the vulture enclosure didn't appear to be tampered with.

On Monday police said the two emperor tamarin monkeys — which have long whiskers that look like a mustache — were believed to have been taken after someone cut an opening in their enclosure.

The following day police released a photo and video of a man they said they wanted to talk to about the monkeys. The photo shows a man eating Doritos chips while walking, and in the video clip he is seen walking down a path.

WHAT COULD BE THE MOTIVE IN TAKING THE MONKEYS?

Lynn Cuny, founder and president of Wildlife Rescue & Rehabilitation in Kendalia, Texas, said she wouldn’t be surprised if it turns out the monkeys were taken to be sold. Depending on the buyer, she said, a monkey like those could be sold for “several thousands” of dollars.

“Primates are high-dollar animals in the wildlife pet trade in this country,” Cuny said. “Everybody that wants one wants one for all the wrong reasons — there’s never any good reason to have any wild animal as a pet.”

She said there were a variety of ways the taken monkeys could have been in danger, from an improper diet to exposure to cold. Temperatures in Dallas dipped into the 20s on Tuesday during a winter storm.

WHAT IS KNOWN ABOUT THE VULTURE?

Pin's death has been hard on the staff, a zoo official said.

The vulture was “a beloved member of the bird department,” according to Harrison Edell, the zoo’s executive vice president for animal care and conservation.

Speaking at a news conference, Edell said Pin was at least 35 years old and had been at the zoo for 33 years. “A lot of our teams have worked closely with him for all of that time,” Edell said.

Pin, one of four lappet-faced vultures at the zoo, was said to have sired 11 offspring, and his first grandchild hatched in early 2020.

Edell said Pin's death was not only a personal loss but also a loss for the species, which “could potentially go extinct in our lifetime.”

WHAT IS KNOWN ABOUT SECURITY?

Hudson, the zoo's CEO, said in a news conference following Pin’s death that normal operating procedures included over 100 cameras to monitor public, staff and exhibit areas, and that number had been increased. Overnight presence of security and staff was also raised.

Where possible, he said, zoo officials limited the ability of animals to go outside overnight.

After Nova went missing, officials said they had reviewed surveillance video but not what it showed.

The zoo was closed Tuesday and Wednesday due to the storm.

WHAT HAPPENED IN LOUISIANA?

The 12 squirrel monkeys were discovered missing Sunday from their enclosure at a zoo in the state's southeast.

Their habitat at Zoosiana in Broussard, about 60 miles (96 kilometers) west of Baton Rouge, had been “compromised” and some damage was done to get in, city Police Chief Vance Olivier said Tuesday. He declined to provide further details on the damage, citing the ongoing investigation.

He said police did not have any suspects yet but were still searching through video files.

Zoosiana said in a Facebook post that the remaining monkeys have been assessed and appear unharmed.

HAVE THERE BEEN OTHER INCIDENTS BEFORE AT THE DALLAS ZOO?

In 2004, a 340-pound (154-kilogram) gorilla named Jabari jumped over a wall and went on a 40-minute rampage that injured three people before police shot and killed the animal.###'''


In [8]:
import re
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def remove_punctuation_except(text, allowed_chars=('.','')):
    cleaned_text = ''.join(c for c in text if c not in string.punctuation or c in allowed_chars)
    cleaned_text = re.sub(r'\s+', ' ', cleaned_text).strip()
    return cleaned_text

transcript = remove_punctuation_except(news_query, allowed_chars=('.',''))
transcript = transcript.lower()

stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(transcript)
filtered_text = [word for word in word_tokens if not word in stop_words]
transcript = ' '.join(filtered_text)

transcript = re.sub(r'\s+', ' ', transcript).strip()
transcript = re.sub(r"\s*\.\s*", ". ", transcript)
transcript = re.sub(r",{2,}", ",", transcript)

sep_token=" \n "
transcript = transcript.replace(sep_token, " ")

transcript = ' '.join(transcript.split())
transcript = re.sub(r'\b(\w+)(\s+\1)+\b', r'\1', transcript, flags=re.IGNORECASE)

print(len(news_query))
print(len(transcript))
#print(transcript)
cleaned_news_query = transcript


5148
3524


In [9]:
class NewsCrisis(BaseModel):
    summary: str = Field(description="Brief summary of the news article")
    crisis_assessment: str = Field(description="Your judgement about the crisis potential")
    crisis_ranking: int = Field(description="Rank the crisis on a scale of 1(low) to 10(high)")
    locations_affected: List[str] = Field(description="Location names")
    people_affected: List[str] = Field(description="Location names")
      #specific_count: <Total count of people affected>
     # by_location: <Count of people affected in each mentioned location>
      #estimate: <Rough estimate if an exact count isn't provided>
    crisis_category: str = Field(description="Identified crisis")
    key_stakeholders: str = Field(description="List of key involved parties")
    causes_and_triggers: str = Field(description="Causes or triggers of the crisis")
    response_efforts: str = Field(description="Efforts in response to the crisis")
    timeline_and_progress: str = Field(description="Description of the crisis timeline and progress")
    resource_requirements: str = Field(description="Required resources to manage the crisis")







As an expert evaluator, I need you to assess whether a given news article indicates a potential humanitarian crisis. Please read the article and provide your judgment based on the content, considering factors such as the magnitude of humanitarian impact.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"summary": {"title": "Summary", "description": "Brief summary of the news article", "type": "string"}, "crisis_assessment": {"title": "Crisis Assessment", "description": "Your judgement about the crisis potential", "type": "string"}, "crisis_ranking": {"title": "Crisis Ranking

In [13]:
from typing import List, Dict
from pydantic import BaseModel, Field

class Location(BaseModel):
    name: str = Field(description="Location name")
    category: str = Field(description="Location category such as country, state, region, city, or unknown")

class PeopleAffected(BaseModel):
    specific_count: int = Field(description="Total count of people affected")
    by_location: Dict[str, int] = Field(description="Count of people affected in each mentioned location")
    estimate: str = Field(description="Rough estimate if an exact count isn't provided")

class NewsCrisis(BaseModel):
    summary: str = Field(description="Brief summary of the news article")
    crisis_assessment: str = Field(description="Your judgement about the crisis potential")
    crisis_ranking: int = Field(description="Rank the crisis on a scale of 1(low) to 10(high)")
    locations_affected: List[Location] = Field(description="List of locations affected")
    people_affected: PeopleAffected = Field(description="Information about the people affected by the crisis")


In [14]:
parser = PydanticOutputParser(pydantic_object=NewsCrisis)

prompt = PromptTemplate(
    template="As an expert evaluator, I need you to assess whether a given news article indicates a potential humanitarian crisis. Please read the article and provide your judgment based on the content, considering factors such as the magnitude of humanitarian impact.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query=cleaned_news_query)
print(_input.to_string()[:10000])
print(len(_input.to_string()))
output = model(_input.to_string())
print(output)

As an expert evaluator, I need you to assess whether a given news article indicates a potential humanitarian crisis. Please read the article and provide your judgment based on the content, considering factors such as the magnitude of humanitarian impact.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"summary": {"title": "Summary", "description": "Brief summary of the news article", "type": "string"}, "crisis_assessment": {"title": "Crisis Assessment", "description": "Your judgement about the crisis potential", "type": "string"}, "crisis_ranking": {"title": "Crisis Ranking

In [10]:
# print length of output
print(len(output))
# Remove the word "Output:" from the JSON string
new_output = output.replace('Output:', '')
import json
# Parse the modified JSON string
data = json.loads(new_output)
# Convert the "name" values within "locations_affected" to strings
#data['locations_affected'] = [location['name'] for location in data['locations_affected']]
# Print the parsed JSON
print(json.dumps(data, indent=2))


1052


JSONDecodeError: Unterminated string starting at: line 12 column 3 (char 1028)