<a href = "https://www.pieriantraining.com"><img src="../PT Centered Purple.png"> </a>

<em style="text-align:center">Copyrighted by Pierian Training</em>

# Parsing Output

Let's set up a Chat Model:

In [3]:
import os
from dotenv import load_dotenv

from langchain.prompts import PromptTemplate, SystemMessagePromptTemplate,ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")

In [4]:
model = ChatOpenAI(openai_api_key=api_key)

## Example: List Parsing 

In [6]:
# List Parsing: we want to have a list as an output
# We can browse all the parsers with TAB 
# https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/
# - JSON
# - XML
# - CSV: CommaSeparatedListOutputParser
# - YAML
# - PandasDataFrame
# - Enum
# - Pydantic
# - Datetime
# - ...
from langchain.output_parsers import CommaSeparatedListOutputParser

In [7]:
output_parser = CommaSeparatedListOutputParser()

In [10]:
# The instructions are really a string 
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [11]:
# However, the parser has also the method parse() which parses a correctly formatted string
reply = "one, two, three"
output_parser.parse("one, two, three")

['one', 'two', 'three']

In [18]:
# The prompt is a string with two placeholders: {request} and {format_instructions}
human_template = '{request}\n{format_instructions}' # \n new line is a good idea
human_prompt = HumanMessagePromptTemplate.from_template(human_template)

In [19]:
# Now, we can create a chat prompt
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])
chat_prompt.format_prompt(request="give me 5 characteristics of dogs",
                   format_instructions = output_parser.get_format_instructions())

ChatPromptValue(messages=[HumanMessage(content='give me 5 characteristics of dogs\nYour response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`')])

In [20]:
# Note: the request needs to line up with the instructions!
request = chat_prompt.format_prompt(
    request="give me 5 characteristics of dogs",
    format_instructions = output_parser.get_format_instructions()).to_messages()

In [21]:
result = model.invoke(request)

In [22]:
# We get back a string but it should follow the instructions
result.content # 'Loyal, playful, protective, social, obedient'

'Loyal, playful, protective, social, obedient'

In [23]:
# Convert to desired output
output_parser.parse(result.content) # ['Loyal', 'friendly', 'playful', 'protective', 'intelligent']

['Loyal', 'playful', 'protective', 'social', 'obedient']

## Example: Datetime Parser 

In [35]:
from langchain.output_parsers import DatetimeOutputParser

In [36]:
output_parser = DatetimeOutputParser()

In [37]:
print(output_parser.get_format_instructions())

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0959-04-28T05:29:30.612263Z, 0123-07-31T09:18:02.715416Z, 0137-03-31T05:16:12.249875Z

Return ONLY this string, no other words!


In [38]:
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [39]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [40]:
print(chat_prompt.format(request="When was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ))

Human: When was the 13th Amendment ratified in the US?
Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 1713-07-27T03:03:40.898689Z, 1718-03-10T13:56:20.049095Z, 0634-05-03T10:46:19.469959Z

Return ONLY this string, no other words!


In [41]:
request = chat_prompt.format_prompt(request="What date was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ).to_messages()

In [42]:
result = model.invoke(request,temperature=0)

In [43]:
# Careful with this, it sometimes will include extra information!
result.content

'1865-12-06T00:00:00.000000Z'

In [44]:
result.content

'1865-12-06T00:00:00.000000Z'

In [45]:
output_parser.parse(result.content)

datetime.datetime(1865, 12, 6, 0, 0)

---

# Methods to Fix Parsing Issues

## Auto-Fix Parser

In [81]:
# If we fear we might get misformatted data, we can use the OutputFixingParser
# Misformatted data is for instance an answer which is not aligned with the format instructions
# Example: '1865-12-06' instead of '1865-12-06T00:00:00Z' in the case of datetimes
from langchain.output_parsers import OutputFixingParser

output_parser = DatetimeOutputParser()

misformatted = result.content # '1865-12-06T00:00:00.000000Z'
#misformatted = "the day is 1865-12-06T00:00:00.000000Z"

In [82]:
misformatted

'1865-12-06T00:00:00.000000Z'

In [83]:
# To create a new parser which will fix the misformatted data
# we need to take the previous parser and the same model we used
new_parser = OutputFixingParser.from_llm(parser=output_parser, llm=model)

In [84]:
# Then, we pass the potentially missformatted data to it
# If it's correctly formatted, nothing will happen
# NOTE: this incurs in higher costs, because the model is called again
new_parser.parse(misformatted)

datetime.datetime(1865, 12, 6, 0, 0)

____
### Fixing via System Prompt:

In [29]:
system_prompt = SystemMessagePromptTemplate.from_template("You always reply to questions only in datetime patterns.")
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [30]:
chat_prompt = ChatPromptTemplate.from_messages([system_prompt,human_prompt])

In [31]:
print(chat_prompt.format(request="When was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ))

System: You always reply to questions only in datetime patterns.
Human: When was the 13th Amendment ratified in the US?
Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0878-01-27T13:27:08.832493Z, 2028-04-22T18:12:45.217964Z, 0771-11-18T02:53:24.751667Z

Return ONLY this string, no other words!


In [32]:
request = chat_prompt.format_prompt(request="What date was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ).to_messages()

In [33]:
result = model.invoke(request,temperature=0)

In [34]:
result.content

'1865-12-06T00:00:00.000000Z'

In [35]:
output_parser.parse(result.content)

datetime.datetime(1865, 12, 6, 0, 0)

Be careful! This could technically be construed as wrong? The full details from Wikipedia:

    27th state to ratify was Georgia: December 6, 1865

    Having been ratified by the legislatures of three-fourths of the states (27 of the 36 states, including those that had been in rebellion), Secretary of State Seward, on December 18, 1865, certified that the Thirteenth Amendment had become valid, to all intents and purposes, as a part of the Constitution.

You also have the issue of states leaving the union, which complicates what a full ratification means at that time. It kind of depends what is meant by the word "ratified"!

## Pydantic JSON Parser
You should also be aware of OpenAI's own JSON offerings (which are still quite new at this time!): https://platform.openai.com/docs/guides/gpt/function-calling


In [36]:
#pip install pydantic



In [85]:
from langchain.output_parsers import PydanticOutputParser

In [86]:
from pydantic import BaseModel, Field

In [87]:
# First, we define the Pydantic class
# We want to get objects that line up with these fields
class Scientist(BaseModel):
    name: str = Field(description="Name of a Scientist")
    discoveries: list = Field(description="Python list of discoveries")

In [88]:
# This is our NL query
query = 'Name a famous scientist and a list of their discoveries' 

In [89]:
# Pydantic parser
parser = PydanticOutputParser(pydantic_object=Scientist)

In [90]:
print(parser.get_format_instructions()) # The output should be formatted as a JSON instance...

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a Scientist", "title": "Name", "type": "string"}, "discoveries": {"description": "Python list of discoveries", "items": {}, "title": "Discoveries", "type": "array"}}, "required": ["name", "discoveries"]}
```


In [92]:
# Prompt template
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# We modify the template with the query
input_prompt = prompt.format_prompt(query=query)
# Run the model
output = model.invoke(input_prompt.to_string()) # Sometimes lowering the temperature helps
# Parse
parser.parse(output.content) # A Scientist object is returned!

Scientist(name='Isaac Newton', discoveries=['Law of Universal Gravitation', 'Three Laws of Motion', 'Calculus'])