# Parsing Output

Often we may need the LLM output to be in a specific format, e.g a datetime object or a json object.

Langchain allows to easily convert the LLM outputs in precise data types or even in custom class instances using Pydantic.

In [2]:
from langchain.prompts import PromptTemplate, SystemMessagePromptTemplate,ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

In [19]:
openai_api_key = os.getenv(key="OPENAI_API_KEY")
chat = ChatOpenAI(openai_api_key=openai_api_key)

Parsers consist of two key elements:

1) format_instructions: An extra string that Langchain adds to the end of a prompt to assist with formatting.
2) parse() method: A method for using eval() internally to parse the string reply to the exact Python object needed.

## List Parsing 

In [6]:
# Step 1: Import the parser and create the parser instance
from langchain.output_parsers import CommaSeparatedListOutputParser
output_parser = CommaSeparatedListOutputParser()

In [11]:
# Step 2: Format instructions
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

# It's giving the model instructions on how should be the response!

Your response should be a list of comma separated values, eg: `foo, bar, baz`


In [13]:
# Test the parser
reply = "one, two, three"
output_parser.parse("one, two, three")

['one', 'two', 'three']

In [14]:
# Now for the human template we use the request and also the format_instructions in the end, separated by a space.
human_template = '{request} {format_instructions}'
human_prompt = HumanMessagePromptTemplate.from_template(human_template)

In [15]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

chat_prompt.format_prompt(request="give me 5 characteristics of dogs",format_instructions = output_parser.get_format_instructions())

ChatPromptValue(messages=[HumanMessage(content='give me 5 characteristics of dogs Your response should be a list of comma separated values, eg: `foo, bar, baz`')])

In [16]:
request = chat_prompt.format_prompt(request="give me 5 characteristics of dogs",format_instructions = output_parser.get_format_instructions()).to_messages()

In [20]:
result = chat(request)

In [21]:
result.content

'Loyal, playful, protective, social, intelligent'

In [22]:
# Convert to desired output:
output_parser.parse(result.content)

['Loyal', 'playful', 'protective', 'social', 'intelligent']

## Datetime Parser 

In [23]:
from langchain.output_parsers import DatetimeOutputParser

In [24]:
output_parser = DatetimeOutputParser()

In [25]:
print(output_parser.get_format_instructions())

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0343-02-24T16:31:50.909073Z, 0254-02-18T22:20:23.816815Z, 0712-09-22T16:05:33.529778Z

Return ONLY this string, no other words!


In [26]:
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [27]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [28]:
print(chat_prompt.format(request="When was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ))

Human: When was the 13th Amendment ratified in the US?
Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 1665-08-11T06:16:51.355942Z, 1767-11-20T15:39:12.343483Z, 0068-08-04T05:09:05.387194Z

Return ONLY this string, no other words!


In [29]:
request = chat_prompt.format_prompt(request="What date was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ).to_messages()

In [34]:
result = chat(request,temperature=0)

In [37]:
result.content

'1865-12-06T00:00:00.000000Z'

In [38]:
output_parser.parse(result.content)

datetime.datetime(1865, 12, 6, 0, 0)

---

# Methods to Fix Parsing Issues

## Auto-Fix Parser

Sometimes the LLM response is not in the expected format. For example, it could have been:

The 13th Amendment was ratified in the US on December 6, 1865.

The datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"

In this case, we would need to fix the parsing issues.

An OutputFixingParser uses a chain and re-sends the original reply to an LLM to fix it.

In [39]:
from langchain.output_parsers import OutputFixingParser

output_parser = DatetimeOutputParser()

misformatted = 'The 13th Amendment was ratified in the US on December 6, 1865. The datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"'

In [45]:
new_parser = OutputFixingParser.from_llm(parser=output_parser, llm=chat)

In [46]:
new_parser.parse(misformatted)

datetime.datetime(1865, 12, 6, 0, 0)

____
### Fixing via a strong System Prompt:

In [136]:
system_prompt = SystemMessagePromptTemplate.from_template("You always reply to questions only in datetime patterns.")
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [48]:
human_template = "{request}\n{format_instructions}"
human_prompt = HumanMessagePromptTemplate.from_template(human_template)

system_template = "You always reply to questions only in datetime pattern and no more text."
system_prompt = HumanMessagePromptTemplate.from_template(system_template)

chat_prompt = ChatPromptTemplate.from_messages([system_prompt, human_prompt])

In [49]:
model_request = chat_prompt.format_prompt(
    request="What date was the 13th Amendment ratified in the US?",
    format_instructions=output_parser.get_format_instructions(),
).to_messages()

In [50]:
result = chat(model_request, temperature=0)
print(result.content)

1865-12-06T00:00:00.000000Z


In [51]:
output_parser.parse(result.content)

datetime.datetime(1865, 12, 6, 0, 0)

## Pydantic JSON Parser

In [77]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

In [53]:
from pydantic import BaseModel, Field

In [59]:
class Scientist(BaseModel):
    name: str = Field(description="Name of a Scientist")
    discoveries: list[str] = Field(description="Python list of discoveries")

In [None]:
query = 'Name a famous scientist and a list of their discoveries' 

In [None]:
parser = PydanticOutputParser(pydantic_object=Scientist)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"title": "Name", "description": "Name of a Scientist", "type": "string"}, "discoveries": {"title": "Discoveries", "description": "Python list of discoveries", "type": "array", "items": {}}}, "required": ["name", "discoveries"]}
```


In [75]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query="Tell me about a famous scientist")

output = model(_input.to_string())

parser.parse(output)

Scientist(name='Albert Einstein', discoveries=['Theory of Relativity', 'Photoelectric Effect', 'Brownian Motion'])