# Output Parsing
Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

- **Get format instructions**: A method which returns a string containing instructions for how the output of a language model should be formatted.
- **Parse**: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

## Pydantic Output Parser

In [17]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ["LANGSMITH_ENDPOINT"]

'https://api.smith.langchain.com'

In [18]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import (SystemMessagePromptTemplate, 
                                    HumanMessagePromptTemplate, 
                                    ChatPromptTemplate,
                                    PromptTemplate)

base_url = "http://localhost:11434/"
model_name = "llama3.2"

llm = ChatOllama(
    base_url = base_url,
    model = model_name
)

In [19]:
from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

In [20]:
class Joke(BaseModel):
  """Joke class to tell to the user."""
  
  setup: str = Field(description="The setup of the joke.")
  punchline: str = Field(description="The punchline of the joke.")
  rating: Optional[int] = Field(description="The rating of the joke in the range of 1 to 10.")

In [21]:
parser = PydanticOutputParser(pydantic_object=Joke)
parser

PydanticOutputParser(pydantic_object=<class '__main__.Joke'>)

In [22]:
instruction = parser.get_format_instructions()
print(instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke class to tell to the user.", "properties": {"setup": {"description": "The setup of the joke.", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke.", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "The rating of the joke in the range of 1 to 10.", "title": "Rating"}}, "required": ["setup", "punchline", "rating"]}
```


In [23]:
prompt = PromptTemplate(
  template="""
  Answer the user query with a joke with this formatting instruction.
  {format_instruction}
  
  Query: {query}
  Answer:""",
  input_variables=['query'],
  partial_variables={'format_instruction': parser.get_format_instructions()}
)

In [24]:
chain = prompt | llm
chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={'format_instruction': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"description": "Joke class to tell to the user.", "properties": {"setup": {"description": "The setup of the joke.", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke.", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "The rating of the joke in the range of 1 to 10.", "title": "Rating"}}, "required": ["setup", "punchline", "rati

In [25]:
response = chain.invoke({"query": "Tell me a joke about cat."})
print(response.content)

{"setup": "Why did the cat join a band?", "punchline": "Because it wanted to be the purr-cussionist!", "rating": null}


In [26]:
chain = prompt | llm | parser
response = chain.invoke({"query": "Tell me a joke about cat."})
print(response)

setup='Why did the cat join a band?' punchline='Because it wanted to be the purr-cussionist!' rating=None


## Parsing with `.with_structured_output()`
- This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes.
-  The schema can be specified as a TypedDict class, JSON Schema or a Pydantic class.

In [27]:
response = llm.invoke("Tell me a joke about cat.")
print(response.content)

Why did the cat join a band?

Because it wanted to be the purr-cussionist! (get it?)


In [28]:
structured_llm = llm.with_structured_output(Joke)
structured_llm

RunnableBinding(bound=ChatOllama(model='llama3.2', base_url='http://localhost:11434/'), kwargs={'tools': [{'type': 'function', 'function': {'name': 'Joke', 'description': 'Joke class to tell to the user.', 'parameters': {'properties': {'setup': {'description': 'The setup of the joke.', 'type': 'string'}, 'punchline': {'description': 'The punchline of the joke.', 'type': 'string'}, 'rating': {'anyOf': [{'type': 'integer'}, {'type': 'null'}], 'description': 'The rating of the joke in the range of 1 to 10.'}}, 'required': ['setup', 'punchline', 'rating'], 'type': 'object'}}}], 'tool_choice': 'any'}, config={}, config_factories=[])
| PydanticToolsParser(first_tool_only=True, tools=[<class '__main__.Joke'>])

In [29]:
response = structured_llm.invoke("Tell me a joke about cat.")
print(response)

setup='Why did the cat join a band?' punchline='Because it wanted to be the purr-cussionist.' rating=8


## JSON Output Parser

In [30]:
from langchain_core.output_parsers import JsonOutputParser

In [33]:
json_parser = JsonOutputParser(pydantic_object=Joke)
print(json_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke class to tell to the user.", "properties": {"setup": {"description": "The setup of the joke.", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke.", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "The rating of the joke in the range of 1 to 10.", "title": "Rating"}}, "required": ["setup", "punchline", "rating"]}
```


In [34]:
prompt = PromptTemplate(
  template="""
  Answer the user query with a joke with this formatting instruction.
  {format_instruction}
  
  Query: {query}
  Answer:""",
  input_variables=['query'],
  partial_variables={'format_instruction': json_parser.get_format_instructions()}
)

In [35]:
chain = prompt | llm | json_parser
response = chain.invoke({"query": "Tell me a joke about cat."})
print(response)

{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist!', 'rating': 8}


## CSV Output Parser

In [36]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

In [38]:
csv_parser = CommaSeparatedListOutputParser()
print(csv_parser.get_format_instructions())

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [39]:
prompt = PromptTemplate(
  template="""
  Answer the user query with a list of values. Here is the formatting instruction.
  {format_instruction}
  
  Query: {query}
  Answer:""",
  input_variables=['query'],
  partial_variables={'format_instruction': csv_parser.get_format_instructions()}
)

In [40]:
chain = prompt | llm | csv_parser
response = chain.invoke({"query": "Generate my website SEO keywords. I have content about NLP and LLM."})
print(response)

['natural language processing', 'large language models', 'machine learning', 'artificial intelligence', 'language understanding', 'text analysis', 'deep learning', 'neural networks', 'semantic search', 'intent identification']


## Datetime Output Parser

In [42]:
from langchain.output_parsers import DatetimeOutputParser

In [43]:
date_parser = DatetimeOutputParser()
print(date_parser.get_format_instructions())

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 1523-04-22T01:52:54.564926Z, 1226-01-22T04:48:49.224632Z, 0781-11-09T04:28:47.769173Z

Return ONLY this string, no other words!


In [46]:
prompt = PromptTemplate(
  template='''
  Answer the user query with a datetime format. Here is your formatting instruction.
  {format_instruction}

  Query: {query}
  Answer:''',
  input_variables=['query'],
  partial_variables={'format_instruction': date_parser.get_format_instructions()}
)   

In [47]:
chain = prompt | llm | date_parser
response = chain.invoke({"query": "When America got discovered?"})
print(response)

1492-10-12 05:00:00
