## Output Parsing

Language models output text. But there are times where you want to get more structured information than just text back

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

- **Get format instructions**: A method which returns a string containing instructions for how the output of a language model should be formatted.
- **Parse**: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

- Output Parsing
    - StrOutputParser
    - JsonOutputParser
    - CSV Output Parser
    - Datatime Output Parser
    - Structured Output Parser (Pydanitc or Json)


### `Pydantinc` Output Parser

In [1]:
from dotenv import load_dotenv

load_dotenv('../env')

# langfuse or opik

True

In [2]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import (
                                        SystemMessagePromptTemplate,
                                        HumanMessagePromptTemplate,
                                        ChatPromptTemplate,
                                        PromptTemplate
                                        )

base_url = "http://localhost:11434"
model = 'deepseek-r1:1.5b'

llm = ChatOllama(base_url=base_url, model=model)

In [3]:
from typing import  Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser


In [4]:
class Joke(BaseModel):
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline of the joke")
    rating: Optional[int] = Field(description="The rating of the joke is from 1 to 10", default=None)

In [5]:
parser = PydanticOutputParser(pydantic_object=Joke)

In [6]:
instruction = parser.get_format_instructions()

In [7]:
print(instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [8]:
prompt = PromptTemplate(
    template='''You are an expert joke teller. Create a joke about dogs and respond STRICTLY in the following JSON format:

{format_instruction}

Ensure your response is a valid JSON object that can be directly parsed. Do not include any additional text or explanation.

Query: {query}
Joke:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)


chain = prompt | llm

In [9]:
print(prompt)

input_variables=['query'] input_types={} partial_variables={'format_instruction': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}\n```'} template='You are an expert joke teller. Create a joke abo

In [10]:
output = chain.invoke({'query': 'Tell me a programming joke'})

In [11]:
print(output.content)

<think>
Alright, I need to create a programming joke based on the user's request. Let me start by understanding what they're asking for. The user provided an example JSON response and specified the exact format they want in their output. So, my task is to follow that structure closely.

First, I'll look at the setup section. It should describe the joke clearly. A common programming joke involves a function with unrelated code but a humorous twist. I'm thinking of something like adding 1 to all elements in an array or multiplying by zero. Those are simple enough and easy to explain.

Next is the punchline. This needs to be concise and encapsulate the humor. It should tie back to the setup in a way that the audience can understand why it's funny. For example, after the code, you could say, "Did you notice that adding 1 to every element of an array just made everything multiply by zero?"

Now, I'll add the rating section. Since this joke is pretty light-hearted, I should give it a low rat

In [12]:
prompt = PromptTemplate(
    template='''You are an expert joke teller. Create a joke about dogs and respond STRICTLY in the following JSON format:

{format_instruction}

Ensure your response is a valid JSON object that can be directly parsed. Do not include any additional text or explanation.

Query: {query}
Joke:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)


In [13]:
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a programming joke'})
print(output)

setup='Why did the dog eat all the apples?' punchline='Because it was data-ing them!' rating=8


### Parsing with `.with_structured_output()` method
- This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes.
-  The schema can be specified as a TypedDict class, JSON Schema or a Pydantic class.


In [17]:
# registry.ollama.ai/library/deepseek-r1:1.5b does not support tools . Hence, switch to llama3.2:3b
model = 'llama3.2:3b'

llm = ChatOllama(base_url=base_url, model=model)
output = llm.invoke('Tell me a joke about the cat')
print(output.content)

Why did the cat join a band?

Because it wanted to be the purr-cussionist.


In [18]:
structured_llm = llm.with_structured_output(Joke)

In [19]:
output = structured_llm.invoke('Tell me a joke about the cat')
print(output)

setup='Because it wanted to be the purr-cussionist!' punchline='Why did the cat join a band?' rating=8


### `JSON` Output Parser

- Output parsers accept a string or BaseMessage as input and can return an arbitrary type.



In [20]:
from langchain_core.output_parsers import JsonOutputParser

In [21]:
parser = JsonOutputParser(pydantic_object=Joke)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [22]:
prompt = PromptTemplate(
    template='''You are an expert joke teller. Create a joke about dogs and respond STRICTLY in the following JSON format:

{format_instruction}

Ensure your response is a valid JSON object that can be directly parsed. Do not include any additional text or explanation.

Query: {query}
Joke:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm
output = chain.invoke({'query': 'Tell me a programming joke'})
print(output.content)

{"properties": {"setup": "Why do programmers prefer dark mode?", "punchline": "Because light attracts bugs.", "rating": null}}


In [23]:
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a programming joke'})
print(output)

{'properties': {'setup': {'description': 'A programmer walks into a bar', 'title': 'Why did the programmer walk into a bar?', 'type': 'string'}, 'punchline': {'description': 'Because he heard it had functioning toilets', 'title': 'He just wanted to find a paws-itive review', 'type': 'string'}, 'rating': {}}}


### CSV Output Parser

- This output parser can be used when you want to return a list of comma-separated items.



In [24]:
# value1, values2, values3, so on

from langchain_core.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

print(parser.get_format_instructions())

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [25]:
format_instruction = parser.get_format_instructions()

prompt = PromptTemplate(
    template='''
    Answer the user query with a list of values. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)   

In [28]:
chain = prompt | llm | parser

output = chain.invoke({'query': 'list the tools similar to langchain'})
print(output)

['Hugging Face', 'Meta AI', 'Google Cloud AI Platform', 'Microsoft Azure Cognitive Services', 'Amazon SageMaker']


### Datatime Output Parser

- Gives output in datetime format. Sometimes throws error if the LLM output is not in datetime format.

In [29]:
from langchain.output_parsers import DatetimeOutputParser

In [30]:
parser = DatetimeOutputParser()

format_instruction = parser.get_format_instructions()
print(format_instruction)

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 1634-02-08T02:44:16.703278Z, 1860-07-04T19:31:19.422386Z, 269-04-13T00:44:57.323073Z

Return ONLY this string, no other words!


In [31]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a datetime. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)

In [49]:
chain = prompt | llm

In [50]:
output = chain.invoke({'query': "What are the key dates in LangChain's history"})

print(output.content)

2022-02-08T17:41:52.121Z


In [51]:
chain = prompt | llm | parser

In [52]:
output = chain.invoke({'query': "What are the key dates in LangChain's history"})

print(output)

2021-02-22 10:24:01.392395
