## Output Parsing

Language models output text. But there are times where you want to get more structured information than just text back

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

- **Get format instructions**: A method which returns a string containing instructions for how the output of a language model should be formatted.
- **Parse**: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

- Output Parsing
    - StrOutputParser
    - JsonOutputParser
    - CSV Output Parser
    - Datatime Output Parser
    - Structured Output Parser (Pydanitc or Json)


### `Pydantinc` Output Parser

In [1]:
from dotenv import load_dotenv

load_dotenv('./../env')

# langfuse or opik

True

In [2]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import (
                                        SystemMessagePromptTemplate,
                                        HumanMessagePromptTemplate,
                                        ChatPromptTemplate,
                                        PromptTemplate
                                        )

base_url = "http://localhost:11434"
model = 'llama3.2:3b'

llm = ChatOllama(base_url=base_url, model=model)

In [3]:
from typing import  Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser


In [4]:
class Joke(BaseModel):
    """Joke to tell user"""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline of the joke")
    rating: Optional[int] = Field(description="The rating of the joke is from 1 to 10", default=None)

In [5]:
parser = PydanticOutputParser(pydantic_object=Joke)

In [6]:
instruction = parser.get_format_instructions()

In [7]:
print(instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [8]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm

In [9]:
output = chain.invoke({'query': 'Tell me a joke about the cat'})

In [10]:
print(output.content)

{"setup": "Why did the cat join a band?", "punchline": "Because it wanted to be a purr-cussionist!", "rating": 7}


In [11]:
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a joke about the dogs'})
print(output)

setup='Why did the dog go to the vet?' punchline='Because he was feeling ruff!' rating=8


### Parsing with `.with_structured_output()` method
- This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes.
-  The schema can be specified as a TypedDict class, JSON Schema or a Pydantic class.


In [12]:
output = llm.invoke('Tell me a joke about the cat')
print(output.content)

Why did the cat join a band?

Because it wanted to be the purr-cussionist! (get it?)


In [13]:
structured_llm = llm.with_structured_output(Joke)

In [14]:
output = structured_llm.invoke('Tell me a joke about the cat')
print(output)

setup='Why did the cat join a band?' punchline='Because it wanted to be the purr-cussionist!' rating=8


### `JSON` Output Parser

- Output parsers accept a string or BaseMessage as input and can return an arbitrary type.



In [15]:
from langchain_core.output_parsers import JsonOutputParser

In [16]:
parser = JsonOutputParser(pydantic_object=Joke)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [17]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm
output = chain.invoke({'query': 'Tell me a joke about the cat'})
print(output.content)

{"setup": "Why did the cat join a band?", "punchline": "Because it wanted to be the purr-cussionist!", "rating": 8}


In [18]:
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a joke about the cat'})
print(output)

{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist.', 'rating': 8}


### CSV Output Parser

- This output parser can be used when you want to return a list of comma-separated items.



In [19]:
# value1, values2, values3, so on

from langchain_core.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

print(parser.get_format_instructions())

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [20]:
format_instruction = parser.get_format_instructions()

prompt = PromptTemplate(
    template='''
    Answer the user query with a list of values. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)   

In [21]:
chain = prompt | llm | parser

output = chain.invoke({'query': 'generate my website seo keywords. I have content about the NLP and LLM.'})
print(output)

['natural language processing', 'large language models', 'machine learning', 'artificial intelligence', 'deep learning', 'neural networks', 'text analysis', 'sentiment analysis', 'language understanding', 'chatbots', 'voice assistants', 'semantic search', 'entity recognition', 'topic modeling', 'language translation', 'natural language generation']


### Datatime Output Parser

- Gives output in datetime format. Sometimes throws error if the LLM output is not in datetime format.

In [22]:
from langchain.output_parsers import DatetimeOutputParser

In [23]:
parser = DatetimeOutputParser()

format_instruction = parser.get_format_instructions()
print(format_instruction)

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 64-01-14T20:57:26.434617Z, 1976-02-14T12:42:33.088562Z, 1170-04-05T08:52:07.024696Z

Return ONLY this string, no other words!


In [24]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a datetime. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)

In [25]:
chain = prompt | llm | parser

In [26]:
output = chain.invoke({'query': 'when the America got discovered?'})

print(output)

1502-10-12 09:32:00
