## Output Parsing

Language models output text. But there are times where you want to get more structured information than just text back

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

- **Get format instructions**: A method which returns a string containing instructions for how the output of a language model should be formatted.
- **Parse**: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

- Output Parsing
    - StrOutputParser
    - JsonOutputParser
    - CSV Output Parser
    - Datatime Output Parser
    - Structured Output Parser (Pydanitc or Json)


### `Pydantinc` Output Parser

In [None]:
from dotenv import load_dotenv

load_dotenv('./../env')

# langfuse or opik

In [1]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import (
                                        SystemMessagePromptTemplate,
                                        HumanMessagePromptTemplate,
                                        ChatPromptTemplate,
                                        PromptTemplate
                                        )

base_url = "http://localhost:11434"
model = 'llama3.2:1b'

llm = ChatOllama(base_url=base_url, model=model)

In [2]:
from typing import  Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser


In [3]:
class Joke(BaseModel):
    """Joke to tell user"""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline of the joke")
    rating: Optional[int] = Field(description="The rating of the joke is from 1 to 10", default=None)

In [4]:
parser = PydanticOutputParser(pydantic_object=Joke)

In [5]:
instruction = parser.get_format_instructions()

In [6]:
print(instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [7]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm

In [8]:
output = chain.invoke({'query': 'Tell me a joke about the cat'})

In [9]:
print(output.content)

{
  "description": "Why did the cat join a band? Because it wanted to be the purr-cussionist!",
  "properties": {
    "setup": "The cat went to the music store and asked for a new guitar.",
    "punchline": "And then it learned how to play 'Whisker Waltz'.",
    "rating": "9"
  },
  "required": ["setup", "punchline"]
}


In [10]:
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a joke about the dogs'})
print(output)

OutputParserException: Failed to parse Joke from completion {"properties": {"setup": "Why did the dog go to the vet?", "punchline": "Because he was feeling ruff!", "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}. Got: 2 validation errors for Joke
setup
  Field required [type=missing, input_value={'properties': {'setup': ... ['setup', 'punchline']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
punchline
  Field required [type=missing, input_value={'properties': {'setup': ... ['setup', 'punchline']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE 

### Parsing with `.with_structured_output()` method
- This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes.
-  The schema can be specified as a TypedDict class, JSON Schema or a Pydantic class.


In [None]:
output = llm.invoke('Tell me a joke about the cat')
print(output.content)

In [None]:
structured_llm = llm.with_structured_output(Joke)

In [None]:
output = structured_llm.invoke('Tell me a joke about the cat')
print(output)

### `JSON` Output Parser

- Output parsers accept a string or BaseMessage as input and can return an arbitrary type.



In [None]:
from langchain_core.output_parsers import JsonOutputParser

In [None]:
parser = JsonOutputParser(pydantic_object=Joke)
print(parser.get_format_instructions())

In [None]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm
output = chain.invoke({'query': 'Tell me a joke about the cat'})
print(output.content)

In [None]:
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a joke about the cat'})
print(output)

### CSV Output Parser

- This output parser can be used when you want to return a list of comma-separated items.



In [None]:
# value1, values2, values3, so on

from langchain_core.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

print(parser.get_format_instructions())

In [None]:
format_instruction = parser.get_format_instructions()

prompt = PromptTemplate(
    template='''
    Answer the user query with a list of values. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)   

In [None]:
chain = prompt | llm | parser

output = chain.invoke({'query': 'generate my website seo keywords. I have content about the NLP and LLM.'})
print(output)

### Datatime Output Parser

- Gives output in datetime format. Sometimes throws error if the LLM output is not in datetime format.

In [None]:
from langchain.output_parsers import DatetimeOutputParser

In [None]:
parser = DatetimeOutputParser()

format_instruction = parser.get_format_instructions()
print(format_instruction)

In [None]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a datetime. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)

In [None]:
chain = prompt | llm | parser

In [None]:
output = chain.invoke({'query': 'when the America got discovered?'})

print(output)