## Output Parsing

Language models output text. But there are times where you want to get more structured information than just text back

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

- **Get format instructions**: A method which returns a string containing instructions for how the output of a language model should be formatted.
- **Parse**: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

- Output Parsing
    - StrOutputParser
    - JsonOutputParser
    - CSV Output Parser
    - Datatime Output Parser
    - Structured Output Parser (Pydanitc or Json)


### `Pydantinc` Output Parser

In [52]:
from dotenv import load_dotenv

load_dotenv('./../.env')

# langfuse or opik

True

In [53]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import (
                                        SystemMessagePromptTemplate,
                                        HumanMessagePromptTemplate,
                                        ChatPromptTemplate,
                                        PromptTemplate
                                        )

base_url = "http://localhost:11434"
model = 'llama3.2:3b'

llm = ChatOllama(base_url=base_url, model=model)

In [54]:
from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

In [55]:
class Joke(BaseModel):
  """Joke to tell user"""
  setup: str = Field(description="Setup of the joke")
  punchline: str = Field(description="Punchline of the joke")
  rating: Optional[int] = Field(description="Rating of the joke is from 1 to 10")

In [56]:
joke_data = {
    "setup": "Why did the chicken cross the road?",
    "punchline": "To get to the other side!",
    "rating": 8
}
joke = Joke(**joke_data)  # Esto crea una instancia de Joke y valida los datos.

In [57]:
joke

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!', rating=8)

In [58]:
parser = PydanticOutputParser(pydantic_object=Joke)

parser

PydanticOutputParser(pydantic_object=<class '__main__.Joke'>)

In [59]:
instructions = parser.get_format_instructions()
print(instructions)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "Setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "Punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "Rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline", "rating"]}
```


In [60]:
prompt = PromptTemplate(
  template = '''
  Answer the query with a joke. Here is the formatting instructions: {format_instructions}
  
  Query: {query}
  Answer:''',
  input_variables = ['query'],
  partial_variables = {'format_instructions': parser.get_format_instructions()})

chain = prompt | llm

In [61]:
output = chain.invoke({'query': "Tell me a joke about developers"})

print(output.content)

{"setup": "Why do developers love stack Overflow?", "punchline": "Because it's the only place where they can always find the issue!", "rating": null}


In [70]:
chain = prompt | llm | parser
output = chain.invoke({'query': "Tell me a joke about dogs"})
print(output)

setup='Why did the dog go to the vet?' punchline='Because he was feeling ruff!' rating=8


### Parsing with `.with_structured_output()` method

In [73]:
output = llm.invoke("Tell me a joke about dogs")
print(output.content)

Why did the dog go to the vet?

Because he was feeling ruff!

(I hope that one made you howl with laughter!)


In [74]:
structured_llm = llm.with_structured_output(Joke)

In [75]:
output = structured_llm.invoke("Tell me a joke about dogs")
print(output)

setup='Why did the dog go to the vet?' punchline='Because he was feeling ruff!' rating=8


### `JSON` Output Parser

In [76]:
from langchain_core.output_parsers import JsonOutputParser

In [79]:
parser = JsonOutputParser(pydantic_object=Joke)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "Setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "Punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "Rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline", "rating"]}
```


In [80]:
prompt = PromptTemplate(
  template = '''
  Answer the query with a joke. Here is the formatting instructions: {format_instructions}
  
  Query: {query}
  Answer:''',
  input_variables = ['query'],
  partial_variables = {'format_instructions': parser.get_format_instructions()})

chain = prompt | llm
output = chain.invoke({'query': "Tell me a joke about dogs"})
print(output.content)

{"setup": "Why did the dog go to the vet?", "punchline": "Because he was feeling ruff!", "rating": 8}


In [81]:
chain = prompt | llm | parser
output = chain.invoke({'query': "Tell me a joke about dogs"})
print(output)

{'setup': 'Why did the dog go to the vet?', 'punchline': 'Because he was feeling ruff!', 'rating': 8}


### CSV Output Parser

In [82]:
# value1, value2, value3, so on
from langchain_core.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

print(parser.get_format_instructions())

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [85]:
format_instructions = parser.get_format_instructions()
prompt = PromptTemplate(
  template = '''
  Answer the query with a list of items separated by commas. Here is the formatting instructions: {format_instructions}
  
  Query: {query}
  Answer:''',
  input_variables = ['query'],
  partial_variables = {'format_instructions': format_instructions})

In [86]:
chain = prompt | llm | parser

output = chain.invoke({'query': "Tell me a list of animals"})

print(output)

['dog', 'cat', 'elephant', 'lion', 'tiger', 'bear', 'monkey', 'giraffe', 'zebra', 'kangaroo']


### Datetime Output Parser

In [94]:
from langchain.output_parsers import DatetimeOutputParser

parser = DatetimeOutputParser()
format_instructions = parser.get_format_instructions()

print(format_instructions)

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0567-05-09T14:25:04.494270Z, 1751-10-06T11:00:09.319664Z, 1825-04-18T07:43:36.735641Z

Return ONLY this string, no other words!


In [106]:
format_instructions = parser.get_format_instructions()
prompt = PromptTemplate(
  template = '''
  Answer the query with a date. Here is the formatting instructions: {format_instructions}
  
  Query: {query}
  Answer:''',
  input_variables = ['query'],
  partial_variables = {'format_instructions': format_instructions})


In [107]:
chain = prompt | llm | parser

In [111]:
output = chain.invoke({'query': "When was America discovered?"})
print(output)

1492-08-10 09:00:00
