## Output Parsing

Language models output text. But there are times where you want to get more structured information than just text back

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

- **Get format instructions**: A method which returns a string containing instructions for how the output of a language model should be formatted.
- **Parse**: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

- Output Parsing
    - StrOutputParser
    - JsonOutputParser
    - CSV Output Parser
    - Datatime Output Parser
    - Structured Output Parser (Pydanitc or Json)


### `Pydantinc` Output Parser

In [1]:
from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

In [6]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import (
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    ChatPromptTemplate,
    PromptTemplate
)

In [7]:
base_url = "http://localhost:11434"
model = "llama3.2"

In [8]:
llm = ChatOllama(
    base_url=base_url,
    model=model
)

In [9]:
from typing import Optional
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

In [27]:
class Joke(BaseModel):
    """Joke to tell user"""

    setup: str = Field(description="The setup of the joke.")
    punchline: str = Field(description="The punchline of the joke.")
    rating: Optional[int] = Field(description="The rating of the joke from 1 to 10.", default=None)

In [12]:
parser = PydanticOutputParser(pydantic_object=Joke)

In [13]:
parser

PydanticOutputParser(pydantic_object=<class '__main__.Joke'>)

In [14]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke.", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke.", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "The rating of the joke from 1 to 10.", "title": "Rating"}}, "required": ["setup", "punchline", "rating"]}
```


In [16]:
prompt = PromptTemplate(
    template="""
    Answer the user query with a joke. Here is your formatting instructions.
    {format_instruction}

    Query: {query}
    Answer:
    """,
    input_variables=['query'],
    partial_variables={
        'format_instruction': parser.get_format_instructions()
    }
)

In [21]:
chain = prompt | llm

output = chain.invoke(
    {'query': 'Tell me a joke about cats.'}
)

print(output.content)

{"setup": "Why did the cat join a band?", "punchline": "Because it wanted to be the purr-cussionist.", "rating": 8}


In [22]:
chain = prompt | llm | parser

output = chain.invoke(
    {'query': 'Tell me a joke about cats.'}
)

print(output)

setup='Why did the cat join a band?' punchline='Because it wanted to be the purr-cussionist!' rating=8


In [24]:
output

Joke(setup='Why did the cat join a band?', punchline='Because it wanted to be the purr-cussionist!', rating=8)

***

### Parsing with `.with_structured_output()` method
- This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes.
-  The schema can be specified as a TypedDict class, JSON Schema or a Pydantic class.


In [25]:
output = llm.invoke('Tell me a joke about the Programmers.')
print(output.content)

Why do programmers prefer dark mode?

Because light attracts bugs.


In [28]:
structured_llm = llm.with_structured_output(Joke)

output = structured_llm.invoke("Tell me a joke about the Programmers.")

print(output)

setup='Why do programmers prefer dark mode?' punchline='Because light attracts bugs!' rating=None


***

### `JSON` Output Parser

- Output parsers accept a string or BaseMessage as input and can return an arbitrary type.

In [29]:
from langchain_core.output_parsers import JsonOutputParser

In [30]:
parser = JsonOutputParser(pydantic_object=Joke)

In [32]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke.", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke.", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke from 1 to 10.", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [33]:
prompt = PromptTemplate(
    template="""
    Answer the user query with a joke. Here is your formatting instructions.
    {format_instruction}

    Query: {query}
    Answer:
    """,
    input_variables=['query'],
    partial_variables={
        'format_instruction': parser.get_format_instructions()
    }
)

chain = prompt | llm

output = chain.invoke(
    {'query': 'Tell me a joke about cats.'}
)

print(output.content)

{"setup": "Why did the cat join a band?", "punchline": "Because it wanted to be the purr-cussionist!", "rating": 8}


In [34]:
chain = prompt | llm | parser

output = chain.invoke(
    {'query': 'Tell me a joke about cats.'}
)

print(output)

{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist!', 'rating': 8}


***

### CSV Output Parser

- This output parser can be used when you want to return a list of comma-separated items.

In [35]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

In [36]:
parser = CommaSeparatedListOutputParser()

print(parser.get_format_instructions())

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [39]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a list of values. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)   

In [40]:
chain = prompt | llm

output = chain.invoke({'query': 'generate my website seo keywords. I have content about the NLP and LLM.'})

print(output.content)

nlp, llm, artificial intelligence, machine learning, natural language processing, deep learning, language model, text analysis, sentiment analysis, chatbots, human-computer interaction, semantic search, information retrieval, content generation


In [41]:
chain = prompt | llm | parser

output = chain.invoke({'query': 'generate my website seo keywords. I have content about the NLP and LLM.'})

print(output)

['nlp', 'llm', 'artificial intelligence', 'natural language processing', 'machine learning', 'deep learning', 'language models', 'sentiment analysis', 'text analysis', 'language understanding', 'conversational ai', 'chatbots', 'language generation', 'content optimization', 'search engine marketing', 'seo', 'digital marketing', 'innovation', 'technology']


***

### Datatime Output Parser

- Gives output in datetime format. Sometimes throws error if the LLM output is not in datetime format.

In [42]:
from langchain.output_parsers import DatetimeOutputParser

In [43]:
parser = DatetimeOutputParser()

format_instruction = parser.get_format_instructions()
print(format_instruction)

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 1093-11-11T08:50:23.962357Z, 1479-03-13T06:14:21.312531Z, 1255-07-30T14:18:15.282860Z

Return ONLY this string, no other words!


In [44]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a datetime. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)

In [46]:
chain = prompt | llm

output = chain.invoke({'query': 'When India got independence ?'})

print(output.content)

1947-08-15T18:00:03.000000Z


In [48]:
chain = prompt | llm | parser

output = chain.invoke({'query': 'When India got independence ?'})

print(output)

1947-08-15 18:43:00
