# Parsers Detailed
While the language models can only generate textual outputs, a predictable data structure is always preferred in a production environment. The Output Parsers help create a data structure to define the expectations from the output precisely. We can ask for a list of words in case of the word suggestion application or a combination of different variables like a word and the explanation of why it fits. The parser can extract the expected information for you.

Throughout this notebook, we will work on a thesaurus application that has to generate a list of possible substitute words based on the context.

---

## 1. Output Parsers

### 1.1 `PydanticOutputParser`
This class instructs the model to generate its output in a JSON format and then extract the information from the response. This class uses the Pydantic library, which helps define and validate data structures in Python. It enables us to characterize the expected output with a name, type, and description.

We need a variable that can store multiple suggestions in the thesaurus example. It can be easily done by defining a class that inherits from the Pydantic’s `BaseModel` class.

In [1]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")

    # Throw error in case of receiving a numbered-list from API
    @validator("words")
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field


parser = PydanticOutputParser(pydantic_object=Suggestions)



In [2]:
from langchain.prompts import PromptTemplate

template = """
Offer a list of suggestions to substitue the specified target_word based on the presented context.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],  # initialized later
    partial_variables={
        "format_instructions": parser.get_format_instructions()
    },  # initialized instantly
)

model_input = prompt.format_prompt(
    target_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.",
)

In [3]:
from langchain.chat_models import AzureChatOpenAI

model = AzureChatOpenAI(deployment_name="gpt4", temperature=0.0)

output = model.predict(model_input.to_string())

parser.parse(output)

Suggestions(words=['conduct', 'actions', 'attitude', 'manner', 'demeanor'])

#### Multiple Outputs Example
Here is a sample code for Pydantic class to process multiple outputs. It requests the model to suggest a list of words and present the reasoning behind each proposition.

In [4]:
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(
        description="the reasoning of why this word fits the context"
    )

    @validator("words")
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

    # manipulates the output to ensure every reasoning ends with a dot.
    @validator("reasons")
    def end_with_dot(cls, field):
        for idx, item in enumerate(field):
            if item[-1] != ".":
                field[idx] += "."
        return field


parser = PydanticOutputParser(pydantic_object=Suggestions)

In [5]:
template = """
Offer a list of suggestions to substitute the specified target_word based \
on the presented context and the reasoning for each word.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],  # initialized later
    partial_variables={
        "format_instructions": parser.get_format_instructions()
    },  # initialized instantly
)

# .format_prompt() returns PromptValue. Need to use .to_string() to convert it into string.
model_input = prompt.format_prompt(
    target_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.",
)

In [6]:
output = model.predict(model_input.to_string())

parser.parse(output)

Suggestions(words=['conduct', 'actions', 'attitude', 'manner', 'demeanor'], reasons=['Conduct refers to the way people behave in a particular situation, which fits the context of students in a classroom.', 'Actions can describe the specific things the students were doing that were disruptive.', 'Attitude refers to the general disposition or mindset of the students, which could be causing the disruption.', 'Manner describes the way in which the students were behaving, which is relevant to the context.', 'Demeanor refers to the outward behavior or bearing of the students, which can be disruptive in a classroom setting.'])

In [7]:
# Check the prompt to the model
model_input.to_string()

'\nOffer a list of suggestions to substitute the specified target_word based on the presented context and the reasoning for each word.\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"words": {"title": "Words", "description": "list of substitue words based on context", "type": "array", "items": {"type": "string"}}, "reasons": {"title": "Reasons", "description": "the reasoning of why this word fits the context", "type": "array", "items": {"type": "string"}}}, "required": ["words", "reasons"]}\n```\ntarget_word=behaviour\ncontext=The behaviour of the students in the c

### 1.2 `CommaSeparatedOutputParser`
It is evident from the name of this class that it manages comma-separated outputs. It handles one specific case: **anytime you want to receive a list of outputs from the model**. However, requesting additional reasoning information using the `CommaSeparatedOutputParser` class is impossible.

In [8]:
from langchain.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

In [9]:
# Prepare the Prompt
template = """
Offer a list of suggestions to substitute the word '{target_word}' \
based on the presented following text: {context}.
{format_instructions}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# .format() returns string. No need to use .to_string().
model_input = prompt.format(
    target_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.",
)

# Send the Request
output = model.predict(model_input)
parser.parse(output)

['conduct',
 'demeanor',
 'actions',
 'attitude',
 'manners',
 'performance',
 'deportment',
 'bearing']

### 1.3 `StructuredOutputParser`
While it can process multiple outputs, it is not as rich as `PydanticOutputParser`. It results in json output.

The code below demonstrates how to define a schema. This class lies in between `PydanticOutputParser` class which provides validation and more flexibility for more complex tasks, and the `CommaSeparatedOutputParser` class which covers more straightforward applications.

In [10]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

response_schemas = [
    ResponseSchema(
        name="word", description="A list of substitue words based on context"
    ),
    ResponseSchema(
        name="reason",
        description="A list of reasoning about why word fits the context.",
    ),
]

parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [11]:
template = """
Offer a list of suggestions to substitute the specified target_word based \
on the presented context and the reasoning for each word in the following format:

word: list the substitute words here as a comma-separated Python list.
reason: list the reasoning for each word here as a comma-separated Python list.

target_word: {target_word}
context: {context}

{format_instructions}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],  # initialized later
    partial_variables={
        "format_instructions": parser.get_format_instructions()
    },  # initialized instantly
)

# .format_prompt() returns PromptValue. Need to use .to_string() to convert it into string.
model_input = prompt.format_prompt(
    target_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.",
)

# Send the Request
output = model.predict(model_input.to_string())
parser.parse(output)

{'word': ['conduct', 'actions', 'attitude', 'manner', 'demeanor'],
 'reason': ["similar meaning in the context of students' actions",
  'refers to the way students act',
  "describes the students' mindset and actions",
  'refers to the way students behave',
  "describes the students' outward appearance and actions"]}

## 2. Fixing Errors
The parsers are powerful tools to dynamically extract the information from the prompt and validate it to some extent. Still, they do not guarantee a response. Imagine a situation where you deployed your application, and the model’s response [to a user’s request] is incomplete, causing the parser to throw an error. It is not ideal! In the following subsections, we will introduce two classes acting as fail-safe. They add a layer on top of the model’s response to help fix the errors.

> Note: The best practice to incorporate these techniques in production is to catch the parsing error using a `try: ... except: ...` method. It means we can capture the errors in the `except` section and attempt to fix them using the mentioned classes. It will limit the number of API calls and avoid unnecessary costs that are associated with it.

### 2.1 `OutputFixingParser`
This method tries to fix the parsing error by looking at the model’s response and the previous parser. It uses a Large Language Model (LLM) to solve the issue.

#### Create the error
The code below will generate an error in the last line because we used the word `reasoning` in `missformatted_output` instead of the expected `reasons` key from `Suggestions` class.

In [None]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(
        description="the reasoning of why this word fits the context"
    )


parser = PydanticOutputParser(pydantic_object=Suggestions)

missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'

parser.parse(missformatted_output)

#### Fix the error

In [13]:
from langchain.output_parsers import OutputFixingParser

outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=model)
outputfixing_parser.parse(missformatted_output)

Suggestions(words=['conduct', 'manner'], reasons=['refers to the way someone acts in a particular situation.', 'refers to the way someone behaves in a particular situation.'])

The `from_llm()` function takes the old parser and a language model as input parameters. Then, It initializes a new parser for you that has the ability to fix output errors. In this case, it successfully identified the misnamed key and changed it to what we defined.

However, fixing the issues using this class is not always possible. And, in such scenarios we should use `RetryOutputParser` class.

### 2.2 `RetryOutputParser`
In some cases, the parser needs access to both the output and the prompt to process the full context.

In [17]:
template = """
Offer a list of suggestions to substitute the specified target_word based \
on the presented context and the reasoning for each word in the following format:

word: list the substitute words here as a comma-separated Python list.
reason: list the reasoning for each word here as a comma-separated Python list.

target_word: {target_word}
context: {context}

{format_instructions}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],  # initialized later
    partial_variables={
        "format_instructions": parser.get_format_instructions()
    },  # initialized instantly
)

# .format_prompt() returns PromptValue. Need to use .to_string() to convert it into string.
model_input = prompt.format_prompt(
    target_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.",
)

In [18]:
from langchain.output_parsers import RetryWithErrorOutputParser

missformatted_output = '{"words": ["conduct", "manner"]}'

retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=model)

retry_parser.parse_with_prompt(missformatted_output, model_input)

Suggestions(words=['conduct', 'manner', 'attitude'], reasons=["'conduct' refers to the way people behave in a particular situation", "'manner' refers to the way people act or behave", "'attitude' refers to the way people display their feelings or thoughts through their actions"])