### 1. Output Parsers

#### 1-1. PydanticOutputParser

- Generate output in a JSON format

- Treat parser's output as a list -> be possible to index through the results

In [15]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator, field_validator
from typing import List

from dotenv import load_dotenv
load_dotenv(".env", override=True)


# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(discriminator="list of substitue words based on context")

    # Throw error in case of receiving a numbered-list from API
    @field_validator('words')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

parser = PydanticOutputParser(pydantic_object=Suggestions)

In [17]:
from langchain.prompts import PromptTemplate

template = """
Offer a list of suggestions to substitue the specified target_word based the presented context.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model_input = prompt.format_prompt(
			target_word="behaviour",
			context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

In [18]:
from langchain.llms import OpenAI

# Before executing the following code, make sure to have
# your OpenAI key saved in the “OPENAI_API_KEY” environment variable.
model = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.0)

output = model(model_input.to_string())

parser.parse(output)

  warn_deprecated(
  warn_deprecated(


Suggestions(words=['conduct', 'manage', 'handle', 'oversee', 'facilitate'])

**Multiple Outputs Example**

In [19]:
template = """
Offer a list of suggestions to substitute the specified target_word based on the presented context and the reasoning for each word.
{format_instructions}
target_word={target_word}
context={context}
"""

In [23]:
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")
    
    @field_validator('words')
    def not_start_with_number(cls, field):
      for item in field:
        if item[0].isnumeric():
          raise ValueError("The word can not start with numbers!")
      return field
    
    @field_validator('reasons')
    def end_with_dot(cls, field):
      for idx, item in enumerate( field ):
        if item[-1] != ".":
          field[idx] += "."
      return field
parser = PydanticOutputParser(pydantic_object=Suggestions)

In [25]:
prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model_input = prompt.format_prompt(
			target_word="behaviour",
			context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

In [28]:
model = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.0)

output = model(model_input.to_string())

parser.parse(output)

Suggestions(words=['conduct', 'manage', 'handle', 'oversee'], reasons=["These words all imply a sense of control and authority, which is lacking in the original context. They also suggest a more active role in guiding the students' actions."])

#### 1-2. CommaSeparatedOutputParser

-  It manages comma-separated outputs

-  It handles one specific case: anytime you want to receive a list of outputs from the model.

In [30]:
from langchain.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

In [31]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

# Prepare the Prompt
template = """
Offer a list of suggestions to substitute the word '{target_word}' based the presented the following text: {context}.
{format_instructions}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model_input = prompt.format(
  target_word="behaviour",
  context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

# Loading OpenAI API
model = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.0)

# Send the Request
output = model(model_input)
parser.parse(output)

['1. Conduct\n2. Manner\n3. Demeanor\n4. Conducting\n5. Attitude\n6. Conductance\n7. Deportment\n8. Etiquette\n9. Performance\n10. Actions']

#### 1-3. StructuredOutputParser

This is the first output parser implemented by the LangChain team. While it can process multiple outputs, it only supports texts and does not provide options for other data types, such as lists or integers. It can be used when you want to receive one response from the model. For example, only one substitute word in the thesaurus application.

In [None]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

response_schemas = [
    ResponseSchema(name="words", description="A substitue word based on context"),
    ResponseSchema(name="reasons", description="the reasoning of why this word fits the context.")
]

parser = StructuredOutputParser.from_response_schemas(response_schemas)

### 2. Fixing Errors

#### 2-1. OutputFixingParser

This method tries to fix the parsing error by looking at the model’s response and the previous parser. It uses a Large Language Model (LLM) to solve the issue.

In [38]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'

parser.parse(missformatted_output)

ValidationError: 1 validation error for Suggestions
reasons
  Field required [type=missing, input_value={'words': ['conduct', 'ma...particular situation.']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing

As you can see in the error message, the parser correctly identified an error in our sample response (missformatted_output) since we used the word **reasoning** instead of the expected **reasons** key.

The OutputFixingParser class could easily fix this error.

In [41]:
from langchain.llms import OpenAI
from langchain.output_parsers import OutputFixingParser

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'


model = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.0)

outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=model)
outputfixing_parser.parse(missformatted_output)

ValidationError: 1 validation error for Suggestions
reasons
  Field required [type=missing, input_value={'words': ['conduct', 'ma...particular situation.']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing