# The Complete LangChain and LLM Guide
## Module 5: LangChain Parsers

LangChain parsers, specifically **output parsers**, are tools for extracting structured information from the plain text output of a language model. They convert the raw string response from a large language model (LLM) into a more useful data format, such as a JSON object, a list, or a Pydantic object. Essentially, they serve as a bridge between the unstructured text generated by an LLM and the structured data needed by other components of your application.

-----

## Why are They Necessary?

An LLM's output is typically a single, unstructured string. While this is fine for basic text generation, it's problematic when you need the LLM to return data that your program can manipulate, such as a list of items, a set of key-value pairs, or a boolean value. For example, if you ask an LLM to list the top three best-selling books of all time, the output might be:

`1. Don Quixote`
`2. A Tale of Two Cities`
`3. The Lord of the Rings`

A program can't easily iterate over this list or retrieve a specific title. An output parser solves this by taking that string and converting it into a structured format like `["Don Quixote", "A Tale of Two Cities", "The Lord of the Rings"]`, which your code can easily work with.

-----

## Common Types of Parsers

LangChain provides several built-in output parsers to handle different data structures:

  * **`StrOutputParser`**: The simplest parser, it just returns the raw string output from the LLM. It's often used as a default or for basic chaining.
  * **`CommaSeparatedListOutputParser`**: This parser takes a comma-separated string and converts it into a list of strings.
  * **`JsonOutputParser`**: This is a powerful parser that extracts a valid JSON object from the LLM's output. It's used for scenarios where you need structured data with keys and values.
  * **`PydanticOutputParser`**: This parser is a more robust alternative to the JSON parser. It works with a Pydantic model you define. It not only extracts the data but also validates it against the Pydantic schema, ensuring the data types are correct and the structure is as expected. This is very useful for building production-grade applications.

-----

## How They are Used

Output parsers are typically integrated into a LangChain Expression Language (LCEL) chain. The general flow is:

1.  A **prompt** is defined to instruct the LLM on the desired output format (e.g., "Respond in a comma-separated list" or "Provide the answer as a JSON object with 'name' and 'age' fields").
2.  The **language model** (LLM) generates a raw string response based on the prompt.
3.  The **output parser** is chained after the LLM. It takes the LLM's raw string and transforms it into the desired structured object.

This chaining mechanism ensures that the final output of the entire process is a structured object, ready for further use in your application.

**Example Chain:**

```python
prompt = ChatPromptTemplate.from_template("List three programming languages as a comma-separated list.")
model = ChatOpenAI()
parser = CommaSeparatedListOutputParser()

chain = prompt | model | parser

result = chain.invoke({}) # Example output: ['Python', 'JavaScript', 'Java']
```


In [1]:
import os
from dotenv import find_dotenv, load_dotenv

In [2]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

In [3]:
def get_openai_key():
    load_dotenv(find_dotenv())
    key = os.getenv("OPENAI_API_KEY")
    if not key:
        raise EnvironmentError("OPENAI_API_KEY is missing.")
    return key

In [4]:
openai_api_key = get_openai_key()

In [5]:
llm_model = "gpt-5-nano"

In [6]:
chat_llm = ChatOpenAI(api_key=openai_api_key, model = llm_model)

In [7]:
email_response = """
Here's our itinerary for our upcoming trip to Europe.
We leave from Denver, Colorado airport at 8:45 pm, and arrive in Amsterdam 10 hours later
at Schipol Airport.
We'll grab a ride to our airbnb and maybe stop somewhere for breakfast before 
taking a nap.

Some sightseeing will follow for a couple of hours. 
We will then go shop for gifts 
to bring back to our children and friends.  

The next morning, at 7:45am we'll drive to to Belgium, Brussels - it should only take aroud 3 hours.
While in Brussels we want to explore the city to its fullest - no rock left unturned!

"""

In [8]:
email_template = """
From the following email, extract the following information:

leave_time: when are they leaving for vacation to Europe. If there's an actual
time written, use it, if not write unknown.

leave_from: where are they leaving from, the airport or city name and state if
available.

cities_to_visit: extract the cities they are going to visit. 
If there are more than one, put them in square brackets like '["cityone", "citytwo"].

Format the output as JSON with the following keys:
leave_time
leave_from
cities_to_visit

email: {email}
"""


In [9]:
# Create the ChatPromptTemplate using from_template
prompt = ChatPromptTemplate.from_template(email_template)
print(prompt)

input_variables=['email'] input_types={} partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['email'], input_types={}, partial_variables={}, template='\nFrom the following email, extract the following information:\n\nleave_time: when are they leaving for vacation to Europe. If there\'s an actual\ntime written, use it, if not write unknown.\n\nleave_from: where are they leaving from, the airport or city name and state if\navailable.\n\ncities_to_visit: extract the cities they are going to visit. \nIf there are more than one, put them in square brackets like \'["cityone", "citytwo"].\n\nFormat the output as JSON with the following keys:\nleave_time\nleave_from\ncities_to_visit\n\nemail: {email}\n'), additional_kwargs={})]


In [10]:
formatted_prompt = prompt.format(email = email_response)
formatted_prompt

'Human: \nFrom the following email, extract the following information:\n\nleave_time: when are they leaving for vacation to Europe. If there\'s an actual\ntime written, use it, if not write unknown.\n\nleave_from: where are they leaving from, the airport or city name and state if\navailable.\n\ncities_to_visit: extract the cities they are going to visit. \nIf there are more than one, put them in square brackets like \'["cityone", "citytwo"].\n\nFormat the output as JSON with the following keys:\nleave_time\nleave_from\ncities_to_visit\n\nemail: \nHere\'s our itinerary for our upcoming trip to Europe.\nWe leave from Denver, Colorado airport at 8:45 pm, and arrive in Amsterdam 10 hours later\nat Schipol Airport.\nWe\'ll grab a ride to our airbnb and maybe stop somewhere for breakfast before \ntaking a nap.\n\nSome sightseeing will follow for a couple of hours. \nWe will then go shop for gifts \nto bring back to our children and friends.  \n\nThe next morning, at 7:45am we\'ll drive to to

In [11]:
# Directly invoke the language model with the formatted prompt
response = chat_llm.invoke(formatted_prompt)

In [12]:
print(response)
print("-----------------------------")
print(response.content)

content='{\n  "leave_time": "8:45 pm",\n  "leave_from": "Denver, Colorado",\n  "cities_to_visit": ["Amsterdam", "Brussels"]\n}' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 813, 'prompt_tokens': 261, 'total_tokens': 1074, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 768, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-nano-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-C6fIvq87ZJQGEEqL0pJ1oXXP247Bn', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='run--3d2f9daf-2fd4-4e17-aeda-dcfddae51f23-0' usage_metadata={'input_tokens': 261, 'output_tokens': 813, 'total_tokens': 1074, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 768}}
-----------------------------
{
  "leave_time": "8:45 pm",
  "leave_from": "Denver, Colorado",
  "

### Example 1 - with LangChain PydanticOutputParser

In [41]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import List, Optional

In [42]:
# First, you'll create a Pydantic model to define the structure of the data you want the language model to generate.
class TravelInfo(BaseModel):
    leave_time: str = Field(
        description="When they are leaving. It's usually a numerical time of the day. If not available write n/a"
    )
    leave_from: str = Field(
        description="Where are they leaving from. It's a city, airport or state, or province"
    )
    cities_to_visit: List[str] = Field(
        description="The cities, towns they will be visiting on their trip. This needs to be in a list"
    )

In [54]:
# Setup the output parser
output_parser = PydanticOutputParser(pydantic_object=TravelInfo)
output_parser

PydanticOutputParser(pydantic_object=<class '__main__.TravelInfo'>)

In [55]:
format_instructions = output_parser.get_format_instructions()
format_instructions

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"leave_time": {"description": "When they are leaving. It\'s usually a numerical time of the day. If not available write n/a", "title": "Leave Time", "type": "string"}, "leave_from": {"description": "Where are they leaving from. It\'s a city, airport or state, or province", "title": "Leave From", "type": "string"}, "cities_to_visit": {"description": "The cities, towns they will be visiting on their trip. This needs to be in a list", "items": {"type": "string"}, "title": "Cities To Visit", "type": "array"}}, "required": [

In [45]:
email_response = """
Here's our itinerary for our upcoming trip to Europe.
We leave from Denver, Colorado airport at 8:45 pm, and arrive in Amsterdam 10 hours later
at Schipol Airport.
We'll grab a ride to our airbnb and maybe stop somewhere for breakfast before 
taking a nap.

Some sightseeing will follow for a couple of hours. 
We will then go shop for gifts 
to bring back to our children and friends.  

The next morning, at 7:45am we'll drive to to Belgium, Brussels - it should only take aroud 3 hours.
While in Brussels we want to explore the city to its fullest - no rock left unturned!

"""

In [46]:
email_template = """
From the following email, extract the following information:
leave_time: when are they leaving for vacation to Europe. If there's an actual
time written, use it, if not write unknown.
leave_from: where are they leaving from, the airport or city name and state if
available.
cities_to_visit: extract the cities they are going to visit as a list of strings.

email: {email}

{format_instructions}
"""


In [48]:
prompt = ChatPromptTemplate.from_template(template=email_template)
prompt

ChatPromptTemplate(input_variables=['email', 'format_instructions'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['email', 'format_instructions'], input_types={}, partial_variables={}, template="\nFrom the following email, extract the following information:\nleave_time: when are they leaving for vacation to Europe. If there's an actual\ntime written, use it, if not write unknown.\nleave_from: where are they leaving from, the airport or city name and state if\navailable.\ncities_to_visit: extract the cities they are going to visit as a list of strings.\n\nemail: {email}\n\n{format_instructions}\n"), additional_kwargs={})])

In [49]:
messages = prompt.format_messages(email=email_response,
                                  format_instructions=format_instructions)
messages

[HumanMessage(content='\nFrom the following email, extract the following information:\nleave_time: when are they leaving for vacation to Europe. If there\'s an actual\ntime written, use it, if not write unknown.\nleave_from: where are they leaving from, the airport or city name and state if\navailable.\ncities_to_visit: extract the cities they are going to visit as a list of strings.\n\nemail: \nHere\'s our itinerary for our upcoming trip to Europe.\nWe leave from Denver, Colorado airport at 8:45 pm, and arrive in Amsterdam 10 hours later\nat Schipol Airport.\nWe\'ll grab a ride to our airbnb and maybe stop somewhere for breakfast before \ntaking a nap.\n\nSome sightseeing will follow for a couple of hours. \nWe will then go shop for gifts \nto bring back to our children and friends.  \n\nThe next morning, at 7:45am we\'ll drive to to Belgium, Brussels - it should only take aroud 3 hours.\nWhile in Brussels we want to explore the city to its fullest - no rock left unturned!\n\n\n\nThe 

In [50]:
response = chat_llm.invoke(messages)

In [51]:
output_dict = output_parser.parse(response.content)  # parse into Pydantic object
output_dict

TravelInfo(leave_time='8:45 pm', leave_from='Denver, Colorado airport', cities_to_visit=['Amsterdam', 'Brussels'])

In [53]:
print(type(output_dict))
print(f"Cities: {output_dict.cities_to_visit}")

<class '__main__.TravelInfo'>
Cities: ['Amsterdam', 'Brussels']


### Example 2 - with LangChain PydanticOutputParser

In [57]:
from pydantic import field_validator

In [58]:
email_response = """
Here's our itinerary for our upcoming trip to Europe.
There will be 5 of us on this vacation trip.
We leave from Denver, Colorado airport at 8:45 pm, and arrive in Amsterdam 10 hours later
at Schipol Airport.
We'll grab a ride to our airbnb and maybe stop somewhere for breakfast before 
taking a nap.

Some sightseeing will follow for a couple of hours. 
We will then go shop for gifts 
to bring back to our children and friends.  

The next morning, at 7:45am we'll drive to to Belgium, Brussels - it should only take aroud 3 hours.
While in Brussels we want to explore the city to its fullest - no rock left unturned!

"""

In [59]:
# Redefine the class TravelInfo including the new field
class TravelInfo(BaseModel):
    leave_time: str = Field(
        description="When they are leaving. It's usually a numerical time of the day. If not available write n/a"
    )
    leave_from: str = Field(
        description="Where are they leaving from. It's a city, airport or state, or province"
    )
    cities_to_visit: List[str] = Field(
        description="The cities, towns they will be visiting on their trip. This needs to be in a list"
    )
    num_people: int = Field(description="this is an integer for a number of people on this trip")
    
    # Modern Pydantic v2 validation syntax
    @field_validator('num_people')
    @classmethod
    def check_num_people(cls, field: int) -> int:
        if field <= 0:
            raise ValueError("Number of people must be greater than 0")
        return field
    
    @field_validator('cities_to_visit')
    @classmethod
    def check_cities_not_empty(cls, field: List[str]) -> List[str]:
        if not field:
            raise ValueError("At least one city must be specified")
        return field

#### Explanation: why does Pydantic requires a class method?
cls stands for "class" and represents the class itself (not an instance of the class). In this example, cls refers to the VacationInfo class.
Explanation of the concepts:

cls vs self:

self = refers to a specific instance of the class (an object)
cls = refers to the class itself (the blueprint)

```Python
class VacationInfo(BaseModel):
    num_people: int
    
    @classmethod  # This is a class method
    def check_num_people(cls, field: int) -> int:
        print(f"cls is: {cls}")  # cls = <class 'VacationInfo'>
        print(f"cls.__name__ is: {cls.__name__}")  # cls.__name__ = 'VacationInfo'
        if field <= 0:
            raise ValueError("Number of people must be greater than 0")
        return field
    
    def regular_method(self):  # This is an instance method
        print(f"self is: {self}")  # self = VacationInfo(num_people=3)
        print(f"type(self) is: {type(self)}")  # type(self) = <class 'VacationInfo'>
```
Why @classmethod is used:

@classmethod tells Python this method belongs to the class, not to individual instances
The first parameter is automatically the class itself (conventionally named cls)

Why Pydantic v2 requires @classmethod:
In Pydantic v2, field validators are called during class definition time, before any instances are created. Therefore:

The validator needs access to the class itself (cls)
It doesn't have access to individual instances (self) because none exist yet
The validation happens when the class is being constructed, not when instances are created

In [60]:
# Setup parser and inject the instructions
pydantic_parser = PydanticOutputParser(pydantic_object=TravelInfo)

In [61]:
format_instructions = pydantic_parser.get_format_instructions()

In [62]:
email_template = """
From the following email, extract the following information:

leave_time: when are they leaving for vacation to Europe. If there's an actual
time written, use it, if not write unknown.

leave_from: where are they leaving from, the airport or city name and state if
available.

cities_to_visit: extract the cities they are going to visit. If there are more than 
one, put them in square brackets like '["cityone", "citytwo"].

Format the output as JSON with the following keys:
leave_time
leave_from
cities_to_visit

email: {email}
{format_instructions}
"""

In [63]:
prompt = ChatPromptTemplate.from_template(template=email_template)

In [64]:
messages = prompt.format_messages(
    email = email_response,
    format_instructions = format_instructions
)

In [65]:
response = chat_llm.invoke(messages)  # Modern invoke() method

In [66]:
vacation = pydantic_parser.parse(response.content)

In [68]:
vacation

TravelInfo(leave_time='8:45 pm', leave_from='Denver, Colorado airport', cities_to_visit=['Amsterdam', 'Brussels'], num_people=5)

In [69]:
type(vacation)

__main__.TravelInfo

In [67]:
print(type(vacation))
for item in vacation.cities_to_visit:
    print(f"Cities: {item}")

<class '__main__.TravelInfo'>
Cities: Amsterdam
Cities: Brussels
