# SESSION 8 : Output Parsers in LangChain | Generative AI using LangChain | Video 6

https://youtu.be/Op6PbJZ5b2Q?list=PLKnIA16_RmvaTbihpo4MtzVm4XOQa0ER0

### Open source models don't give structured_output, so we need Output parsers for it. They can work with both can and can't model

__Output Parsers__ in LangChain help convert raw LLM responses into structured formats like JSON, CSV, Pydantic models, and more. They ensure consistency, validation, and ease of use in applications.



In **LangChain**, an **Output Parser** is a utility that:

* **Takes the raw text output** from an LLM.


* **Converts/parses it into structured Python objects** (dicts, lists, Pydantic models, JSON, etc.).


* Optionally provides **formatting instructions** back to the LLM, so the LLM knows how to respond.

👉 Think of it as the “translator” between **free-text LLM output** and **structured data your code can use**.

### 🔹 Why do we need Output Parsers?

LLMs naturally return unstructured text. But in real-world apps, you usually need structured results:

* ✅ Extracting entities (name, date, amount).
* ✅ Returning valid JSON (not "almost JSON").
* ✅ Parsing classification labels (positive/negative).
* ✅ Turning outputs into **Python objects** that your pipeline can consume.

Without parsers, you’d have to do brittle post-processing with regex, JSON parsing, etc.

### 🔹 Types of Output Parsers in LangChain


### 1. **`StrOutputParser`** : String output parser

* The StrOutputParser is the simplest output parser in LangChain. It is used to parse the output of a Language Model (LLM) and return it as a plain string.


* Returns the raw string as-is.


* Use when you don’t care about structure.


```python
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()
parser.parse("Hello world!")   # → "Hello world!"
```

### 2. **`JSONOutputParser`**

* Ensures output is **valid JSON**.

```python
from langchain.output_parsers import JSONOutputParser

parser = JSONOutputParser()
parser.parse('{"city": "New York", "country": "USA"}')
# → {'city': 'New York', 'country': 'USA'}
```

#### need to provide additional information, how the output should loook like. `parser.get_format_instructions`

#### you cannot force your own schema of the output, hence StructuredOutputParser came into picture

### 3. **`StructuredOutputParser` (JSON Schema)**


* It is an output parser in LangChain that helps extract structured JSON data from LLM responses based on predefined field schemas. It works by defining a __list of fields (ResponseSchema)__ that the model should return, ensuring the output follows a structured format


* __Lets you define expected fields & schema.__



* LLM is prompted to follow that schema.


```python
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

schemas = [
    ResponseSchema(name="name", description="The person's name"),
    ResponseSchema(name="age", description="The person's age")
]

parser = StructuredOutputParser.from_response_schemas(schemas)
format_instructions = parser.get_format_instructions()

print(format_instructions)  
# Instructs LLM to return valid JSON with "name" and "age"
```

### Disadvantage of StructuredOutputParser:

**We cannot perfrom data validation**

### 4. **`PydanticOutputParser`**

![image.png](attachment:image.png)

* Converts output into a **Pydantic model**.


* when forming schema, instead of Schema we send here pydantic object.


* Very useful for **structured extraction**.


* Performs data validation also



```python
from pydantic import BaseModel
from langchain.output_parsers import PydanticOutputParser

class Person(BaseModel):
    name: str
    age: int

parser = PydanticOutputParser(pydantic_object=Person)
output = parser.parse('{"name": "Alice", "age": 30}')
# → Person(name='Alice', age=30)
```

### 5. **`CommaSeparatedListOutputParser`**

* Parses output into a Python list, assuming items are comma-separated.

```python
from langchain.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()
parser.parse("apples, bananas, cherries")  
# → ["apples", "bananas", "cherries"]
```

---

### 6. **`RegexParser`**

* Extracts values using regex patterns.

```python
from langchain.output_parsers import RegexParser

parser = RegexParser(
    regex=r"Name: (.*), Age: (.*)", 
    output_keys=["name", "age"]
)
parser.parse("Name: Alice, Age: 30")  
# → {"name": "Alice", "age": "30"}
```

---

### 🔹 When to Use Which?

* **Just need raw text** → `StrOutputParser`.


* **Lists** → `CommaSeparatedListOutputParser`.


* **Extract specific patterns** → `RegexParser`.


* **Guaranteed schema / typed objects** → `PydanticOutputParser`.


* **Generic JSON output** → `JSONOutputParser`.


* **Schema-defined responses** (API-like) → `StructuredOutputParser`.

### 🔹 Key Point

* **Output Parsers** make sure your LLM responses are **predictable, validated, and machine-usable**.


* They are especially useful in **pipelines/agents**, where downstream steps need structured inputs.

⚡ If you’re working on production-grade apps:

* Use **`PydanticOutputParser`** or **`StructuredOutputParser`** (they give you schema guarantees).


* Wrap them with **`with_structured_output`** for even tighter validation.