```{contents}
```

## Output Parser

In **LangChain**, an **Output Parser** is:

> A component that **converts raw LLM output into a structured, validated Python object**.

LLMs always return **text**.
Applications need **data**.

Output parsers bridge that gap.

---

### Why Output Parsers Are Necessary

Without parsers:

* Free-form text
* Fragile string parsing
* Runtime bugs
* No guarantees

With parsers:

* Typed outputs
* Schema validation
* Automatic retries
* Production safety

---

### Where Output Parsers Sit in the Pipeline

```
Prompt
  ↓
LLM
  ↓
Raw Text
  ↓
Output Parser
  ↓
Typed Python Object
```

In LCEL:

```python
prompt | llm | output_parser
```

---

### Core Output Parser Types in LangChain

| Parser                           | Purpose                 |
| -------------------------------- | ----------------------- |
| `StrOutputParser`                | Plain text              |
| `JsonOutputParser`               | JSON output             |
| `PydanticOutputParser`           | Typed schema            |
| `EnumOutputParser`               | Controlled labels       |
| `CommaSeparatedListOutputParser` | Lists                   |
| `OutputFixingParser`             | Auto-repair bad outputs |
| `StructuredOutputParser`         | JSON schema (legacy)    |

---

###  `StrOutputParser` (Default / Simplest)

Used when you just want text.



In [22]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = prompt | llm | parser

result = chain.invoke({"input": "Explain RAG"})
print(result)

KeyError: "Input to PromptTemplate is missing variables {'text'}.  Expected: ['text'] Received: ['input']\nNote: if you intended {text} to be part of the string and not a variable, please escape it with double curly braces like: '{{text}}'.\nFor troubleshooting, visit: https://docs.langchain.com/oss/python/langchain/errors/INVALID_PROMPT_INPUT "



Output:

```text
"Retrieval-Augmented Generation combines search with generation."
```

---

### `PydanticOutputParser` (Most Important)

#### Define a schema



In [None]:
from pydantic import BaseModel, Field

class Ticket(BaseModel):
    category: str = Field(description="Issue category")
    priority: str = Field(description="High, Medium, or Low")


#### Create parser



In [None]:

from langchain_core.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=Ticket)



#### Use with prompt



In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract structured ticket info."),
    ("human", "{input}\n\n{format_instructions}")
])

chain = prompt | ChatOpenAI(temperature=0.2) | parser

result = chain.invoke({
    "input": "CEO cannot access VPN",
    "format_instructions": parser.get_format_instructions()
})

print(result)


category='VPN' priority='High'


---

### `with_structured_output()` (Modern Shortcut)

This is the **recommended approach**.



In [None]:
llm = ChatOpenAI(temperature=0.2)
structured_llm = llm.with_structured_output(Ticket)

result = structured_llm.invoke(
    "Email service down for finance team"
)

print(result)




category='Email' priority='High'


What LangChain does internally:

* Generates JSON schema
* Injects format instructions
* Parses + validates
* Retries on failure

---

### JSON Output Parser



In [None]:
from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser()

chain = prompt | llm | parser

chain.invoke({"input": "Explain RAG  ","format_instructions": parser.get_format_instructions()})


{'acronym': 'RAG',
 'meaning': 'Red, Amber, Green',



Use when:

* Schema is dynamic
* No strict typing needed

---

### Enum Output Parser (Classification)



In [None]:
from langchain_classic.output_parsers.enum import EnumOutputParser
from enum import Enum
from langchain_core.prompts import PromptTemplate
class Severity(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

parser = EnumOutputParser(enum=Severity)

prompt = PromptTemplate(
    template="Classify severity: {text}\n{format_instructions}",
    input_variables=["text"],
    partial_variables={
        "format_instructions": parser.get_format_instructions()
    }
)

chain = prompt | llm | parser

chain.invoke({"text": "Database is down"})


<Severity.HIGH: 'high'>


---

### OutputFixingParser (Auto-Repair)

Used when models sometimes return invalid JSON.



In [None]:
from langchain_classic.output_parsers.fix import OutputFixingParser

fixing_parser = OutputFixingParser.from_llm(
    llm=llm,
    parser=parser
)



LangChain will:

1. Detect parse failure
2. Re-prompt the LLM
3. Fix output automatically

---

### Output Parsers in Agents

Agents **do not expose raw parsers** directly.

Instead:

* Tool outputs are parsed
* Final agent output is text
* Structured parsing is applied **after agent execution**

Example:

```python
executor.invoke(...)
result["output"]
```

If you need structured agent output → wrap with `with_structured_output()` or post-parse.

---

### Common Mistakes (Critical)

| Mistake                             | Result          |
| ----------------------------------- | --------------- |
| Forgetting `format_instructions`    | Invalid output  |
| Manual JSON parsing                 | Fragile         |
| Parsing chain-of-thought            | Security risk   |
| Using legacy StructuredOutputParser | Not recommended |
| Skipping validation                 | Runtime bugs    |

---

### Security & Safety Note

LangChain **does NOT expose chain-of-thought**.
Output parsers extract **answers only**, which is production-safe.

---

### When to Use Which Parser

| Use Case              | Parser                     |
| --------------------- | -------------------------- |
| Free-form chat        | `StrOutputParser`          |
| Typed API response    | `PydanticOutputParser`     |
| Classification        | `EnumOutputParser`         |
| Unstable model output | `OutputFixingParser`       |
| Modern systems        | `with_structured_output()` |

---

**Interview-Ready Summary**

> “Output parsers in LangChain convert raw LLM text into validated, typed Python objects. They are essential for production systems because they enforce schemas, reduce hallucinations, and enable deterministic downstream processing.”

---

**Rule of Thumb**

* **UI → text**
* **API / agents / automation → structured output**
* **Production → always parse**
