## Section 2, Part 5: Structuring LLM Outputs with Parsers & Pydantic

#### **1. What Are Output Parsers and Why Do We Need Them?**

Imagine you ask a Large Language Model (LLM) a question. The LLM, at its core, gives you back a string of text.

* **You ask:** "Who were the first three astronauts to land on the moon?"
* **LLM returns (as a string):** "The first three astronauts to land on the moon were Neil Armstrong, Buzz Aldrin, and Michael Collins."

This is great for a human to read. But what if your application needs that information in a structured format, like a Python list, to do something with it?

```python
# What your application needs:
["Neil Armstrong", "Buzz Aldrin", "Michael Collins"]
```

This is where **Output Parsers** come in. They are special classes in LangChain whose job is to take the raw text output from an LLM and convert it into a more useful, structured format that your code can easily work with.

Think of them as the bridge between the LLM's free-form text and your application's structured data requirements.

---

#### **2. Common Types of Output Parsers (with Code)**

Let's explore the most common parsers you'll use.

##### **a) `StrOutputParser` - The Default**

This is the most basic parser. It doesn't do much! It simply takes the output from the model and converts it into a standard Python `string`. It's the default final step in many simple chains.

**Code Example:**

```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# 1. Define our components
model = ChatOpenAI(model="gpt-3.5-turbo")
prompt = ChatPromptTemplate.from_template("Tell me a short, one-sentence joke about {topic}.")
output_parser = StrOutputParser()

# 2. Create the chain
chain = prompt | model | output_parser

# 3. Invoke the chain
result = chain.invoke({"topic": "computers"})

# 4. Print the result and its type
print(result)
print(f"\nType of result: {type(result)}")
```

**Expected Output:**

```
Why did the computer keep sneezing? It had a virus!

Type of result: <class 'str'>
```

##### **b) `JsonOutputParser` - For Dictionaries and JSON**

This is incredibly useful when you need the LLM to return multiple pieces of information. You instruct the model to format its response as a JSON object, and this parser will automatically convert that JSON string into a Python dictionary.

**Code Example:**

```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser

# 1. Instantiate the parser
parser = JsonOutputParser()

# 2. Create a prompt that instructs the model to output JSON
#    We use the parser's .get_format_instructions() method to help the model
prompt = ChatPromptTemplate.from_template(
    "Extract information from the following text.\n"
    "{format_instructions}\n"
    "Text: {text}"
)

# 3. Create the model and chain
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = prompt | model | parser

# 4. Invoke the chain
text_to_parse = "The Eiffel Tower was built in 1889 and is located in Paris, France. It is 330 meters tall."
result = chain.invoke({
    "text": text_to_parse,
    "format_instructions": parser.get_format_instructions()
})

# 5. Print the result and its type
print(result)
print(f"\nType of result: {type(result)}")
print(f"The landmark is {result['landmark']} located in {result['location']}.")
```

**Expected Output:**

```json
{'landmark': 'Eiffel Tower', 'year_built': 1889, 'location': 'Paris, France', 'height_meters': 330}

Type of result: <class 'dict'>
The landmark is Eiffel Tower located in Paris, France.
```

##### **c) `PydanticOutputParser` - The Most Robust Method**

This is the most powerful and recommended approach for structured data. It uses the `Pydantic` library to let you define a data schema with types. The parser then does two things:

1.  It generates highly specific formatting instructions for the LLM based on your schema.
2.  It parses the LLM's output into a Pydantic object, which gives you type-hinting, validation, and easy access to your data.

**Code Example:**

```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

# 1. Define your desired data structure with Pydantic
class Recipe(BaseModel):
    name: str = Field(description="The name of the recipe")
    ingredients: List[str] = Field(description="A list of ingredients for the recipe")
    servings: int = Field(description="The number of people the recipe serves")

# 2. Set up a parser with your Pydantic object
parser = PydanticOutputParser(pydantic_object=Recipe)

# 3. Create a prompt with the format instructions
prompt = ChatPromptTemplate.from_template(
    "You are a helpful assistant that generates recipes based on a user's request.\n"
    "{format_instructions}\n"
    "User request: {request}\n"
)

# 4. Create the model and chain
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = prompt | model | parser

# 5. Invoke the chain
request = "Give me a simple recipe for a classic spaghetti bolognese."
result = chain.invoke({
    "request": request,
    "format_instructions": parser.get_format_instructions()
})

# 6. Print the result and its type
print(result)
print(f"\nType of result: {type(result)}")
print(f"\nRecipe for: {result.name}")
print(f"It serves: {result.servings}")
print("Ingredients:")
for ingredient in result.ingredients:
    print(f"- {ingredient}")
```

**Expected Output:**

```
name='Classic Spaghetti Bolognese' ingredients=['Ground Beef', 'Onion', 'Garlic', 'Canned Tomatoes', 'Tomato Paste', 'Spaghetti', 'Olive Oil', 'Salt', 'Pepper'] servings=4

Type of result: <class '__main__.Recipe'>

Recipe for: Classic Spaghetti Bolognese
It serves: 4
Ingredients:
- Ground Beef
- Onion
- Garlic
- Canned Tomatoes
- Tomato Paste
- Spaghetti
- Olive Oil
- Salt
- Pepper
```

---

#### **Summary**

* **LLMs output strings.** Your application often needs structured data (lists, JSON, objects).
* **Output Parsers** are the components that convert the LLM's string output into the structured format you need.
* **`StrOutputParser`** is the simplest, giving you a plain string.
* **`JsonOutputParser`** is great for getting back Python dictionaries.
* **`PydanticOutputParser`** is the most robust, giving you validated data objects with types, making your application more reliable.
