<a href="https://colab.research.google.com/github/nsk-ai/RAG-Bootcamp-2025/blob/main/Output_Parsers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install langchain



In [None]:
pip install -qU langchain-groq

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/131.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/131.1 kB[0m [31m2.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m81.9/131.1 kB[0m [31m1.1 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m122.9/131.1 kB[0m [31m1.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.1/131.1 kB[0m [31m977.9 kB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from google.colab import userdata
import os

# Access the secret using userdata.get()
my_variable = userdata.get('GROQ_API_KEY')

# You can also set it as an environment variable for use with os.getenv()
os.environ['GROQ_API_KEY'] = my_variable

### **1. StrOutputParser**

  * **Definition:** The most basic parser that simply takes the model's output and returns it as a standard Python string.
  * **Use Case:** Ideal for any simple task where you just need the raw text response from the model, such as straightforward Q\&A, summarization, or simple content generation.


In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_groq import ChatGroq

# The chain will produce a standard string output
chain = (
    ChatPromptTemplate.from_template("Tell me a one-sentence summary of the book '{book_title}'.")
    | ChatGroq(model="llama3-8b-8192") # Replace ChatOpenAI with ChatGroq
    | StrOutputParser()
)

# Execute the chain
result = chain.invoke({"book_title": "1984"})

print(result)
print(f"\nType of output: {type(result)}")

In the dystopian novel "1984" by George Orwell, Winston Smith, a low-ranking member of the ruling Party, begins to question the official ideology of the totalitarian government and rebels against it, leading him to confront the harsh realities of a surveillance state that seeks to control every aspect of its citizens' lives.

Type of output: <class 'str'>


### **2. JsonOutputParser**

  * **Definition:** Parses a JSON string from the model's output into a Python dictionary.
  * **Use Case:** When you need to extract multiple, distinct pieces of information from a prompt and want them returned in a structured key-value format.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_groq import ChatGroq

# This parser expects the model to output a JSON string
parser = JsonOutputParser()

# We include instructions in the prompt to guide the model
prompt = ChatPromptTemplate.from_template(
    "Extract the main character and its Actor's name of this movie title: {movie_title}.\n\n{format_instructions}\n\nONLY return the JSON object and nothing else."
)

chain = (
    prompt
    | ChatGroq(model="llama3-8b-8192")
    | parser
)

# Execute the chain
result = chain.invoke({
    "movie_title": "Hancock 2008",
    "format_instructions": parser.get_format_instructions()
})

print(result)
print(f"\nType of output: {type(result)}")

{'mainCharacter': 'John Hancock', 'actor': 'Will Smith'}

Type of output: <class 'dict'>


### **3. PydanticOutputParser**

  * **Definition:** Parses the model's output into a Pydantic model, giving you a validated, typed Python object.
  * **Use Case:** The most robust method for structured output; perfect for complex data extraction where you need to guarantee the output's schema, including data types (strings, integers, lists, etc.), making it safe to use in your application.

In [None]:
from typing import List
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from langchain_groq import ChatGroq

# 1. Define the desired data structure using Pydantic
class Person(BaseModel):
    name: str = Field(description="The full name of the person.")
    skills: List[str] = Field(description="A list of the person's skills.")
    years_of_experience: int = Field(description="The person's total years of professional experience.")

# 2. Set up the parser
parser = PydanticOutputParser(pydantic_object=Person)

# 3. Print the formatting instructions
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "The full name of the person.", "title": "Name", "type": "string"}, "skills": {"description": "A list of the person's skills.", "items": {"type": "string"}, "title": "Skills", "type": "array"}, "years_of_experience": {"description": "The person's total years of professional experience.", "title": "Years Of Experience", "type": "integer"}}, "required": ["name", "skills", "years_of_experience"]}
```


In [None]:
# 4. Create the prompt and chain
prompt = ChatPromptTemplate.from_template(
    "Analyze the following job description summary and extract the candidate's information.\n\n{format_instructions}\n\nSummary: {summary}\n"
)

chain = (
    prompt
    | ChatGroq(model="llama3-8b-8192") # Replace ChatOpenAI with ChatGroq
    | parser
)

In [None]:
# 5. Execute the chain
result = chain.invoke({
    "summary": "John Doe is a software engineer with 8 years of experience in Python, React, and SQL.",
    "format_instructions": parser.get_format_instructions()
})

print(result)
print(f"\nType of output: {type(result)}")

name='John Doe' skills=['Python', 'React', 'SQL'] years_of_experience=8

Type of output: <class '__main__.Person'>


In [None]:
result.name

'John Doe'