<a href="https://www.kaggle.com/code/jatin2055/langchain-pydantic-output-parser?scriptVersionId=254921714" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [12]:
# !pip install langchain
# ! pip install langchain_openai
# ! pip install langchain_huggingface



In [21]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

# Model Initialisation

In [22]:
# HuggingFaceEndpoint API provides free credits
llm= HuggingFaceEndpoint(
    repo_id="google/gemma-2-2b-it",
    task="text-generation",
    huggingfacehub_api_token="your_api_key" 
)

model = ChatHuggingFace(llm=llm)

# Output Schema

In [23]:
class Person(BaseModel):
    name: str = Field(description='Name of the person')
    age: int = Field(gt=18, description="Age of the person")
    city: str = Field(description="City name of the city person resides in")

In [24]:
pydantic_parser = PydanticOutputParser(pydantic_object=Person)

In [25]:
template1 = PromptTemplate(
    template="Generate name , age and city of any person residing the stae {state} and  country {country}. {format_instruction}",
    input_variables = ['state', 'country'],
    partial_variables={'format_instruction': pydantic_parser.get_format_instructions()}
    
) 

# parser.get_format_instructions()

The output should be formatted as a JSON instance of the following schema:

{
  "fact1": string,
  "fact2": string,
  "fact3": string
}

In [27]:
prompt = template1.invoke({'state': 'Haryana', 'country': 'INDIA'})

In [28]:
print(prompt)

text='Generate name , age and city of any person residing the stae Haryana and  country INDIA. The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "Name of the person", "title": "Name", "type": "string"}, "age": {"description": "Age of the person", "exclusiveMinimum": 18, "title": "Age", "type": "integer"}, "city": {"description": "City name of the city person resides in", "title": "City", "type": "string"}}, "required": ["name", "age", "city"]}\n```'


# What goes to the LLM

**Prompt** :

text='Generate name , age and city of any person residing the stae Haryana and  country INDIA. The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "Name of the person", "title": "Name", "type": "string"}, "age": {"description": "Age of the person", "exclusiveMinimum": 18, "title": "Age", "type": "integer"}, "city": {"description": "City name of the city person resides in", "title": "City", "type": "string"}}, "required": ["name", "age", "city"]}\n```'

# Invoking model using chaining

In [18]:
chain = template1 | model | pydantic_parser

result = chain.invoke({'state': 'Haryana', 'country': 'INDIA'})


In [19]:
print(result) # name='Hiral Verma' age=25 city='Rohtak'


name='Hiral Verma' age=25 city='Rohtak'


# PARSER VS PROMPT VS PROMPT_TEMPLATE





| Aspect      | Parser (OutputParser)                          | Prompt                                | PromptTemplate                                       |
|-------------|------------------------------------------------|----------------------------------------|------------------------------------------------------|
| Purpose     | Turns LLM output into structured format        | Instructions/questions for the LLM     | Reusable blueprint for creating prompts              |
| Input       | LLM raw output (text)                          | User/system/contextual input           | User/system/context + variables                      |
| Output      | Structured object (dict/list/class)            | Input string for LLM                   | Formatted prompt string                              |
| When Used   | After model response                           | Before LLM call                        | Before LLM call, when response must be dynamic       |


# FLOW DIAGRAM

**[User Input / Context]**
        |
        v
**[PromptTemplate]**

    - Blueprint with variables (e.g., {topic}, {format_instruction})
    - Uses: .format(topic="AI") → Generates a Prompt
        |
        v
        
    **[Prompt]**
        - Fully rendered string with instructions for LLM
        |
        v
    **[LLM Call]**
    
        - LLM processes the prompt and returns raw text
        |
        v
    **[OutputParser]**
    
        - StructuredOutputParser / PydanticOutputParser / Custom
        - Uses: .parse(raw LLM output)
        |
        v
**[Structured Output]**

    - dict / list / class instance
