<a href="https://colab.research.google.com/github/jasreman8/OOPs-for-Intelligent-Agentic-Systems/blob/main/Inheritance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Objectives

- Understand how Pydantic's `BaseModel` serves as a parent class for defining structured data schemas in Python.
- See how these Pydantic model "children" inherit data validation and serialization capabilities from `BaseModel`.
- Illustrate how Pydantic models are crucial for defining structured output schemas when working with LangChain LLMs, particularly with `llm.with_structured_output()`.

# Understanding Inheritance

We've seen classes and objects, and how `TypedDict` can define the shape of dictionaries. Now, we'll explore inheritance more deeply, but specifically in the context that's most vital for LangChain: using Pydantic's `BaseModel`.

When you create a Pydantic model to define a data structure (like we did for `UserProfile` previously), you are actually using inheritance!

```python
class MyDataStructure(BaseModel):
```

Here, `MyDataStructure` is a child class (or derived class), and `BaseModel` (from the Pydantic library) is the parent class (or base class). Your `MyDataStructure` inherits a lot of powerful capabilities from `BaseModel`.


Pydantic's `BaseModel` is a specially designed class that provides a foundation for creating your own data schemas with built-in features. When your class inherits from BaseModel, it automatically gets:
- Data Validation: Pydantic will check if the data you provide to create an instance of your model matches the types and constraints you defined (e.g., `name: str, age: int, email: EmailStr`). This happens at runtime.
- Type Coercion: It can often convert input data to the correct type (e.g., if age is an int but you provide "25", Pydantic might convert it).
- Clear Error Reporting: If validation fails, Pydantic gives detailed error messages.
- Serialization/Deserialization: Easy methods to convert your model instances to dictionaries (.dict()) or JSON strings (.json()) and vice-versa.
- IDE Support: Great autocompletion and type checking in your editor.

You don't have to write the code for these features yourself; you inherit them from `BaseModel`.

Let's look at an example in action where we define a schema for extracting key information from a product support ticket.

In [1]:
! pip install -q "pydantic[email]"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/331.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m327.7/331.1 kB[0m [31m22.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m331.1/331.1 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from pydantic import BaseModel, Field, EmailStr
from typing import List, Optional

In [3]:
# Over here, the new class (SupportTicketInfo) inherits from the parent class (BaseModel) and defines parameters and their data structuring requirement.
class SupportTicketInfo(BaseModel): # Our class SupportTicketInfo INHERITS from BaseModel
    """Structured information extracted from a customer support ticket."""
    ticket_id: Optional[str] = Field(None, description="The unique identifier for the ticket, if mentioned.")
    customer_name: str = Field(description="The full name of the customer.")
    customer_email: EmailStr = Field(description="The customer's email address for contact.")
    product_name: str = Field(description="The name of the product the ticket is about.")
    issue_summary: str = Field(min_length=10, description="A brief summary of the customer's issue (at least 10 chars).")
    priority: str = Field(description="The assessed priority of the ticket (Low, Medium, High, Urgent).")
    tags: List[str] = Field(default_factory=list, description="Relevant tags or keywords for the issue (e.g., 'login', 'billing', 'feature_request').")

In this example, `SupportTicketInfo` is a child class. `SupportTicketInfo` inherits directly from `pydantic.BaseModel`. It automatically get all the validation logic, type checking, and utility methods from the parent.

Let us look at our child class in action.

In [4]:
ticket_data_dict = {
        "ticket_id": "TICK-00123",
        "customer_name": "Alice Wonderland",
        "customer_email": "alice.w@example.com", # this will be validated as an email
        "product_name": "Premium Widget X1000",
        "issue_summary": "The main power button is unresponsive after the latest firmware update.",
        "priority": "High",
        "tags": ["power_button", "firmware_update", "unresponsive"]
    }

In [5]:
# ticket_data_dict is passed as an keyword args (**kwargs)
ticket_object = SupportTicketInfo(**ticket_data_dict)

In [6]:
ticket_object.ticket_id

'TICK-00123'

In [7]:
type(ticket_object)

If we pass in an incorrect schema in the dict, we will get an error message.

In [8]:
ticket_data_dict = {
    "ticket_id": "TICK-00123",
    "customer_name": "Alice Wonderland",
    "customer_email": "alice.w", # this will be validated as an email and return an error
    "product_name": "Premium Widget X1000",
    "issue_summary": "The main power button is unresponsive after the latest firmware update.",
    "priority": "High",
    "tags": ["power_button", "firmware_update", "unresponsive"]
}

In [9]:
try:
    ticket_object = SupportTicketInfo(**ticket_data_dict)
except Exception as e:
    print("Validation error occurred:")
    print(e.json())

Validation error occurred:
[{"type":"value_error","loc":["customer_email"],"msg":"value is not a valid email address: An email address must have an @-sign.","input":"alice.w","ctx":{"reason":"An email address must have an @-sign."},"url":"https://errors.pydantic.dev/2.11/v/value_error"}]


In [10]:
ticket_data_dict = {
    "ticket_id": "TICK-00123",
    "customer_name": "Alice Wonderland",
    "customer_email": "alice.w@gmail.com",
    "product_name": "Premium Widget X1000",
    "issue_summary": "The", # this will be validated for length and return an error
    "priority": "High",
    "tags": ["power_button", "firmware_update", "unresponsive"]
}

In [11]:
try:
    ticket_object = SupportTicketInfo(**ticket_data_dict)
except Exception as e:
    print("Validation error occurred:")
    print(e.json())

Validation error occurred:
[{"type":"string_too_short","loc":["issue_summary"],"msg":"String should have at least 10 characters","input":"The","ctx":{"min_length":10},"url":"https://errors.pydantic.dev/2.11/v/string_too_short"}]


# Benefits of Inheritance

- **Code Reuse**:
As the example in the previous section shows, you don't write validation for email formats, checking if a number is positive, or converting to JSON. `BaseModel` provides this. You just declare your fields and their types/constraints.
- **Standardization and Interoperability (Polymorphism)**: All Pydantic models (children of `BaseModel`) share common behaviors like `.dict()`, `.json()`, and the way they are initialized and validated.
- **Extensibility (Creating More Specific Schemas)**:
You can create a new Pydantic model that inherits from one of your existing Pydantic models to add more fields or specialize it.

In [13]:
# This new class inherits from the old class but adds a few name parameters to it. These parameters are optional and have a default value of Null.
class InternalSupportTicketInfo(SupportTicketInfo): # Inherits from our SupportTicketInfo
    """Extends SupportTicketInfo with internal tracking fields."""
    assigned_agent_id: Optional[str] = Field(None, description="ID of the agent assigned to this ticket.")
    resolution_notes: Optional[str] = Field(None, description="Notes on how the ticket was resolved.")

internal_data = {
    **ticket_object.model_dump(), # Get all data from the parent SupportTicketInfo instance
    "assigned_agent_id": "AGENT_007"
}
internal_ticket = InternalSupportTicketInfo(**internal_data)
print(f"\n--- Internal Ticket ---")
print(f"Internal Ticket, Customer: {internal_ticket.customer_name}, Agent: {internal_ticket.assigned_agent_id}")


--- Internal Ticket ---
Internal Ticket, Customer: Alice Wonderland, Agent: AGENT_007


In the above example, `InternalSupportTicketInfo` gets all fields and validations from `SupportTicketInfo` and adds its own.

# Inheritance in Action: LangChain

In [17]:
! pip install -q langchain-openai==0.3.24

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/69.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.0/69.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [18]:
import os
from langchain_openai import ChatOpenAI
from google.colab import userdata

In [19]:
llm = ChatOpenAI(
    api_key=userdata.get('OPEN_API_KEY'),
    base_url="https://aibe.mygreatlearning.com/openai/v1",
    model="gpt-4o-mini",
    temperature=0.2
)

In [20]:
class SupportTicketInfo(BaseModel): # Our class SupportTicketInfo INHERITS from BaseModel
    """Structured information extracted from a customer support ticket."""

    ticket_id: Optional[str] = Field(None, description="The unique identifier for the ticket, if mentioned.")
    customer_name: str = Field(description="The full name of the customer.")
    customer_email: EmailStr = Field(description="The customer's email address for contact.")
    product_name: str = Field(description="The name of the product the ticket is about.")
    issue_summary: str = Field(min_length=10, description="A brief summary of the customer's issue (at least 10 chars).")
    priority: str = Field(description="The assessed priority of the ticket (Low, Medium, High, Urgent).")
    tags: List[str] = Field(default_factory=list, description="Relevant tags or keywords for the issue (e.g., 'login', 'billing', 'feature_request').")

In [21]:
structured_llm = llm.with_structured_output(SupportTicketInfo)

In [22]:
prompt = [
    ('system', 'You are an expert at extracting information from customer support texts. Please extract the required fields and structure your output according to the provided schema.'),
    ('user', 'Ticket #TICK-00789 from John Doe (john.doe@email.provider.com) regarding our AlphaRouter Pro. He says "My internet keeps dropping every few hours, and the setup was a nightmare! This is super urgent as I work from home." He also mentioned "login issues" and "slow speed."')
]

In [23]:
extracted_customer_info = structured_llm.invoke(prompt)

In [25]:
print(extracted_customer_info)

ticket_id='TICK-00789' customer_name='John Doe' customer_email='john.doe@email.provider.com' product_name='AlphaRouter Pro' issue_summary='My internet keeps dropping every few hours, and the setup was a nightmare! This is super urgent as I work from home.' priority='Urgent' tags=['login issues', 'slow speed']


In [26]:
print(extracted_customer_info.customer_name)

John Doe


**How Inheritance from `BaseModel` is Key Here:**

1. Schema Definition: `SupportTicketInfo(BaseModel)` clearly defines the "contract" for the LLM's output.
2. LangChain Integration: `llm.with_structured_output()` specifically works with Pydantic `BaseModel` (or `TypedDicts`, but Pydantic is more robust). LangChain uses the schema information from `SupportTicketInfo` (which it gets because `SupportTicketInfo` is a `BaseModel`) to:
    - Instruct the LLM (often via "function calling" or "tool calling" modes) on the exact JSON format to produce.
    - Parse the LLM's JSON response.
    - Validate the parsed data against the `SupportTicketInfo` schema, creating an instance of your `SupportTicketInfo` class. If the LLM output doesn't match, Pydantic's inherited validation logic will raise an error.
3. Developer Experience: You get a clean, typed Python object (`extracted_support_info`) back from the LLM call, not just a raw string or dictionary. You can access its fields with confidence (e.g., `extracted_support_info.customer_name`).