# Model Composition

Model composition is basically defining Pydantic models where one or more fields are themselves Pydantic models.

This is no different than regular composition (since Pydantic classes are Python classes) - the main difference being that Pydantic knows how to serialize and deserialize these fields, since they are Pydantic models.

Let's take a look at a few examples.

Le's say we want a `Circle` class, that has a `center` field and a `radius` field. Previously we just defined the `center` field as a tuple.

In this case, we want a 2D circle, and we want to explicitly name the orgin coordinates as `x` and `y`:

In [1]:
from pydantic import BaseModel, ConfigDict, Field, ValidationError


class Point2D(BaseModel):
    x: float = 0
    y: float = 0

class Circle2D(BaseModel):
    center: Point2D
    radius: float = Field(default=1, gt=0)

To create a `Circle` instance using class-based approach, we will need to specify our center point using the `Point2D` model explicitly:

In [2]:
c = Circle2D(center=Point2D(x=1, y=1), radius=2)

Now let's look at the representation:

In [3]:
c

Circle2D(center=Point2D(x=1.0, y=1.0), radius=2.0)

As you can see, Pydantic shows us the nested model, and the same happens when serializing the data, if will create nested serialized data structures.

Serializing to Python objects:

In [4]:
c.model_dump()

{'center': {'x': 1.0, 'y': 1.0}, 'radius': 2.0}

Notice how `Point2D` was serialized to a dictionary, so our entire serialized data is a dictionary with `center` being another dictionary.

Same with JSON serialization:

In [5]:
c.model_dump_json()

'{"center":{"x":1.0,"y":1.0},"radius":2.0}'

So this should give you an idea that deserialization using dictionaries or JSON is going to work the same way:

In [6]:
data = {
    "center": {
        "x": 5, 
        "y": -5
    },
    "radius": 10
}

c = Circle2D.model_validate(data)
c

Circle2D(center=Point2D(x=5.0, y=-5.0), radius=10.0)

And similarly with JSON data:

In [7]:
data = """
{
    "center": {
        "x": 5, 
        "y": -5
    },
    "radius": 10
}
"""

c = Circle2D.model_validate_json(data)
c

Circle2D(center=Point2D(x=5.0, y=-5.0), radius=10.0)

And of course we can access our field using dot notation, just like regular classes:

In [8]:
c.center

Point2D(x=5.0, y=-5.0)

In [9]:
c.center.x

5.0

One of the useful things about model composition, is that each model used can be independently configured.

For example, suppose we have this piece of JSON we are receiving from some API call.

We are interested in only a subset of the information, so we can easily create models that can ignore these extra pieces of data.

In [10]:
json_data = """
{
    "firstName": "David",
    "lastName": "Hilbert",
    "contactInfo": {
        "email": "d.hilbert@spectral-theory.com",
        "homePhone": {
            "countryCode": 49,
            "areaCode": 551,
            "localPhoneNumber": 123456789
        }
    },
    "personalInfo": {
        "nationality": "German",
        "born": {
            "date": "1862-01-23",
            "place": {
                "city": "Konigsberg",
                "country": "Prussia"
            }
        },
        "died": {
            "date": "1943-02-14",
            "place": {
                "city": "Gottingen",
                "country": "Germany"
            }
        }
    },
    "awards": ["Lobachevsky Prize", "Bolyai Prize", "ForMemRS"],
    "notableStudents": ["von Neumann", "Weyl", "Courant", "Zermelo"]
}
"""

Suppose that we are only interested in the first name, last name, email, born info, awards and notable students.

To model this using Pydantic we would break things down using decomposition:

In [11]:
from pydantic import EmailStr, PastDate
from pydantic.alias_generators import to_camel


class ContactInfo(BaseModel):
    model_config = ConfigDict(extra="ignore")

    email: EmailStr | None = None

class PlaceInfo(BaseModel):
    city: str
    country: str
    
class PlaceDateInfo(BaseModel):
    model_config = ConfigDict(populate_by_name=True)
    
    date_: PastDate = Field(alias="date")
    place: PlaceInfo
    
class PersonalInfo(BaseModel):
    model_config = ConfigDict(extra="ignore")

    nationality: str
    born: PlaceDateInfo

class Person(BaseModel):
    model_config = ConfigDict(alias_generator=to_camel, populate_by_name=True, extra="ignore")
    
    first_name: str
    last_name: str
    contact_info: ContactInfo
    personal_info: PersonalInfo
    notable_students: list[str] = []

In [12]:
p = Person.model_validate_json(json_data)
p

Person(first_name='David', last_name='Hilbert', contact_info=ContactInfo(email='d.hilbert@spectral-theory.com'), personal_info=PersonalInfo(nationality='German', born=PlaceDateInfo(date_=datetime.date(1862, 1, 23), place=PlaceInfo(city='Konigsberg', country='Prussia'))), notable_students=['von Neumann', 'Weyl', 'Courant', 'Zermelo'])

And when we serialize our model, you'll see that not only does everything get nested (like the input data), but we also have ignored extra fields in that JSON object our app did not need:

In [13]:
print(p.model_dump_json(by_alias=True, indent=2))

{
  "firstName": "David",
  "lastName": "Hilbert",
  "contactInfo": {
    "email": "d.hilbert@spectral-theory.com"
  },
  "personalInfo": {
    "nationality": "German",
    "born": {
      "date": "1862-01-23",
      "place": {
        "city": "Konigsberg",
        "country": "Prussia"
      }
    }
  },
  "notableStudents": [
    "von Neumann",
    "Weyl",
    "Courant",
    "Zermelo"
  ]
}


Just as an exercise, let's add some code so that our `notableStudents` list is always sorted alphabetically.

We do that by adding a custom validator - and I'll use an annotated type, since I might want to re-use a sorted list of strings in other places in my app.

In [14]:
from typing import Annotated
from pydantic import AfterValidator


# using an AfterValidator, so guaranteed value will be a list (empty or strings)
SortedStringList = Annotated[list[str], AfterValidator(lambda value: sorted(value, key=str.casefold))]

class Person(BaseModel):
    model_config = ConfigDict(alias_generator=to_camel, populate_by_name=True, extra="ignore")
    
    first_name: str
    last_name: str
    contact_info: ContactInfo
    personal_info: PersonalInfo
    notable_students: SortedStringList = []

In [15]:
p = Person.model_validate_json(json_data)
p

Person(first_name='David', last_name='Hilbert', contact_info=ContactInfo(email='d.hilbert@spectral-theory.com'), personal_info=PersonalInfo(nationality='German', born=PlaceDateInfo(date_=datetime.date(1862, 1, 23), place=PlaceInfo(city='Konigsberg', country='Prussia'))), notable_students=['Courant', 'von Neumann', 'Weyl', 'Zermelo'])

In [16]:
print(p.model_dump_json(by_alias=True, indent=2))

{
  "firstName": "David",
  "lastName": "Hilbert",
  "contactInfo": {
    "email": "d.hilbert@spectral-theory.com"
  },
  "personalInfo": {
    "nationality": "German",
    "born": {
      "date": "1862-01-23",
      "place": {
        "city": "Konigsberg",
        "country": "Prussia"
      }
    }
  },
  "notableStudents": [
    "Courant",
    "von Neumann",
    "Weyl",
    "Zermelo"
  ]
}


This is a "trick" I find myself using often when I have some API query that returns way more information than I need - I can eliminate all the extra data using `extra="ignore"` in any of the models used in the main model.