# Validation Aliases

So far we have seen to we can use aliases (or names) for deserializing data, and also specifying different aliases when serializing.

Although this is not common, you can also specify different validation aliases.

Essentially this allows you to specify different aliases for deserialization, and then use either aliases or serialization aliases for serialization.

It's a bit confusing, what if you have all three aliases defined - a validation alias, an alias, and a serialization alias.

Think of a plain alias as the default for either deserialization or serialization, and serialization aliases and validation aliases as overrides.

Let's take a look at a simple example with just a validation alias:

In [1]:
from pydantic import BaseModel, Field, ConfigDict, ValidationError

In [2]:
class Model(BaseModel):
    model_config = ConfigDict(populate_by_name=True)
    
    first_name: str = Field(validation_alias="FirstName")

Now this validation alias is used when deserializing data:

In [3]:
m = Model(FirstName="Isaac")
m

Model(first_name='Isaac')

In [4]:
data = {"FirstName": "Isaac"}
m = Model.model_validate(data)
m

Model(first_name='Isaac')

When we deserialize, the validation alias will not be used - in this example we don't have an alias defined, so it will use the field name:

In [5]:
m.model_dump(by_alias=True)

{'first_name': 'Isaac'}

We can specify a plain alias for our field:

In [6]:
class Model(BaseModel):
    model_config = ConfigDict(populate_by_name=True)
    
    first_name: str = Field(validation_alias="FirstName", alias="firstName")

In [7]:
m = Model.model_validate(data)
m

Model(first_name='Isaac')

In [8]:
m.model_dump(by_alias=True)

{'firstName': 'Isaac'}

If we add a serialization alias, we'll get a different serialization:

In [9]:
class Model(BaseModel):
    model_config = ConfigDict(populate_by_name=True)
    
    first_name: str = Field(
        validation_alias="FirstName", 
        alias="firstName", 
        serialization_alias="givenName"
    )

In [10]:
m = Model.model_validate(data)
m

Model(first_name='Isaac')

In [11]:
m.model_dump(by_alias=True)

{'givenName': 'Isaac'}

As you can see, the serialization alias essentially overrides the plain alias.

So, in most cases you do not need to use all three - in fact, if you specify all three the plain alias seems to have no effect.

So, why have those three alias types?

Consider this example, where we use auto generated aliases, but one field requires special aliasing for either validation or serialization:

In [12]:
from pydantic.alias_generators import to_camel

class Model(BaseModel):
    model_config = ConfigDict(populate_by_name=True, alias_generator=to_camel)
    
    first_name: str
    last_name: str

In this case, `first_name` and `last_name` will have autogenerated aliases `firstName` and `lastName` respectively.

In [13]:
data = {
    "firstName": "Isaac",
    "lastName": "Newton"
}
m = Model.model_validate(data)

In [14]:
m.model_dump()

{'first_name': 'Isaac', 'last_name': 'Newton'}

In [15]:
class Model(BaseModel):
    model_config = ConfigDict(populate_by_name=True, alias_generator=to_camel)
    
    first_name: str = Field(
        validation_alias="FirstName", serialization_alias="givenName"
    )
    last_name: str

In [16]:
data = {
    "FirstName": "Isaac",
    "lastName": "Newton"
}
m = Model.model_validate(data)
m

Model(first_name='Isaac', last_name='Newton')

In [17]:
m.model_dump(by_alias=True)

{'givenName': 'Isaac', 'lastName': 'Newton'}

Hopefully this clears up some of the possible confusion around `alias`, `validation_alias` and `serialization_alias`.

Now, where things get interesting is that when specifying a validation alias you can actually define multiple of these - this way Pydantic will look for a field with any of the specified validation aliases, and use it to match to the field.

In [18]:
from pydantic import AliasChoices

In [19]:
class Model(BaseModel):
    model_config = ConfigDict(populate_by_name=True, alias_generator=to_camel)
    
    first_name: str = Field(
        validation_alias=AliasChoices("FirstName", "GivenName"), 
        serialization_alias="givenName"
    )
    last_name: str

And now, Pydantic can handle either `FirstName`, or `GivenName` when deserializing the data.

In [20]:
data = {
    "FirstName": "Isaac",
    "lastName": "Newton"
}
m = Model.model_validate(data)
m

Model(first_name='Isaac', last_name='Newton')

In [21]:
m.model_dump(by_alias=True)

{'givenName': 'Isaac', 'lastName': 'Newton'}

In [22]:
data = {
    "GivenName": "Isaac",
    "lastName": "Newton"
}
m = Model.model_validate(data)
m

Model(first_name='Isaac', last_name='Newton')

In [23]:
m.model_dump(by_alias=True)

{'givenName': 'Isaac', 'lastName': 'Newton'}

What happens if your data contains both?

Actually Pydantic will not complin about it, and you end up with the "latest" value for the field:

In [24]:
data = {
    "GivenName": "Isaac",
    "FirstName": "Isaac2",
    "lastName": "Newton"
}
m = Model.model_validate(data)
m

Model(first_name='Isaac2', last_name='Newton')

As of the time of recording this video, I am not aware of any setting which would force Pydantic to raise a validation exception when this happens - so just be a bit careful if you ever use this.

Which leads to the question, where *would* you use something like this?

The most common use case I can think of, is when dealing with settings files. You may have settings that define connection strings to various resources, but each resource uses a different name for that setting.

Something like this:

In [25]:
data = {
    "databases": {
        "redis": {
            "name": "Local Redis",
            "redis_conn": "redis://secret@localhost:9000/1"
        },
        "pgsql": {
            "name": "Local Postgres",
            "pgsql_conn": "postgresql://user:secret@localhost"
        },
        "nosql": {
            "name": "Local MongoDB",
            "mongo_conn": "mongodb://USERNAME:PASSWORD@HOST/DATABASE"
        }
    }
}

We want a model that can be used for all three, so the different connection field names are an issue that we can work around using `AliasChoices`:

In [26]:
class Database(BaseModel):
    name: str
    connection: str = Field(
        validation_alias=AliasChoices("redis_conn", "pgsql_conn", "mongo_conn")
    )

In [27]:
databases = {}

for key, value in data["databases"].items():
    m = Database.model_validate(value)
    databases[key] = m

databases

{'redis': Database(name='Local Redis', connection='redis://secret@localhost:9000/1'),
 'pgsql': Database(name='Local Postgres', connection='postgresql://user:secret@localhost'),
 'nosql': Database(name='Local MongoDB', connection='mongodb://USERNAME:PASSWORD@HOST/DATABASE')}

Later I'll cover how to make more complex models that would handle something like this without us having to manually loop through the data.

Here's a quick preview:

In [28]:
class Databases(BaseModel):
    databases: dict[str, Database]

In [29]:
databases = Databases.model_validate(data)
databases

Databases(databases={'redis': Database(name='Local Redis', connection='redis://secret@localhost:9000/1'), 'pgsql': Database(name='Local Postgres', connection='postgresql://user:secret@localhost'), 'nosql': Database(name='Local MongoDB', connection='mongodb://USERNAME:PASSWORD@HOST/DATABASE')})

In [30]:
print(databases.model_dump_json(indent=2))

{
  "databases": {
    "redis": {
      "name": "Local Redis",
      "connection": "redis://secret@localhost:9000/1"
    },
    "pgsql": {
      "name": "Local Postgres",
      "connection": "postgresql://user:secret@localhost"
    },
    "nosql": {
      "name": "Local MongoDB",
      "connection": "mongodb://USERNAME:PASSWORD@HOST/DATABASE"
    }
  }
}
