# Project with Solution (Section 8)

## Project Specs

This is where we left off in the previous section project:

In [1]:
from datetime import date
from enum import Enum
from typing import Annotated, TypeVar
from uuid import uuid4
from pydantic import BaseModel, ConfigDict, Field, field_serializer
from pydantic.alias_generators import to_camel
from pydantic import UUID4


class AutomobileType(Enum):
    sedan = "Sedan"
    coupe = "Coupe"
    convertible = "Convertible"
    suv = "SUV"
    truck = "Truck"


T = TypeVar('T')
BoundedString = Annotated[str, Field(min_length=2, max_length=50)]
BoundedList = Annotated[list[T], Field(min_length=1, max_length=5)]


class Automobile(BaseModel):
    model_config = ConfigDict(
        extra="forbid",
        str_strip_whitespace=True,
        validate_default=True,
        validate_assignment=True,
        alias_generator=to_camel,
    )

    id_: UUID4 | None = Field(alias="id", default_factory=uuid4) 
    manufacturer: BoundedString
    series_name: BoundedString
    type_: AutomobileType = Field(alias="type")
    is_electric: bool = False
    manufactured_date: date = Field(validation_alias="completionDate", ge=date(1980, 1, 1))
    base_msrp_usd: float = Field(
        validation_alias="msrpUSD", 
        serialization_alias="baseMSRPUSD"
    )
    top_features: BoundedList[BoundedString] | None = None
    vin: BoundedString
    number_of_doors: int = Field(
        default=4, 
        validation_alias="doors",
        ge=2,
        le=4,
        multiple_of=2,
    )
    registration_country: BoundedString | None = None
    license_plate: BoundedString | None = None

    @field_serializer("manufactured_date", when_used="json-unless-none")
    def serialize_date(self, value: date) -> str:
        return value.strftime("%Y/%m/%d")

There are two main changes we are going to make on our model.

First, we want to add an additional field to capture when an automobile was registered. To do so add a field named `registration_date` that is implemented as follows:
- place it right after `registration_country` in the model
- if should be a `date` object
- it should be optional and default to `None`
- it should deserialize from and serialize to the camel case version of the field name
- if cannot be earlier than the `manufactured_date`
- just like `manufactured_date` it should serialize the data to a `YYYY/MM/DD` format for JSON serialization. (Hint: you do not need to define a second serializer for that field! The syntax is the same as what I showed you when aplying the same decorator validator to multiple fields)


Secondly, we want to ensure that the `registration_country` only allows values from a pre-determined list of countries.

We are not going to use an enum for this, as there would simply be too many values. Instead we are going to validate the country name against a "database".

For this exercise we are not going to use an actual database, instead you can use the dictionary provided below. The dictionary keys are going to become the accepted "input" value for country names, and each key's value contains a tuple consisting of the country name (properly formatted), and the 3 character country code (we won't use the country code right now, but we will later).

Create a custom validator for `registration_country` that validates the data being deserialized is one of the keys in that dictionary, and replace the deserialized value with the country name from the first name in the tuple.

For example, if the input data contains:
```
{
    ...,
    "registrationCountry": "UK",
    ...
}
```
then, since our "database" gives us this info:
```
"uk": ("United Kingdom", "GBR")
```
the deserialized value in our model should become `United Kingdom`. 

Your validator should validate a country name based on the lower-cased and stripped version of the string - i.e. input data such as `"UK"`, `"Uk"`, `"uk "` should all end up being matched with the key `"uk"` in the database.

Use an annotated type to do this - name your new annotated type `Country`.

In [2]:
countries = {
    "australia": ("Australia", "AUS"),
    "canada": ("Canada", "CAN"),
    "china": ("China", "CHN"),
    "france": ("France", "FRA"),
    "germany": ("Germany", "DEU"),
    "india": ("India", "IND"),
    "mexico": ("Mexico", "MEX"),
    "norway": ("Norway", "NOR"),
    "pakistan": ("Pakistan", "PAK"),
    "san marino": ("San Marino", "SMR"),
    "sanmarino": ("San Marino", "SMR"),
    "spain": ("Spain", "ESP"),
    "sweden": ("Sweden", "SWE"),
    "united kingdom": ("United Kingdom", "GBR"),
    "uk": ("United Kingdom", "GBR"),
    "great britain": ("United Kingdom", "GBR"),
    "britain": ("United Kingdom", "GBR"),
    "us": ("United States of America", "USA"),
    "united states": ("United States of America", "USA"),
    "usa": ("United States of America", "USA"),
}

I obtained this list by selecting a small portion of the data available 
here: [https://www.iban.com/country-codes](https://www.iban.com/country-codes).

Feel free to add yourself in the list if you're not already!

Here is a sample data input and serialized output (dict and JSON):

In [3]:
from uuid import UUID

data = {
    "id": "c4e60f4a-3c7f-4da5-9b3f-07aee50b23e7",
    "manufacturer": "BMW",
    "seriesName": "M4 Competition xDrive",
    "type": "Convertible",
    "isElectric": False,
    "completionDate": "2023-01-01",
    "msrpUSD": 93_300,
    "topFeatures": ["6 cylinders", "all-wheel drive", "convertible"],
    "vin": "1234567890",
    "doors": 2,
    "registrationCountry": "us",
    "registrationDate": "2023-06-01",
    "licensePlate": "AAA-BBB"
}

expected_by_alias = {
    'id': UUID('c4e60f4a-3c7f-4da5-9b3f-07aee50b23e7'),
    'manufacturer': 'BMW',
    'seriesName': 'M4 Competition xDrive',
    'type': AutomobileType.convertible,
    'isElectric': False,
    'manufacturedDate': date(2023, 1, 1),
    'baseMSRPUSD': 93300.0,
    'topFeatures': ['6 cylinders', 'all-wheel drive', 'convertible'],
    'vin': '1234567890',
    'numberOfDoors': 2,
    'registrationCountry': 'United States of America',
    'registrationDate': date(2023, 6, 1),
    'licensePlate': 'AAA-BBB'
}

expected_json_by_alias = '{"id":"c4e60f4a-3c7f-4da5-9b3f-07aee50b23e7","manufacturer":"BMW","seriesName":"M4 Competition xDrive","type":"Convertible","isElectric":false,"manufacturedDate":"2023/01/01","baseMSRPUSD":93300.0,"topFeatures":["6 cylinders","all-wheel drive","convertible"],"vin":"1234567890","numberOfDoors":2,"registrationCountry":"United States of America","registrationDate":"2023/06/01","licensePlate":"AAA-BBB"}'

Don't forget to also test with invalid data for registration date and country name!

## Solution

We're going to write a lookup function that will return the country name and code given an input - we'll also perform a case-insensitive search, and strip our string (in case it's not already).

I am going to pre-create a list of valid country names (based on the `countries` dictionary, so that I can return that information when country validation fails. I am not putting this code into my validator since i do not want to incur the cost of re-generating that list every time the validator is executed.

In [4]:
valid_country_names = sorted(countries.keys())
valid_country_names

['australia',
 'britain',
 'canada',
 'china',
 'france',
 'germany',
 'great britain',
 'india',
 'mexico',
 'norway',
 'pakistan',
 'san marino',
 'sanmarino',
 'spain',
 'sweden',
 'uk',
 'united kingdom',
 'united states',
 'us',
 'usa']

In [5]:
def lookup_country(name: str) -> tuple[str, str]:
    name = name.strip().casefold()
    
    try:
        return countries[name]
    except KeyError:
        raise ValueError(
            "Unknown country name. "
            f"Country name must be one of: {','.join(valid_country_names)}"
        )

Now let's create an annotated type for countries:

In [6]:
from pydantic import AfterValidator

Country = Annotated[str, AfterValidator(lambda name: lookup_country(name)[0])]

And now, let's use it in our model:

In [7]:
class Automobile(BaseModel):
    model_config = ConfigDict(
        extra="forbid",
        str_strip_whitespace=True,
        validate_default=True,
        validate_assignment=True,
        alias_generator=to_camel,
    )

    id_: UUID4 | None = Field(alias="id", default_factory=uuid4) 
    manufacturer: BoundedString
    series_name: BoundedString
    type_: AutomobileType = Field(alias="type")
    is_electric: bool = False
    manufactured_date: date = Field(validation_alias="completionDate", ge=date(1980, 1, 1))
    base_msrp_usd: float = Field(
        validation_alias="msrpUSD", 
        serialization_alias="baseMSRPUSD"
    )
    top_features: BoundedList[BoundedString] | None = None
    vin: BoundedString
    number_of_doors: int = Field(
        default=4, 
        validation_alias="doors",
        ge=2,
        le=4,
        multiple_of=2,
    )
    registration_country: Country | None = None
    registration_date: date | None = None
    license_plate: BoundedString | None = None

    @field_serializer("manufactured_date", "registration_date", when_used="json-unless-none")
    def serialize_date(self, value: date) -> str:
        return value.strftime("%Y/%m/%d")

We still have to add a custom validator for `registration_date` that will check that this date is not earlier than the `manufactured_date`.

Let's add that directly in the model now:

In [8]:
from pydantic import field_validator, ValidationInfo


class Automobile(BaseModel):
    model_config = ConfigDict(
        extra="forbid",
        str_strip_whitespace=True,
        validate_default=True,
        validate_assignment=True,
        alias_generator=to_camel,
    )

    id_: UUID4 | None = Field(alias="id", default_factory=uuid4) 
    manufacturer: BoundedString
    series_name: BoundedString
    type_: AutomobileType = Field(alias="type")
    is_electric: bool = False
    manufactured_date: date = Field(validation_alias="completionDate", ge=date(1980, 1, 1))
    base_msrp_usd: float = Field(
        validation_alias="msrpUSD", 
        serialization_alias="baseMSRPUSD"
    )
    top_features: BoundedList[BoundedString] | None = None
    vin: BoundedString
    number_of_doors: int = Field(
        default=4, 
        validation_alias="doors",
        ge=2,
        le=4,
        multiple_of=2,
    )
    registration_country: Country | None = None
    registration_date: date | None = None
    license_plate: BoundedString | None = None

    @field_serializer("manufactured_date", "registration_date", when_used="json-unless-none")
    def serialize_date(self, value: date) -> str:
        return value.strftime("%Y/%m/%d")
        
    @field_validator("registration_date")
    @classmethod
    def validate_registration_date(cls, value:date, values: ValidationInfo):
        data = values.data
        if "manufactured_date" in data and data["manufactured_date"] > value:
            raise ValueError("Automobile cannot be registered prior to manufacture date.")
        return value
            

Let's try deserializing and serializing:

In [9]:
car = Automobile.model_validate(data)
car

Automobile(id_=UUID('c4e60f4a-3c7f-4da5-9b3f-07aee50b23e7'), manufacturer='BMW', series_name='M4 Competition xDrive', type_=<AutomobileType.convertible: 'Convertible'>, is_electric=False, manufactured_date=datetime.date(2023, 1, 1), base_msrp_usd=93300.0, top_features=['6 cylinders', 'all-wheel drive', 'convertible'], vin='1234567890', number_of_doors=2, registration_country='United States of America', registration_date=datetime.date(2023, 6, 1), license_plate='AAA-BBB')

In [10]:
assert car.model_dump(by_alias=True) == expected_by_alias

In [11]:
assert car.model_dump_json(by_alias=True) == expected_json_by_alias

Let's test out the validation features: a registration date that's too early, and an unsupported country name:

In [12]:
bad_data = {
    "id": "c4e60f4a-3c7f-4da5-9b3f-07aee50b23e7",
    "manufacturer": "BMW",
    "seriesName": "M4 Competition xDrive",
    "type": "Convertible",
    "isElectric": False,
    "completionDate": "2023-01-01",
    "msrpUSD": 93_300,
    "topFeatures": ["6 cylinders", "all-wheel drive", "convertible"],
    "vin": "1234567890",
    "doors": 2,
    "registrationCountry": "Lunar Colony",
    "registrationDate": "2022-06-01",
    "licensePlate": "AAA-BBB"
}

In [13]:
from pydantic import ValidationError

try:
    Automobile.model_validate(bad_data)
except ValidationError as ex:
    print(ex)

2 validation errors for Automobile
registrationCountry
  Value error, Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa [type=value_error, input_value='Lunar Colony', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/value_error
registrationDate
  Value error, Automobile cannot be registered prior to manufacture date. [type=value_error, input_value='2022-06-01', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/value_error


As a side note, so far we've always just printed out the Pydantic validation exceptions. But Pydantic's validation exceptions have a special method `.json()` that will return all the validation exceptions as a JSON object. In frameworks such as FastAPI we use this to return information to our API callers when validation fails.

This method also supports an `indent` parameter - although we normally don't use it when returnin JSON from an API (we try to keep data as compact as possible), here we would want to use it so we can print a legible JSON object.

In [14]:
try:
    Automobile.model_validate(bad_data)
except ValidationError as ex:
    exceptions = ex.json(indent=2)

In [15]:
print(exceptions)

[
  {
    "type": "value_error",
    "loc": [
      "registrationCountry"
    ],
    "msg": "Value error, Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa",
    "input": "Lunar Colony",
    "ctx": {
      "error": "Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa"
    },
    "url": "https://errors.pydantic.dev/2.5/v/value_error"
  },
  {
    "type": "value_error",
    "loc": [
      "registrationDate"
    ],
    "msg": "Value error, Automobile cannot be registered prior to manufacture date.",
    "input": "2022-06-01",
    "ctx": {
      "error": "Automobile cannot be registered prior to manufacture date."
    },
    "url": "https://errors.pydantic.dev/2.5/v/value

Of course, you can also get it as a Python object instead if you prefer:

In [16]:
try:
    Automobile.model_validate(bad_data)
except ValidationError as ex:
    exceptions = ex.errors()

In [17]:
type(exceptions)

list

In [18]:
from pprint import pprint

pprint(exceptions)

[{'ctx': {'error': ValueError('Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa')},
  'input': 'Lunar Colony',
  'loc': ('registrationCountry',),
  'msg': 'Value error, Unknown country name. Country name must be one of: '
         'australia,britain,canada,china,france,germany,great '
         'britain,india,mexico,norway,pakistan,san '
         'marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa',
  'type': 'value_error',
  'url': 'https://errors.pydantic.dev/2.5/v/value_error'},
 {'ctx': {'error': ValueError('Automobile cannot be registered prior to manufacture date.')},
  'input': '2022-06-01',
  'loc': ('registrationDate',),
  'msg': 'Value error, Automobile cannot be registered prior to manufacture '
         'date.',
  'type': 'value_error',
  'url': 'https://errors.pydantic.dev/2.5/v/value_error'}]
