# Model Code Generators

I mentioned this earlier in the course. there are tools available that can generate Pydantic models for us. One of these tools is called `datamodel-code-generator`, and is primarily a command line tool.

This code generator is usually just run once, the generated Pydantic model code is then "pasted" in to our application, and we don't run it again and again (of course this depends on what your app does exactly).

The docs for this code generator can be found [here](https://github.com/koxudaxi/datamodel-code-generator/tree/main)

In this folder, you'll find a few additional files:
- a simple JSON schema file (`schema_1.json`)
- a more complicated JSON schema file (`schema_2.json`)
- a even more complicated JSON schema file (`schema_3.json`)
- a file containing some sample JSON formatted data (no schema, just data) (`rectangle_data.json`)
- a file containing some CSV data (`csv_data.csv`)

## JSON Schema - Example 1

The file `schema_1.json` is a pretty simple schema:

```JSON
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
```

To generate a Pydantic model from this schema, we can use the following command line:
```sh
datamodel-codegen --input schema_1.json --input-file-type jsonschema --output-model-type pydantic_v2.BaseModel --target-python-version "3.12" --allow-population-by-field-name --field-constraints --snake-case-field
```


The resulting output is:

In [1]:
# generated by datamodel-codegen:
#   filename:  schema_1.json
#   timestamp: 2023-11-21T04:54:05+00:00

from __future__ import annotations

from typing import Optional

from pydantic import BaseModel, ConfigDict, Field


class Person(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    first_name: Optional[str] = Field(
        None, alias='firstName', description="The person's first name."
    )
    last_name: Optional[str] = Field(
        None, alias='lastName', description="The person's last name."
    )
    age: Optional[int] = Field(
        None,
        description='Age in years which must be equal to or greater than zero.',
        ge=0,
    )

The `__future__` import of `annotations` is actually not needed in this particular case (has to do with type hints supporting forward (or "postponed") references. More info available [here](https://peps.python.org/pep-0563/).) Does not hurt to import it though.

## JSON Schema - Example 2

In this second example, the JSON is slightly more complicated:

```JSON
{
  "$id": "https://example.com/complex-object.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Complex Object",
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer",
      "minimum": 0
    },
    "address": {
      "type": "object",
      "properties": {
        "street": {
          "type": "string"
        },
        "city": {
          "type": "string"
        },
        "state": {
          "type": "string"
        },
        "postalCode": {
          "type": "string",
          "pattern": "\\d{5}"
        }
      },
      "required": [
        "street",
        "city",
        "state",
        "postalCode"
      ]
    },
    "hobbies": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  },
  "required": [
    "name",
    "age"
  ]
}
```

Let's run the code generator on it:

```shell
datamodel-codegen --input schema_2.json --input-file-type jsonschema --output-model-type pydantic_v2.BaseModel --target-python-version "3.12" --allow-population-by-field-name --field-constraints --snake-case-field
```

And this is the output:

In [2]:
# generated by datamodel-codegen:
#   filename:  schema_2.json
#   timestamp: 2023-11-21T04:56:59+00:00

from __future__ import annotations

from typing import List, Optional

from pydantic import BaseModel, ConfigDict, Field


class Address(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    street: str
    city: str
    state: str
    postal_code: str = Field(..., alias='postalCode', pattern='\\d{5}')


class ComplexObject(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    name: str
    age: int = Field(..., ge=0)
    address: Optional[Address] = None
    hobbies: Optional[List[str]] = None

And again, the code generator worked really well.

By the way, if you are wondering what that ... (ellipsis) is doing in the Field objects, it just means that the field has no default defined. This is kind of a holdover from Pydantic V1, where optional fields were automatically assigned a default value (of None) unless that ellpisis was provided. This is no longer the case in Pydantic V2 as we saw earlier in the course, so we don't need to preserve the ellpisis:

In [3]:
class Address(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    street: str
    city: str
    state: str
    postal_code: str = Field(alias='postalCode', pattern='\\d{5}')


class ComplexObject(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    name: str
    age: int = Field(ge=0)
    address: Optional[Address] = None
    hobbies: Optional[List[str]] = None

## JSON Schema - Example 3

But even with a JSON schema, there are certain things that the code generator simply cannot handle.

In this third example (`schema_3.json`), we have this JSON schema:

```json
{
  "$id": "https://example.com/conditional-validation-if-else.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Conditional Validation with If-Else",
  "type": "object",
  "properties": {
    "isMember": {
      "type": "boolean"
    },
    "membershipNumber": {
      "type": "string"
    }
  },
  "required": [
    "isMember"
  ],
  "if": {
    "properties": {
      "isMember": {
        "const": true
      }
    }
  },
  "then": {
    "properties": {
      "membershipNumber": {
        "type": "string",
        "minLength": 10,
        "maxLength": 10
      }
    }
  },
  "else": {
    "properties": {
      "membershipNumber": {
        "type": "string",
        "minLength": 5
      }
    }
  }
}
```

```shell
datamodel-codegen --input schema_3.json --input-file-type jsonschema --output-model-type pydantic_v2.BaseModel --target-python-version "3.12" --allow-population-by-field-name --field-constraints --snake-case-field
```

We get this model:

In [4]:
# generated by datamodel-codegen:
#   filename:  schema_3.json
#   timestamp: 2023-11-21T05:32:20+00:00

from __future__ import annotations

from typing import Optional

from pydantic import BaseModel, ConfigDict, Field


class ConditionalValidationWithIfElse(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    is_member: bool = Field(..., alias='isMember')
    membership_number: Optional[str] = Field(None, alias='membershipNumber')

Which obviously does not capture the entirety of what the schema defines.

Can we do it ourselves? How?

We would have to use custom validators to handle the conditions set out by the dependent schema.

By now, you should be able to do this by yourself, but let's go through it together anyway.

If we look at the conditional in the schema, we see that in reality `membership_number` is not optional, and cannot be `None` in any circumstance. So we'll need to fix that first (plus we don't need the ellipsis, and I'll choose a shorter model name):

In [5]:
class MemberInfo(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    is_member: bool = Field(alias='isMember')
    membership_number: str = Field(alias='membershipNumber')

Next, we need to add a custom validator to check that `membership_number` is exactly 10 characters if `is_member` is True, or at least `5` characters otherwise. So, this means that in any event, the membership number must be at least `5` characters - we'll add that to the field definition, and only check for `10` characters when `is_member` is `True`:

In [6]:
from pydantic import field_validator, ValidationInfo


class MemberInfo(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    is_member: bool = Field(alias='isMember')
    membership_number: str = Field(alias='membershipNumber', min_length=5)

    @field_validator("membership_number")  # after validator
    def validate_membership_number(value: str, validation_info: ValidationInfo):
        print("validating membership_number...")
        values = validation_info.data
        if "is_member" in values:
            # is_member validated OK, so we have a bool value for it
            is_member = values["is_member"]
            if is_member:
                if len(value) != 10:
                    raise ValueError("must be exactly 10 characters long when is_member is set to True")
        # is_member failed validation, not much we can do, so just return value
        return value            

Let's try our model out now:

In [7]:
MemberInfo(is_member=False, membership_number="12345")

validating membership_number...


MemberInfo(is_member=False, membership_number='12345')

In [8]:
MemberInfo(is_member=True, membership_number = "0123456789")

validating membership_number...


MemberInfo(is_member=True, membership_number='0123456789')

In [9]:
from pydantic import ValidationError

In [10]:
try:
    MemberInfo(is_member=False)
except ValidationError as ex:
    print(ex)

1 validation error for MemberInfo
membershipNumber
  Field required [type=missing, input_value={'is_member': False}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing


In [11]:
try:
    MemberInfo(is_member=False, membershipNumber="123")
except ValidationError as ex:
    print(ex)

1 validation error for MemberInfo
membershipNumber
  String should have at least 5 characters [type=string_too_short, input_value='123', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/string_too_short


In [12]:
try:
    MemberInfo(is_member=True, membershipNumber="123456789012345")
except ValidationError as ex:
    print(ex)

validating membership_number...
1 validation error for MemberInfo
membershipNumber
  Value error, must be exactly 10 characters long when is_member is set to True [type=value_error, input_value='123456789012345', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/value_error


## Generating a Model from Actual JSON Data

Sometimes (actually more often than we would like), API developers don't bother specifying a JSON schema for their data models.

(why you ask? I don't know, lazyness maybe, uninformed, couldn't-care-less attitude, etc. My take is that if you are creating a REST API, you specify JSON schemas for all the JSON that goes back and forth in your API. But that's just me...)

So, in those cases, what to do? We could do what we did earlier in the course, visually inspect the JSON data and try to come up with an appropriate JSON model.

And that is certainly the way I would do it. Let's see why I don't like the code generator for this particular use case (and it's not the generator's fault, it does it's best, but sometimes there are subtleties in the data, that as humans, we can deduce, whereas the code generator cannot. At least for now, while we wait for our AI overlords to take over!)

In this folder you'll find a JSON file (`rectangle_data.json`) that contains this data:

```json
{
    "rectangles": [
        {
            "upperLeft": {"x": 10, "y": 20},
            "lowerRight": {"x": 100, "y": 50}
        },
        {
            "upperLeft": {"x": 10.5, "y": 20.5},
            "lowerRight": {"x": 100.5, "y": 50.5}
        }
    ]
}
```

And this is the output we get after we run the generator:
```shell
datamodel-codegen --input rectangle_data.json --input-file-type json --output-model-type pydantic_v2.BaseModel --target-python-version "3.12" --allow-population-by-field-name --field-constraints --snake-case-field
```

In [13]:
# generated by datamodel-codegen:
#   filename:  rectangle_data.json
#   timestamp: 2023-11-21T05:48:25+00:00

from __future__ import annotations

from typing import List

from pydantic import BaseModel, ConfigDict, Field


class UpperLeft(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    x: float
    y: float


class LowerRight(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    x: float
    y: float


class Rectangle(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    upper_left: UpperLeft = Field(..., alias='upperLeft')
    lower_right: LowerRight = Field(..., alias='lowerRight')


class Model(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    rectangles: List[Rectangle]

As you can see the models are not that bad - and in fact, they would work just fine.

But look at the `UpperLeft` and `LowerRight` models - we can see that they are in fact exactly the same models - so we don't need to create to separate models, and instead can reference once common model for both the `upper_left` and `lower_right` fields (plus I'll remove the uneeded ellipses):

In [14]:
class Coordinates(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    x: float
    y: float


class Rectangle(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    upper_left: Coordinates = Field(alias='upperLeft')
    lower_right: Coordinates = Field(alias='lowerRight')


class Model(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    rectangles: List[Rectangle]

## Generating a Model from a CSV File

In a prior video, we manually generated a model for the `pop_estimates.csv` file.

We can also generate a model using the code generator, but, as is often the case, none of the more specialized validators we create manually (because we understood something more about the data than just the data itself, something the code generator cannot) will be generated, and the model is going to be relatively simplistic.

```shell
datamodel-codegen --input pop_estimates.csv --input-file-type csv --output-model-type pydantic_v2.BaseModel --target-python-version "3.12" --allow-population-by-field-name --field-constraints --snake-case-field
```

In [15]:
# generated by datamodel-codegen:
#   filename:  pop_estimates.csv
#   timestamp: 2023-11-21T04:32:37+00:00

from __future__ import annotations

from pydantic import BaseModel, ConfigDict, Field


class Model(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    geographic_area: str = Field(..., alias='Geographic Area')
    july_1__2001_estimate: str = Field(..., alias='July 1, 2001 Estimate')
    july_1__2000_estimate: str = Field(..., alias='July 1, 2000 Estimate')
    april_1__2000_population_estimates_base: str = Field(
        ..., alias='April 1, 2000 Population Estimates Base'
    )

As you can see, the generator just assigns all the columns to strings. Although the model is technically correct for the data file, it is not suitable for our purpose. We already covered how to create a model that was better suited for the data in this file.

And just to confirm that the code generator does not try to infer data types from the data in the csv file, we can use the file `csv_data.csv` which has completely clean data with various types.

After running the code generator:
```shell
datamodel-codegen --input csv_data.csv --input-file-type csv --output-model-type pydantic_v2.BaseModel --target-python-version "3.12" --allow-population-by-field-name --field-constraints --snake-case-field
```

we get this model:

In [16]:
# generated by datamodel-codegen:
#   filename:  csv_data.csv
#   timestamp: 2023-11-21T06:03:37+00:00

from __future__ import annotations

from pydantic import BaseModel, ConfigDict, Field


class Model(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )
    field_bool: str = Field(..., alias='fieldBool')
    field_str: str = Field(..., alias='fieldStr')
    field_int: str = Field(..., alias='fieldInt')
    field_float: str = Field(..., alias='fieldFloat')

## Conclusion

So, as with all code generators, your mileage will vary. Best advice I have, if you are not inclined to generate your models by hand, is try it out, and see if what's generated is a good enough starting point.

Personally I don't use the code generator much - far too often I end up having to rework the models too much, and find it easier to just start from scratch.

There are many many options you can specify when running this code generator - I only covered a few here.

Best way I have found of finding what those options are, is to run this command:
```shell
datamodel-codegen --help
```

which gives us this:

```
usage: 
  datamodel-codegen [options]

Generate Python data models from schema definitions or structured data

Options:
  --http-headers HTTP_HEADER [HTTP_HEADER ...]
                        Set headers in HTTP requests to the remote host. (example:
                        "Authorization: Basic dXNlcjpwYXNz")
  --http-ignore-tls     Disable verification of the remote host's TLS certificate
  --input INPUT         Input file/directory (default: stdin)
  --input-file-type {auto,openapi,jsonschema,json,yaml,dict,csv}
                        Input file type (default: auto)
  --output OUTPUT       Output file (default: stdout)
  --output-model-type {pydantic.BaseModel,pydantic_v2.BaseModel,dataclasses.dataclass,typing.TypedDict,msgspec.Struct}
                        Output model type (default: pydantic.BaseModel)
  --url URL             Input file URL. `--input` is ignored when `--url` is used

Typing customization:
  --base-class BASE_CLASS
                        Base Class (default: pydantic.BaseModel)
  --enum-field-as-literal {all,one}
                        Parse enum field as literal. all: all enum field type are Literal. one:
                        field type is Literal when an enum has only one possible value
  --field-constraints   Use field constraints and not con* annotations
  --set-default-enum-member
                        Set enum members as default values for enum field
  --strict-types {str,bytes,int,float,bool} [{str,bytes,int,float,bool} ...]
                        Use strict types
  --use-annotated       Use typing.Annotated for Field(). Also, `--field-constraints` option will
                        be enabled.
  --use-generic-container-types
                        Use generic container types for type hinting (typing.Sequence,
                        typing.Mapping). If `--use-standard-collections` option is set, then
                        import from collections.abc instead of typing
  --use-non-positive-negative-number-constrained-types
                        Use the Non{Positive,Negative}{FloatInt} types instead of the
                        corresponding con* constrained types.
  --use-one-literal-as-default
                        Use one literal as default value for one literal field
  --use-standard-collections
                        Use standard collections for type hinting (list, dict)
  --use-subclass-enum   Define Enum class as subclass with field type when enum has type (int,
                        float, bytes, str)
  --use-union-operator  Use | operator for Union type (PEP 604).
  --use-unique-items-as-set
                        define field type as `set` when the field attribute has `uniqueItems`

Field customization:
  --capitalise-enum-members, --capitalize-enum-members
                        Capitalize field names on enum
  --empty-enum-field-name EMPTY_ENUM_FIELD_NAME
                        Set field name when enum value is empty (default: `_`)
  --field-extra-keys FIELD_EXTRA_KEYS [FIELD_EXTRA_KEYS ...]
                        Add extra keys to field parameters
  --field-extra-keys-without-x-prefix FIELD_EXTRA_KEYS_WITHOUT_X_PREFIX [FIELD_EXTRA_KEYS_WITHOUT_X_PREFIX ...]
                        Add extra keys with `x-` prefix to field parameters. The extra keys are
                        stripped of the `x-` prefix.
  --field-include-all-keys
                        Add all keys to field parameters
  --force-optional      Force optional for required fields
  --original-field-name-delimiter ORIGINAL_FIELD_NAME_DELIMITER
                        Set delimiter to convert to snake case. This option only can be used with
                        --snake-case-field (default: `_` )
  --remove-special-field-name-prefix
                        Remove field name prefix if it has a special meaning e.g. underscores
  --snake-case-field    Change camel-case field name to snake-case
  --special-field-name-prefix SPECIAL_FIELD_NAME_PREFIX
                        Set field name prefix when first character can't be used as Python field
                        name (default: `field`)
  --strip-default-none  Strip default None on fields
  --use-default         Use default value even if a field is required
  --use-default-kwarg   Use `default=` instead of a positional argument for Fields that have
                        default values.
  --use-field-description
                        Use schema description to populate field docstring

Model customization:
  --allow-extra-fields  Allow to pass extra fields, if this flag is not passed, extra fields are
                        forbidden.
  --allow-population-by-field-name
                        Allow population by field name
  --class-name CLASS_NAME
                        Set class name of root model
  --collapse-root-models
                        Models generated with a root-type field will be mergedinto the models
                        using that root-type model
  --disable-appending-item-suffix
                        Disable appending `Item` suffix to model name in an array
  --disable-timestamp   Disable timestamp on file headers
  --enable-faux-immutability
                        Enable faux immutability
  --enable-version-header
                        Enable package version on file headers
  --keep-model-order    Keep generated models' order
  --reuse-model         Re-use models on the field when a module has the model with the same
                        content
  --target-python-version {3.6,3.7,3.8,3.9,3.10,3.11,3.12}
                        target python version (default: 3.7)
  --use-schema-description
                        Use schema description to populate class docstring
  --use-title-as-name   use titles as class names of models

Template customization:
  --aliases ALIASES     Alias mapping file
  --custom-file-header CUSTOM_FILE_HEADER
                        Custom file header
  --custom-file-header-path CUSTOM_FILE_HEADER_PATH
                        Custom file header file path
  --custom-template-dir CUSTOM_TEMPLATE_DIR
                        Custom template directory
  --encoding ENCODING   The encoding of input and output (default: UTF-8)
  --extra-template-data EXTRA_TEMPLATE_DATA
                        Extra template data
  --use-double-quotes   Model generated with double quotes. Single quotes or your black config
                        skip_string_normalization value will be used without this option.
  --wrap-string-literal
                        Wrap string literal by using black `experimental-string-processing`
                        option (require black 20.8b0 or later)

OpenAPI-only options:
  --openapi-scopes {schemas,paths,tags,parameters} [{schemas,paths,tags,parameters} ...]
                        Scopes of OpenAPI model generation (default: schemas)
  --strict-nullable     Treat default field as a non-nullable field (Only OpenAPI)
  --use-operation-id-as-name
                        use operation id of OpenAPI as class names of models
  --validation          Deprecated: Enable validation (Only OpenAPI). this option is deprecated.
                        it will be removed in future releases

General options:
  --debug               show debug message (require "debug". `$ pip install 'datamodel-code-
                        generator[debug]'`)
  --disable-warnings    disable warnings
  --no-color            disable colorized output
  --version             show version
  -h, --help            show this help message and exit
```