To see Pydantic at work, let's start with a simple example, creating a custom class that inherits from `BaseModel`:

In [24]:
from datetime import datetime
from pydantic import BaseModel, PositiveInt, constr
import pandas as pd

class Business(BaseModel):
    id: int
    business_name: str = 'Unregistered Business'
    registration_ts: datetime | None
    business_number: constr(pattern=r'^\d{9}$')
    naics_code: constr(pattern=r'^\d{6}$')

### Understanding Our Business Model Schema
Our Pydantic model defines a business entity with the following fields:

| Field | Type | Required | Validation |
|-------|------|----------|------------|
| id | int | Yes | Coerces strings/floats to int |
| business_name | str | No | Default: "Unregistered Business" |
| registration_ts | datetime \| None | No | Accepts timestamp or datetime string |
| business_number | constr | Yes | Must be 9 digits |
| naics_code | constr | Yes | Must be 6 digits |


This schema ensures data consistency by:
- Converting input types where possible (e.g., string to int for ID)
- Enforcing specific formats for business and NAICS codes
- Providing flexible datetime handling
- Making certain fields optional with sensible defaults

Now create instance of `Business` by passing our external data to `Business`

In [17]:
# Example 1: Valid business data
external_data = {
    'id': 123,
    'business_name': 'Hanan Ather Pharmacy',
    'registration_ts': '2023-01-15 09:30',
    'business_number': '123456789',
    'naics_code': '446110'  # Actual NAICS code for pharmacies
}
business = Business(**external_data) 
print(business.model_dump()) # convert the model to dictionary with `model_dump`
print('='*100)
pd.DataFrame([Business(**external_data).model_dump()])

{'id': 123, 'business_name': 'Hanan Ather Pharmacy', 'registration_ts': datetime.datetime(2023, 1, 15, 9, 30), 'business_number': '123456789', 'naics_code': '446110'}


Unnamed: 0,id,business_name,registration_ts,business_number,naics_code
0,123,Hanan Ather Pharmacy,2023-01-15 09:30:00,123456789,446110


continuing the above example...

In [25]:
optional_field_example = {
    'id': 456,  # Required field is present
    'registration_ts': '2023-06-01 12:22',
    'business_number': '987654321',
    'naics_code': '484110'
    # Notice: business_name is not provided, but that's OK because it has a default
}

try:
    business = Business(**optional_field_example)  
except ValidationError as e:
    print(e.errors())
print(business.model_dump()) # convert the model to dictionary with `model_dump`
print('='*100)
pd.DataFrame([Business(**optional_field_example).model_dump()])

{'id': 456, 'business_name': 'Unregistered Business', 'registration_ts': datetime.datetime(2023, 6, 1, 12, 22), 'business_number': '987654321', 'naics_code': '484110'}


Unnamed: 0,id,business_name,registration_ts,business_number,naics_code
0,456,Unregistered Business,2023-06-01 12:22:00,987654321,484110


In [26]:
naics_example = {
    'id': 123,
    'registration_ts': '2023-01-15 09:30',
    'business_number': '123456789',
    'naics_code': '4841'  # Invalid: NAICS must be exactly 6 digits
}
try:
    business = Business(**naics_example)  
except ValidationError as e:
    print(e.errors())

[{'type': 'string_pattern_mismatch', 'loc': ('naics_code',), 'msg': "String should match pattern '^\\d{6}$'", 'input': '4841', 'ctx': {'pattern': '^\\d{6}$'}, 'url': 'https://errors.pydantic.dev/2.10/v/string_pattern_mismatch'}]


In [28]:
required_fields_example = {
    'business_name': 'Serge Godbout Trucking', 
    'registration_ts': '2023-06-01 12:22',
    'business_number': '987654321',
    'naics_code': '484110'
    # Missing 'id' field which is required
}
try:
    business = Business(**required_fields_example)  
except ValidationError as e:
    print(e.errors())

[{'type': 'missing', 'loc': ('id',), 'msg': 'Field required', 'input': {'business_name': 'Serge Godbout Trucking', 'registration_ts': '2023-06-01 12:22', 'business_number': '987654321', 'naics_code': '484110'}, 'url': 'https://errors.pydantic.dev/2.10/v/missing'}]


# Advanced Pydantic Model Features

#### 1. Field Class
The `Field` class is Pydantic's way of adding extra validation and metadata to model fields. The syntax `Field(...)` means the field is required, while providing other parameters adds constraints:
```python
id: int = Field(..., description="Business identifier")
```
- `...` means required
- `description` adds documentation
- This is clearer than just `id: int`

#### 2. Field Constraints
Fields can have multiple validation rules:

```python
business_name: str = Field( default='Unregistered Business', 
# Default value if none provided min_length=1, 
# Can't be empty max_length=100, # Maximum length description="Name of the business" )
```

#### 3. Optional Fields
For optional fields, we combine Python's typing with Field:

```python
registration_ts: Optional[datetime] = Field( None, # Default value is None description="Registration timestamp in YYYY-MM-DD HH:MM format" )
```

#### 4. Custom Validators
The `@validator` decorator allows custom validation logic:
```python
    @validator('business_number')
    def validate_business_number(cls, v):
        if not v.isdigit() or len(v) != 9:
            raise ValueError("Business number must be exactly 9 digits")
        return v
```



In [39]:
from pydantic import BaseModel, Field, validator, field_validator
from typing import Optional
from datetime import datetime

class Business(BaseModel):
    id: int = Field(..., description="Business identifier")
    business_name: str = Field(
        default='Unregistered Business',
        min_length=1,
        max_length=100,
        description="Name of the business"
    )
    registration_ts: Optional[datetime] = Field(
        None,
        description="Registration timestamp in YYYY-MM-DD HH:MM format"
    )
    business_number: str = Field(
        ...,
        description="9-digit business registration number"
    )
    naics_code: str = Field(
        ...,
        description="6-digit NAICS industry code"
    )

    @field_validator('business_number')
    def validate_business_number(cls, v: str) -> str:
        if not isinstance(v, str):
            raise ValueError("Business number must be a string")
        if not v.isdigit():
            raise ValueError("Business number must contain only digits")
        if len(v) != 9:
            raise ValueError("Business number must be exactly 9 digits")
        return v

    @field_validator('naics_code')
    def validate_naics_code(cls, v: str) -> str:
        if not isinstance(v, str):
            raise ValueError("NAICS code must be a string")
        if not v.isdigit():
            raise ValueError("NAICS code must contain only digits")
        if len(v) != 6:
            raise ValueError("NAICS code must be exactly 6 digits")
        
        valid_sectors = {'11', '21', '22', '23', '31', '32', '33', '42', 
                        '44', '45', '48', '49', '51', '52', '53', '54', 
                        '55', '56', '61', '62', '71', '72', '81', '92'}
        sector = v[:2]
        if sector not in valid_sectors:
            raise ValueError(f"Invalid NAICS sector code '{sector}'. Must start with one of: {', '.join(sorted(valid_sectors))}")
        return v

In [40]:
# Example 1: Valid business data
external_data = {
    'id': 123,
    'business_name': 'Hanan Ather Pharmacy',
    'registration_ts': '2023-01-15 09:30',
    'business_number': '123456789',
    'naics_code': '446110'  # Actual NAICS code for pharmacies
}
business = Business(**external_data) 
print(business.model_dump()) # convert the model to dictionary with `model_dump`
print('='*100)
pd.DataFrame([Business(**external_data).model_dump()])

{'id': 123, 'business_name': 'Hanan Ather Pharmacy', 'registration_ts': datetime.datetime(2023, 1, 15, 9, 30), 'business_number': '123456789', 'naics_code': '446110'}


Unnamed: 0,id,business_name,registration_ts,business_number,naics_code
0,123,Hanan Ather Pharmacy,2023-01-15 09:30:00,123456789,446110


In [32]:
# Test 2: Optional field omitted
print("\nTest 2: Optional field omitted")
test_business_data(optional_field_example, "Business with optional field omitted")



Test 2: Optional field omitted

Testing Business with optional field omitted:
Success! Model dump:
{'id': 456, 'business_name': 'Unregistered Business', 'registration_ts': datetime.datetime(2023, 6, 1, 12, 22), 'business_number': '987654321', 'naics_code': '484110'}

As DataFrame:
    id          business_name     registration_ts business_number naics_code
0  456  Unregistered Business 2023-06-01 12:22:00       987654321     484110


In [33]:
# Test 3: Invalid NAICS code
print("\nTest 3: Invalid NAICS code")
test_business_data(naics_example, "Business with invalid NAICS code")



Test 3: Invalid NAICS code

Testing Business with invalid NAICS code:
Validation Error:
- Field 'naics_code': String should match pattern '^\d{6}$'


In [34]:
# Test 4: Missing required field
print("\nTest 4: Missing required field")
test_business_data(required_fields_example, "Business with missing required field")


Test 4: Missing required field

Testing Business with missing required field:
Validation Error:
- Field 'id': Field required


In [35]:
required_fields_example = {
    'business_name': 'Serge Godbout Trucking', 
    'registration_ts': '2023-06-01 12:22',
    'business_number': '987654321',
    'naics_code': '484110'
    # Missing 'id' field which is required
}
try:
    business = Business(**required_fields_example)  
except ValidationError as e:
    print(e.errors())

[{'type': 'missing', 'loc': ('id',), 'msg': 'Field required', 'input': {'business_name': 'Serge Godbout Trucking', 'registration_ts': '2023-06-01 12:22', 'business_number': '987654321', 'naics_code': '484110'}, 'url': 'https://errors.pydantic.dev/2.10/v/missing'}]


In [41]:
optional_field_example = {
    'id': 456,  # Required field is present
    'registration_ts': '2023-06-01 12:22',
    'business_number': '987654321',
    'naics_code': '484110'
    # Notice: business_name is not provided, but that's OK because it has a default
}

try:
    business = Business(**optional_field_example)  
except ValidationError as e:
    print(e.errors())
print(business.model_dump()) # convert the model to dictionary with `model_dump`
print('='*100)
pd.DataFrame([Business(**optional_field_example).model_dump()])

{'id': 456, 'business_name': 'Unregistered Business', 'registration_ts': datetime.datetime(2023, 6, 1, 12, 22), 'business_number': '987654321', 'naics_code': '484110'}


Unnamed: 0,id,business_name,registration_ts,business_number,naics_code
0,456,Unregistered Business,2023-06-01 12:22:00,987654321,484110


In [42]:
naics_example = {
    'id': 123,
    'registration_ts': '2023-01-15 09:30',
    'business_number': '123456789',
    'naics_code': '4841'  # Invalid: NAICS must be exactly 6 digits
}
try:
    business = Business(**naics_example)  
except ValidationError as e:
    print(e.errors())

[{'type': 'value_error', 'loc': ('naics_code',), 'msg': 'Value error, NAICS code must be exactly 6 digits', 'input': '4841', 'ctx': {'error': ValueError('NAICS code must be exactly 6 digits')}, 'url': 'https://errors.pydantic.dev/2.10/v/value_error'}]


In [44]:
# Business number too long (11 digits)
test1 = {
    'id': 123,
    'business_name': 'Test Business',
    'business_number': '12345678901',  # 11 digits
    'naics_code': '446110'
}

try:
    business = Business(**test1)
except ValidationError as e:
    print(e)


1 validation error for Business
business_number
  Value error, Business number must be exactly 9 digits [type=value_error, input_value='12345678901', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error


In [45]:
# Business number with non-digits
test2 = {
    'id': 123,
    'business_name': 'Test Business',
    'business_number': '123abc456',  # contains letters
    'naics_code': '446110'
}

try:
    business = Business(**test2)
except ValidationError as e:
    print(e)

1 validation error for Business
business_number
  Value error, Business number must contain only digits [type=value_error, input_value='123abc456', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error


In [46]:
# Test case 3: Invalid NAICS sector
test3 = {
    'id': 123,
    'business_name': 'Test Business',
    'business_number': '123456789',
    'naics_code': '996110'  # starts with 99, which isn't a valid sector
}

try:
    business = Business(**test3)
except ValidationError as e:
    print(e)

1 validation error for Business
naics_code
  Value error, Invalid NAICS sector code '99'. Must start with one of: 11, 21, 22, 23, 31, 32, 33, 42, 44, 45, 48, 49, 51, 52, 53, 54, 55, 56, 61, 62, 71, 72, 81, 92 [type=value_error, input_value='996110', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error
