In [1]:
!python --version

Python 3.12.6


# Pydantic: Simplimfying Data Validation in Python

Pydantic provides four ways to create schemas and perform validation and serialization:

* **BaseModel** — Pydantic's own super class with many common utilities available via instance methods.
* **Pydantic dataclasses** — a wrapper around standard dataclasses with additional validation performed.
* **TypeAdapter** — a general way to adapt any type for validation and serialization. This allows types like TypedDict and NamedTuple to be validated as well as simple types (like int or timedelta) — all types supported can be used with TypeAdapter.
* **validate_call** — a decorator to perform validation when calling a function.

In [9]:
import pydantic

print(pydantic.__version__)

2.9.2


# Example

In [16]:
# потому что pydantic будет применять свою встроенную проверку email и короче без этого след ячейка упадет
!pip install pydantic[email] -q

In [189]:
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr


class Department(Enum):
    HR = "HR"
    SALES = "SALES"
    IT = "IT"
    ENGINEERING = "ENGINEERING"


class Employee(BaseModel):
    employee_id: UUID = uuid4()
    name: str
    email: EmailStr
    date_of_birth: date
    salary: float
    department: Department
    elected_benefits: bool

* **employee_id**: This is the UUID for the employee you want to store information for. By using the UUID annotation, __Pydantic ensures this field is always a valid UUID__. Each instance of Employee will be assigned a UUID by default, as you specified by calling uuid4().
* **name**: The employee’s name, which Pydantic expects to be a string.
* **email**: __Pydantic will ensure that each employee email is valid by using Python’s email-validator library under the hood.__
* **date_of_birth**: Each employee’s date of birth must be a valid date, as annotated by date from Python’s datetime module. If you pass a string into date_of_birth, **Pydantic will attempt to parse and convert it to a date object**.
* **salary**: This is the employee’s salary, and it’s expected to be a float.
* **department**: Each employee’s department must be one of HR, SALES, IT, or ENGINEERING, as defined in your Department enum.
elected_benefits: This field stores whether the employee has elected benefits, and Pydantic expects it to be a Boolean.

### Pydantic valid data

In [190]:
#### Засунем то, что и ожидаем увидеть

new_entity = Employee(
    name="Sasha Gau",
    email="cdetuma@example.com",
    date_of_birth="1998-04-02",
    salary=123_000.00,
    department="IT",
    elected_benefits=True,
)

print(type(new_entity), new_entity, sep="\n")

<class '__main__.Employee'>
employee_id=UUID('975af7cd-9063-4ce2-8bcd-cbe5e6a79152') name='Sasha Gau' email='cdetuma@example.com' date_of_birth=datetime.date(1998, 4, 2) salary=123000.0 department=<Department.IT: 'IT'> elected_benefits=True


In [191]:
print(new_entity.name)

Sasha Gau


Короче отработал и не обоссался, теперь говна братишке с фронта подкинем

### Pydantic invalid data

In [192]:
try:
    Employee(
        employee_id="123",
        name=("Sasha not gau" == True),
        email="cdetumaexamplecom",
        date_of_birth="1939804-02",
        salary="high paying",
        department="PRODUCT",
        elected_benefits=300,
    )
except Exception as e:
    print(f"Error processing new slave. Trace ID: __ \nError: {e}")

    raise

Error processing new slave. Trace ID: __ 
Error: 7 validation errors for Employee
employee_id
  Input should be a valid UUID, invalid length: expected length 32 for simple format, found 3 [type=uuid_parsing, input_value='123', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/uuid_parsing
name
  Input should be a valid string [type=string_type, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.9/v/string_type
email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='cdetumaexamplecom', input_type=str]
date_of_birth
  Input should be a valid date or datetime, invalid date separator, expected `-` [type=date_from_datetime_parsing, input_value='1939804-02', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/date_from_datetime_parsing
salary
  Input should be a valid number, unable to parse string as a number [type=float

ValidationError: 7 validation errors for Employee
employee_id
  Input should be a valid UUID, invalid length: expected length 32 for simple format, found 3 [type=uuid_parsing, input_value='123', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/uuid_parsing
name
  Input should be a valid string [type=string_type, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.9/v/string_type
email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='cdetumaexamplecom', input_type=str]
date_of_birth
  Input should be a valid date or datetime, invalid date separator, expected `-` [type=date_from_datetime_parsing, input_value='1939804-02', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/date_from_datetime_parsing
salary
  Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='high paying', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/float_parsing
department
  Input should be 'HR', 'SALES', 'IT' or 'ENGINEERING' [type=enum, input_value='PRODUCT', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/enum
elected_benefits
  Input should be a valid boolean, unable to interpret input [type=bool_parsing, input_value=300, input_type=int]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_parsing

<b> Как видно, братишка по дефолту ругается, если в него подают не те данные, которые он ожидает и выводит ошибку по каждому полю

### Error Handling

<b> Errors, которые кидает pydantic можно кастомизировать по каждому из полей для валидации [тык тык](https://docs.pydantic.dev/latest/errors/errors/)

In [193]:
from pydantic import ValidationError


try:
    Employee(
        employee_id="123",
        name=("Sasha not gau" == True),
        email="cdetumaexamplecom",
        date_of_birth="1939804-02",
        salary="high paying",
        department="PRODUCT",
        elected_benefits=300,
    )
except ValidationError as e:
    for er in e.errors():
        print(er)

{'type': 'uuid_parsing', 'loc': ('employee_id',), 'msg': 'Input should be a valid UUID, invalid length: expected length 32 for simple format, found 3', 'input': '123', 'ctx': {'error': 'invalid length: expected length 32 for simple format, found 3'}, 'url': 'https://errors.pydantic.dev/2.9/v/uuid_parsing'}
{'type': 'string_type', 'loc': ('name',), 'msg': 'Input should be a valid string', 'input': False, 'url': 'https://errors.pydantic.dev/2.9/v/string_type'}
{'type': 'value_error', 'loc': ('email',), 'msg': 'value is not a valid email address: An email address must have an @-sign.', 'input': 'cdetumaexamplecom', 'ctx': {'reason': 'An email address must have an @-sign.'}}
{'type': 'date_from_datetime_parsing', 'loc': ('date_of_birth',), 'msg': 'Input should be a valid date or datetime, invalid date separator, expected `-`', 'input': '1939804-02', 'ctx': {'error': 'invalid date separator, expected `-`'}, 'url': 'https://errors.pydantic.dev/2.9/v/date_from_datetime_parsing'}
{'type': 'flo

### Custom Error

In [194]:
from pydantic import BaseModel, ValidationError, field_validator


class Model(BaseModel):
    foo: str

    @field_validator("foo")
    def value_must_equal_bar(cls, v):
        if v != "bar":
            raise ValueError('value must be "bar"')
        return v

In [195]:
try:
    Model(foo="ber")
except ValidationError as e:
    print(e)
    print(e.errors())
    print(e.errors()[0]["ctx"]["error"])

1 validation error for Model
foo
  Value error, value must be "bar" [type=value_error, input_value='ber', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/value_error
[{'type': 'value_error', 'loc': ('foo',), 'msg': 'Value error, value must be "bar"', 'input': 'ber', 'ctx': {'error': ValueError('value must be "bar"')}, 'url': 'https://errors.pydantic.dev/2.9/v/value_error'}]
value must be "bar"


### Get data from dictionary

Pydantic’s BaseModel is equipped with a suite of methods that make it easy to create models from other objects, such as dictionaries and JSON.

In [204]:
"""
Напоминаю, что наш класс:

class Employee(BaseModel):
    employee_id: UUID = uuid4()
    name: str
    email: EmailStr
    date_of_birth: date
    salary: float
    department: Department
    elected_benefits: bool
"""

try:
    employee_data = {
        "name": "Clyde Harwell",
        "email": "charwell@example.com",
        "date_of_birth": "2000-06-12",
        "salary": 100_000,
        "department": "ENGINEERING",
        "elected_benefits": True,
    }

    # employee = Employee.model_validate(employee_data)
    employee = Employee(**employee_data)
except ValidationError as e:
    for er in e.errors():
        print(er)

print("Сам присвоил id, но как? :", employee.employee_id)
print("Распарсил зарплату, ведь мы хотели float:", employee.salary)

Сам присвоил id, но как? : 975af7cd-9063-4ce2-8bcd-cbe5e6a79152
Распарсил зарплату, ведь мы хотели float: 100000.0


### Get data from JSON Statham using .model_validate_json()

In [205]:
new_employee_json = """
{"employee_id":"d2e7b773-926b-49df-939a-5e98cbb9c9eb",
"name":"Eric Slogrenta",
"email":"eslogrenta@example.com",
"date_of_birth":"1990-01-02",
"salary":125000.0,
"department":"HR",
"elected_benefits":false}
"""

try:
    new_employee = Employee.model_validate_json(new_employee_json)
    # new_employee = Employee.model_validate_json(new_employee_json)
except ValidationError as e:
    for er in e.errors():
        print(er)

new_employee

Employee(employee_id=UUID('d2e7b773-926b-49df-939a-5e98cbb9c9eb'), name='Eric Slogrenta', email='eslogrenta@example.com', date_of_birth=datetime.date(1990, 1, 2), salary=125000.0, department=<Department.HR: 'HR'>, elected_benefits=False)

### Serialize Pydantic models as dictionaries and JSON

Тут какбы встает вопрос зачем и нахуя, может какие-то <b>большие файлы придут

In [206]:
new_employee.model_dump()

{'employee_id': UUID('d2e7b773-926b-49df-939a-5e98cbb9c9eb'),
 'name': 'Eric Slogrenta',
 'email': 'eslogrenta@example.com',
 'date_of_birth': datetime.date(1990, 1, 2),
 'salary': 125000.0,
 'department': <Department.HR: 'HR'>,
 'elected_benefits': False}

In [207]:
new_employee.model_dump_json()

'{"employee_id":"d2e7b773-926b-49df-939a-5e98cbb9c9eb","name":"Eric Slogrenta","email":"eslogrenta@example.com","date_of_birth":"1990-01-02","salary":125000.0,"department":"HR","elected_benefits":false}'

Here, you use .model_dump() and .model_dump_json() to convert your new_employee model to a dictionary and JSON string, respectively. Notice how .model_dump_json() returns a JSON object with date_of_birth and department stored as strings.

In [208]:
Employee.model_json_schema()

{'$defs': {'Department': {'enum': ['HR', 'SALES', 'IT', 'ENGINEERING'],
   'title': 'Department',
   'type': 'string'}},
 'properties': {'employee_id': {'default': '975af7cd-9063-4ce2-8bcd-cbe5e6a79152',
   'format': 'uuid',
   'title': 'Employee Id',
   'type': 'string'},
  'name': {'title': 'Name', 'type': 'string'},
  'email': {'format': 'email', 'title': 'Email', 'type': 'string'},
  'date_of_birth': {'format': 'date',
   'title': 'Date Of Birth',
   'type': 'string'},
  'salary': {'title': 'Salary', 'type': 'number'},
  'department': {'$ref': '#/$defs/Department'},
  'elected_benefits': {'title': 'Elected Benefits', 'type': 'boolean'}},
 'required': ['name',
  'email',
  'date_of_birth',
  'salary',
  'department',
  'elected_benefits'],
 'title': 'Employee',
 'type': 'object'}

When you call .model_json_schema(), you get a dictionary representing your model’s JSON schema.

## Using Fields for Customization and Metadata

Ознакомиться с полями можно [тут](https://docs.pydantic.dev/latest/api/fields/)

А еще ты спросил что такое ... в Field -> [кто нажмет, тот гей](https://github.com/pydantic/pydantic/discussions/8188)

The Field class allows you to customize and add metadata to your model’s fields. To see how this works, take a look at this example:

In [78]:
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr, Field


class Department(Enum):
    HR = "HR"
    SALES = "SALES"
    IT = "IT"
    ENGINEERING = "ENGINEERING"


class Employee_field(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name: str = Field(min_length=1, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", gt=0, repr=False)
    department: Department
    elected_benefits: bool

* **default_factory**: You use this to define a callable that generates default values. In the example above, you set __default_factory to uuid4. This calls uuid4() to generate a random UUID for employee_id when needed__. You can also use a lambda function for more flexibility.
* **frozen**: This is a Boolean parameter you can set to make your fields immutable. This means, __when frozen is set to True, the corresponding field can’t be changed after your model is instantiated__. In this example, employee_id, name, and date_of_birth are made immutable using the frozen parameter.
* **min_length**: You can __control the length of string fields with min_length and max_length__. In the example above, you ensure that name is at least one character long.
* **pattern**: For string fields, you can __set pattern to a regex expression to match whatever pattern you’re expecting__ for that field. For instance, when you use the regex expression in the example above for email, Pydantic will ensure that every email ends with @example.com.
* **alias**: You can use this parameter when you want to assign an alias to your fields. For example, __you can allow date_of_birth to be called birth_date or salary to be called compensation__. You can use these aliases when instantiating or serializing a model.
* **gt**: This parameter, short for __“greater than”__, is used for numeric fields to __set minimum values__. In this example, setting gt=0 ensures salary is always a positive number. Pydantic also has other numeric constraints, such as lt which is short for “less than”.
* **repr**: This Boolean parameter determines whether a field is __displayed in the model’s field__ representation. __In this example, you won’t see date_of_birth or salary when you print an Employee instance.__

<b> Специально для тебя, валенок, посмотрим как это работает, бл

In [81]:
incorrect_employee_data = {
    "name": "",  # Даем имя нулевое (ошибка по min_lenght) | Интересно если такое для pdf file и надо ли оно тут если оно можно на фронте сделать
    "email": "cdetuma@fakedomain.com",  # почта-хуечта
    "birth_date": "1998-04-02",
    "salary": -10,
    "department": "IT",
    "elected_benefits": True,
    # "compensation": 10_0 # проебали
}

try:
    Employee_field.model_validate(incorrect_employee_data)
except ValidationError as e:
    for er in e.errors():
        print(er)

{'type': 'string_too_short', 'loc': ('name',), 'msg': 'String should have at least 1 character', 'input': '', 'ctx': {'min_length': 1}, 'url': 'https://errors.pydantic.dev/2.9/v/string_too_short'}
{'type': 'string_pattern_mismatch', 'loc': ('email',), 'msg': "String should match pattern '.+@example\\.com$'", 'input': 'cdetuma@fakedomain.com', 'ctx': {'pattern': '.+@example\\.com$'}, 'url': 'https://errors.pydantic.dev/2.9/v/string_pattern_mismatch'}
{'type': 'missing', 'loc': ('compensation',), 'msg': 'Field required', 'input': {'name': '', 'email': 'cdetuma@fakedomain.com', 'birth_date': '1998-04-02', 'salary': -10, 'department': 'IT', 'elected_benefits': True}, 'url': 'https://errors.pydantic.dev/2.9/v/missing'}


In [84]:
employee_data = {
    "name": "Clyde Harwell",
    "email": "charwell@example.com",
    "birth_date": "2000-06-12",
    "compensation": 100_000,
    "department": "ENGINEERING",
    "elected_benefits": True,
}

employee = Employee_field.model_validate(employee_data)
employee

Employee_field(employee_id=UUID('1ecfb843-c692-49b1-8dbe-2e628128e5aa'), name='Clyde Harwell', email='charwell@example.com', department=<Department.ENGINEERING: 'ENGINEERING'>, elected_benefits=True)

Обрати внимание, что не видем **compensation** в представлении Employee_field из-за repr=False

In [90]:
print(f"Пример работы alias (compensation or salary): {employee.salary}")
print(f"birth_date or date_of_birth: {employee.date_of_birth}")

Пример работы alias (compensation or salary): 100000.0
birth_date or date_of_birth: 2000-06-12


In [91]:
#### Frozen. поменяем щас department на изи, а вот name уже не сможем

employee.department = "HR"
employee.name = "Andrew TuGrendele"

ValidationError: 1 validation error for Employee_field
name
  Field is frozen [type=frozen_field, input_value='Andrew TuGrendele', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/frozen_field

#### ... в Field

In [96]:
from pydantic import BaseModel, Field


class Okay(BaseModel):
    x: int = Field(default=4, title="X")


class Wrong(BaseModel):
    x: int = Field(4, title="X")


class Gau(BaseModel):
    x: int = Field(..., title="X")


class NotGau(BaseModel):
    x: int = Field(title="X")

In [92]:
Okay()

Okay(x=4)

In [94]:
Wrong()

Wrong(x=4)

In [97]:
NotGau()

ValidationError: 1 validation error for NotGau
x
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing

In [95]:
Gau()

ValidationError: 1 validation error for Gau
x
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing

Короче лучше тогда ничего не писать, чем писать ... как __лошара__

## Working With Validators

### Validating Models and Fields

In [209]:
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr, Field, field_validator


class Department(Enum):
    HR = "HR"
    SALES = "SALES"
    IT = "IT"
    ENGINEERING = "ENGINEERING"


class Employee_validation(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name: str = Field(min_length=1, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", gt=0, repr=False)
    department: Department
    elected_benefits: bool

    @field_validator("date_of_birth")
    @classmethod
    def check_valid_age(cls, date_of_birth: date) -> date:
        today = date.today()
        eighteen_years_ago = date(today.year - 18, today.month, today.day)

        if date_of_birth > eighteen_years_ago:
            raise ValueError("Employees must be at least 18 years old.")

        return date_of_birth

In this block, you import **field_validator** and use it to decorate a class method in Employee called __.check_valid_age()__. Field validators must be defined a class methods. In .check_valid_age(), you calculate today’s date but eighteen years ago. If the employee’s date_of_birth is after that date, an error is raised.

To see how this validator works, check out this example:

In [211]:
from datetime import date, timedelta


young_employee_data = {
    "name": "Jake Bar",
    "email": "jbar@example.com",
    "birth_date": date.today() - timedelta(days=365 * 17),
    "compensation": 90_000,
    "department": "SALES",
    "elected_benefits": True,
}

## Короче, написали свою функцию для проверки возраста и щас подадим говно

try:
    Employee_validation.model_validate(young_employee_data)
except ValidationError as e:
    for er in e.errors():
        print(er)

    print(e.errors()[0]["msg"])

{'type': 'value_error', 'loc': ('birth_date',), 'msg': 'Value error, Employees must be at least 18 years old.', 'input': datetime.date(2007, 10, 6), 'ctx': {'error': ValueError('Employees must be at least 18 years old.')}, 'url': 'https://errors.pydantic.dev/2.9/v/value_error'}
Value error, Employees must be at least 18 years old.


### Using Validation Decorators to Validate Functions

In [112]:
from typing import Self
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import (
    BaseModel,
    EmailStr,
    Field,
    field_validator,
    model_validator,
)


class Department(Enum):
    HR = "HR"
    SALES = "SALES"
    IT = "IT"
    ENGINEERING = "ENGINEERING"


class Employee_valid_2(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name: str = Field(min_length=1, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", gt=0, repr=False)
    department: Department
    elected_benefits: bool

    @field_validator("date_of_birth")
    @classmethod
    def check_valid_age(cls, date_of_birth: date) -> date:
        today = date.today()
        eighteen_years_ago = date(today.year - 18, today.month, today.day)

        if date_of_birth > eighteen_years_ago:
            raise ValueError("Employees must be at least 18 years old.")

        return date_of_birth

    @model_validator(mode="after")
    def check_it_benefits(self) -> Self:
        """
        Высрали новый декоратор. Почему нужен другой я не понял.
        Тут короче суть такая, что IT департамент benefits не получает, а нам вдруг
        сунули такую дату:
            department: IT
            elected_benefits: True
        С точки зрения данных, все четко. Но мы сами считаем такие данные ошибкой

        When you set mode to after in @model_validator, Pydantic waits until after
        you’ve instantiated your model to run .check_it_benefits().
        """
        department = self.department
        elected_benefits = self.elected_benefits

        if department == Department.IT and elected_benefits:
            raise ValueError(
                "IT employees are contractors and don't qualify for benefits"
            )
        return self

In [114]:
new_employee = {
    "name": "Alexis Tau",
    "email": "ataue@example.com",
    "birth_date": "2001-04-12",
    "compensation": 100_000,
    "department": "IT",
    "elected_benefits": True,
}

try:
    Employee_valid_2.model_validate(new_employee)
except ValidationError as e:
    for er in e.errors():
        print(er)

    print(e.errors()[0]["msg"])

{'type': 'value_error', 'loc': (), 'msg': "Value error, IT employees are contractors and don't qualify for benefits", 'input': {'name': 'Alexis Tau', 'email': 'ataue@example.com', 'birth_date': '2001-04-12', 'compensation': 100000, 'department': 'IT', 'elected_benefits': True}, 'ctx': {'error': ValueError("IT employees are contractors and don't qualify for benefits")}, 'url': 'https://errors.pydantic.dev/2.9/v/value_error'}
Value error, IT employees are contractors and don't qualify for benefits


### Using Validation Decorators to Validate Functions

In [115]:
import time
from typing import Annotated
from pydantic import PositiveFloat, Field, EmailStr, validate_call


@validate_call
def send_invoice(
    client_name: Annotated[str, Field(min_length=1)],
    client_email: EmailStr,
    items_purchased: list[str],
    amount_owed: PositiveFloat,
) -> str:

    email_str = f"""
    Dear {client_name}, \n
    Thank you for choosing xyz inc! You
    owe ${amount_owed:,.2f} for the following items: \n
    {items_purchased}
    """

    print(f"Sending email to {client_email}...")
    time.sleep(2)

    return email_str

In [116]:
try:
    send_invoice(
        client_name="",
        client_email="ajolawsonfakedomain.com",
        items_purchased=["pie", "cookie", 17],
        amount_owed=0,
    )
except ValidationError as e:
    for er in e.errors():
        print(er)

{'type': 'string_too_short', 'loc': ('client_name',), 'msg': 'String should have at least 1 character', 'input': '', 'ctx': {'min_length': 1}, 'url': 'https://errors.pydantic.dev/2.9/v/string_too_short'}
{'type': 'value_error', 'loc': ('client_email',), 'msg': 'value is not a valid email address: An email address must have an @-sign.', 'input': 'ajolawsonfakedomain.com', 'ctx': {'reason': 'An email address must have an @-sign.'}}
{'type': 'string_type', 'loc': ('items_purchased', 2), 'msg': 'Input should be a valid string', 'input': 17, 'url': 'https://errors.pydantic.dev/2.9/v/string_type'}
{'type': 'greater_than', 'loc': ('amount_owed',), 'msg': 'Input should be greater than 0', 'input': 0, 'ctx': {'gt': 0.0}, 'url': 'https://errors.pydantic.dev/2.9/v/greater_than'}


# Rag Document Entity

PydanticDeprecatedSince20: Pydantic V1 style `@validator` validators are deprecated.

In [184]:
from pydantic import BaseModel, Field, field_validator
from uuid import UUID


class Document(BaseModel):
    trace_id: UUID = Field(
        default_factory=uuid4,
        frozen=True,
        description="Уникальный идентификатор, сгенерированный на фронтенде",
    )
    file_name: str = Field(description="Имя загруженного файла")  # pattern=r".+pdf$"
    file_bytes: bytes = Field(
        description="Содержимое файла в байтах"
    )  # вот тут бы че нить

    @field_validator("file_name")
    def validate_file_extension(cls, v):
        if not v.lower().endswith(".pdf"):
            raise ValueError("Файл должен иметь расширение .pdf")
        return v

    @field_validator("file_bytes")
    def validate_pdf_content(cls, v):
        if not v.startswith(b"%PDF"):
            raise ValueError("Содержимое файла не является допустимым PDF")
        return v

    @field_validator("file_bytes")
    def validate_file_size(cls, v):
        max_size = 1 * 1024 * 1024  # 1 MB
        min_size = 15 * 1024
        if len(v) > max_size:
            raise ValueError("Размер файла превышает 1 MB")
        elif len(v) < min_size:
            raise ValueError("Файл меньше члена Сани")
        return v

Моделируем файлы с фронта, которые летят в наш безобидный Dao?

In [185]:
import os


def read_files_as_bytes(directory_path: str):
    files_data = []
    supported_extensions = [
        ".pdf",
        ".csv",
        ".doc",
        ".docx",
        ".ipynb",
        ".txt",
    ]  # Можно добавить другие форматы

    for filename in os.listdir(directory_path):
        file_path = os.path.join(directory_path, filename)

        # Пропускаем, если это не файл
        if not os.path.isfile(file_path):
            print(f"Пропуск {file_path}, это не файл")
            continue

        # Проверка расширения файла
        _, file_extension = os.path.splitext(filename)
        if file_extension.lower() not in supported_extensions:
            print(f"Пропуск файла {filename}, неподдерживаемый тип {file_extension}")
            continue

        # Читаем содержимое файла в байтах
        try:
            with open(file_path, "rb") as f:
                file_bytes = f.read()
            files_data.append({"file_name": filename, "file_bytes": file_bytes})
            print(f"Файл {filename} успешно считан")
        except Exception as e:
            print(f"Ошибка при чтении файла {filename}: {e}")

    return files_data

In [186]:
front_data_examples = read_files_as_bytes("../data")

Файл data_example.csv успешно считан
Файл empty.txt успешно считан
Файл empty_pdf.pdf успешно считан
Файл example_pdf.pdf успешно считан
Файл LSTM_notebook.ipynb успешно считан
Файл too_big_pdf.pdf успешно считан
Файл word_doc.docx успешно считан


In [187]:
### Погнали нахуй
documents = []

for file_data in front_data_examples:
    file_name = file_data["file_name"]
    file_bytes = file_data["file_bytes"]

    print(f"Working with: {file_name}")
    try:
        document = Document(file_name=file_name, file_bytes=file_bytes)

        documents.append(document)
    except ValidationError as e:
        for er in e.errors():
            print(f'File name: {file_name} | {er["msg"]}')

Working with: data_example.csv
File name: data_example.csv | Value error, Файл должен иметь расширение .pdf
File name: data_example.csv | Value error, Содержимое файла не является допустимым PDF
Working with: empty.txt
File name: empty.txt | Value error, Файл должен иметь расширение .pdf
File name: empty.txt | Value error, Содержимое файла не является допустимым PDF
Working with: empty_pdf.pdf
File name: empty_pdf.pdf | Value error, Файл меньше члена Сани
Working with: example_pdf.pdf
Working with: LSTM_notebook.ipynb
File name: LSTM_notebook.ipynb | Value error, Файл должен иметь расширение .pdf
File name: LSTM_notebook.ipynb | Value error, Содержимое файла не является допустимым PDF
Working with: too_big_pdf.pdf
File name: too_big_pdf.pdf | Value error, Размер файла превышает 1 MB
Working with: word_doc.docx
File name: word_doc.docx | Value error, Файл должен иметь расширение .pdf
File name: word_doc.docx | Value error, Содержимое файла не является допустимым PDF
