Skip to content

Latest commit

 

History

History
492 lines (311 loc) · 12.3 KB

types.rst

File metadata and controls

492 lines (311 loc) · 12.3 KB

Supported data types

.. seealso::

   Official Avro schema specification: https://avro.apache.org/docs/current/spec.html#schemas


py-avro-schema supports the following Python types:

Compound types/structures

Supports Python classes decorated with :func:`dataclasses.dataclass`.

Avro schema: record

The Avro record type is a named schema. py-avro-schema uses the Python class name as the schema name.

Dataclass fields with types supported by py-avro-schema are output as expected, including population of default values.

Example:

# File shipping/models.py

import dataclasses
from typing import Optional

@dataclasses.dataclass
class Ship:
    """A beautiful ship"""

    name: str
    year_launched: Optional[int] = None

Is output as:

{
  "type": "record",
  "name": "Ship",
  "namespace": "shipping",
  "doc": "A beautiful ship",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "year_launched",
      "type": ["null", "long"],
      "default": null
    }
  ],
}

Field default values may improve Avro schema evolution and resolution. To validate that all dataclass fields are specified with a default value, use option :attr:`py_avro_schema.Option.DEFAULTS_MANDATORY`.

The Avro record schema's doc field is populated from the Python class's docstring. To disable this, pass the option :attr:`py_avro_schema.Option.NO_DOC`.

Recursive or repeated reference to the same Python dataclass is supported. After the first time the schema is output, any subsequent references are by name only.

Supports Python classes inheriting from pydantic.BaseModel. Requires Pydantic version 2 or greater. For Pydantic 1 support, use py-avro-schema version 2.

Avro schema: record

The Avro record type is a named schema. py-avro-schema uses the Python class name as the schema name.

Pydantic model fields with types supported by py-avro-schema are output as expected, including population of default values and descriptions.

Example:

# File shipping/models.py

import pydantic
from typing import Optional

class Ship(pydantic.BaseModel):
    """A beautiful ship"""

    name: str
    year_launched: Optional[int] = pydantic.Field(None, description="When we hit the water")

Is output as:

{
  "type": "record",
  "name": "Ship",
  "namespace": "shipping",
  "doc": "A beautiful ship",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "year_launched",
      "type": ["null", "long"],
      "default": null,
      "doc": "When we hit the water"
    }
  ],
}

Field default values may improve Avro schema evolution and resolution. To validate that all model fields are specified with a default value, use option :attr:`py_avro_schema.Option.DEFAULTS_MANDATORY`.

The Avro record schema's doc attribute is populated from the Python class's docstring. For individual model fields, the doc attribute is taken from the Pydantic field's :attr:`description` attribute. To disable this, pass the option :attr:`py_avro_schema.Option.NO_DOC`.

Recursive or repeated reference to the same Pydantic class is supported. After the first time the schema is output, any subsequent references are by name only.

Warning

When using a hierarchy of Pydantic model classes, recursive type references are supported in the final class only and not in any inherited/base class.

Plain Python classes

Supports Python classes with a :meth:`__init__` where all arguments have type hints and fully define all schema fields.

Avro schema: record

The Avro record type is a named schema. py-avro-schema uses the Python class name as the schema name.

Example:

class Port:
    """A port you can sail to"""

    def __init__(self, name: str, country: str = "NLD"):
        self.name = name
        self.country = country.upper()

Is output as:

{
  "type": "record",
  "name": "Port",
  "namespace": "shipping",
  "doc": "A port you can sail to",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "country",
      "type": "string",
      "default": "NLD"
    }
  ]
}

Avro schema: JSON array of multiple Avro schemas

Union members can be any other type supported by py-avro-schema.

When defined as a class field with a default value, the union members may be re-ordered to ensure that the first member matches the type of the default value.

Forward references

Avro schema: any named schema

py-avro-schema generally supports "forward" or recursive references, for example when a class attribute has the same type as a the class itself.

Example:

@dataclasses.dataclass
class PyType:
    field_a: "PyType"

Is output as:

{
  "type": "record",
  "name": "PyType",
  "fields": [
    {
      "name": "field_a",
      "type": "PyType",
    },
  ],
}

Warning

When using a hierarchy of Pydantic model classes, recursive type references are supported in the final class only and not in any inherited/base class.

Collections

.. seealso::

   For a "normal" Avro ``map`` schema using fully typed Python dictionaries, see :ref:`types::class:`typing.mapping``.


Avro schema: bytes
Avro logical type: json

Arbitrary Python dictionaries could be serialized as a bytes Avro schema by first serializing the data as JSON. py-avro-schema supports this "JSON-in-Avro" approach by adding the custom logical type json to a bytes schema.

To support JSON serialization as strings instead of bytes, use :attr:`py_avro_schema.Option.LOGICAL_JSON_STRING`.

.. seealso::

   For a "normal" Avro ``array`` schema using fully typed Python lists of dictionaries, see :ref:`types::class:`typing.sequence``.


Avro schema: bytes
Avro logical type: json

Arbitrary lists of Python dictionaries could be serialized as a bytes Avro schema by first serializing the data as JSON. py-avro-schema supports this "JSON-in-Avro" approach by adding the custom logical type json to a bytes schema.

To support JSON serialization as strings instead of bytes, use :attr:`py_avro_schema.Option.LOGICAL_JSON_STRING`.

Avro schema: map

This supports other "generic type" versions of :class:`collections.abc.Mapping`, including :class:`typing.Dict`.

Avro map schemas support string keys only. Map values can be any other Python type supported by py-avro-schema. For example, Dict[str, int] is output as:

{
  "type": "map",
  "values": "long"
}

Avro schema: array

This supports other "generic type" versions of :class:`collections.abc.Sequence`, including :class:`typing.List`.

Sequence values can be any Python type supported by py-avro-schema. For example, List[int] is output as:

{
  "type": "array",
  "values": "long"
}

Simple types

:class:`bool` (and subclasses)

Avro schema: boolean

:class:`bytes` (and subclasses)

Avro schema: bytes

Avro schema: int
Avro logical type: date
Avro schema: long
Avro logical type: timestamp-micros

To output with millisecond precision instead (logical type timestamp-millis), use :attr:`py_avro_schema.Option.MILLISECONDS`.

Avro schema: long
Avro logical type: time-micros

To output with millisecond precision instead (logical type time-millis), use :attr:`py_avro_schema.Option.MILLISECONDS`. In that case, the Avro schema is int.

Avro schema: fixed
Avro logical type: duration

The Avro fixed type is a named schema. Here, py-avro-schema uses the name datetime.timedelta. The full generated schema looks like this:

{
  "type": "fixed",
  "name": "datetime.timedelta",
  "size": 12,
  "logicalType": "duration"
}

Avro schema: enum

The Avro enum type is a named schema. py-avro-schema uses the Python class name as the schema name. Avro enum symbols must be strings.

Example:

# File shipping/models.py

import enum

class ShipType(enum.Enum):
    SAILING_VESSEL = "SAILING_VESSEL"
    MOTOR_VESSEL = "MOTOR_VESSEL"

Outputs as:

{
  "type": "enum",
  "name": "ShipType",
  "namespace": "shipping",
  "symbols": ["SAILING_VESSEL", "MOTOR_VESSEL"],
  "default": "SAILING_VESSEL"
}

The default value is taken from the first defined enum symbol and is used to support writer/reader schema resolution.

:class:`float` (and subclasses)

Avro schema: double

To output as the 32-bit Avro schema float instead, use :attr:`py_avro_schema.Option.FLOAT_32`.

:class:`int` (and subclasses)

Avro schema: long

To output as the 32-bit Avro schema int instead, use :attr:`py_avro_schema.Option.INT_32`.

Avro schema: null

This schema is typically used as a "unioned" type where the default value is None.

Avro schema: bytes
Avro logical type: decimal

:class:`py_avro_schema.DecimalType` is a generic type for standard library :class:`decimal.Decimal` values. The generic type is used to define the scale and precision of a field.

For example, a decimal field with precision 4 and scale 2 is defined like this:

import py_avro_schema as pas

construction_costs: pas.DecimalType[4, 2]

Values can be assigned like normal, e.g. construction_costs = decimal.Decimal("12.34").

The Avro schema for the above type is:

{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 4,
  "scale": 2
}

Avro schema: string

:class:`str` subclasses ("named strings")

Avro schema: string

Python classes inheriting from :class:`str` are converted to Avro string schemas to support serialization of any arbitrary Python types "as a string value".

Primarily to support deserialization of Avro data, a custom property namedString is added and populated as the schema's namespace followed by the class name. The custom property is used here since the Avro string schema is not a "named" schema. py-avro-schema schema uses the same namespace logic as with real named Avro schemas.

Example:

# file shipping/models.py

class PortName(str):
     ...

Outputs as:

{
  "type": "string",
  "namedString": "shipping.PortName"
}
Avro schema: string
Avro logical type: uuid