# A minimal standard for interoperable Schemas

The **minimal standard** is defined as a Protocol, with some Abstract Base Classes. The protocol is minimal in that it provides for the most common [Use Cases](https://github.com/mcgfeller/py-schemas/blob/master/UseCases.md):
* Serialization to an external representation
* Deserialization from an external representation
* Validation
* Attach a Schema to an object class, and vice versa.
* Obtaining Schema elements
* Static type checking
  * Get the Python type of a schema element 
* Minimal Schema transformation:
  * Get a Type Annotation [PEP 593](https://www.python.org/dev/peps/pep-0593) from a Schema Element. If provided, the Type Annotation is standardized, ensuring interoperability of a schema element.
  * Construct a a Schema element from a Type Annotation. 
  * Construct a Schema from another Schema (from another Schema solution) by going through the Type Annotation for each element. 
* Associate data with Schema

*The protocol doesn't provide a standard representation for Schemas or Schema Elements; it only provides standard access and use.* It does provide minimal conversion of arbitrary Schema features between schema libraries, as it provides conversion to Python static types and Type Annotation [PEP 593](https://www.python.org/dev/peps/pep-0593). See [Alternatives considered](alternatives.md).

Using the Protocol for a single schema solution, such as **[Marshmallow](https://marshmallow.readthedocs.io/en/latest/)**, does not provide facilities superior to native usage. However, if the protocol is implemented by several libraries, integration of libraries using different schema facilities becomes much easier. The [Marshmallow ecosystem](https://github.com/marshmallow-code/marshmallow/wiki/Ecosystem) lists a few integrations (adaptions) to other libraries using Schemas (JSON, dataclasses, SQL Alchemy), each being a once-off ad-hoc solution. If a solution conforms to the standard, no adaption to other solutions will be required. 


In [0]:
import abc
import typing
import typing_extensions
import collections.abc
import enum

First, an entirely optional, a class providing a protocol to associate a Schema with a class, without constraining too much how things are composed. We only mandate that if a Schema is assigned to a subclass of `SchemedObject`, the Schema can be retrieved by `.__get_schema__()`. 

In [4]:
class SchemedObject(metaclass=abc.ABCMeta):
    """ An object with a Schema, supporting the __get_schema__ method.
        It is completly optional, but very convenient to have the schema
        accessible from each schemed object. 
    """

    @classmethod
    @abc.abstractmethod
    def __get_schema__(cls) -> "AbstractSchema":
        pass

To standardize on representations into which we serialize, we enumerate a few, so we can use them as enum. 

In [36]:
class WellknownRepresentation(enum.Enum):
    """ Standard way to designate external representations. This can be subclassed to add represenations,
        but these are prescribed by the protocol.
        The member value should be the MIME type, where one exists. 
    """

    python  = "__python__"  # internal python structures
    pickle  = "application/python-pickle"
    json    = "application/json"
    xml     = "application/xml"
    sql     = "application/sql"
    html    = "text/html"

The minimal protocol for a Schema defines it as an Iterable, yielding Schema Elements, having a set of representations it supports, and methods to convert to and from external representations. 

We want the Schema to be useful for type declarations, so `.get_annotations()` returns a dictionary usable as `__annotations__` in a class. If `include_extras` (following [PEP 593](https://www.python.org/dev/peps/pep-0593)) is set, annotated types are returned. Unfortunately, the Python `typing` module assumes `__annotations__` to be a dictionary, instead of allowing a callable returning annotations, so I don't think this can be done more elegantly. 

We also have a `.get_metadata()` method defined both on the Schema and the Schema Element. Metadata is not used at all by the Schema, and is provided as a third-party extension mechanism. Multiple third-parties can each have their own key, to use as a namespace in the metadata. This is similar to and taken from [dataclasses.Field](https://docs.python.org/3/library/dataclasses.html#dataclasses.field).

The `writer_callback` argument in `.to_external()` and the `external` in `.from_external()` are inspired by the consumer API of [PEP-574](https://www.python.org/dev/peps/pep-0574/#consumer-api). The support arbitrary callables to receive or source the external representation, respectively; if not supplied, the external representation is returned or input as a string argument, respectively. 

The optional methods `.from_schema()` and `.add_element()` support the creation of a Schema and adding elements from another Schema (from another solution). They must not assume any representation of the source schema, apart from the one exposed in this protocol. 

In [37]:
class AbstractSchema(collections.abc.Iterable):
    """ The AbstractSchema does not prescribe how the Schema is organized, and
        only prescribes that the AbstractSchemaElement may be obtained by iterating
        over the Schema.

        It defines methods to transform a SchemedObject to an external representation,
        and vice versa. 

        It also defines method to get annotations, and to create a Schema from
        another AbstractSchema. 

        The __objclass__ optionally contains the class this is the Schema, for
        so schema.__objclass__.__get_schema__() is schema. This may be used to
        recreate objects when transforming from excternal representations.  
    """

    SupportedRepresentations: typing.ClassVar[typing.Set["WellknownRepresentation"]] = {
        WellknownRepresentation.python
    }

    SupportsCallableIO: typing.ClassVar[
        bool
    ] = False  # must be overwritten if callable input / output is supported

    __objclass__: typing.Optional[
        typing.Type
    ] = None  # class of the corresponding object

    def get_name(self) -> typing.Optional[str]:
        """ get name of Schema or None """
        return getattr(self, "name", None)

    @abc.abstractmethod
    def to_external(
        self,
        obj: SchemedObject,
        destination: WellknownRepresentation,
        writer_callback: typing.Optional[typing.Callable] = None,
        **params,
    ) -> typing.Optional[typing.Any]:
        """ Method to convert a SchemedObject (or Python structure in general)
            to the external representation according to destination. 

            If *writer_callback* is None (the default), the external representation
            is returned as result.

            If *writer_callback* is not None, then it can be called any number
            of times with some arguments. No result is returned.

            (inspired by PEP-574 https://www.python.org/dev/peps/pep-0574/#producer-api)
        """
        self.check_supported_output(destination, writer_callback)
        pass

    @abc.abstractmethod
    def from_external(
        self,
        external: typing.Union[typing.Any, typing.Callable],
        source: WellknownRepresentation,
        **params,
    ) -> SchemedObject:
        """ Method to convert an external representation according to source
            to a SchemedObject (or Python structure in general).

            If *external* is bytes, they are consumed as source representation.

            If *external* is a Callable, then it can be called any number
            of times with some arguments to obtain parts of the source representation.

        """
        self.check_supported_input(source, external)
        pass

    @abc.abstractmethod
    def validate_internal(self, obj: SchemedObject, **params) -> SchemedObject:
        """ Method to validate that SchemedObject (or Python structure in general)
            conforms to this schema. Either returns the (possibly converted) SchemedObject
            or raises a ValidationError.
        """
        return obj

    @classmethod
    def check_supported_input(
        cls,
        source: WellknownRepresentation,
        external: typing.Union[typing.Any, typing.Callable],
    ):
        """ check whether representation is supported, raise an error otherwise """
        if source not in cls.SupportedRepresentations:
            raise NotImplementedError(
                f"Input representation {source} not supported; choose one of {', '.join([str(r) for r in cls.SupportedRepresentations])}"
            )
        if not cls.SupportsCallableIO and callable(external):
            raise NotImplementedError(f"Callable input not supported by {cls.__name__}")

        return

    @classmethod
    def check_supported_output(
        cls,
        destination: WellknownRepresentation,
        writer_callback: typing.Optional[typing.Callable] = None,
    ):
        """ check whether representation is supported, raise an error otherwise """
        if destination not in cls.SupportedRepresentations:
            raise NotImplementedError(
                f"Input representation {destination} not supported; choose one of {', '.join([str(r) for r in cls.SupportedRepresentations])}"
            )
        if not cls.SupportsCallableIO and writer_callback is not None:
            raise NotImplementedError(
                f"Callable output not supported by {cls.__name__}"
            )

        return

    @abc.abstractmethod
    def __iter__(self) -> typing.Iterator["AbstractSchemaElement"]:
        """ iterator through SchemaElements in this Schema """
        pass

    def get_annotations(
        self, include_extras: bool = False
    ) -> typing.Dict[str, typing.Type]:
        """ return Schema Elements in annotation format.
            If include_extras (PEP-593) is True, the types returned are typing.Annotated types.

            Use as class.__annotations__ = schema.get_annotations()
            I would wish that __annotations__ is a protocol that can be provided,
            instead of simply assuming it is a mapping. 
        """
        return {
            se.get_name(): se.get_annotated()
            if include_extras
            else se.get_python_type()
            for se in self
        }

    def get_metadata(self) -> typing.Mapping[str, typing.Any]:
        """ return metadata (aka payload data) for this Schema.

            Meta data is not used at all by the Schema, and is provided as a third-party 
            extension mechanism. Multiple third-parties can each have their own key, 
            to use as a namespace in the metadata.
            (similar to and taken from dataclasses.Field)

            Can be refined; by default an empty dict is returned.

            There is a similar method defined on the AbstractSchemaElement for
            smetadata attached to a schema element. 
        """
        return {}

    @classmethod
    @abc.abstractmethod
    def from_schema(cls, schema: "AbstractSchema") -> "AbstractSchema":
        """ Create a new Schema (in the Schema dialect of the cls) from
            a schema in any Schema Dialect.
        """
        pass

    def add_element(self, element: "AbstractSchemaElement") -> "AbstractSchemaElement":
        """ Optional API: Add a Schema element (in any Schema Dialect) to this Schema
            and returns the created element. 
        """
        raise NotAvailable


We finish the minimal protocol by defining a Schema Element. It supports a back reference to its Schema, and two methods to get the Schema Element's name and its Python type and type annotation.

The optional method `.from_schema_element()` supports the creation of a SchemaElement and adding elements from another SchemaElement (from another solution). They must not assume any representation of the source SchemaElement, apart from the one exposed in this protocol. 

*Implementation hints:* 
 - Use `.get_python_field()` to retrieve the data from a SchemaElement, and construct a SchemaElement from the Field.
 - If argument `schema_element` is from your Schema solution, you may return it unchanged. 


In [8]:
class AbstractSchemaElement(metaclass=abc.ABCMeta):
    """ Holds one SchemaElement of a Schema. No represenation is prescribed, hence there is no constructor.
        The SchemaTypeAnnotation, however, prescribes a representation. It can either be attached to the 
        SchemaElement, or generated from it when queried by .get_annotation(). 
    """

    @abc.abstractmethod
    def get_schema(self) -> typing.Optional[AbstractSchema]:
        """ get associated schema or None """
        pass

    @abc.abstractmethod
    def get_name(self) -> str:
        """ get name useable as variable name """
        pass

    @abc.abstractmethod
    def get_python_type(self) -> type:
        """ get Python type of this AbstractSchemaElement """
        pass

    @abc.abstractmethod
    def get_annotation(self) -> typing.Optional["SchemaTypeAnnotation"]:
        """ Optional: get SchemaTypeAnnotation of this AbstractSchemaElement """
        pass

    def get_metadata(self) -> typing.Mapping[str, typing.Any]:
        """ return metadata (aka payload data) for this SchemaElement.

            Metadata is not used at all by the Schema, and is provided as a third-party 
            extension mechanism. Multiple third-parties can each have their own key, 
            to use as a namespace in the metadata.
            (similar to and taken from dataclasses.Field)

            Can be refined; by default an empty dict is returned.
        """
        return {}

    @classmethod
    @abc.abstractmethod
    def from_schema_element(
        cls,
        schema_element: "AbstractSchemaElement",
        parent_schema: typing.Optional[AbstractSchema] = None,
    ) -> "AbstractSchemaElement":
        """ Optional API: create a new AbstractSchemaElement (in the Schema dialect of the cls) from
            a AbstractSchemaElement in any Schema Dialect.
            parent_schema is the new schema parent of the element. 
        """
        pass

    def get_annotated(self) -> type:
        """ get PEP-593 typing.Annotated type """
        return typing_extensions.Annotated[
            self.get_python_type(), self.get_annotation()
        ]


It is convenient to have a standardized object representing a missing value, which is distinct from `None`, which may be a valid value (in particular, a valid default).

In [34]:
class _MISSING_TYPE:
    """ A sentinel object to detect if a parameter is supplied or not.  Use a class to give it a better repr. """

    def __repr__(self):
        return "MISSING"

MISSING = _MISSING_TYPE()

The Type Annotation is the annotation in PEP-593's `typing_extensions.Annotated[type,annotation]`. It is defined concretly, and serves as 
the interoperability mechanism between elements of different Schema solutions. The functionality is desinged to be extensible, but fulfill the
main use cases:

In [32]:
class SchemaTypeAnnotation:
    """ Annotation holding SchemaElement typing information to go as 2nd parameter into typing_extensions.Annotated.
        Unlike the AbstractSchemaElement, the SchemaTypeAnnotation is concrete and prescribes a minimal representation.

        A subclass of AbstractSchema may or may not use the to_external, from_external or validate_internal
        methods, at its convience. It may have its own transformation approach, for example working on the whole object
        at once instead of at the element level. 

        
    """

    def __init__(
        self,
        required: bool = False,
        default: typing.Any = MISSING,
        to_external: typing.Optional[typing.Callable] = None,
        from_external: typing.Optional[typing.Callable] = None,
        validate_internal: typing.Optional[typing.Callable] = None,
        metadata: typing.Mapping[str, typing.Any] = {},
    ):
        """ SchemaTypeAnnotation 
            default is the internal form of the default value, or MISSING
            from_external, to_external and validate_internal are callables with signature of the methods below, which they overwrite.
        """
        self.required = required
        self.default = default
        if to_external is not None:
            self.to_external = to_external
        if from_external is not None:
            self.from_external = from_external
        if validate_internal is not None:
            self.validate_internal = validate_internal
        self.metadata = metadata

    def __repr__(self):
        return f"{self.__class__.__name__}(required={self.required}, default={self.default})"

    @staticmethod  # can be overritten by passing a function to the constructor
    def to_external(
        annotation: "SchemaTypeAnnotation",
        schemaElement: AbstractSchemaElement,
        value: typing.Any,
        destination: WellknownRepresentation,
        writer_callback: typing.Optional[typing.Callable] = None,
        **params,
    ) -> typing.Optional[typing.Any]:
        """ Externalize value in schemaElement to destination.
            The arguments are the same as in AbstractSchema.to_external(). Params must be passed down. 
        """
        pass

    @staticmethod  # can be overritten by passing a function to the constructor
    def from_external(
        annotation: "SchemaTypeAnnotation",
        schemaElement: AbstractSchemaElement,
        external: typing.Union[typing.Any, typing.Callable],
        source: WellknownRepresentation,
        **params,
    ) -> typing.Any:
        """ Internalize and validate data from external source. Returns internal form, or raises error.
            The arguments are the same as in AbstractSchema.from_external(). Params must be passed down. 
        """
        pass

    @staticmethod  # can be overritten by passing a function to the constructor
    def validate_internal(
        annotation: "SchemaTypeAnnotation",
        schemaElement: AbstractSchemaElement,
        value: typing.Any,
        **params,
    ) -> typing.Any:
        """ Validation method, to transform value and validate it. Returns internal form (possibly unchanged value), or raises error.
            The arguments are the same as in AbstractSchema.from_external(). Params must be passed down. 

            Default implementation:
            
            Passes external to schemaElement.get_python_type() by default.
        """
        if not value:
            if annotation.default is not MISSING:
                value = annotation.default
            elif annotation.required:
                raise ValidationError(
                    f"required element {schemaElement.get_name()} must be supplied"
                )
        else:
            pt = schemaElement.get_python_type()
            basetype = (
                typing.get_origin(pt) or pt
            )  # typing type.__origin__ is Python class
            try:
                value = basetype(value)
            except Exception as e:
                raise ValidationError(str(e), original_error=e)
        return value


Finally, standardizing some errors helps to use several Schema solutions in one code base:

In [33]:
class SchemaError(Exception):
    """ Base class for all schema-related errors. 
        The original_error can be used to pass in the error raised by an underlying implementation.
    """

    def __init__(self, message, original_error=None):
        self.message = message
        self.original_error = original_error
        return


class ValidationError(SchemaError, ValueError):
    """ Denotes invalid data """

    ...

# Example Schema solutions 

I have developed adapters to two schema solutions and will show some examples:
- Dataclasses - a minmal Schema solution: [dataclasses_schema.py](dataclasses_schema.py). 
- [Marshmallow](https://marshmallow.readthedocs.io/en/latest/) - a full-featured, well-engineered, stand-alone library supporting serialization to Python natives types and JSON: [marshmallow_schema.py](marshmallow_schema.py)

## Marshmallow 

### Schema adapter

I don't change the source of Marshmallow, I use subclassing for the Schema and a monkey-patched superclass for the fields.

The adaption to _Nested_ fields would be trickier, but would follow the same principles (personally, I prefer using an field _Object_, which refers to another schemed class, and works with `post_load()` to re-create the object).  

Now adapt Marshmallow fields by monkey patching:

The SchemedObject can be used a superclass for Python objects with an associated Schema. By convention of this implementation, the Schema is obtained as an inner class named Schema.

The Schema is instantiated and cached as `.__schema`. It also sets `__objclass__` in the schema of a class to that class, so `MMSchema.object_factory()` can recreate the actual class instance. I prefer this over a Marshmallow `post_load()` decorator, because that needs to be defined on the Schema (however, this is not part of the protocol, but of my adaption of Marshmallow). 

Now for the Schema itself, and its transformation methods:

### Examples

Using the above code, we declare a Person class with a Marshmallow Schema:

In [0]:
from marshmallow_schema import SchemedObject, MMSchema
import abc_schema
import marshmallow as mm  # type: ignore
import dataclasses
import datetime
import typing

In [0]:
class Person(SchemedObject):
    class Schema(MMSchema):
        name = mm.fields.Str(required=True)
        email = mm.fields.Email(missing=None)
        dob = mm.fields.Date(required=False,missing=None)
        sex = mm.fields.Str(
            validate=mm.fields.validate.OneOf(("m", "f", "o", "?")), missing="?"
        )
        education = mm.fields.Dict(
            keys=mm.fields.Str(), values=mm.fields.Date(), payload="field metadata"
        )

    __annotations__ = Schema().get_annotations()


p = Person()

In [0]:
{se.get_name(): se.get_python_type() for se in p.__get_schema__()}

In [0]:
s = p.__get_schema__()
assert s.fields["name"].get_schema() is s
assert s.fields["education"].get_metadata() == {"payload": "field metadata"}

`education` is a Marshmallow field:

In [0]:
field = s.fields["education"]
field

Use it to create another Marshmallow field without using field's Marshmallow internals:

In [0]:
mm.fields.Field.from_schema_element(field)

Create another Marshmallow schema from s, only using the protocol API to access s (round trip): 

In [0]:
scopy = MMSchema.from_schema(s)

We can also define a data class:

In [0]:
@dataclasses.dataclass
class DCPerson(Person):
     __annotations__ = Person.Schema().as_field_annotations()

Note we need to set dob=None, as dataclasses cannot handle optional fields, but since the type is optional, it's still a valid value:

In [0]:
dcp = DCPerson(name='Martin',email='mgf@acm.org',sex='m',dob=None,education={"Gymnasium Raemibuehl": datetime.date(1981, 9, 1)})

In [0]:
dcp

In [0]:
dcp_s = dcp.__get_schema__()

In [0]:
j = dcp_s.to_external(dcp,abc_schema.WellknownRepresentation.json)

In [0]:
j

In [0]:
o = dcp_s.from_external(j,abc_schema.WellknownRepresentation.json)

In [0]:
dcp_s.validate_internal(dcp)

Validate with the "copy" of the Schema created above using only the protocol API:

In [0]:
scopy.validate_internal(dcp)

### MyPy static type checking

Note that MyPy currently cannot check type definitions returned by functions and methods (definitions, not the return type itself). However, MyPy has a [plugin mechanism](http://mypy-lang.blogspot.com/2019/03/extending-mypy-with-plugins.html) that should support such calculated types. 

In [0]:
!mypy marshmallow_example.py

### Conclusion

Using Marshmallow through the protocol is not easier than using the standard Marshmallow API. However, if in a Marshmallow 
Schema one needs to refer to some Schemas in Django or SQLAlchemy, interoperability becomes key.