# A minimal standard for interoperable Schemas

The minimal standard is defined as a Protocol, with some Abstract Base Classes. The protocol is minimal in that it provides for the most common [Use Cases](https://github.com/mcgfeller/py-schemas/blob/master/UseCases.md):
* Serialization to an external representation
* Deserialization from an external representation
* Validation
* Obtaining Schema elements
* Static type checking
  * Get the Python type of a schema element 
* Associate data with Schema

The protocol doesn't provide a standard representation for Schemas or Schema Elements; it only provides standard access and use. It does not provide conversion of arbitrary Schema features between schema libraries, but it provides conversion to Python static types. See [Alternatives considered](https://github.com/mcgfeller/py-schemas/blob/master/alternatives.md).

Using the Protocol for a single schema library, such as Marshmallow, does not provide facilities superior over the native usage. However, if the protocol is implemented by several libraries, integration of libraries using different schema facilities becomes much easier.


In [1]:
import abc
import typing
import collections.abc
import enum

First, an entirely optional, a class providing a protocol to associate a Schema with a class, without constraining to much how things are composed. We only mandate that if a Schema is assigned to a subclass of `SchemedObject`, the Schema can be retrieved by `.__get_schema__()`. 

In [2]:
class SchemedObject(metaclass=abc.ABCMeta):
    """ An object with a Schema, supporting the __get_schema__ method.
    """

    @classmethod
    @abc.abstractmethod    
    def __get_schema__(cls) -> 'AbstractSchema':
        pass

To standardize on representations into which we serialize, we enumerate a few, so we can use them as enum. 

In [3]:
class WellknownRepresentation(enum.Enum):

    python  = '__python__' # internal python structures
    pickle  = 'application/python-pickle'
    json    = 'application/json'
    xml     = 'application/xml'
    sql     = 'application/sql'
    html    = 'text/html'

The minimal protocol for a Schema defines it as an Iterable, yielding Schema Elements, having a set of representations it supports, and methods to convert to and from external representations. 

We want the Schema to be useful for type declarations, so `.as_annotations()` and `.as_field_annotations()` return dictionaries usable as `__annotations__` in a class. Unfortunately, the Python `typing` module assumes `__annotations__` to be a dictionary, instead of allowing a callable returing annotations, so I don't think this can be done elegantly. 

We also have a `get_metadata()` method defined both on the Schema and the Schema Element. Metadata is not used at all by the Schema, and is provided as a third-party extension mechanism. Multiple third-parties can each have their own key, to use as a namespace in the metadata. This is similar to and taken from [dataclasses.Field](https://docs.python.org/3/library/dataclasses.html#dataclasses.field).

In [4]:
class AbstractSchema(collections.abc.Iterable,metaclass=abc.ABCMeta):
    """ The AbstractSchema does not prescribe how the Schema is organizred, and
        only prescribes that the AbstractSchemaElement may be obtained by iterating
        over the Schema.
    """
    SupportedRepresentations: typing.ClassVar[typing.Set['WellknownRepresentation']] = \
        {WellknownRepresentation.python}

    @abc.abstractmethod
    def to_external(self, obj : SchemedObject, destination : WellknownRepresentation, 
                    writer_callback : typing.Optional[typing.Callable]=None, 
                    **params) -> typing.Optional[typing.Any]:
        """
            If *writer_callback* is None (the default), the external representation
            is returned as result.

            If *writer_callback* is not None, then it can be called any number
            of times with some arguments. No result is returned.

            (inspired by PEP-574 https://www.python.org/dev/peps/pep-0574/#producer-api)
        """
        pass


    @abc.abstractmethod
    def from_external(self,external : typing.Union[typing.Any,typing.Callable], 
                      source : WellknownRepresentation, **params ) -> SchemedObject:

        """
            If *external* is bytes, they are consumed as source representation.

            If *external* is a Callable, then it can be called any number
            of times with some arguments to obtain parts of the source representation.

        """
        pass

    @abc.abstractmethod
    def validate_internal(self, obj : SchemedObject, **params, ) -> SchemedObject:
        pass

    @abc.abstractmethod
    def __iter__(self) -> typing.Iterator['AbstractSchemaElement'] :
        """ iterator through SchemaElements in this Schema """
        pass

    def as_annotations(self) -> typing.Dict[str,type]:
        """ return Schema Elements in annotation format.
            Use as class.__annotations__ = schema.as_annotations()
            I would wish that __annotations__ is a protocol that can be provided,
            instead of simply assuming it is a mapping. 
        """
        return {se.get_name() : se.get_python_type() for se in self}

    def as_field_annotations(self) -> typing.Dict[str,type]:
        """ return Schema Elements in DataClass field annotation format.
            Use as class.__annotations__ = schema.as_field_annotations().

            Equivalent to as_annotations unless refined in a subclass, 
        """
        return self.as_field_annotations()  
    
    def get_metadata(self) -> typing.Dict[str,typing.Any]:
        """ return metadata (aka payload data) for this Schema.

            Meta data is not used at all by the Schema, and is provided as a third-party 
            extension mechanism. Multiple third-parties can each have their own key, 
            to use as a namespace in the metadata.
            (similar to and taken from dataclasses.Field)

            Can be refined; by default an empty dict is returned.

            There is a similar method defined on the AbstractSchemaElement for
            smetadata attached to a schema element. 
        """
        return {} 

We finish the minimal protocol by defining a Schema Element. It supports a back reference to its Schema, and two methods to get the Schema Element's name and its Python type (for interoperability with Python static typing). 

In [6]:
class AbstractSchemaElement(metaclass=abc.ABCMeta):


    @abc.abstractmethod
    def get_schema(self) -> typing.Optional[AbstractSchema]:
        """ get associated schema or None """
        pass

    @abc.abstractmethod
    def get_python_type(self) -> type:
        """ get Python type of this AbstractSchemaElement """
        pass

    @abc.abstractmethod
    def get_name(self) -> str:
        """ get name useable as variable name """
        pass
    
    def get_metadata(self) -> typing.Dict[str,typing.Any]:
        """ return metadata (aka payload data) for this SchemaElement.

            Meta data is not used at all by the Schema, and is provided as a third-party 
            extension mechanism. Multiple third-parties can each have their own key, 
            to use as a namespace in the metadata.
            (similar to and taken from dataclasses.Field)

            Can be refined; by default an empty dict is returned.
        """
        return {} 

We now continue by adapting a well-know Schema library. 



## Marshmallow 

I use [Marshmallow](https://marshmallow.readthedocs.io/en/latest/) as a Schema library. Marshmallow is a full-featured, well-engineered, stand-alone library supporting serialization to Python natives types and JSON. 

### Adapting the Schema for Marshmallow

I assume the above code is available as module `abc_schema`. I don't change the source of Marshmallow, I use subclassing for the Schema and monkey-patching for the fields.

The adaption to _Nested_ fields would be trickier, but would follow the same principles (personally, I prefer using an field _Object_, which refers to another schemed class, and works with `post_load()` to re-create the object).  

In [7]:
""" Marshmallow based conformant Schema.
    Marshmallow fields are monkey-patched.
    Marshmallow schema is subclassed. 
"""

import typing
import marshmallow as mm # type: ignore
import decimal
import datetime
import abc_schema
import dataclasses

Now adapt Marshmallow fields by monkey patching:

In [8]:
def get_schema(self) -> typing.Optional['MMSchema']:
    """ return the Schema or None """
    return self.root

def get_python_type(self) -> type:
    """ get native class of field, can be overwritten """
    return self.python_type

def get_name(self) -> str:
    return self.name

def get_metadata(self) -> typing.Dict[str,typing.Any]:
    """ return metadata (aka payload data) for this SchemaElement.
    """
    return self.metadata 

# monkey-patch all Fields:
mm.fields.Field.get_schema = get_schema
mm.fields.Field.get_python_type = get_python_type
mm.fields.Field.get_name = get_name
mm.fields.Field.get_metadata = get_metadata
# monkey-patch specific type Fields:
mm.fields.FieldABC.python_type = typing.Any
mm.fields.Str.python_type = str
mm.fields.Integer.python_type = int
mm.fields.Float.python_type = float
mm.fields.Decimal.python_type = decimal.Decimal
mm.fields.Boolean.python_type = bool
mm.fields.FormattedString.python_type = str
mm.fields.DateTime.python_type = datetime.datetime
mm.fields.Time.python_type = datetime.time
mm.fields.Date.python_type = datetime.date
mm.fields.TimeDelta.python_type = datetime.timedelta

def _dict_get_python_type(self) -> type:
    """ get native classes of containers and build Dict type
        Simplified - either container is a Field, or we use Any.
    """
    kt = self.key_container.get_python_type() if isinstance(self.key_container,mm.fields.FieldABC) else typing.Any
    vt = self.value_container.get_python_type() if isinstance(self.value_container,mm.fields.FieldABC) else typing.Any
    return typing.Dict[kt,vt]

mm.fields.Dict.get_python_type = _dict_get_python_type 



The SchemedObject can be used a superclass for Python objects with an associated Schema. By convention of this implementation, the Schema is obtained as an inner class named Schema.

The Schema is instantiated and cached as `.__schema`. It also sets `__objclass__` in the schema of a class to that class, so `MMSchema.object_factory()` can recreate the actual class instance. I prefer this over a Marshmallow `post_load()` decorator, because that needs to be defined on the Schema (however, this is not part of the protocol, but of my adaption of Marshmallow). 

In [12]:
class SchemedObject:
    """ SchemedObject is the - entirely optional - superclass that can be used for classes 
        that have an associated Schema. It defines one class method .__get_schema__, 
        to return that Schema.

        By convention of this implementation, the Schema is obtained as an inner class named Schema.
        The Schema is instantiated and cached as .__schema. __objclass__ is set in the Schema so
        that .object_factory() can create an instance. 
    """

    @classmethod
    def __get_schema__(cls):
        """ get schema attached to class, and cached in cls.__schema. If not cached, instantiate .Schema """
        s = getattr(cls,'__schema',None)
        if s is None:
            sclass = getattr(cls,'Schema',None)
            if sclass is None:
                raise ValueError('Class must have Schema inner class')
            else:
                s = cls.__schema = sclass() # instantiate
                s.__objclass__ = cls # assign this class to schema.__objclass__
        return s
                
abc_schema.SchemedObject.register(SchemedObject)

__main__.SchemedObject

Now for the Schema itself, and its transformation methods:

In [13]:
class MMSchema(mm.Schema):

    SupportedRepresentations = {abc_schema.WellknownRepresentation.python,
                                abc_schema.WellknownRepresentation.json,}

    def to_external(self, obj : SchemedObject, destination : abc_schema.WellknownRepresentation, 
                    writer_callback : typing.Optional[typing.Callable]=None, **params) -> typing.Optional[typing.Any]:
        """
            If *writer_callback* is None (the default), the external representation
            is returned as result.

            If *writer_callback* is not None, then it can be called any number
            of times with some arguments. No result is returned.

            (inspired by PEP-574 https://www.python.org/dev/peps/pep-0574/#producer-api)
        """
        supported = {
            abc_schema.WellknownRepresentation.json : self.dumps,
            abc_schema.WellknownRepresentation.python : self.dump,
          }
        method = supported.get(destination)
        if not method:
            raise ValueError(f'destination {destination} not supported.')
        e = method(obj,**params)
        if writer_callback:
            return writer_callback(e)
        else:
            return e

    def from_external(self, external : typing.Union[typing.Any,typing.Callable], 
                      source : abc_schema.WellknownRepresentation,
                      **params ) -> typing.Union[SchemedObject, typing.Dict[typing.Any, typing.Any]]:

        """
            If *external* is bytes, they are consumed as source representation.

            If *external* is a Callable, then it can be called any number
            of times with some arguments to obtain parts of the source representation.

        """
        supported = {
            abc_schema.WellknownRepresentation.json : self.loads,
            abc_schema.WellknownRepresentation.python : self.load,
          }
        method = supported.get(source)
        if not method:
            raise ValueError(f'source {source} not supported.')
        if callable(external):
            external = external(None)
        d = method(external, **params)
        o = self.object_factory(d)
            
        return o
        

    def validate_internal(self, obj : SchemedObject, **params, ) -> SchemedObject:
        """ Marshmallow doesn't provide validation on the object - we need to dump it.
            As Schema.validate returns a dict, but we want an error raised, we call .load() instead.
            However, if the validation doesn't raise an error, we return the argument obj unchanged. 
        """
        dummy = self.load(self.dump(obj)) # may raise an error
        return obj


    def __iter__(self):
        """ iterator through SchemaElements in this Schema, sett """
        for name,field in self._declared_fields.items():
            field.name = name
            yield field

    def as_annotations(self):
        """ return Schema Elements in annotation format.
            Use as class.__annotations__ = schema.as_annotations()
            I would wish that __annotations__ is a protocol that can be provided, 
            instead of simply assuming it is a mapping. 
        """
        r = {}
        for name,field in self._declared_fields.items():
            nclass = field.get_python_type()
            if not field.required:
                if field.missing is not mm.missing: # this is dummy!
                    nclass = typing.Union[nclass,type(field.missing)]
            r[name] = nclass
        return r


    def as_field_annotations(self):
        """ return Schema Elements in dataclass field annotation format.
            Use as class.__annotations__ = schema.as_annotations()
        """
        r = {}
        for name,field in self._declared_fields.items():
            nclass = field.get_python_type()
            default = None if field.missing is not mm.missing else field.missing
            metadata = None 
            dcfield = dataclasses.field(default=default,metadata=metadata)
            dcfield.type = nclass
            r[name] = dcfield
        return r

    def object_factory(self,d : dict) -> typing.Union[SchemedObject,dict]:
        """ return an object from dict, according to the Schema's __objclass__ """
        objclass = getattr(self,'__objclass__',None)
        if objclass:
            o = objclass(**d) # factory!
        else:
            o = d
        return o
    
    def get_metadata(self) -> typing.Dict[str,typing.Any]:
        """ return metadata (aka payload data) for this Schema.
            Meta data is not used at all by the Schema, and is provided as a third-party 
            extension mechanism. Multiple third-parties can each have their own key, 
            to use as a namespace in the metadata (similar to and taken from dataclasses.Field)
        """
        return self.context

abc_schema.AbstractSchema.register(MMSchema)

__main__.MMSchema

### Examples

Using the above code, we declare a Person class with a Marshmallow Schema:

In [14]:
from marshmallow_schema import SchemedObject,MMSchema
import abc_schema
import marshmallow as mm # type: ignore
import dataclasses
import datetime

In [15]:
class Person(SchemedObject):

    class Schema(MMSchema):
        name = mm.fields.Str(required=True)
        email = mm.fields.Email(missing=None)
        sex = mm.fields.Str(validate=mm.fields.validate.OneOf(('m','f','o','?')),missing='?')
        education = mm.fields.Dict(values=mm.fields.Date(), keys=mm.fields.Str(),payload='field metadata')

    __annotations__ = Schema().as_annotations()

p=Person()

In [6]:
{se.get_name() : se.get_python_type() for se in p.__get_schema__()}

{'name': str,
 'email': str,
 'sex': str,
 'education': typing.Dict[str, datetime.date]}

In [18]:
s = p.__get_schema__()
assert s.fields['name'].get_schema() is s # get schema from element
assert s.fields['education'].get_metadata() == {'payload': 'field metadata'} # get metadata

We can also define a data class:

In [7]:
@dataclasses.dataclass
class DCPerson(Person):
     __annotations__ = Person.Schema().as_field_annotations()

In [11]:
dcp = DCPerson(name='Martin',email='mgf@acm.org',sex='m',education={"Gymnasium Raemibuehl": datetime.date(1981, 9, 1)})

In [12]:
dcp

DCPerson(name='Martin', email='mgf@acm.org', sex='m', education={'Gymnasium Raemibuehl': datetime.date(1981, 9, 1)})

In [13]:
dcp_s = dcp.__get_schema__()

In [14]:
j = dcp_s.to_external(dcp,abc_schema.WellknownRepresentation.json)

In [15]:
j

'{"education": {"Gymnasium Raemibuehl": "1981-09-01"}, "email": "mgf@acm.org", "name": "Martin", "sex": "m"}'

In [16]:
o = dcp_s.from_external(j,abc_schema.WellknownRepresentation.json)

In [17]:
dcp_s.validate_internal(dcp)

DCPerson(name='Martin', email='mgf@acm.org', sex='m', education={'Gymnasium Raemibuehl': datetime.date(1981, 9, 1)})

### MyPy static type checking

Note that MyPy currently cannot check type definitions returned by functions and methods (definitions, not the return type itself). However, MyPy has a [plugin mechanism](http://mypy-lang.blogspot.com/2019/03/extending-mypy-with-plugins.html) that should support such calculated types. 

In [19]:
!mypy marshmallow_example.py

marshmallow_schema.py:56: error: Invalid type "kt"
marshmallow_schema.py:56: error: Invalid type "vt"
marshmallow_example.py:31: error: Unexpected keyword argument "name" for "DCPerson"
marshmallow_example.py:31: error: Unexpected keyword argument "email" for "DCPerson"
marshmallow_example.py:31: error: Unexpected keyword argument "sex" for "DCPerson"
marshmallow_example.py:31: error: Unexpected keyword argument "education" for "DCPerson"


### Conclusion

Using Marshmallow through the protocol is not easier than using the standard Marshmallow API. However, if in a Marshmallow 
Schema one needs to refer to some Schemas in Django or SQLAlchemy, interoperability becomes key.