Skip to content

Add Serialization Module#9

Merged
lukpueh merged 8 commits intoin-toto:masterfrom
PradyumnaKrishna:serialization
Aug 25, 2022
Merged

Add Serialization Module#9
lukpueh merged 8 commits intoin-toto:masterfrom
PradyumnaKrishna:serialization

Conversation

@PradyumnaKrishna
Copy link

Description of the changes being introduced by the pull request:

Metadata classes and other similar classes requires a method to
serialize and deserialize their objects into json, bytes, and
other forms. Hence, to provide a common interface to serialize
and deserialize is a better approach.

BaseSerializer and BaseDeserializer are introduced having
method serialize and deserialize respectively. These methods will
serialize class instance to raw bytes and vice versa.

SerializationError and DeserializationError will be raised on error
occurrences.

Please verify and check that the pull request fulfils the following requirements:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Metadata classes and other similar classes requires a method to
serialize and deserialize their objects into json, bytes, and
other forms. Hence, to provide a common interface to serialize
and deserialize is a better approach.

GenericSerializer and GenericDeserializer are introduced having
method serialize and deserialize respectively. These methods will
serialize class instance to raw bytes and vice versa.

SerializationError and DeserializationError will be raised on error
occurrences.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Serialization and Desrialization to and from utf-8 encoded json
bytes.

These JSON method requires an object or class type with to_dict
or from_dict methods respectively. Which will result in reuse of
this serialization technique between multiple different types of
Metadata or other serializable entities.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Serializable class will provide methods to serialize class into
bytes or write the bytes into file and vice versa using common
methods. These methods include `from_bytes`, `to_bytes`,
`from_file` and `to_file`.

This will provide functionality to use serialization by classes
other than Metadata and reduce duplication.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Envelope is now made a serializable entity to support operations
like: from_bytes, to_bytes, from_file and to_file.

Payload deserialization method is added to convert payload into a
required form, currently from json to a required class_type.
Maybe its not best idea to reuse BaseDeserializer here because
payload may not always result as an instance of a class.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Tests are added for serialization considering all possible cases.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
@PradyumnaKrishna PradyumnaKrishna marked this pull request as ready for review August 10, 2022 08:03
@lukpueh
Copy link
Member

lukpueh commented Aug 11, 2022

This is extremely cool! I like how you provide common serialization-related methods in the Serializable class.

Do you think it's worth to define (in code) the expectations with have for classes that use JSONSerializer (must implement to_dict) and JSONDeserializer (must implement from_dict)?

It would be cool, if we could use typing.Protocol. It seems like the "right" way to define those expectations, without requiring classes like Metadata or Envelope to subclass e.g. a JSONSerializable, or JSONDeserializable abstract class.

JSONSerializable class provide abstract method `from_dict` and
`to_dict`, required by JSONDeserializer and JSONSerializer to
serialize object to json bytes and vice versa.
Inheriting the JSONSerializable class, these two method need to
be defined by the subclass in order to use JSON serialization.

Serializable is renamed to SerializationMixin, to remove confusion
about naming and keep it serparate from JSONSerializable.

Option for default serializer and deserializer are added to
SerializatioMixin. The method default_serializer will be defined
to provide a default serializer for serialization, if a serializer
is not provided (similar for default_deserializer).

By using this method, Metadata or other container can define a
default option in case of multiple serializer (or deserialzer)
other than JSON one, which makes it simpler. Otherwise, have to
redefine all methods of Mixin for each Serialization method.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation-wise this looks impeccable! I really only have a couple of high-level thoughts to share. IMO the following classes of this PR would be better candidates for typing.Protocol than abc:

  • BaseSerializer
  • BaseDeserializer
  • JSONSerializable

Mind adding a TODO comment or creating a ticket for post-Python 3.7 times?

I also hope that this does not turn out too "engineered". It really looks smart and we definitely want to make the code more robust and enforce expectations, such as "json deserialize requires the used class to implement a from_dict method", but we also don't want the code to look like java.

I'd appreciate an assessment from one of the theupdateframework/python-tuf#1279-reviewers - cc @joshuagl, @jku. FYI: This is @PradyumnaKrishna's take on python-tuf's serialization subpackage, making it more re-usable/flexible in terms of which classes it can be used with, e.g. an Envelope class, which is the DSSE equivalent of python-tuf's Metadata class. The goal is to use this implementation in python-tuf too at some point.

"""Parse DSSE payload.

Arguments:
class_type: A class having a from_dict method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only has to be from_dict if we use default JSONDeserializer. For other deserializers class_type may not even be needed. Should we make it optional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseDeserializer requires a class type to store the bytes into a containerized form. That's why the class type will be required by every deserializer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment about from_dict ist still true. Not every deserializer might require the class to have a from_dict method. What about something like...

Suggested change
class_type: A class having a from_dict method.
class_type: The class to be deserialized. If the default deserializer is
used, it must implement ``JSONSerializable``.

TODO is added to use typing.Protocol after dropping python 3.7

Method return type SerializationMixin doesn't gives type hints
for Envelope, Metablock and will not provide hints to other
wrappers too. Any type is much more flexible in this case.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Documentation for class_type argument corrected in Envelope's
deserialize_payload method.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@lukpueh lukpueh merged commit c011e6c into in-toto:master Aug 25, 2022
@lukpueh lukpueh mentioned this pull request Aug 25, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants