Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema API #466

Merged
merged 54 commits into from
Mar 15, 2017
Merged

Schema API #466

merged 54 commits into from
Mar 15, 2017

Conversation

lkraider
Copy link
Contributor

@lkraider lkraider commented Dec 9, 2016

Implements #446

This is the foundational work to get back compatibility with SQLAlchemy. To do that, the concept of Serializable setters is reintroduced, along with a lower level abstraction of the Model named Schema.

The functional API can now rely on the Schema declaration directly, which simplifies integration by exposing a smaller surface. More work will need to be done to introduce a CompoundType with pure Schema.

Additionally, the concept of data validation stages is introduced (see contrib/machine.py) and initial work has been done in the Model to apply that, but it's not fully realized yet. Discussion for a more granular approach will be presented in another PR.

Ideally _data will be an immutable dict and all writes happen in
_raw_data. Data access will return _raw_data first, then _data if set.
Serialization will always attempt to validate and output only if valid.

Another idea is to move the data state into a container, so that each
field can be individually queried, which may reduce the `context` needed
to be passed around during the transforms.
A Models data need to follow the specified data state flow, conversion
will always be implicit. Calling the classmethod defeats the expected use
and the functional API is preferred in this case.
The idea is to move to a common type for fields and avoid mixing
descriptors with base types, but didn't work on a common abstraction
yet. The slots will help reduce mem usage for this intermediary wrapper
for now.
This is the old style functional test, that relies on a model instance,
but it should not rely in the model attribute access to test state.

Also removed use of `_initial` since it's not a source of valid data.
Changed assumption to be in line with data state flow.
This keeps the interface closer to the previous Model, while extracting
the data state handling into a single concern class.

This structure can be simplified in case the data state is stored with
the data itself, but will need more changes, specially with the types.
It is something we should investigate further.
It seems the loops could be optimized by moving the granularity of data
state into each field.
The data mutation step is implicitly done before the validation.
Still missing is aggregation of mutation errors to halt validation.
Required for compatibility with SQLAlchemy, might be wasteful if convert
is not a no-op when already converted.
@lkraider
Copy link
Contributor Author

This is ready for merging. Fixes SQLAlchemy integration.

I am leaving the deprecated API compatibility-layer enabled by default, work will need to be done to refactor it out later.

Unless there are objections, I'll merge it this friday.

Validation now will always clear invalid data from the Model, leaving
accessible only what is considered valid by each field.

A versioned Model could be devised to keep the different states
available for inspection, or at least the invalid data could be returned
by the exception.
Update parameters/docs to reference schema.
To avoid losing validation information, previous input data will be
restored after serialization into the Model.
@bintoro
Copy link
Member

bintoro commented Dec 23, 2016

@lkraider Could you maybe illustrate what's going on here in an easy-to-understand fashion, if possible? :)

@lkraider
Copy link
Contributor Author

lkraider commented Dec 23, 2016

@bintoro The motivation for this work was to bring serializable setters (from 0.9) back into mainline. This feature is required for SQLAlchemy integration (supported by https://github.com/schematics/schemalchemy).

To make this possible it was required to change the transform functions back to accept and operate on instance data, as setters in the model have to modify the instance state (this also allows instance validators again).

My approach was to avoid making the Model instance a requirement, but to have them accept a "mutable" mapping, so that non-model data can be validated too. To make this explicit I opted to separate the Model from the actual "Schema" specification that is really required during the transforms. I took inspiration in the architecture in SQLAlchemy that splits the Table description from the declarative Base model mapping. This split helps in that it separates concerns of the structure itself from its workflow (the Schema encodes the structure, the Model encodes the workflow).

Related to the setters introduction, means that they can generate "raw data" during a transform (separated into its own _mutate step), but for the Model means there is a need to split valid from raw data, so now the Model dict has that segregation builtin. There is another way to achieve that which I want to explore further, and that is to attach metadata to the values to indicate their state, but that requires more discussion.

For this PR, I kept the Schema/Model interchangeable by providing some "deprecated" API mixins. The intent is to drop them completely eventually.

Below is an overview of the encoded states now:
schematics-state-machine

As you can see, each step will only operate with data that is in the correct state from the previous step (at least that is the idea), and mutation resets the data state.

About 2.5 times faster using explicit dict init instead of namedtuple
method _asdict.
@bintoro
Copy link
Member

bintoro commented Dec 23, 2016

At this point I barely understand any of the diffs. This is going to take a long time to review. Would someone else like to give it a shot?

@lkraider
Copy link
Contributor Author

Sorry for the long delay, I am ready to merge it this week.

@bintoro
Copy link
Member

bintoro commented Jan 29, 2017

First question: why not have Model inherit from Schema?

@lkraider
Copy link
Contributor Author

lkraider commented Jan 31, 2017

I preferred composition in this case, to make the split more clear. Also because the Model is already complicated enough with metaclass behaviour and properties, I opted not to overload it with more attributes (that could conflict with user-defined properties). It also makes the interaction with SQLAlchemy Base easier to understand.

The Model responsibilities now is to encapsulate API usage, the schema and data state.

Conceptually:

  • Model 'has a' Schema
  • Schema is inferred from the Model description (by means of ModelMeta)
  • Model encodes the data transitions in the state machine (schematics API + data state)

@lkraider
Copy link
Contributor Author

lkraider commented Mar 1, 2017

I will move forward with this patch into develop branch this weekend, there are some fixes to other issues that depend on this and it should not break high level compatibility with current 2.0.0a code. People can propose changes with other PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants