Validation refactoring + exception redesign #374

bintoro · 2015-12-11T02:50:17Z

This is a refactoring of the validation/conversion machinery that aims to

further eliminate redundant code
make the import recursion loop more generic and reusable
make the exception flow and error aggregation easier to understand.

It introduces some new features made possible by the redesign and a few bugfixes as well.

Since the diff is pretty big, I'm going to explain most changes below.

List of changes

Combines the validation and conversion code paths. Recursive validation of nested items and submodels is now provided by MultiType.convert() (formerly to_native()), obviating the need for parallel, validation-specific logic.
- The validate_items() methods on ListType and DictType are removed.
- The validate_model() method on ModelType is removed.
Enables simultaneous conversion and validation during data import.

The statement FooModel(<raw data>, validate=True) now runs conversion and validation in one go, collecting both types of errors. If the data is likely to clear at least the conversion step, this is more efficient than sequential processing, which requires traversing the entire dataset twice.

Here are some timing results from a simple import+validation testcase:

Sequential Single-pass Speedup

Success 2.855 1.691 41%

Fail on validation 3.164 2.003 37%

Fail on conversion 1.533 2.077 −35%
Improves the exception design by implementing a bit more structure (see Redesign the conversion/validation exceptions #369).
- Clear separation between field-level and compound errors and their representations
- Support for more detailed error data within the atomic messages
Fixes the conversion of ModelType fields when the input is already a model instance. Previously, the instance would be returned unchanged even if it contained unconverted fields or submodels.

If the process originates from Model.__init__(), a new instance is created for each submodel, and the input instance is simply used as the raw data, as would happen with a dict input. Otherwise, such as during validation, the existing instance is updated in place to preserve object identity.
Makes the valid part of a failing dataset available to application code as DataError.partial_data.

This information used to exist as ModelConversionError.partial_data but was only accessible to functions calling import_loop directly. It also wasn't aggregated to include submodels. Both of these issues are fixed, so the entire tree of valid data is now included with the final error raised from Model.validate() or similar.
Fixes a bug where the first failing field-level validator would prevent the rest of a compound field's validators from running.
Fixes a bug(?) that caused model-level validation to be skipped entirely if any errors occurred during field-level validation.

From now on, only those fields that failed the initial validation are skipped, while others will have their model-level validators run normally.
Removes MultiType.validate(). The BaseType implementation now works universally.
Removes an obsolete validator _check_for_unknown_fields() from the validate module.
Fixes a number of tests that were broken due to unreachable assertions inside with pytest.raises() statements.
Changes Context back to a custom class from namedtuple. Turns out strict immutability is highly impractical because sometimes you have to establish a partial or empty context outside of import_loop to retain a reference to it. The new Context class implements a __setattr__() that allows setting each attribute just once, which provides the required degree of immutability.
The instant conversion of ModelType fields when populated through assignment (as in model_instance.submodel = {'x': 1}) is now accomplished through a hook that types themselves may implement. This gets rid of the isinstance stuff inside FieldDescriptor.__set__() and eliminates the bidirectional module dependency.

Compatibility

This is mostly just a reorganization that should not cause any huge trouble in a typical setting.

Potential issues are likely to be related to the redesigned exceptions:
- The switch from list to dict as the representation of nested ListType errors (Return index of invalid objects #307) is clearly a backward-incompatible change, but it's done for a good reason.
- Since ValidationError is a field-specific construct, aggregate error classes are no longer its subclasses. This will not work:
```
try:
    model_instance.validate()
except ValidationError:
    ...
```
  Using ModelValidationError here will continue to work, as it's declared a synonym of DataError, as is ModelConversionError.
If a derivative of ListType or DictType has overridden validate_items(), it would now cause validation to occur twice, since the original types no longer need this method. The same applies to ModelType and validate_model().
If a Model subclass has overridden the validate() method in the anticipation that it will be called for every submodel throughout a data tree, it will not work as expected. Model.validate() only initiates the validation process and is no longer invoked for submodels.
Previously, the data dictionary passed to model-level validators would be guaranteed to contain entries for all fields because
- the validators weren't called at all unless field-level validation was a complete success
- Undefined objects would substitute for missing values (since Add support for undefined values #372).
As both of these issues have now been remedied, validators wanting to look at other fields on the model will have to take care not to blindly access keys that might not exist.

Introduces a 'pre_setattr' method that types may implement. It will be called from FieldDescriptor.__set__() before updating the instance. This is done to get rid of the special-casing of ModelType fields during FieldDescriptor.__set__() and to eliminate the two-way dependency between the 'models' module and 'ModelType'.

* Merges the conversion and validation algorithms. * Enables simultaneous conversion and validation during data import. * Introduces redesigned exception classes.

Validation refactoring + exception redesign

bintoro force-pushed the validation-refactor branch 2 times, most recently from e035228 to 48e4921 Compare December 11, 2015 13:39

bintoro mentioned this pull request Dec 28, 2015

RFC: Model API #375

Open

bintoro force-pushed the validation-refactor branch 2 times, most recently from 763ca1b to 4cc9564 Compare December 29, 2015 02:20

bintoro force-pushed the validation-refactor branch from 4cc9564 to 6a75f05 Compare January 20, 2016 01:54

bintoro force-pushed the development branch from e78be4e to 493f2f0 Compare January 20, 2016 12:00

bintoro force-pushed the validation-refactor branch 2 times, most recently from 96b37ab to bd138e7 Compare January 20, 2016 20:30

bintoro mentioned this pull request Jan 23, 2016

Return index of invalid objects #307

Closed

bintoro force-pushed the validation-refactor branch 3 times, most recently from dcb38b2 to 129b6f6 Compare January 25, 2016 10:54

bintoro added 4 commits January 25, 2016 19:04

Fix submodel conversion when already an instance

7cf8868

Conversion and validation refactoring + new exceptions

deea537

* Merges the conversion and validation algorithms. * Enables simultaneous conversion and validation during data import. * Introduces redesigned exception classes.

Add tests

832b22a

bintoro force-pushed the validation-refactor branch from 129b6f6 to 832b22a Compare January 25, 2016 17:39

bintoro added a commit that referenced this pull request Jan 25, 2016

Merge pull request #374 from bintoro/validation-refactor

46d7445

Validation refactoring + exception redesign

bintoro merged commit 46d7445 into schematics:development Jan 25, 2016

This was referenced Jan 26, 2016

Split input data from valid data #124

Closed

Conversion vs. validation #137

Closed

bintoro deleted the validation-refactor branch February 15, 2016 14:13

bintoro mentioned this pull request Feb 23, 2016

Introduce MixedType #380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation refactoring + exception redesign #374

Validation refactoring + exception redesign #374

bintoro commented Dec 11, 2015

	Sequential	Single-pass	Speedup
Success	2.855	1.691	41%
Fail on validation	3.164	2.003	37%
Fail on conversion	1.533	2.077	−35%

Validation refactoring + exception redesign #374

Validation refactoring + exception redesign #374

Conversation

bintoro commented Dec 11, 2015

List of changes

Compatibility