Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation refactoring + exception redesign #374

Merged
merged 4 commits into from
Jan 25, 2016

Conversation

bintoro
Copy link
Member

@bintoro bintoro commented Dec 11, 2015

This is a refactoring of the validation/conversion machinery that aims to

  • further eliminate redundant code
  • make the import recursion loop more generic and reusable
  • make the exception flow and error aggregation easier to understand.

It introduces some new features made possible by the redesign and a few bugfixes as well.

Since the diff is pretty big, I'm going to explain most changes below.

List of changes

  • Combines the validation and conversion code paths. Recursive validation of nested items and submodels is now provided by MultiType.convert() (formerly to_native()), obviating the need for parallel, validation-specific logic.

    • The validate_items() methods on ListType and DictType are removed.
    • The validate_model() method on ModelType is removed.
  • Enables simultaneous conversion and validation during data import.

    The statement FooModel(<raw data>, validate=True) now runs conversion and validation in one go, collecting both types of errors. If the data is likely to clear at least the conversion step, this is more efficient than sequential processing, which requires traversing the entire dataset twice.

    Here are some timing results from a simple import+validation testcase:

    Sequential Single-pass Speedup
    Success 2.855 1.691 41%
    Fail on validation 3.164 2.003 37%
    Fail on conversion 1.533 2.077 −35%
  • Improves the exception design by implementing a bit more structure (see Redesign the conversion/validation exceptions #369).

    • Clear separation between field-level and compound errors and their representations
    • Support for more detailed error data within the atomic messages
  • Fixes the conversion of ModelType fields when the input is already a model instance. Previously, the instance would be returned unchanged even if it contained unconverted fields or submodels.

    If the process originates from Model.__init__(), a new instance is created for each submodel, and the input instance is simply used as the raw data, as would happen with a dict input. Otherwise, such as during validation, the existing instance is updated in place to preserve object identity.

  • Makes the valid part of a failing dataset available to application code as DataError.partial_data.

    This information used to exist as ModelConversionError.partial_data but was only accessible to functions calling import_loop directly. It also wasn't aggregated to include submodels. Both of these issues are fixed, so the entire tree of valid data is now included with the final error raised from Model.validate() or similar.

  • Fixes a bug where the first failing field-level validator would prevent the rest of a compound field's validators from running.

  • Fixes a bug(?) that caused model-level validation to be skipped entirely if any errors occurred during field-level validation.

    From now on, only those fields that failed the initial validation are skipped, while others will have their model-level validators run normally.

  • Removes MultiType.validate(). The BaseType implementation now works universally.

  • Removes an obsolete validator _check_for_unknown_fields() from the validate module.

  • Fixes a number of tests that were broken due to unreachable assertions inside with pytest.raises() statements.

  • Changes Context back to a custom class from namedtuple. Turns out strict immutability is highly impractical because sometimes you have to establish a partial or empty context outside of import_loop to retain a reference to it. The new Context class implements a __setattr__() that allows setting each attribute just once, which provides the required degree of immutability.

  • The instant conversion of ModelType fields when populated through assignment (as in model_instance.submodel = {'x': 1}) is now accomplished through a hook that types themselves may implement. This gets rid of the isinstance stuff inside FieldDescriptor.__set__() and eliminates the bidirectional module dependency.

Compatibility

This is mostly just a reorganization that should not cause any huge trouble in a typical setting.

  • Potential issues are likely to be related to the redesigned exceptions:

    • The switch from list to dict as the representation of nested ListType errors (Return index of invalid objects #307) is clearly a backward-incompatible change, but it's done for a good reason.

    • Since ValidationError is a field-specific construct, aggregate error classes are no longer its subclasses. This will not work:

      try:
          model_instance.validate()
      except ValidationError:
          ...

      Using ModelValidationError here will continue to work, as it's declared a synonym of DataError, as is ModelConversionError.

  • If a derivative of ListType or DictType has overridden validate_items(), it would now cause validation to occur twice, since the original types no longer need this method. The same applies to ModelType and validate_model().

  • If a Model subclass has overridden the validate() method in the anticipation that it will be called for every submodel throughout a data tree, it will not work as expected. Model.validate() only initiates the validation process and is no longer invoked for submodels.

  • Previously, the data dictionary passed to model-level validators would be guaranteed to contain entries for all fields because

    • the validators weren't called at all unless field-level validation was a complete success
    • Undefined objects would substitute for missing values (since Add support for undefined values #372).

    As both of these issues have now been remedied, validators wanting to look at other fields on the model will have to take care not to blindly access keys that might not exist.

@bintoro bintoro force-pushed the validation-refactor branch 2 times, most recently from e035228 to 48e4921 Compare December 11, 2015 13:39
@bintoro bintoro mentioned this pull request Dec 28, 2015
@bintoro bintoro force-pushed the validation-refactor branch 2 times, most recently from 763ca1b to 4cc9564 Compare December 29, 2015 02:20
@bintoro bintoro force-pushed the validation-refactor branch 2 times, most recently from 96b37ab to bd138e7 Compare January 20, 2016 20:30
@bintoro bintoro force-pushed the validation-refactor branch 3 times, most recently from dcb38b2 to 129b6f6 Compare January 25, 2016 10:54
Introduces a 'pre_setattr' method that types may implement.
It will be called from FieldDescriptor.__set__() before updating
the instance.

This is done to get rid of the special-casing of ModelType fields
during FieldDescriptor.__set__() and to eliminate the two-way
dependency between the 'models' module and 'ModelType'.
* Merges the conversion and validation algorithms.
* Enables simultaneous conversion and validation during data import.
* Introduces redesigned exception classes.
bintoro added a commit that referenced this pull request Jan 25, 2016
Validation refactoring + exception redesign
@bintoro bintoro merged commit 46d7445 into schematics:development Jan 25, 2016
@bintoro bintoro deleted the validation-refactor branch February 15, 2016 14:13
@bintoro bintoro mentioned this pull request Feb 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant