Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate environment during recursive import/export/validation #359

Merged
merged 6 commits into from Nov 24, 2015

Conversation

bintoro
Copy link
Member

@bintoro bintoro commented Nov 16, 2015

A rewrite of #266.

This is a huge PR, but it's split into meaningful commits, each of which represents a working state with all tests passing.

env is now always passed to all to_primitive, to_native, and export_loop methods on types. Consequently, the standard field_converter interface also has env as a mandatory component: field_converter(field, value, env).

The env object will be set up by either import_loop or export_loop during the first iteration if the application hasn't supplied it by then.

context

The overloading of "context" with a number of uses has been resolved as follows:

  • context in import_loop() and validate() renamed to trusted_data
  • context as an argument to to_native(), to_primitive(), validate(), and mock() superseded by env
  • context as an argument to expand in transforms.py renamed to expanded_data
  • context when accessing the private context replaced by env.context.

Basically, env.context is just a recommendation for establishing a private namespace. Apparently the only thing in the library that accesses env.context is MultiLingualString that looks for a locale there.

export_loop inspection gone

[DELETED]

Miscellaneous changes

  • ModelType.strict option removed for now. To support something like this, we would need to differentiate between at least two flavors of settings:

    • static settings (specified as function defaults, model options, or ModelType definitions), where later (as in deeper level of recursion) overrides earlier
    • explicit settings (specified by application at runtime) that would override everything else.

    I suspect ModelType.strict may have been a band-aid to provide some degree of control where the runtime strict setting hasn't been doing anything.

  • BaseType.validate_required() removed. import_loop enforces required=True for fields directly on a model, so this was only needed for nested fields. They now get this validator in MultiType.__init__() as needed.

Validators

env is now passed to all validators, including model-level validators. Although the object is absolutely needed only in compound type validators, it's probably best to make the change everywhere at once, since appending env to the argument list is all that is needed to adapt existing code.

env = ConfigObject()
elif not isinstance(env, ConfigObject):
env = ConfigObject(env)
env._setdefault('partial', partial)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to have values passed as arguments overwrite those in env.

Having two ways to specify options is confusing. But I understand you don't want to break backward compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. There may be some misunderstanding here. For the most part, options are supposed to be specified just like before:

foo.import_data({...}, strict=True, partial=True)

This PR does not intend to change that.

An application may also pass an env object, but there are only two reasons to do so:

  • you want to put some proprietary data in it
  • your custom type or Model class wants to change the parameters in the middle of the recursive import/export process and propagate the new settings from there on.

What this code does is it sets up the env container on the first invocation, and then the options will be frozen thereafter. Letting arguments overwrite the prior state on every call is precisely the problem this PR aims to fix.

Here's an example of how a call stack might look like before:

  1. Model.__init__({...}, strict=True)
  2. Model.convert({...}, strict=True)
  3. convert({...}, strict=True)
  4. import_loop({...}, strict=True)
  5. ModelType.to_native(...)
  6. Model.import_data(...)
  7. Model.convert(...)
  8. convert(...)
  9. import_loop(...)

Okay, back at import_loop. What's the value of strict? It's False, set by convert() at step 7. But obviously it should be True, since that's what was originally specified.

There were two ways to fix this:

  • make sure every function throughout the chain propagates every possible parameter as an argument, or
  • encapsulate the settings in a single object in the beginning and pass that around instead.

I opted for the latter approach because it's way cleaner and you can now add new options without touching everything.

If there's something that was easy to do before but becomes awkward due to this change, I'm sure we can address it somehow.

@bintoro
Copy link
Member Author

bintoro commented Nov 17, 2015

Okay, I think I get it now about to_native(). You might want to export a structure that can be fed to another library that understands standard types like datetime.datetime but does not understand models.

Fair enough, but in that case "native" is not a well-defined concept. Alternatively, Model.to_native() is a misnomer, since the end result is a mixed native/primitive representation.

Resolving what "native" means is out of scope for this PR, so I'm going to shelve the fifth commit.

@lkraider
Copy link
Contributor

I like the cleanup idea proposed and the patch as is seems good.

Do you have an idea how this could affect execution performance? What's the tradeoff in overhead/ease of use of passing ConfigObjects instead of dicts or maybe namedtuples? (this is just a question, don't block this patch acceptance based on it - I think performance should be a next goal for schematics as a whole).

@bintoro
Copy link
Member Author

bintoro commented Nov 17, 2015

Great idea about namedtuple. When I wrote #266, I assumed the container should be mutable, but I don't think that's actually the case at all, particularly now that we're designating env.context for arbitrary data.

Aside from being faster, namedtuple would also prevent bugs arising from accidental modification of the settings.

(Whenever env is modified locally, such as in import_loop where mapping is replaced by its subtree, the object must be copied — otherwise the changes would apply in parent scopes too. A namedtuple would enforce this.)

However, using namedtuple means that passing a partial env as an argument won't be so practical. So, ideally, context should get its own argument after all, and that brings us back to question of whether it's too outrageous to change the meaning of context in import_loop or if it should be renamed.

In general I totally agree about the performance concerns. It would be great if someone put together a testcase for trying out some optimization ideas.

@lkraider
Copy link
Contributor

I see no problem in re-purposing the context keyword if we are breaking compatibility, it was de-facto an internal API anyway (the new trusted_data). But calling it user_data might be clearer.

@bintoro
Copy link
Member Author

bintoro commented Nov 19, 2015

Now using namedtuple with a modified constructor that allows omitting values to make testing easier:

>>> from schematics.transforms import ExportContext
>>> ExportContext(role='public')
ExportContext(role='public', raise_error_on_role=None, print_none=None, app_data=None)

Added _branch method to simplify copying env during the recursion. It's like _replace but ignores None values and does not create a copy unless needed.

In a hypothetical ModelType class that has support for strict and partial settings,

params = {}
if self.strict is not None:
    params['strict'] = self.strict
if self.partial is not None:
    params['partial'] = self.partial

branched_env = env._replace(**params) if params else env

return model.import_data(value, env=branched_env)

can now be expressed as

params = {
  'strict': self.strict,
  'partial': self.partial
}

return model.import_data(value, env=env._branch(**params))

@bintoro
Copy link
Member Author

bintoro commented Nov 23, 2015

Although the diff is all over the place, this is a rather straightforward PR, so I'm preparing to merge.

Last call regarding nomenclature:

(1) After @lkraider alleviated my concerns about reusing the retired context parameter, I'm now considering just using that instead of env because:

  • env can sound like an application-wide or library-wide thing, while context has all the right connotations
  • it would reduce the need for mechanical adaptation of old code, as the old context in most places is simply about to be replaced with the new env
  • context is no longer used in any conflicting meaning.

(2) The old context data bag is now called app_data, which I felt was more neutral and unambiguous than either private_data or user_data, but I'm open to being overruled on this too.

Thoughts?


If we make the switch from env to context, migrating custom types would go like this:

  • Add context as a third parameter to validator methods.
  • No need to touch method signatures that already include context.
  • In actual method code, replace context with context.app_data if you're using the data for something.

@lkraider
Copy link
Contributor

Can you list the flow of options/context now? For example, how will something like validate(partial=True, context={partial=False}) behave? (I suppose the context will be overridden by the parameter?)

@bintoro
Copy link
Member Author

bintoro commented Nov 24, 2015

For example, how will something like validate(partial=True, context={partial=False}) behave? (I suppose the context will be overridden by the parameter?)

Taken literally, that form (a user-provided dict as context) no longer exists, since with the addition of the app_data parameter everything can now be specified as regular arguments. In fact, the context parameter should be removed altogether from the likes of Model.validate() that make up the main API.

(EDIT: Actually no, it can't be removed because the same methods are also called from ModelType which needs to pass on the context object.)

Other than that, context always overrides any individual arguments, since they could just be random defaults.

This now makes things very simple: up until the first invocation of import_loop or export_loop there is no context object — everything's in the arguments. Once we get to the loop function, it creates the context object if there was none, and from there on only context matters.

The only case where a context object needs to be created from scratch is when writing tests for a method that needs one:

  mls = MultilingualStringType(default_locale='en_US')

  assert mls.to_primitive(
      {'en_US': 'snake', 'fr_FR': 'serpent'},
      context=ExportContext(app_data={'locale': 'fr_FR'})) == 'serpent'
              ^^^^^^^^^^^^^

@lkraider
Copy link
Contributor

Thanks for clearing it up, sounds great for me.

This patch introduces a metadata object that is propagated through
every level of a recursive import/validation process. It is used to
carry options and other data that need to be available throughout
the entire process, not just during the outermost loop.
* Add BaseType.parent_field
* Remove BaseType.validate_required() (see MultiType.__init__())
* Remove ModelType.strict
* Move ModelType.validate_model()
To simplify the creation of dummy env objects, use a modified
namedtuple whose constructor sets missing fields to None.
@bintoro
Copy link
Member Author

bintoro commented Nov 24, 2015

I believe this PR is now mature enough to be merged.

It resolves #316 and facilitates the addition of new conversion options in the future without causing disruption to existing methods.

bintoro added a commit that referenced this pull request Nov 24, 2015
Propagate environment during recursive import/export/validation
@chekunkov
Copy link

@bintoro hi, when is this change going to be released on PyPI?

@bintoro
Copy link
Member Author

bintoro commented Jan 6, 2016

@chekunkov I'm not sure. The patch is technically backward-incompatible, since all to_* methods and custom validators are now required to accept the context parameter.

If we want to be really strict about semantic versioning, this would have to wait until v2.0-alpha. If we accept mildly breaking changes in a minor release, this could be shipped pretty quickly in v1.2.0. I haven't decided.

@bintoro
Copy link
Member Author

bintoro commented Jan 22, 2016

Just FYI, I've decided to include this patch in v1.2.0 that will be released shortly.

I'm going to alleviate upgrading pains by making the context parameter optional on validator functions. If a validator doesn't declare the parameter, the function will be placed in a wrapper that drops the extra argument at runtime.

@bintoro bintoro deleted the env branch April 30, 2016 17:19
@jacobstr
Copy link

jacobstr commented Nov 4, 2016

I'm curious what the state of the 1.2 release is?

@ryanhiebert
Copy link

I know that I'm using a 2.0 pre-release personally. You might wish to try it yourself, to see if this feature is in that, and if it works for your cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants