Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create shared environment across nested import/validation loops via a metadata object #266

Closed
wants to merge 5 commits into from

Conversation

bintoro
Copy link
Member

@bintoro bintoro commented Dec 18, 2014

Summary

This patch introduces a metadata object that is propagated through every level of a recursive import/validation process. It is used to carry options and other data that need to be available throughout the entire process, not just during the outermost loop.

Background

One thing currently blocking the Undefined feature is the fact that import/validation parameters (importantly, partial=True) only affect the first level of processing. If you have nested models, your settings no longer apply.

Apparently someone has noticed this before and created this construct in import_loop to pass the deserialization mapping down to nested models:

try:
    ...
    raw_value = field_converter(field, raw_value, mapping=model_mapping)
except Exception:
    raw_value = field_converter(field, raw_value)

And there was a similar try—except inside the field_converter in question.

The patch

EDIT: Changed param name from meta to env.

Instead of adding yet another parameter here, I have future-proofed this by turning the third parameter into a multipurpose object representing the current environment. Import converters and validators can now propagate options and whatever data they please in the env parameter. One practical application might be bookkeeping during validation; currently it's possible to have a ModelType structure that gets into an infinite loop.

The above code now looks like this:

    sub_env = env.copy() # Make a copy
    sub_env.mapping = model_mapping  # Put the relevant mapping in there
    raw_value = field_converter(field, raw_value, env=sub_env) # Pass it on

The way this works is any function (including the initial caller) could decide to set up the Environment object. Once it enters the processing chain, its contents override any overlapping parameters passed as keyword args (EDIT: not by necessity, of course, but because import_loop chooses to treat it that way).

Right now, the setup is the responsibility of the outermost import_loop. It takes the strict and partial arguments it receives and locks them in for the rest of the import recursion.

As an aside, if all options were collated into a env object from the start, the parameter signature of import_loop, convert, and friends could look like this:

def ...(cls, instance_or_dict, env)

Although incorporating many options into a single object may be considered a bit opaque, it's clearly preferable to always passing a dozen values to every method and their aunts:

model.validate(...)
=> validate(...)
=> import_loop(...)
=> field_converter(...)
=> field.validate(...)
=> submodel.validate(...)

That's five calls for each level of recursion.

BREAKING CHANGE WARNING

I've gotten rid of the try–except constructs that otherwise would have been all over the place.

Considering that the env container is primarily needed in a recursive process, the method signature dilemma has been solved by requiring compound types (MultiType descendants) to accept the env argument in their to_native and validate_ methods. Simple types are called without env.

Since custom compound types need a (trivial) change as a result, this would be a great opportunity also to get rid of the context parameter that appears in a ton of places. It would be better placed inside env as well.

Other changes incorporated here

  • Model.__init__ now accepts the partial parameter too (default=True because it was the de facto default already, specified in the main convert func).
  • Fixed a bug where ListType.validate_items() was added to the list of validators twice: once explicitly and once via collation by TypeMeta.

This patch introduces a metadata object that is propagated through
every level of a recursive import/validation process. It is used to
carry options and other data that need to be available throughout
the entire process, not just during the outermost loop.
@lkraider
Copy link
Contributor

lkraider commented Apr 9, 2015

I like the idea here, having the environment accessible during the loop helps to keep any options inspectionable during the recursion. Do we need the pass-by-copy though? Couldn't we devise a structure similar to namedtuple that would be immutable during the loop?

Also, the context usage should probably be cleaned up to pass only opaque user data, as requested in #274, while adding another dict to store the already validated data (this is what import_loop uses the context for now).

@kstrauser
Copy link
Contributor

I like this, but:

Simple types will still need to receive env if we're going to move context into it.

I agree with @lkraider about context being a user-generated data bag. The intent was that a user could pass in information telling types how to export (or import) data within the context of the current situation. The canonical example is the MultiLingualString, where it can export to many different languages but won't know which one at instantiation time. The same could be true for imports. Say you don't know what locale you'll be parsing data from at model creation time, but don't want a hypothetical StrictlyLocalizedFloatType to accept "123,45" in a en_US context because that's not a valid representation in America, and "123.45" isn't valid in de_AT. We should support context in the import loop symmetrically to how it's supported in the export loop.

Since this is a (mildly) breaking change, we'd need to bump the minor version at least.

@jmsdnns
Copy link
Member

jmsdnns commented Oct 20, 2015

Sorry, it is not clear to me what problem this solves.

The reason the arguments from the outer most loop are locked in is to keep the arguments consistent throughout the entire process. If you want a different result, export with different arguments.

Could you describe the problem this change solves?

@jmsdnns
Copy link
Member

jmsdnns commented Oct 20, 2015

One thing currently blocking the Undefined feature is the fact that import/validation parameters (importantly, partial=True) only affect the first level of processing. If you have nested models, your settings no longer apply.

OK. I am thinking on this here.

Are you saying the value for partial is not passed throughout the rest of the export loop?

@bintoro
Copy link
Member Author

bintoro commented Oct 20, 2015

Are you saying the value for partial is not passed throughout the rest of the export loop?

Yes. And neither does strict, see #316. Part of the problem is that changing the argument lists is a breaking change. That's why I propose we set up this container where parameters can added in the future without breaking things.

@jmsdnns
Copy link
Member

jmsdnns commented Oct 20, 2015

ohhhhh well THAT is a bug that should be fixed.

I don't think we need to pass around an environment variable as much as we need subsequent calls to use the same keyword arguments.

I am not sure what the cause is yet, but it's supposed to copy the initial arguments the whole way down.

@bintoro
Copy link
Member Author

bintoro commented Oct 20, 2015

The reason I prefer a single environment object is that, once it has been implemented, it's zero maintenance going forward. The same cannot be said about repeating this huge argument list in 100 methods around the codebase, particularly since authors of custom types also need to keep their code in sync with the full range of available arguments.

Reliance on individual keyword args is probably what has led to this kind of conditional code.

@jmsdnns
Copy link
Member

jmsdnns commented Oct 20, 2015

Hmm. OK. I better understand what you're trying to do now. I see the point about every type having to support the same keyword arguments now too.

What we'd get with this approach is a dict that has all the values we care about, but rogue fields would be ignored by types that don't know how to use particular values. This is the primary strength as I see it.

I think I am on board with this but wanna think on it a little longer.

Thanks for explaining that as I worked my way through what you're thinking about.

@kstrauser
Copy link
Contributor

Agreeing with @bintoro. I wrote #318 and it's an ugly mess that touches everything. I'd rather rip the band-aid off and be done with it so we can add future features without breaking all the things.

@jmsdnns
Copy link
Member

jmsdnns commented Oct 20, 2015

srsly blowing my mind that the arguments weren't preserved... what's even
the point of a strict flag if it's ignored by compound types?!

we'll get this worked out properly.

On Tue, Oct 20, 2015 at 5:00 PM, Kirk Strauser notifications@github.com
wrote:

Agreeing with @bintoro https://github.com/bintoro. I wrote #318
#318 and it's an ugly mess
that touches everything. I'd rather rip the band-aid off and be done with
it so we can add future features without breaking all the things.


Reply to this email directly or view it on GitHub
#266 (comment)
.

jms dnns
me http://jmsdnns.com // facebook https://www.facebook.com/jmsdnns //
github https://github.com/jmsdnns/ // twitter
https://twitter.com/jmsdnns

@kstrauser
Copy link
Contributor

Hey, we were all surprised by that one. 😄 This is off-topic for this particular PR, but I'll mention it anyway: even if we merged this today we'd still need to reimplement #318 on top of it to get strict working. I'd be so happy to throw away all the special-case code and start over with a clean foundation, though.

@jmsdnns
Copy link
Member

jmsdnns commented Oct 20, 2015

I've been away from schematics for a while, so my memory on things is
rusty, but let it be known that my goal is to always play like a champion.

if we know what the ideal solution to something is, let's always do that.
if not, we can talk until it seems obvious what to do. :)

we'll probably do this one, and then i guess that will require a new
implementation of 318.

i'll think on this one and hopefully merge it in really soon.

On Tue, Oct 20, 2015 at 5:05 PM, Kirk Strauser notifications@github.com
wrote:

Hey, we were all surprised by that one. [image: 😄] This is
off-topic for this particular PR, but I'll mention it anyway: even if we
merged this today we'd still need to reimplement #318
#318 on top of it to get
strict working. I'd be so happy to throw away all the special-case code and
start over with a clean foundation, though.


Reply to this email directly or view it on GitHub
#266 (comment)
.

jms dnns
me http://jmsdnns.com // facebook https://www.facebook.com/jmsdnns //
github https://github.com/jmsdnns/ // twitter
https://twitter.com/jmsdnns

@kstrauser
Copy link
Contributor

I haven't had the opportunity to be as active lately as in the past, but I haven't forgotten about it. Also, #324 is still open. One day it may be appropriate to have a new major version with a new API. Or not. But we can discuss it to see what makes the most sense!

@bintoro
Copy link
Member Author

bintoro commented Oct 21, 2015

I've also been inactive for most of this year due to vision problems that kept me away from coding. In the past few weeks I've been looking at Schematics again, and it's an interesting coincidence that @jmsdnns is now back as well.

Another reason this PR was left pending is that it's a kind of breaking change. And if we're going to break things, then it would make the most sense to try to identify as many legacy pain points as possible and target a new (major?) release that would bundle all the backward-incompatible changes together. Then, the API could again remain constant for the foreseeable future.

Also, something to figure out is what to do with the arguments (such as context) that already exist throughout the call chain. In #316, I floated the idea that context should go inside env too, which got tentative support from @kstrauser.

The way I envisioned this, individual arguments (existing and future) should remain available in methods that are typically directly called by application code using Schematics. For example, Model.__init__ would still take strict and partial arguments. But in the majority of the other ("internal") methods, particularly those inside type classes, these would be absent and env would propagate the values.

@bintoro
Copy link
Member Author

bintoro commented Nov 15, 2015

I think it's time to move on this, mostly because the Undefined feature needs this, but also because it's been nearly a year and there's no point in waiting any longer.

I'm currently rewriting the PR, and I'd like some input on what to do with context.

Based on prior discussion, the context argument in import_loop means "trusted data" and should be renamed to reflect that. That much is clear.

The other usage ("arbitrary user data") could otherwise continue to be called "context", but I think we'd like all conversion functions to accept such arbitrary data, including import_loop. Well, we can't just suddenly change the meaning of context in the import_loop signature from trusted data to arbitrary user data.

That said, once env is available, the ability to pass options as individual arguments is just a convenience:

instance.import_data({'name': 'bintoro'}, partial=True, strict=True)
instance.import_data({'name': 'bintoro'}, env=ConfigObject(partial=True, strict=True))

These two statements are equivalent. In the former, import_loop just takes care of constructing the env object.

If we forgo this convenience for context, then I think it's OK to continue to use that name, since it won't appear in the import_loop (or any other) signature any longer.

The other option is to eliminate "context" throughout the library and replace it with something like "private", "app", "user", ...

(Basically, env already is a context container where applications could put anything. But I think it's a good idea to have a dedicated object to keep the user namespace separate from reserved parameters.)

My current thinking is that we could just drop context from function signatures because

  • it's not the most commonly-used feature
  • when you need it, creating the env object is very easy
  • "context" is a pretty good name for user data.

EDIT:

And just to be clear, env can appear together with individual arguments as well. Let's say I just want to add context to the first statement above:

instance.import_data({...}, partial=True, strict=True, env=ConfigObject(context={...}))

Also, the code that wants to use the context data is going to access it via env.context anyway, so it kind of makes sense to set it through the same route.

EDIT 2:

I'll probably add support for supplying env as a dictionary (to avoid having to import ConfigObject), which will simplify the above to:

instance.import_data({...}, partial=True, strict=True, env={'context': {...}})

@bintoro
Copy link
Member Author

bintoro commented Nov 16, 2015

Closing in favor of #359.

@bintoro bintoro closed this Nov 16, 2015
@bintoro bintoro deleted the meta branch November 16, 2015 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants