Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep a reference to the current document, 'rename' and 'remove_unknown' options #113

Closed
wants to merge 23 commits into from
Closed

Conversation

misja
Copy link
Contributor

@misja misja commented Jun 16, 2015

This relates to #95 , where no reference is kept to the document being validated and hence not being able to change or remove fields. This commit maintains references and adds a current property to the validator, pointing to the current document and making it available to custom validators.

@@ -701,6 +722,13 @@ def _validate_allof(self, allof, field, value):
if not valid:
self._error(field, errorstack)

def _validate_rename(self, rename, field, value):
if isinstance(rename, _str_type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be caught upon schema-validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think schema validation will check the field value type only, the rename field is just a configuration option which should be a string since since its value will replace the original field name in the document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't get your point.
what hinders you from checking if the provided value in the schema is a string in validate_schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'll add the check there and will change the type to collections.Hashable.

@funkyfuture
Copy link
Member

there should be a remove_unknown-rule, similar to allow_unknown.

@@ -298,6 +316,9 @@ def _validate_definition(self, definition, field, value):
if validator:
validator(definition[rule], field, value)

if 'rename' in definition:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this happen before further validation like coercion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've purposely added it last since a rename will change a document field which could break other (custom/user) validators looking for the original field name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean the possibility to override this option via a rule, that can be placed in a schema-defintion. since allow_unknown (as Validator-property) is inherited by derived validators, there is also the constraint allow_unknown in order to override this. please see the documentation about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've purposely added it last since a rename will change a document field which could break other (custom/user) validators looking for the original field name.

intuitively i'd expect that first a normalization is applied and the result is then validated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll put the rename first. As for adding a remove_unknown rule to the schema, will do.

@funkyfuture
Copy link
Member

as @misja points out this actually solves #107. which is a very cool incident, so i can leave town w/o that issue open. i'm going to investigate and review more tomorrow regarding this issue.

an extension to the schema like the following works fine to check the correct execution order and result:

'propertyschema': {'type': 'string', 'regex': '^bar$'}

imo, the form a_dict[…] is more suitable in the tests than a_dict.get(…). functionally there's nothing wrong about it, it's that cultural thing.

oh, what happens in a case like this?:

schema = {'foo': {'type': 'string'},
          'bar': {'rename': 'foo'}}
document = {'bar': 42}

@misja
Copy link
Contributor Author

misja commented Jun 16, 2015

Right, I just did a final commit, just a tweak really. I did leave the rename at the end since changing the field name also requires modifying the schema during the validation, and frankly don't have the time for it anymore ... Same goes with adding remove_unknown to the schema - I tried but I keep losing the reference to the current document there. It will probably have to do with the field and values getting split everywhere to pass field and value to rules where it might probably better to pass the current document (self.current now) and field. It's essentially the difference between the following, where the latter will fail:

x = {'root': {'child': 'my value'}}
y = x['root']
y['child'] = 'other value'
assert x == {'root': {'child': 'other value'}}
x = {'root': {'child': 'my value'}}
y = x['root']['child']
y = 'other value'
assert x == {'root': {'child': 'other value'}}

@nicolaiarocci
Copy link
Member

Cool stuff.

One thing I'd do however, is make current a property, like errors, with proper self-documenting docstring. This way when docs are generated, devs take notice of this new feature.

@nicolaiarocci
Copy link
Member

Also, for next time maybe, make sure you split every new feature or fix into individual pull requests. Makes it easier on the maintainer and more importantly on project history and therefore, for others to review.

@nicolaiarocci nicolaiarocci changed the title Keep a reference to the current document Keep a reference to the current document, 'rename' and 'remove_unknown' options Jun 17, 2015
@CD3
Copy link
Contributor

CD3 commented Jun 17, 2015

I have been thinking about the 'reference to current document' feature recently. My though was to keep a current "path", probably just a member named _path that contains the keys required to get from the top of the document down to the current document.

What do you think about doing it this way instead. For my own applications it would be useful to know where in the document validation is occurring, but I think it would have the added benefit of not having to keep the main and current documents in sync.

@funkyfuture
Copy link
Member

i was also thinking to refactor some stuff. especially a segregation of normalization of a document as a whole before validation.

what you propose as path may well be what i intend as trail with regards to #93.

but i can look into things at earliest as July.

@CD3
Copy link
Contributor

CD3 commented Jun 17, 2015

I think you are right. Perhaps the Validator class keeps a trail that is just deep copied into _errors?

@funkyfuture
Copy link
Member

that's the plan.

@CD3
Copy link
Contributor

CD3 commented Jun 17, 2015

nice. then I think it would be better to have a __get_current_document function that gives you a reference to the current document using the main document and trail.

@nicolaiarocci
Copy link
Member

I want/need to wrap up a 0.9 release soon, ideally as soon as #107 is solved.

I can merge this so it gets included with next release, which I suspect would be ideal for @misja, or keep it on hold until further development is accomplished. Given @funkyfuture and @CD3 planned developments around this, it might be advisable to hold for now. Thoughts?

@misja
Copy link
Contributor Author

misja commented Jun 18, 2015

@funkyfuture @CD3 as I see it, adding a layer to keep a trail is routing around a more persistent problem. One can perfectly iterate through a dict and keep pointers to elements within the document, it's just the current splitting up and passing of field and value to rules which messes things up.

@misja
Copy link
Contributor Author

misja commented Jun 18, 2015

@nicolaiarocci #107 could be solved now by cherry picking my commits related to setting current. As for future developments and reference handling, that's up to @funkyfuture and @CD3 :)

@nicolaiarocci
Copy link
Member

@misja I know you're going to hate me for asking but, would you please resubmit as separate PRs? remove_unknown; rename, #107 fix and/or current property? I'm a lazy bastard.

@CD3
Copy link
Contributor

CD3 commented Jun 18, 2015

Having a trail would allow validators to consider things like the current field's parents or values of fields at the level above the current field, which would be useful to me, but this is probably a rare need. Perhaps having copy of current would be more useful to more people. If it fixes #107 I like it.

@funkyfuture
Copy link
Member

unfortunately stress grew more than anticipated the last days and i won't be in town for two weeks.

i'm not very fond of a release atm. here are some reasons:

as i understand such thing would not fail respectively the behaviour is not determinate depending on order in which fields are validated, which is not inuitive imo:

schema = {'foo': {'type': 'string'},
          'bar': {'rename': 'foo'}}
document = {'bar': 42}

there's no way to change the behaviour concerning remove_unknown in subschemas.

in my impression the code is meanwhile wildly grown and some consolidating refactoring should be done before further features are added. several aspects have been spotted and noted in the recent discussions. few i haven't publicly articulated yet. unfortuantely i don't have the time now to comprehend all that.

this is still unclear to me. as the list as argument for types hasn't been released yet, i'd rather remove it as with anyof there's now a more generic solution.

also, @CD3 points out, that *of-rules don't work as expected in any constellation. or is this solved? if not, it seems that some refactoring is a key part in tackling that.

if a release is necessary soon, i'd pledge to make it a 0.9-rc0.

@CD3
Copy link
Contributor

CD3 commented Jun 18, 2015

This issue has been fixed. The unit tests added in this commit demonstrates that both organizations will work.

@funkyfuture
Copy link
Member

thanks for the feedback.

@misja
Copy link
Contributor Author

misja commented Jun 18, 2015

@funkyfuture To be honest I'm fine if this pull request is put on hold, I'm at least glad part of it has helped squash a related issue (serendipity ftw!). Whilst working on it doubt has crept in though if this library is suitable enough for my use, so am leaving it for now.

@funkyfuture
Copy link
Member

that leaves the question about the removal of checking a list of types from my list. that's not too hard, once it's decided. we can also postpone an 'agglutinating' syntax for *of-rules onto a later release.

@CD3
Copy link
Contributor

CD3 commented Jun 18, 2015

I agree (but I realize I've only been contributing for about a week). You should remove the list of types feature before the release so that you/we can implement a generic solution without having to worry about breaking backward compatibility. If you don't remove it now, you won't be able to remove it for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants