New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#134] Version schema check for ["object","null"] #738
Conversation
@@ -67,7 +67,7 @@ def add_versions(schema, location=''): | |||
version_properties["value"] = new_value | |||
schema['properties'][key] = version | |||
|
|||
elif prop_type == "object": | |||
elif (prop_type == "object" or prop_type == ["object", "null"]) and "properties" in value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a more robust test, where prop_type
is coerced to a list, and we check for the inclusion of object
in that list? We can also have NotImplementedError
for cases/combinations for which we don't have defined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for object
in the list is not good enough due to former_value
which has type ["string", "number", "integer", "array", "object", "null"]
.
There is also Organization:details that is an ["object", "null"]
but does not have any properties.
Both these cases probably want to be treated as a whole and is why I chose to be so specific about this case.
I am not sure this tool is the correct place for more in depth schema analysis and make sure we are consistent with our type usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for object in the list is not good enough due to…
Indeed - that was my meaning when writing "We can also have NotImplementedError
for cases/combinations for which we don't have defined behavior." If the defined behavior is to run the default case (for former_value
, etc.) then the code should obviously continue to do that.
My use case is: "As standard editor, I should not have to keep track of the assumptions various tools are making about the structure of the JSON Schema that I am editing."
As with ocdsmerge, it would be good to better document this code, as it has several specific conditions that would benefit from a comment. In the absence of comments, I'm curious as to whether there should be a special case for an array of strings (which in ocdsmerge is treated as wholeListMerge
even if wholeListMerge
is not set open-contracting/ocds-merge#14).
I am not sure this tool is the correct place for more in depth schema analysis and make sure we are consistent with our type usage.
I don't understand this sentence. Can you expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was that we should probably have a tool/test of some kind, separate to these tools, that checks the schema for new or different uses of jsonschema that we currently have not used, or documented, that falls into undefined behaviour.
I do not think that tool/test should exist in this script or ocdsmerge and they are not necessary the best place to comment on the edge cases.
Any new behaviour that falls into default behaviour (without explicit commands) should be documented in http://standard.open-contracting.org/latest/en/schema/merging/
For example if a schema editor writes a oneOf
, anyOf
... then the merge tool, or this tool, will not know what to do with them either. There are huge amounts of other edge-cases that we probably have not covered.
This will be difficult to write the NotImplementedErrors without guessing all the possible usages a schema editor can write before they write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the number of edge cases is that large (the JSON Schema spec is small). If we broke long if-statements into single condition statements, and then added else: raise NotImplementedError
to each, I think we'd be most of the way there.
I could see some sense in having one thing that identifies new cases by e.g. raising NotImplementedError for unexpected cases, instead of making that every tool's responsibility. However, if the number of cases is small, then I might prefer doing it everywhere anyway, as we have no assurance that we'll always remember to run this extra check. Furthermore, even if we have one thing that identifies new cases / lists all cases to implement, that doesn't really help us check that those cases are in fact implemented in all tools. If all tools raised errors for cases in the schema that they can't handle, we'd be guaranteed to notice that they aren't handling those cases.
If we are concerned about duplication of effort, we have some options, including:
- Write a schema parser with a simple API that emits parts of schema to tools. If the tool doesn't recognize a part, it errors. This adds an intermediary betweens tools and schema, which has pros (centralizing some logic and case-handling) and cons.
- Have all tools call a JSON Schema auditor library during initialization, which – I don't know – returns the 'features' the schema uses, which the tool then compares to its known list of implemented features, and fails if a feature isn't implemented.
There may be other options. However, I wouldn't be confident in a new tool that is entirely divorced from other tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the number of edge cases is that large (the JSON Schema spec is small). If we broke long if-statements into single condition statements, and then added else: raise NotImplementedError to each, I think we'd be most of the way there.
As I said I do think the the amount of edge cases are large. There 6 different types in jsonschema we have 3 different merge rules and theoretically a schema could have any subset of these. That is 2^9 = 512 different ways to express what can go into into a field that could effect merging behaviour. Of course each if/else could cover a large subset of these, but I would not be confident of coving all combinations. Also there are other cases like not putting in properties
or items
, what of the types are that are allowed in a list and constructs like anyOf, oneOf.
When writing the original make_validation_schema.py
and ocds-merge I wrote a script to analyse all the different combination of types and rules that were actually used and made sure there was logic to cover those in use. As that seemed like the practical option.
This issue, and the ocds-merge one (open-contracting/ocds-merge#14) have come up due to new combinations of these appearing that had not appeared before those were written.
My point about the test, was to check that schema writers are not adding any new combinations of these things, without considering the consequences to merging behaviour.
The only way to do that would to be to fix the combinations to the ones we currently use and if there are any new ones then the tests against the standard or an extension should fail. We could add combinations, but we would have to do that at the same time as modifying the tools.
Only then, with a list of allowed combinations with what there expected behaviour is, can the tools be confident that they are not actually doing some undefined behaviour by accident and can raise a NotImplementedError
. In that case a long list of very specific if/else statements would be workable in all tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a test makes sense. Can you share the script you wrote to analyze the combination of types and rules, which can perhaps be reused when authoring that test? I imagine we can add the test to https://github.com/open-contracting/standard-maintenance-scripts/blob/master/tests/test_json.py
I started a page documenting what would need to change if we were to change the layout of codelist CSV files, extension.json files, etc. I also have a short list of repos that would need to be changed if extensions were changed. We can consider making a list of things that would need to change if a new pattern is used in the JSON Schema. (I'm also happy to reorganize the above content – I just wrote those to help myself.)
From my perspective, this PR can be merged, though we should add a note to the changelog first. I propose (please feel free to suggest corrections or edits!): ## [1.X]
### Advisories
1.1.3 changed the merging and versioning behavior of the `unit` field of the `Item` object.
* If you are using compiled releases, `unit` information can now be removed entirely in later releases by setting the field to `null`; previously, only its subfields could be set to `null` and removed.
* If you are using versioned releases, `unit` information is now versioned as a whole; previously, its subfields were versioned individually. |
I've added the note to the changelog. |
@jpmckinney sorry for taken so long to get round to this. I do not think this pull request should be merged apart from the changelog. There is some confusion, the code changes in this pull request change the behaviour back to what was in 1.1.2 and contrary to what is outlined in the changelog. What I think needs to happen in this branch is:
Later we can get to making sure that all current uses of types and merge rules are consistent and that there are checks in place to make sure nobody is using any unexpected combinations of these rules. The script I used when getting the current type/merge-rules used was a slightly modified version of this function (removing the conditional parts): |
@kindly Aha, right! Please go ahead with your proposed changes. You can create a new branch if that would be simpler or cleaner. Also, please create the follow-up issues we discussed, e.g. adding a check about unexpected combinations. |
@kindly @robredpath Just pinging on this issue, in case it's not being tracked in the backlog. |
@jpmckinney What is the status of the changes you have made? On the whole (after looking at them for a while) the gist of changes look good but I have not got round to checking the logic thoroughly or checking the output compared to the previous output. Would it be beneficial for me to do that? |
I had to drop off and work on other things for a bit. Looking back at my work, I had a few things I wanted to follow-up on. At the OCDS level, I'm not sure if making unit nullable was the right decision, given the impact on versioning (which wasn't well understood when we worked on #630). Reverting to
What do you think? That would avoid 1.1.3 introducing a change in how Unit is versioned. If it sounds reasonable to make this a bug fix, I've staged those changes locally (to see what'd happen). At a technical level:
Once we resolve this, we can close #737, and we should fix: open-contracting/ocds-merge#13 |
@jpmckinney As to the the technical points:
|
Great. I suggest I make the OCDS-level changes in this PR (updating the changelog), which you can then review, and we can move the technical-level changes into new issue(s). I'll compare the new schema to the 1.1.2 schema and note any changes. When I last did this, the new script had not changed much. |
So, interesting… Organization/details allowed null since it was first introduced. I think that means I leave it as-is. |
@kindly Ready for review. Differences (ignoring whitespace) from 1.1.2 in the generated schema are:
So, as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement over the previous script, much cleaner and more readable.
I also agree with the backward compatible stance of unit
and keeping organization.details
I can not approve as I originally started this pull request.
|
||
warnings.formatwarning = custom_warning_formatter | ||
|
||
versioned_template = OrderedDict([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this more readable, how about putting multi line string here (with triple quotes) of the template as JSON and then loading it ordered i.e something like:
versioned_template_json = '''
{
"type": "array",
"items": {
"type": "object",
"properties": {
"releaseDate": {
"format": "date-time",
"type": "string"
},
"releaseID": {
"type": "string"
},
"value": {},
"releaseTag": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
'''
versioned_template = json.loads(versioned_template_json, object_pairs_hook=OrderedDict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I've made that change, and will merge once tests complete.
No description provided.