[#134] Version schema check for ["object","null"] #738

kindly · 2018-07-30T10:26:16Z

No description provided.

jpmckinney · 2018-07-30T13:41:01Z

standard/schema/utils/make_validation_schema.py

@@ -67,7 +67,7 @@ def add_versions(schema, location=''):
                version_properties["value"] = new_value
                schema['properties'][key] = version

-        elif prop_type == "object":
+        elif (prop_type == "object" or prop_type == ["object", "null"]) and "properties" in value:


Can we do a more robust test, where prop_type is coerced to a list, and we check for the inclusion of object in that list? We can also have NotImplementedError for cases/combinations for which we don't have defined behavior.

Checking for object in the list is not good enough due to former_value which has type ["string", "number", "integer", "array", "object", "null"].
There is also Organization:details that is an ["object", "null"] but does not have any properties.
Both these cases probably want to be treated as a whole and is why I chose to be so specific about this case.

I am not sure this tool is the correct place for more in depth schema analysis and make sure we are consistent with our type usage.

Checking for object in the list is not good enough due to…

Indeed - that was my meaning when writing "We can also have NotImplementedError for cases/combinations for which we don't have defined behavior." If the defined behavior is to run the default case (for former_value, etc.) then the code should obviously continue to do that.

My use case is: "As standard editor, I should not have to keep track of the assumptions various tools are making about the structure of the JSON Schema that I am editing."

As with ocdsmerge, it would be good to better document this code, as it has several specific conditions that would benefit from a comment. In the absence of comments, I'm curious as to whether there should be a special case for an array of strings (which in ocdsmerge is treated as wholeListMerge even if wholeListMerge is not set open-contracting/ocds-merge#14).

I am not sure this tool is the correct place for more in depth schema analysis and make sure we are consistent with our type usage.

I don't understand this sentence. Can you expand?

What I meant was that we should probably have a tool/test of some kind, separate to these tools, that checks the schema for new or different uses of jsonschema that we currently have not used, or documented, that falls into undefined behaviour.
I do not think that tool/test should exist in this script or ocdsmerge and they are not necessary the best place to comment on the edge cases.

Any new behaviour that falls into default behaviour (without explicit commands) should be documented in http://standard.open-contracting.org/latest/en/schema/merging/

For example if a schema editor writes a oneOf, anyOf ... then the merge tool, or this tool, will not know what to do with them either. There are huge amounts of other edge-cases that we probably have not covered.

This will be difficult to write the NotImplementedErrors without guessing all the possible usages a schema editor can write before they write it.

I don't think the number of edge cases is that large (the JSON Schema spec is small). If we broke long if-statements into single condition statements, and then added else: raise NotImplementedError to each, I think we'd be most of the way there.

I could see some sense in having one thing that identifies new cases by e.g. raising NotImplementedError for unexpected cases, instead of making that every tool's responsibility. However, if the number of cases is small, then I might prefer doing it everywhere anyway, as we have no assurance that we'll always remember to run this extra check. Furthermore, even if we have one thing that identifies new cases / lists all cases to implement, that doesn't really help us check that those cases are in fact implemented in all tools. If all tools raised errors for cases in the schema that they can't handle, we'd be guaranteed to notice that they aren't handling those cases.

If we are concerned about duplication of effort, we have some options, including:

Write a schema parser with a simple API that emits parts of schema to tools. If the tool doesn't recognize a part, it errors. This adds an intermediary betweens tools and schema, which has pros (centralizing some logic and case-handling) and cons.

Have all tools call a JSON Schema auditor library during initialization, which – I don't know – returns the 'features' the schema uses, which the tool then compares to its known list of implemented features, and fails if a feature isn't implemented.

There may be other options. However, I wouldn't be confident in a new tool that is entirely divorced from other tools.

I don't think the number of edge cases is that large (the JSON Schema spec is small). If we broke long if-statements into single condition statements, and then added else: raise NotImplementedError to each, I think we'd be most of the way there.

As I said I do think the the amount of edge cases are large. There 6 different types in jsonschema we have 3 different merge rules and theoretically a schema could have any subset of these. That is 2^9 = 512 different ways to express what can go into into a field that could effect merging behaviour. Of course each if/else could cover a large subset of these, but I would not be confident of coving all combinations. Also there are other cases like not putting in properties or items, what of the types are that are allowed in a list and constructs like anyOf, oneOf.

When writing the original make_validation_schema.py and ocds-merge I wrote a script to analyse all the different combination of types and rules that were actually used and made sure there was logic to cover those in use. As that seemed like the practical option.

This issue, and the ocds-merge one (open-contracting/ocds-merge#14) have come up due to new combinations of these appearing that had not appeared before those were written.

My point about the test, was to check that schema writers are not adding any new combinations of these things, without considering the consequences to merging behaviour.

The only way to do that would to be to fix the combinations to the ones we currently use and if there are any new ones then the tests against the standard or an extension should fail. We could add combinations, but we would have to do that at the same time as modifying the tools.

Only then, with a list of allowed combinations with what there expected behaviour is, can the tools be confident that they are not actually doing some undefined behaviour by accident and can raise a NotImplementedError. In that case a long list of very specific if/else statements would be workable in all tools.

I think having a test makes sense. Can you share the script you wrote to analyze the combination of types and rules, which can perhaps be reused when authoring that test? I imagine we can add the test to https://github.com/open-contracting/standard-maintenance-scripts/blob/master/tests/test_json.py

I started a page documenting what would need to change if we were to change the layout of codelist CSV files, extension.json files, etc. I also have a short list of repos that would need to be changed if extensions were changed. We can consider making a list of things that would need to change if a new pattern is used in the JSON Schema. (I'm also happy to reorganize the above content – I just wrote those to help myself.)

jpmckinney · 2018-08-01T16:44:12Z

From my perspective, this PR can be merged, though we should add a note to the changelog first. I propose (please feel free to suggest corrections or edits!):

## [1.X]

### Advisories

1.1.3 changed the merging and versioning behavior of the `unit` field of the `Item` object.

* If you are using compiled releases, `unit` information can now be removed entirely in later releases by setting the field to `null`; previously, only its subfields could be set to `null` and removed.
* If you are using versioned releases, `unit` information is now versioned as a whole; previously, its subfields were versioned individually.

jpmckinney · 2018-08-07T01:49:10Z

I've added the note to the changelog.

kindly · 2018-08-07T10:49:26Z

@jpmckinney sorry for taken so long to get round to this.

I do not think this pull request should be merged apart from the changelog.

There is some confusion, the code changes in this pull request change the behaviour back to what was in 1.1.2 and contrary to what is outlined in the changelog.

What I think needs to happen in this branch is:

Change the code so that logic is the same but is more explicit about the behaviour. i.e that ["object", null] is treated differently than ["object"]. I think the logic that says that any property that has type null should be versioned as a whole, as otherwise merging/versioning behaviour is undefined.
Make a comment in the docs that say that any property that has type null should be versioned as a whole, which is I think the desired behaviour now.
Make sure prop_type is coerced as a list as outlined above by @jpmckinney
Add the changelog (done)

Later we can get to making sure that all current uses of types and merge rules are consistent and that there are checks in place to make sure nobody is using any unexpected combinations of these rules.

The script I used when getting the current type/merge-rules used was a slightly modified version of this function (removing the conditional parts):
https://github.com/open-contracting/ocds-merge/blob/master/ocdsmerge/merge.py#L32
However, I think I could write a more thorough one now.

jpmckinney · 2018-08-07T15:25:27Z

@kindly Aha, right! Please go ahead with your proposed changes. You can create a new branch if that would be simpler or cleaner.

Also, please create the follow-up issues we discussed, e.g. adding a check about unexpected combinations.

jpmckinney · 2018-09-13T02:44:27Z

@kindly @robredpath Just pinging on this issue, in case it's not being tracked in the backlog.

…later date

…ase_schema()

…ioned()

…comments

…ypos in merging rules

kindly · 2018-11-20T14:48:25Z

@jpmckinney
I am finally getting round to looking at this.

What is the status of the changes you have made?
Would you like me to review them?

On the whole (after looking at them for a while) the gist of changes look good but I have not got round to checking the logic thoroughly or checking the output compared to the previous output.

Would it be beneficial for me to do that?

jpmckinney · 2018-11-20T22:53:18Z

I had to drop off and work on other things for a bit. Looking back at my work, I had a few things I wanted to follow-up on.

At the OCDS level, I'm not sure if making unit nullable was the right decision, given the impact on versioning (which wasn't well understood when we worked on #630). Reverting to null not being an option if the type includes object would mean removing null from:

Core
- /definitions/Organization/properties/details
- /definitions/Item/properties/unit
Extensions
- ocds_lots_extension: /definitions/LotDetails
- ocds_location_extension: /definitions/Location/properties/geometry, /definitions/Location/properties/gazetteer
- ocds_finance_extension: /definitions/Finance/properties/interestRate
- ocds-shareholders-extension: /definitions/Organization/properties/beneficialOwnership
- ocds_performance_failures: /definitions/PerformanceFailure/properties/period
- ocds_metrics_extension: /definitions/Observation/properties/unit, /definitions/Observation/properties/dimensions
- ocds_tariffs_extension: /definitions/Observation/properties/unit, /definitions/Observation/properties/dimensions

What do you think? That would avoid 1.1.3 introducing a change in how Unit is versioned. If it sounds reasonable to make this a bug fix, I've staged those changes locally (to see what'd happen).

At a technical level:

The old logic didn't account for the presence of minLength when replacing sections with e.g. StringNullVersioned. In the new code, I continue to drop minLength to maintain the same behavior. However, should this keyword instead be preserved? I assume so.
There's an exceptional case made for /Amendment/changes in the new code, whose versioning behavior seems to have changed in Make make_validation_schema simpler and useful for extensions #412. I don't know what the correct behavior ought to be, but I suppose it doesn't matter as it's deprecated.
title, description, merging properties, etc. are removed from most of the schema, but not all of it. We should probably make those removals in a single, final pass, to avoid this inconsistency.
We can make more patterns like StringNullVersioned, e.g. StringIntegerMinLength1.

Once we resolve this, we can close #737, and we should fix:

open-contracting/ocds-merge#13
open-contracting/sample-data#82

kindly · 2018-11-21T16:26:11Z

@jpmckinney
I originally argued for reverting to null not being an option for type object but mostly I am undecided on this. I think its the correct thing to do for backwards compatibility in terms of organization.details an unit in the core standard even though we it broke 1.1.3, so I agree a bugfix makes sense. Nonetheless, as @timgdavies argued, in terms of usability they are probably better to versioned as a whole.

As to the the technical points:

Yes that was an oversight, I think the minLength changes came in around the same time or after this script this was worked on.
I was aware of amendmant.changes but decided to put my head in the sand about it as it was being deprecated.
Yes I noticed some title and description not being dropped and forgot to do anything about it.
Sounds like a good idea.

jpmckinney · 2018-11-21T17:16:57Z

Great. I suggest I make the OCDS-level changes in this PR (updating the changelog), which you can then review, and we can move the technical-level changes into new issue(s). I'll compare the new schema to the 1.1.2 schema and note any changes. When I last did this, the new script had not changed much.

jpmckinney · 2018-11-21T23:35:26Z

So, interesting… Organization/details allowed null since it was first introduced. I think that means I leave it as-is. details would be versioned as a whole, regardless.

jpmckinney · 2018-11-21T23:38:48Z

@kindly Ready for review. Differences (ignoring whitespace) from 1.1.2 in the generated schema are:

add a top-level description
add "type": "object" to versioned template
1.1.3 changes: Buyer name required in OrganizationReference #639 Remove "null" "type" from "items" under "additionalProcurementCategories" #646
1.1.4 changes: Typo fixes #726 Updates to the currency codelist, including addition of VES code #746

So, as expected.

kindly

This is a great improvement over the previous script, much cleaner and more readable.

I also agree with the backward compatible stance of unit and keeping organization.details

I can not approve as I originally started this pull request.

kindly · 2018-11-22T16:16:37Z

standard/schema/utils/make_versioned_release_schema.py

+
+warnings.formatwarning = custom_warning_formatter
+
+versioned_template = OrderedDict([


To make this more readable, how about putting multi line string here (with triple quotes) of the template as JSON and then loading it ordered i.e something like:

versioned_template_json = ''' { "type": "array", "items": { "type": "object", "properties": { "releaseDate": { "format": "date-time", "type": "string" }, "releaseID": { "type": "string" }, "value": {}, "releaseTag": { "type": "array", "items": { "type": "string" } } } } ''' versioned_template = json.loads(versioned_template_json, object_pairs_hook=OrderedDict)

Great! I've made that change, and will merge once tests complete.

[#134] Validation schema check for ["object","null"]

0fa5a3e

jpmckinney reviewed Jul 30, 2018

View reviewed changes

jpmckinney mentioned this pull request Aug 1, 2018

make_validation_schema.py creating different data for 1.1.2 vs 1.1.3 #737

Closed

kindly mentioned this pull request Aug 1, 2018

Update unit_example.py open-contracting/ocds-merge#13

Merged

Update changelog

ec11ef7

jpmckinney and others added 14 commits September 17, 2018 13:27

Merge branch '1.1-dev' into 134-version-schema

045ecd6

Use OrderedDict consistently, to avoid errors if keys are added at a …

7e92fb0

…later date

Update some variables names for clarity

801e08c

Fold add_string_definitions into get_versioned_release_schema

fe49f30

Update 'version' variable names for clarity

e9c3f64

$schema doesn't need to be overwritten

6b96ff0

Make make_validation_schema.py robust to additional required fields

0c777f8

Add and update comments, and change code order, in get_versioned_rele…

607f0ff

…ase_schema()

Add and update comments, and update code for readability, in add_vers…

0fea98b

…ioned()

Merge branch '1.1-dev' into 134-version-schema

bde8908

Change quoting style

53e265d

Update changelog to be more accurate

7b7c2d5

Rework add_versioned() for the logic to be easier to follow, and add …

2821d08

…comments

Merge upstream

ba4e7c1

jpmckinney mentioned this pull request Oct 6, 2018

Add script to test for use of novel JSON Schema features #757

Closed

James McKinney added 3 commits October 5, 2018 22:54

If type includes null, version as a whole

c621c34

Clarify merging rules when null is an allowed type, and correct t…

7c6026c

…ypos in merging rules

Correct changelog change in 7b7c2d5

5950777

jpmckinney mentioned this pull request Nov 9, 2018

Record package sample data validation issues open-contracting/sample-data#82

Closed

James McKinney added 2 commits November 20, 2018 16:16

Merge branch '1.1-dev' into 134-version-schema

6641a7c

Rename make_validation_schema.py to make_versioned_release_schema.py

dd216d0

Restore merging and versioning behavior of Item.unit from 1.1.2

8490d67

jpmckinney assigned kindly Nov 21, 2018

Simplify Unit.type declaration

661d816

jpmckinney mentioned this pull request Nov 22, 2018

Follow-up to versioned release schema updates #768

Closed

9 tasks

kindly commented Nov 22, 2018

View reviewed changes

Improve readability of JSON fragment

fca6279

jpmckinney merged commit 3bdae83 into 1.1-dev Nov 23, 2018

jpmckinney deleted the 134-version-schema branch November 23, 2018 20:27

jpmckinney added this to the 1.1.4 milestone May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#134] Version schema check for ["object","null"] #738

[#134] Version schema check for ["object","null"] #738

kindly commented Jul 30, 2018

jpmckinney Jul 30, 2018

kindly Jul 31, 2018

jpmckinney Jul 31, 2018

kindly Jul 31, 2018 •

edited

jpmckinney Jul 31, 2018 •

edited

kindly Aug 1, 2018 •

edited

jpmckinney Aug 1, 2018 •

edited

jpmckinney commented Aug 1, 2018

jpmckinney commented Aug 7, 2018

kindly commented Aug 7, 2018 •

edited

jpmckinney commented Aug 7, 2018 •

edited

jpmckinney commented Sep 13, 2018

kindly commented Nov 20, 2018

jpmckinney commented Nov 20, 2018 •

edited

kindly commented Nov 21, 2018

jpmckinney commented Nov 21, 2018

jpmckinney commented Nov 21, 2018 •

edited

jpmckinney commented Nov 21, 2018 •

edited

kindly left a comment

kindly Nov 22, 2018

jpmckinney Nov 23, 2018


		warnings.formatwarning = custom_warning_formatter

		versioned_template = OrderedDict([

[#134] Version schema check for ["object","null"] #738

[#134] Version schema check for ["object","null"] #738

Conversation

kindly commented Jul 30, 2018

jpmckinney Jul 30, 2018

Choose a reason for hiding this comment

kindly Jul 31, 2018

Choose a reason for hiding this comment

jpmckinney Jul 31, 2018

Choose a reason for hiding this comment

kindly Jul 31, 2018 • edited

Choose a reason for hiding this comment

jpmckinney Jul 31, 2018 • edited

Choose a reason for hiding this comment

kindly Aug 1, 2018 • edited

Choose a reason for hiding this comment

jpmckinney Aug 1, 2018 • edited

Choose a reason for hiding this comment

jpmckinney commented Aug 1, 2018

jpmckinney commented Aug 7, 2018

kindly commented Aug 7, 2018 • edited

jpmckinney commented Aug 7, 2018 • edited

jpmckinney commented Sep 13, 2018

kindly commented Nov 20, 2018

jpmckinney commented Nov 20, 2018 • edited

kindly commented Nov 21, 2018

jpmckinney commented Nov 21, 2018

jpmckinney commented Nov 21, 2018 • edited

jpmckinney commented Nov 21, 2018 • edited

kindly left a comment

Choose a reason for hiding this comment

kindly Nov 22, 2018

Choose a reason for hiding this comment

jpmckinney Nov 23, 2018

Choose a reason for hiding this comment

kindly Jul 31, 2018 •

edited

jpmckinney Jul 31, 2018 •

edited

kindly Aug 1, 2018 •

edited

jpmckinney Aug 1, 2018 •

edited

kindly commented Aug 7, 2018 •

edited

jpmckinney commented Aug 7, 2018 •

edited

jpmckinney commented Nov 20, 2018 •

edited

jpmckinney commented Nov 21, 2018 •

edited

jpmckinney commented Nov 21, 2018 •

edited