New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to draft 06 #24

Open
binarylogic opened this Issue Oct 29, 2017 · 25 comments

Comments

Projects
None yet
6 participants
@binarylogic

binarylogic commented Oct 29, 2017

Draft 06 of the JSON schema has been released, here's a summary of the changes.

I might work on a pull request (time permitting), but wanted to get an issue up.

@TreyE

This comment has been minimized.

TreyE commented Nov 3, 2017

@binarylogic
I've begun work in this direction in https://github.com/TreyE/ex_json_schema/tree/draft06. It also includes my already existing PR that moves to the official schema testing repository for draft04 as well.

I've got it down to about ~60 failures, which is a good improvement over nothing, however the code still needs cleanup to remove some duplication, and I haven't yet decided if we should split the validators between the draft versions. For now, I've implemented much of the newer functionality as transforms to the draft04 schema (which allows me to leave the old code intact, but makes the error messages slightly confusing).

I haven't seen any updates by @jonasschmidt in almost a year now, so do you know if this project is dead and seeking a maintainer?

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented Nov 6, 2017

Thanks a lot for starting to work on this. I will take a look at the new draft and your branch to get a sense of the scope of those changes.

To be honest, I just haven't found the time to work on the project recently in between work, family and moving cities this year.

On the other hand, the only bigger improvement that was requested multiple times and is still missing is changing the way that errors are returned from strings to a data type that makes it easier to process the errors further. I already did most of the work in https://github.com/jonasschmidt/ex_json_schema/tree/validation-errors but haven't even found the time to take care of the last few %. I will most likely get a chance to finish that work in the following weeks. And also look into the rest of the smaller bugs that were reported.

So in a nutshell, I will put in some time until the end of the year and will decide then whether I can find the necessary time to put int the project. Or if I have to find someone to take over as maintainer.

Thanks a lot for taking an interest and moving the project forward.

@Zeeker

This comment has been minimized.

Zeeker commented Nov 23, 2017

How is the progress regarding this issue? Is there anything one can do to help?

@handrews

This comment has been minimized.

handrews commented Mar 12, 2018

Note that draft-07 is out, and we have published some release migration notes explaining the differences.

draft-07 is backwards compatible with draft-06

@asummers

This comment has been minimized.

asummers commented Apr 20, 2018

Just a quick note -- I'm working on resolving the differences for version 6 then eventually version 7. It's likely diverged enough from master to be its own project, unless @jonasschmidt is willing to accept some rather large PRs. I'm down to 4 failures against the official suite for v6 with no skipped tests thanks to the work by @TreyE. I will update here when publicly available and update on further progress on v7.

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented Apr 20, 2018

I'm definitely open to accepting any work done on supporting draft06/07. Still haven't found the time to look into it more deeply, but as soon as someone comes up with a working version, I can help with getting it merged and released.

@asummers

This comment has been minimized.

asummers commented Apr 20, 2018

The tl;dr of what I'm doing is following along with the pattern established by moving the Properties, Format, etc. logic out to their own modules, and then going a little further and giving them control over their internals of even accepting the validation, versus matching in the outer scope (likely introducing a behaviour). From there, since they'll all be given the root schema, they can switch on behavior relative to specific versions. That should dovetail into the work for making better error messages, as they can individually start producing structs or whatever we go with for consumption by the main validation chain in validate.ex. The internals are a little hard to parse with remote refs in schema.ex, but that's the last blocker to getting 6 working. After that, backporting and switching on v4 behavior will happen, then based on the patterns established there, adding in v7 support.

I imagine the internals will look a lot like:

@behaviour ExJsonSchema.Validator

@impl ExJsonSchema.Validator
def validate(root = %{version: 4}, schema, {"$ref" => ref}, data) do
  validate_v4(root, ref, data)
end

Then the outer module can just reduce over a list of Validator modules provided explicitly, removing intimate internal knowledge from the dispatcher.

@asummers

This comment has been minimized.

asummers commented Apr 20, 2018

That also allows non official extensions to be added, such as with the uuid format supported in AJV, as someone can just make a new module with that behaviour, and attach to a list of custom_validators or something.

@handrews

This comment has been minimized.

handrews commented Apr 20, 2018

@asummers supporting non-official extensions is great as a lot of draft-08 will be about making extensibility easier. And easier to detect than the current situation of "well, if it's not in the spec it's an extension, but who knows what it means?"

@asummers

This comment has been minimized.

asummers commented Apr 20, 2018

@handrews do you have a ballpark timeline of when you expect draft-08 to be released? No real reason behind the question other than curiosity and knowing when the underlying tooling would need to be adapted.

@asummers

This comment has been minimized.

asummers commented Apr 21, 2018

@jonasschmidt master...asummers:v6-upgrades This is the current diff of my branch. I was wrong about only having four failures, there were several false test positives due to a bug I had in the test setup. It now has 16 failures for v6 that I'm working through, mostly in the ref resolution. It has the general shape of what I'd expect it to look like now, so please give it a look over and let me know if you have any fundamental problems with the approach. Otherwise I'll keep on keeping on.

Interesting notes:

  • https://github.com/jonasschmidt/ex_json_schema/compare/master...asummers:v6-upgrades?expand=1#diff-7b11e4a3423b62253a7fd16d8c4b1e1fR2 This file is significantly simpler now. I rather like how that turned out

  • I added the Validator behaviour in that module, but I have no problems moving that to not pollute that file if that makes sense.

  • Test cases are now in a CaseTemplate that you simply give the URL for the tests, and it will go and do the right thing. I removed the ignore tests behavior because I wanted to just truck through it all, and that functionality made that code a lot more complicated, but this could be added back in a simpler fashion if we need to.

  • All JSON Schema concepts now have their own validators so it suggests an entry point for all future spec compliant checks, and since the validator.ex is just given a list, we can augment it with any module that expresses a validate/4 function per the behaviour. Unsure what that API should ultimately look like, but it allows that flexibility.

  • I have not added moduledocs anywhere but will eventually.

  • I removed existing tests as the behavior being checked did not seem to match v6 official tests. I imagine I'll have to fix that behavior when I back in v4.

  • https://github.com/jonasschmidt/ex_json_schema/compare/master...asummers:v6-upgrades?expand=1#diff-f5b1af2dd6c9a42d56541733adb15260R32 This is kind of fancy. I was getting errors with my other approaches and this came out cleanly, I think.

  • I tag all tests now. I found myself needing to run either an explicit test or a whole suite of them, so I adjusted the runner to allow mix test --only json_schema_multipleOf. I needed the prefix because "type" is reserved, apparently. Additionally, I allow you to opt into a test you're looking at individually by expressing its name. This allows more targeted debugging for failures.

  • master...asummers:v6-upgradesdiff-6445d87a5a54d579bdb0848cf521ec3eR10 Shows how some backporting could be done. Not sure how much I love that exact implementation, but it illustrates the idea and passes the test suite.

This is very much a WIP, so I'll be sure to remove all debug statements and get Dialyzer happy before PRing.

@asummers

This comment has been minimized.

asummers commented Apr 22, 2018

Okay further updates:

master...asummers:v6-upgrades

I have added back in ignoring of tests. You can either ignore by suite ("if-then-else" as in v7), or by test name and description ("validation of regular expressions: a regular expression with unclosed parens is invalid"). With this, we have full ability to iterate on v4, v6 and v7 failures, and addition of v3 if we choose to add it. I have no such plans to add v3 to be honest, but the pattern is clear for if that were to be necessary for someone.

There are currently 79 skipped tests among the 3 suites being tested, but many are the same underlying errors around refs and some dependency tests.

With the ignoring of tests, all 3 test suites are passing, and I think this warrants PRing into some feature branch somewhere. It can be iterated on, and the remaining few failures can be moved to zero.

I began introducing moduledocs for the various validators, linking to their docs in the specs. This pattern is clear as to how to proceed forward. Don't know if that documentation is sufficient, but it begins the conversation there.

There is currently one failing Dialyzer error that I'll hopefully figure out in the next day or so.

I did a bunch of cleanup all around, and tried to simplify concepts where I saw them. Please let me know how I should proceed with getting this into the mainline branch.

@asummers

This comment has been minimized.

asummers commented May 6, 2018

@jonasschmidt Hey there, just a ping, wondering if you've had a chance to read through any of the above.

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented May 8, 2018

Sorry for losing track of this. I had this filed under "you're still working on it" and it completely slipped my mind.

The work you did so far is really impressive, thanks a lot! I will have some time today to look into the details and give you feedback, and then we can come up with a plan to move forward and get it merged.

@asummers

This comment has been minimized.

asummers commented May 8, 2018

No worries whatsoever! It's still WIP as far as ref management, but the lion's share of features are working as expected. We're using it in production for non-ref'd schemas and it works well. I'm still a bit unclear on the ref semantics, as they're not mentioned in the spec explicitly, but those seem discrete enough that they make sense as a separate PR to the branch, probably.

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented May 8, 2018

Yes, the whole ref handling business is a pain to wrap your head around. Especially when it comes to altering the resolution scope inside a schema with the id: http://json-schema.org/draft-04/json-schema-core.html#rfc.section.7.2

It also used to be a bit easier in an earlier implementation because I just used closures to resolve references, which were already bringing their scope with them. Unfortunately that did not work with Erlang releases though, so the whole thing became a bit more involved.

I will also have to look into what the later drafts have to say about refs. I seem to remember some discussions about dropping support for all kinds of complex and mind-boggling ref behaviour.

@asummers

This comment has been minimized.

asummers commented May 8, 2018

My understanding (and maybe @handrews can chime in here) is that Draft-08 is giving them proper treatment. There were several open issues on the spec repo around cleaning up their semantics.

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented May 8, 2018

I created a branch that you can send a PR to (https://github.com/jonasschmidt/ex_json_schema/tree/multi-draft-support).

It seems like a bunch of the test failures regarding refs are about those scope changes that I was referring to before. I think most of them fail against the current implementation as well, so they could definitely be handled separately.

If it's not too much effort, it would be great to fix them as part of this overhaul, to get to a clean slate. Otherwise there's the added difficulty of merging/rebasing if those fixes are made on master. I can definitely help with that.

@handrews

This comment has been minimized.

handrews commented May 10, 2018

@asummers Not quite sure of the draft-08 timeline, although I'm still actively working on it. I also just quit my job and am taking the summer off from the tech industry. So it depends on how much tech-related work I feel like doing (but here I am commenting on things so obviously I'm doing some :-) )

I'll comment separately on the other points you and @jonasschmidt raise above.

@handrews

This comment has been minimized.

handrews commented May 10, 2018

@asummers

I'm still a bit unclear on the ref semantics, as they're not mentioned in the spec explicitly

@jonasschmidt

I will also have to look into what the later drafts have to say about refs. I seem to remember some discussions about dropping support for all kinds of complex and mind-boggling ref behaviour.

While I don't like to speak ill of work done in good faith, the approach that draft-04 took to the concepts was very complicated and seemed to confuse many people. Austin Wright rewrote all of that for the following draft (reframing "canonical" vs "inline" in terms of "internal" and "external", although see below for further changes), and we've tweaked the wording every time since then.

In draft-handrews-json-schema-01 (which is still "draft-07" as it's just a bugfix/clarification of handrews-00) we reframed the whole mess again in terms of how you load schemas and dereference $ref for previously loaded vs unknown ref targets. This seems much more straightforward than earlier approaches.

We also added a lot of examples to clarify how $id and $ref work.

As far as I know, this is all essentially the same as draft-04, just disentangled and with the emphasis placed on different ways of thinking about things. Any odd cases that were "changed" between draft-04 and draft-07 we're pretty sure were bugs in the first place. It just boils down to totally normal RFC 3986 URI reference resolution, plus being able to use a previously-loaded schema if available.

Feedback on the latest text for this area is very much appreciated. The concepts aren't really that complicated, they've just been made complicated for... um... idek. Reasons.


$ref changes slightly in draft-08 to clarify its semantics. We're nailing down the language in PRs right now, actually.

Specifically, instead of treating $ref as (conceptually) being replaced by the reference target, we are changing it (again, conceptually) so that the $ref keyword simply has the same result as the reference target. This means that other schema keywords can be used alongside of $ref and combined just as any other keyword result is combined. TL;DR: it does what you'd expect, and what many people try to do already.

By "conceptually" I mean that implementations may actually do something different internally, like make a copy of the reference target, treat it as if it's stuffed into an allOf with the adjacent keywords, etc. As I said, we're sorting through this in PRs right now if anyone wants to comment. It's a significant change, but we think it gets rid of some seriously counter-intuitive behavior.

@handrews

This comment has been minimized.

handrews commented May 10, 2018

Also, modeling $ref as replacement produces some logical inconsistencies when the target and the source use different meta-schemas. I'll spare you the details as it's pretty mind-bending. Modeling it as "has the same result" means that you don't care what meta-schema the target is using. The source and target can be processed independently, with different meta-schemas if necessary.

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented May 14, 2018

Thanks @handrews for that in-depth explanation of what's happening with refs. That was really helpful. I have to say that the additional tests around the different ref behaviours in the official test suite help a lot as well. They weren't there when I came up with the original implementation so I had to make sense of the spec (and it turns out I got it slightly wrong). I always find it easier to implement against a set of tests than against a written spec (that official test suite is such an awesome idea in general, every standard should have that).

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented May 14, 2018

@jonasschmidt

This comment has been minimized.

Owner

jonasschmidt commented Jun 1, 2018

Huh, just found out that the script that's run on Travis didn't actually run the tests but only Dialyzer 🤦‍♂️ . Turns out that mix dialyzer --halt-exit-status actually halts and the coveralls.travis task that runs the tests is never executed. That behaviour must have changed recently.

Just found out that the build was actually broken since this commit here: 27d4020.

I reverted that commit and fixed the Dialyzer errors only. I hope that doesn't uncover other tests that were broken for an older Elixir version somewhere in between :)

@asummers

This comment has been minimized.

asummers commented Jun 1, 2018

Interesting. Fingers crossed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment