Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour of validator_for depending on http vs https #1182

Closed
berislavlopac opened this issue Oct 27, 2023 · 2 comments
Closed
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup

Comments

@berislavlopac
Copy link

There seems to be a problem when selecting a validator based on the $schema URL, using the validator_for function: specifically, some schemas can't be located depending whether the URL is http or https.

After some research, it looks like the current implementation assumes the following:

  • Schemas up to and including Draft 07 use http.
  • Schemas starting with Draft 2019-09 use https.

This assumption is incorrect, as in practice all http urls are redirected (with 301 response code) to their https counterparts, and https works for all; this short script shows what happens both when calling validate_for and retrieving the schema from the URL, with either protocol:

from jsonschema.validators import validator_for
import httpx

metaschemas = [
    "//json-schema.org/draft-04/schema#",
    "//json-schema.org/draft-06/schema#",
    "//json-schema.org/draft-07/schema#",
    "//json-schema.org/draft/2019-09/schema#",
    "//json-schema.org/draft/2020-12/schema#",
]

print("== schemas with http:")
for metaschema in metaschemas:
    url = f"http:{metaschema}"
    validator_http = validator_for({"$schema": url})
    remote_schema = httpx.get(url)
    print(url, remote_schema.status_code, validator_http)

print()

print("== schemas with https:")
for metaschema in metaschemas:
    url = f"https:{metaschema}"
    validator_https = validator_for({"$schema": url})
    remote_schema = httpx.get(url)
    print(url, remote_schema.status_code, validator_https)

This is the output of that script:

== schemas with http:
http://json-schema.org/draft-04/schema# 301 <class 'jsonschema.validators.Draft4Validator'>
http://json-schema.org/draft-06/schema# 301 <class 'jsonschema.validators.Draft6Validator'>
http://json-schema.org/draft-07/schema# 301 <class 'jsonschema.validators.Draft7Validator'>
/Users/berislavlopac/Documents/Development/personal/schematalog/jstest.py:15: DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future.
  validator_http = validator_for({"$schema": url})
http://json-schema.org/draft/2019-09/schema# 301 <class 'jsonschema.validators.Draft202012Validator'>
http://json-schema.org/draft/2020-12/schema# 301 <class 'jsonschema.validators.Draft202012Validator'>

== schemas with https:
/Users/berislavlopac/Documents/Development/personal/schematalog/jstest.py:24: DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future.
  validator_https = validator_for({"$schema": url})
https://json-schema.org/draft-04/schema# 200 <class 'jsonschema.validators.Draft202012Validator'>
https://json-schema.org/draft-06/schema# 200 <class 'jsonschema.validators.Draft202012Validator'>
https://json-schema.org/draft-07/schema# 200 <class 'jsonschema.validators.Draft202012Validator'>
https://json-schema.org/draft/2019-09/schema# 200 <class 'jsonschema.validators.Draft201909Validator'>
https://json-schema.org/draft/2020-12/schema# 200 <class 'jsonschema.validators.Draft202012Validator'>

This behaviour means that a schema with the "wrong" HTTP(S) protocol in the $schema URL with be treated as the default metaschema, potentially failing validation.

@Julian
Copy link
Member

Julian commented Oct 27, 2023

The current behavior is correct.

The URIs in $schema are just that -- URIs. They are identifiers for the JSON Schema versions. Regardless of the current website behavior (which has to do more with convenience), up until draft 7 the identifiers were indeed HTTP (even if they were retrievable over HTTPS). And the current ones are HTTPS. The same is true about whether they contain fragments or not.

Essentially, you are supposed to use the exact identifier, and it's irrelevant whether the meta schema is even retrievable at all from that URL.

@Julian Julian closed this as not planned Won't fix, can't repro, duplicate, stale Oct 27, 2023
@karenetheridge
Copy link

a schema with the "wrong" HTTP(S) protocol in the $schema URL with be treated as the default metaschema

This bit seems wrong -- if the $schema URI does not match one of the known metaschemas (whether a mismatch of http/https or something else), the implementation shouldn't fall back to the default, but rather it should error out entirely.

@Julian Julian added the Invalid Not a bug, PEBKAC, or an unsupported setup label Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup
Projects
None yet
Development

No branches or pull requests

3 participants