Error: schemafile could not be parsed as JSON #183

blink1073 · 2022-11-14T22:26:42Z

We occasionally see this error in CI, and it usually works when we kick the build.

Error: schemafile could not be parsed as JSON
SchemaParseError: https://json.schemastore.org/github-workflow
  in "/home/runner/.cache/pre-commit/repo72xvbo31/py_env-python3.10/lib/python3.10/site-packages/check_jsonschema/checker.py", line 50
  >>> return self._schema_loader.get_validator(path, doc, self._format_opts)

  caused by

  JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    in "/home/runner/.cache/pre-commit/repo72xvbo31/py_env-python3.10/lib/python3.10/site-packages/check_jsonschema/schema_loader/readers.py", line 18
    >>> schema = callback()

    caused by

    StopIteration: 0
      in "/opt/hostedtoolcache/Python/3.10.8/x64/lib/python3.10/json/decoder.py", line 353
      >>> obj, end = self.scan_once(s, idx)
ok -- validation done

Error: Process completed with exit code 1.

Perhaps adding retry logic would address the issue.

sirosen · 2022-11-14T23:05:13Z

(Aside: It's not related to the core issue, but it's weird that the ok message was printed on a failure. I'll have to look into that as a separate matter.)

To understand your usage, I took a look at your config. Just to ensure we're on the same page, here's what you're running in jupyter_server:

  - repo: https://github.com/sirosen/check-jsonschema
    rev: 0.18.4
    hooks:
      - id: check-jsonschema
        name: "Check GitHub Workflows"
        files: ^\.github/workflows/
        types: [yaml]
        args: ["--schemafile", "https://json.schemastore.org/github-workflow"]
        stages: [manual]

That's a bit different from what I'm recommending these days, so I first want to make sure we're on the same page about the behaviors and that you're getting what you want and expect. Here's what I document for checking github workflows:

- repo: https://github.com/python-jsonschema/check-jsonschema
  rev: 0.19.1
  hooks:
    - id: check-github-workflows

I don't mean to suggest that your config is wrong; it's just different. You'll always get the latest schema from schemastore (well, you should, the bug report is that you don't! 😅 ), whereas the check-github-workflows hook uses a vendored copy of the schemastore schema. On the one hand, always getting the latest from schemastore means that you're not dependent on check-jsonschema releases to ship updates. On the other hand, it means that the behavior of the hook can change between two runs on the same version number, which could be confusing or surprising.

All that aside, I can definitely do something to improve the download behavior.

This is just a guess, but I've at least once seen schemastore respond with an empty 200.
I'm thinking that I need to make the following adjustments:

add support to the downloader piece to do a validation callback
if the validation callback fails, the download retries once or twice (2 retries seems like a reasonable start to me)
pass a parse function, e.g. json.load as the validation callback

blink1073 · 2022-11-14T23:07:57Z

Thanks @sirosen! I'm happy to update to the new recommended workflow.

sirosen · 2022-11-14T23:09:08Z

Awesome, glad I could offer a helping hand!

I'm keeping this open though, since I still think that download behavior has room for improvement.

blink1073 · 2022-11-14T23:10:51Z

Yep, sounds good. Here's the PR for anyone interested: jupyter-server/jupyter_server#1071. A nice benefit is that it can run in the pre-commit.ci job since it doesn't need access to the internet.

First, add support to the cachedownloader for validation. A validation callback method can be passed to the cachedownloader on init, and will be used to test data and potentially retry before returning a result. The basic premise is that the validator should be a function which can raise a ValueError or a subclass thereof if the data downloaded doesn't meet our expectations (e.g. a partial download or empty data returned as a 200). Then apply this to the schema reader by passing `json.loads` as a validation callback. This will raise a JSONDecodeError on malformed data, triggering the retry. Add the appropriate note to the changelog. resolves #183

sirosen added the bug Something isn't working label Nov 14, 2022

blink1073 mentioned this issue Nov 14, 2022

use recommended github-workflows checker jupyter-server/jupyter_server#1071

Merged

sirosen mentioned this issue Nov 15, 2022

Setup json validation for remote schemafiles #184

Merged

sirosen closed this as completed in #184 Nov 15, 2022

sirosen mentioned this issue Nov 25, 2022

Update check-jsonschema usage to latest style jupyter-server/jupyter_server_fileid#50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: schemafile could not be parsed as JSON #183

Error: schemafile could not be parsed as JSON #183

blink1073 commented Nov 14, 2022

sirosen commented Nov 14, 2022

blink1073 commented Nov 14, 2022

sirosen commented Nov 14, 2022

blink1073 commented Nov 14, 2022

Error: schemafile could not be parsed as JSON #183

Error: schemafile could not be parsed as JSON #183

Comments

blink1073 commented Nov 14, 2022

sirosen commented Nov 14, 2022

blink1073 commented Nov 14, 2022

sirosen commented Nov 14, 2022

blink1073 commented Nov 14, 2022