Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: schemafile could not be parsed as JSON #183

Closed
blink1073 opened this issue Nov 14, 2022 · 4 comments · Fixed by #184
Closed

Error: schemafile could not be parsed as JSON #183

blink1073 opened this issue Nov 14, 2022 · 4 comments · Fixed by #184
Labels
bug Something isn't working

Comments

@blink1073
Copy link

We occasionally see this error in CI, and it usually works when we kick the build.

Error: schemafile could not be parsed as JSON
SchemaParseError: https://json.schemastore.org/github-workflow
  in "/home/runner/.cache/pre-commit/repo72xvbo31/py_env-python3.10/lib/python3.10/site-packages/check_jsonschema/checker.py", line 50
  >>> return self._schema_loader.get_validator(path, doc, self._format_opts)

  caused by

  JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    in "/home/runner/.cache/pre-commit/repo72xvbo31/py_env-python3.10/lib/python3.10/site-packages/check_jsonschema/schema_loader/readers.py", line 18
    >>> schema = callback()

    caused by

    StopIteration: 0
      in "/opt/hostedtoolcache/Python/3.10.8/x64/lib/python3.10/json/decoder.py", line 353
      >>> obj, end = self.scan_once(s, idx)
ok -- validation done

Error: Process completed with exit code 1.

Perhaps adding retry logic would address the issue.

@sirosen sirosen added the bug Something isn't working label Nov 14, 2022
@sirosen
Copy link
Member

sirosen commented Nov 14, 2022

(Aside: It's not related to the core issue, but it's weird that the ok message was printed on a failure. I'll have to look into that as a separate matter.)

To understand your usage, I took a look at your config. Just to ensure we're on the same page, here's what you're running in jupyter_server:

  - repo: https://github.com/sirosen/check-jsonschema
    rev: 0.18.4
    hooks:
      - id: check-jsonschema
        name: "Check GitHub Workflows"
        files: ^\.github/workflows/
        types: [yaml]
        args: ["--schemafile", "https://json.schemastore.org/github-workflow"]
        stages: [manual]

That's a bit different from what I'm recommending these days, so I first want to make sure we're on the same page about the behaviors and that you're getting what you want and expect. Here's what I document for checking github workflows:

- repo: https://github.com/python-jsonschema/check-jsonschema
  rev: 0.19.1
  hooks:
    - id: check-github-workflows

I don't mean to suggest that your config is wrong; it's just different. You'll always get the latest schema from schemastore (well, you should, the bug report is that you don't! 😅 ), whereas the check-github-workflows hook uses a vendored copy of the schemastore schema. On the one hand, always getting the latest from schemastore means that you're not dependent on check-jsonschema releases to ship updates. On the other hand, it means that the behavior of the hook can change between two runs on the same version number, which could be confusing or surprising.


All that aside, I can definitely do something to improve the download behavior.

This is just a guess, but I've at least once seen schemastore respond with an empty 200.
I'm thinking that I need to make the following adjustments:

  • add support to the downloader piece to do a validation callback
  • if the validation callback fails, the download retries once or twice (2 retries seems like a reasonable start to me)
  • pass a parse function, e.g. json.load as the validation callback

@blink1073
Copy link
Author

Thanks @sirosen! I'm happy to update to the new recommended workflow.

@sirosen
Copy link
Member

sirosen commented Nov 14, 2022

Awesome, glad I could offer a helping hand!

I'm keeping this open though, since I still think that download behavior has room for improvement.

@blink1073
Copy link
Author

Yep, sounds good. Here's the PR for anyone interested: jupyter-server/jupyter_server#1071. A nice benefit is that it can run in the pre-commit.ci job since it doesn't need access to the internet.

sirosen added a commit that referenced this issue Nov 15, 2022
First, add support to the cachedownloader for validation.
A validation callback method can be passed to the cachedownloader on
init, and will be used to test data and potentially retry before
returning a result. The basic premise is that the validator should be
a function which can raise a ValueError or a subclass thereof if the
data downloaded doesn't meet our expectations (e.g. a partial download
or empty data returned as a 200).

Then apply this to the schema reader by passing `json.loads` as a
validation callback. This will raise a JSONDecodeError on malformed
data, triggering the retry.

Add the appropriate note to the changelog.
resolves #183
sirosen added a commit that referenced this issue Nov 15, 2022
First, add support to the cachedownloader for validation.
A validation callback method can be passed to the cachedownloader on
init, and will be used to test data and potentially retry before
returning a result. The basic premise is that the validator should be
a function which can raise a ValueError or a subclass thereof if the
data downloaded doesn't meet our expectations (e.g. a partial download
or empty data returned as a 200).

Then apply this to the schema reader by passing `json.loads` as a
validation callback. This will raise a JSONDecodeError on malformed
data, triggering the retry.

Add the appropriate note to the changelog.
resolves #183
sirosen added a commit that referenced this issue Nov 15, 2022
First, add support to the cachedownloader for validation.
A validation callback method can be passed to the cachedownloader on
init, and will be used to test data and potentially retry before
returning a result. The basic premise is that the validator should be
a function which can raise a ValueError or a subclass thereof if the
data downloaded doesn't meet our expectations (e.g. a partial download
or empty data returned as a 200).

Then apply this to the schema reader by passing `json.loads` as a
validation callback. This will raise a JSONDecodeError on malformed
data, triggering the retry.

Add the appropriate note to the changelog.
resolves #183
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants