Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first set of schemas and a simple Python package for distributing schemas #1

Merged
merged 2 commits into from
Feb 26, 2024

Conversation

Zsailer
Copy link
Member

@Zsailer Zsailer commented Jan 4, 2024

I took a crack at getting things started here. Let me know what you think.

Following the pattern set by Vega here: https://github.com/vega/schema

This adds a simple README with info about authoring a schema.

The schemas should render under the correct URI, e.g. schema.jupyter.org/jupyter_server/events/content_service/v1.json.

I've included a super simple Python package that installs the schemas under Jupyter's data directory. I've also include some simple utility functions to list and loads these schemas.

@fcollonval @afshin @bollwyvl

@Zsailer
Copy link
Member Author

Zsailer commented Jan 4, 2024

I started by adding Jupyter Server's event schemas, since these are relatively low risk set of schemas to begin with.

NBFormat would be a good one to add next.

Once this is merged and released, Jupyter Server can depend on this package to install and locate its event schemas.

@bollwyvl
Copy link

bollwyvl commented Jan 5, 2024

Thanks for starting.

I think nbformat is a... big bite of the elephant to swallow.

In my crazy fever dream, i'm seeing something grounded in a known set of mimetypes (mostly our home-grown non-IANA-registered ones), which build all the way up to the client-facing APIs.

flowchart TD
subgraph schema
    mimetypes --> kernelspec & contents & nbformat 
    kernelspec --> nbformat
    nbformat --> contents
    mimetypes --> messages
end

subgraph openapi
    jupyter_server
end

subgraph asyncapi
    kernel_messaging
    widgets
end

nbformat & kernelspec & contents --> jupyter_server
messages --> kernel_messaging & widgets
Loading

The other deal is the efficient and humane authoring and review of these things. To that end, I'm imagining having "source of truth" schema in toml or yaml (both have (dis)advantages) which generate the canonical, formatted, validated .json files that actually get deployed, and have a per-PR ReadTheDocs setup.

Finally, in addition to the raw schema, we probably want to look at low-dependency (maybe annotated-types) ways of generating well-typed, importable files, initially for python and js, even if they don't apply validation (e.g. avoid traitlets, pydantic, msgspec` for now).

For js, there is a lot of great tools out there for generated .d.ts or similar boilerplate.

For py, I've had some success with jsonschema-gentypes.

End of the day, I think this repo will end up being a monorepo in the language (and language-specific-dependency) space...

@ianthomas23
Copy link

A couple of comments on the three example schemas:

  1. For full conformity I would add "$schema": "https://json-schema.org/draft/2020-12/schema" to the top of each.
  2. The \n in string fields make them harder for human parsing. I think they are fine where they are needed, but the one-liner description probably don't need them.

Other than that I would inclined to merge this soon and we can build on top of it. I've had a look at writing schemas that are reused in other schemas (e.g. kernel message header) and some tooling as discussed in https://github.com/jupyter/enhancement-proposals/blob/master/108-jupyter-subdomain-for-schemas/jupyter-subdomain-for-schemas.md. I think it would be useful to add some of those to this repo to try out possibilities and discuss the way forward before we commit to putting the schemas on https://schema.jupyter.org.

@ianthomas23
Copy link

It looks like there are two different repos for this work:

  1. https://github.com/jupyter/schema (this)
  2. https://github.com/jupyter-standards/schemas (as mentioned in JEP 108)

@Zsailer
Copy link
Member Author

Zsailer commented Feb 22, 2024

For full conformity I would add "$schema": "https://json-schema.org/draft/2020-12/schema" to the top of each.

👍

The \n in string fields make them harder for human parsing. I think they are fine where they are needed, but the one-liner description probably don't need them.

This is a side effect of using JSON instead of e.g. YAML for authoring schemas. In fact, they are probably there because I took the YAML schemas from here and converted them to JSON using a PyYAML. We can remove this from this PR, but I empathize with @bollwyvl's comment here:

The other deal is the efficient and humane authoring and review of these things. To that end, I'm imagining having "source of truth" schema in toml or yaml (both have (dis)advantages) which generate the canonical, formatted, validated .json files that actually get deployed, and have a per-PR ReadTheDocs setup.

If not in this PR, the next PR should be focused on a better authoring/reviewing flow. It's much more pleasurable to read and review schemas in yaml or toml than JSON, but we'll probably need to setup CI or pre-commit to do the conversion.

Other than that I would inclined to merge this soon and we can build on top of it.

👍

"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
]
dependencies = ["jupyter_core"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to note here that we are making jupyter_core a dependency of jupyter_schemas. This means we (ideally) won't have any schemas in jupyter_core, which would create a circular dependency.

I think this is fine, but I'm flagging here in case anyone else has concerns.

@Zsailer
Copy link
Member Author

Zsailer commented Feb 22, 2024

It looks like there are two different repos for this work:

It looks like this was resolved in the Jupyter Server meeting. This repo will remain while the jupyter-standards repo was archived.

@ianthomas23
Copy link

The \n in string fields make them harder for human parsing. I think they are fine where they are needed, but the one-liner description probably don't need them.

This is a side effect of using JSON instead of e.g. YAML for authoring schemas. In fact, they are probably there because I took the YAML schemas from here and converted them to JSON using a PyYAML. We can remove this from this PR, but I empathize with @bollwyvl's comment here:

The other deal is the efficient and humane authoring and review of these things. To that end, I'm imagining having "source of truth" schema in toml or yaml (both have (dis)advantages) which generate the canonical, formatted, validated .json files that actually get deployed, and have a per-PR ReadTheDocs setup.

Yes, @bollwyvl explained in the Jupyter Server meeting that these descriptions already exist elsewhere in this form and we should use precisely the same text.

If not in this PR, the next PR should be focused on a better authoring/reviewing flow. It's much more pleasurable to read and review schemas in yaml or toml than JSON, but we'll probably need to setup CI or pre-commit to do the conversion.

I have some demo code for conversion between JSON, YAML and TOML, and also for (semi-) automatic generation of sphinx docs in all the various formats. PR to follow shortly.

@Zsailer
Copy link
Member Author

Zsailer commented Feb 26, 2024

In the SSC meeting today, we moved the Github Pages pointer to a new "stable" branch. This allows us to merge this PR without it going "live". I think (and others like @JohanMabille and @ianthomas23 agreed in the meeting) that we merge this and iterate in #4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants