Add first set of schemas and a simple Python package for distributing schemas #1

Zsailer · 2024-01-04T19:58:50Z

I took a crack at getting things started here. Let me know what you think.

Following the pattern set by Vega here: https://github.com/vega/schema

This adds a simple README with info about authoring a schema.

The schemas should render under the correct URI, e.g. schema.jupyter.org/jupyter_server/events/content_service/v1.json.

I've included a super simple Python package that installs the schemas under Jupyter's data directory. I've also include some simple utility functions to list and loads these schemas.

@fcollonval @afshin @bollwyvl

Zsailer · 2024-01-04T22:05:34Z

I started by adding Jupyter Server's event schemas, since these are relatively low risk set of schemas to begin with.

NBFormat would be a good one to add next.

Once this is merged and released, Jupyter Server can depend on this package to install and locate its event schemas.

bollwyvl · 2024-01-05T18:13:28Z

Thanks for starting.

I think nbformat is a... big bite of the elephant to swallow.

In my crazy fever dream, i'm seeing something grounded in a known set of mimetypes (mostly our home-grown non-IANA-registered ones), which build all the way up to the client-facing APIs.

flowchart TD
subgraph schema
    mimetypes --> kernelspec & contents & nbformat 
    kernelspec --> nbformat
    nbformat --> contents
    mimetypes --> messages
end

subgraph openapi
    jupyter_server
end

subgraph asyncapi
    kernel_messaging
    widgets
end

nbformat & kernelspec & contents --> jupyter_server
messages --> kernel_messaging & widgets

The other deal is the efficient and humane authoring and review of these things. To that end, I'm imagining having "source of truth" schema in toml or yaml (both have (dis)advantages) which generate the canonical, formatted, validated .json files that actually get deployed, and have a per-PR ReadTheDocs setup.

Finally, in addition to the raw schema, we probably want to look at low-dependency (maybe annotated-types) ways of generating well-typed, importable files, initially for python and js, even if they don't apply validation (e.g. avoid traitlets, pydantic, msgspec` for now).

For js, there is a lot of great tools out there for generated .d.ts or similar boilerplate.

For py, I've had some success with jsonschema-gentypes.

End of the day, I think this repo will end up being a monorepo in the language (and language-specific-dependency) space...

ianthomas23 · 2024-02-22T14:45:45Z

A couple of comments on the three example schemas:

For full conformity I would add "$schema": "https://json-schema.org/draft/2020-12/schema" to the top of each.
The \n in string fields make them harder for human parsing. I think they are fine where they are needed, but the one-liner description probably don't need them.

Other than that I would inclined to merge this soon and we can build on top of it. I've had a look at writing schemas that are reused in other schemas (e.g. kernel message header) and some tooling as discussed in https://github.com/jupyter/enhancement-proposals/blob/master/108-jupyter-subdomain-for-schemas/jupyter-subdomain-for-schemas.md. I think it would be useful to add some of those to this repo to try out possibilities and discuss the way forward before we commit to putting the schemas on https://schema.jupyter.org.

ianthomas23 · 2024-02-22T14:49:19Z

It looks like there are two different repos for this work:

https://github.com/jupyter/schema (this)
https://github.com/jupyter-standards/schemas (as mentioned in JEP 108)

Zsailer · 2024-02-22T18:09:37Z

For full conformity I would add "$schema": "https://json-schema.org/draft/2020-12/schema" to the top of each.

👍

The \n in string fields make them harder for human parsing. I think they are fine where they are needed, but the one-liner description probably don't need them.

This is a side effect of using JSON instead of e.g. YAML for authoring schemas. In fact, they are probably there because I took the YAML schemas from here and converted them to JSON using a PyYAML. We can remove this from this PR, but I empathize with @bollwyvl's comment here:

The other deal is the efficient and humane authoring and review of these things. To that end, I'm imagining having "source of truth" schema in toml or yaml (both have (dis)advantages) which generate the canonical, formatted, validated .json files that actually get deployed, and have a per-PR ReadTheDocs setup.

If not in this PR, the next PR should be focused on a better authoring/reviewing flow. It's much more pleasurable to read and review schemas in yaml or toml than JSON, but we'll probably need to setup CI or pre-commit to do the conversion.

Other than that I would inclined to merge this soon and we can build on top of it.

👍

Zsailer · 2024-02-22T18:11:52Z

pyproject.toml

+  "Programming Language :: Python :: Implementation :: CPython",
+  "Programming Language :: Python :: Implementation :: PyPy",
+]
+dependencies = ["jupyter_core"]


It's important to note here that we are making jupyter_core a dependency of jupyter_schemas. This means we (ideally) won't have any schemas in jupyter_core, which would create a circular dependency.

I think this is fine, but I'm flagging here in case anyone else has concerns.

Zsailer · 2024-02-22T18:12:45Z

It looks like there are two different repos for this work:

It looks like this was resolved in the Jupyter Server meeting. This repo will remain while the jupyter-standards repo was archived.

ianthomas23 · 2024-02-23T09:27:11Z

The \n in string fields make them harder for human parsing. I think they are fine where they are needed, but the one-liner description probably don't need them.

This is a side effect of using JSON instead of e.g. YAML for authoring schemas. In fact, they are probably there because I took the YAML schemas from here and converted them to JSON using a PyYAML. We can remove this from this PR, but I empathize with @bollwyvl's comment here:

The other deal is the efficient and humane authoring and review of these things. To that end, I'm imagining having "source of truth" schema in toml or yaml (both have (dis)advantages) which generate the canonical, formatted, validated .json files that actually get deployed, and have a per-PR ReadTheDocs setup.

Yes, @bollwyvl explained in the Jupyter Server meeting that these descriptions already exist elsewhere in this form and we should use precisely the same text.

If not in this PR, the next PR should be focused on a better authoring/reviewing flow. It's much more pleasurable to read and review schemas in yaml or toml than JSON, but we'll probably need to setup CI or pre-commit to do the conversion.

I have some demo code for conversion between JSON, YAML and TOML, and also for (semi-) automatic generation of sphinx docs in all the various formats. PR to follow shortly.

Zsailer · 2024-02-26T16:54:50Z

In the SSC meeting today, we moved the Github Pages pointer to a new "stable" branch. This allows us to merge this PR without it going "live". I think (and others like @JohanMabille and @ianthomas23 agreed in the meeting) that we merge this and iterate in #4.

Zsailer added 2 commits January 4, 2024 11:47

Add first set of schemas a simple Python package to put schemas on disk

1ad7b18

Update Python example

0b48a34

Zsailer mentioned this pull request Jan 18, 2024

Meeting Notes 2024 jupyter-server/team-compass#57

Closed

bollwyvl mentioned this pull request Feb 22, 2024

Plan concrete structure for schemas-as-rest #2

Open

Zsailer commented Feb 22, 2024

View reviewed changes

ianthomas23 mentioned this pull request Feb 23, 2024

Some tooling experiments #4

Open

Zsailer merged commit 71f57d9 into jupyter:main Feb 26, 2024

gabalafou mentioned this pull request May 16, 2024

SSC meeting minutes 2024 jupyter/software-steering-council-team-compass#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first set of schemas and a simple Python package for distributing schemas #1

Add first set of schemas and a simple Python package for distributing schemas #1

Zsailer commented Jan 4, 2024 •

edited by gabalafou

Loading

Zsailer commented Jan 4, 2024

bollwyvl commented Jan 5, 2024

ianthomas23 commented Feb 22, 2024

ianthomas23 commented Feb 22, 2024

Zsailer commented Feb 22, 2024

Zsailer Feb 22, 2024

Zsailer commented Feb 22, 2024

ianthomas23 commented Feb 23, 2024

Zsailer commented Feb 26, 2024 •

edited

Loading

Add first set of schemas and a simple Python package for distributing schemas #1

Add first set of schemas and a simple Python package for distributing schemas #1

Conversation

Zsailer commented Jan 4, 2024 • edited by gabalafou Loading

Zsailer commented Jan 4, 2024

bollwyvl commented Jan 5, 2024

ianthomas23 commented Feb 22, 2024

ianthomas23 commented Feb 22, 2024

Zsailer commented Feb 22, 2024

Zsailer Feb 22, 2024

Choose a reason for hiding this comment

Zsailer commented Feb 22, 2024

ianthomas23 commented Feb 23, 2024

Zsailer commented Feb 26, 2024 • edited Loading

Zsailer commented Jan 4, 2024 •

edited by gabalafou

Loading

Zsailer commented Feb 26, 2024 •

edited

Loading