Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-proposal: subdomain for published schemas under jupyter.org #107

Closed
Zsailer opened this issue Apr 24, 2023 · 6 comments
Closed

Pre-proposal: subdomain for published schemas under jupyter.org #107

Zsailer opened this issue Apr 24, 2023 · 6 comments

Comments

@Zsailer
Copy link
Member

Zsailer commented Apr 24, 2023

(Following the JEP submission workflow described here, but there's been a substantial amount of discussion already around this idea.)

I'm drafting a JEP to propose a new subdomain under jupyter.org, e.g. schema.jupyter.org, where we can publish json schemas describing key components, APIs, and data files defined across Project Jupyter. Examples include event schemas, kernel specification, connection file specification, kernel messaging specification, notebook file format specification, and the Jupyter Server and JupyterHub REST APIs.

Publishing these as JSON schemas ensures that we explicitly publish our "contracts" to the community and enables us to write tooling that helps others validate their code against our schemas.

Key pieces that might be included in this JEP:

  • where should schemas be stored?
  • URI rules... e.g. every URI should be versioned; namespace by project; (maybe) namespace by current state (e.g. experimental/stable)
  • governance: who publishes to this repo?

I'm planning to submit this JEP this week, but starting the pre-proposal process here.

@bollwyvl
Copy link

Very much in favor of moving this forward.

As mentioned elsewhere: I feel like the repo that drives this (and feel like it should be one repo, even if it submodules in "canonical" sources of schema) should also drive the publishing of installable packages, with a selection of:

  • the schemas-at-rest in JSON
    • probably somewhere in {sys.prefix}/share/jupyter
  • type-only packages with no depedencies for any willins languages
  • and even ones for specific, de facto validator implementations (e.g. python's jsonschema, JS ajv)

The above should also generate (and validate) per-PR interlinked human-readable docs, which can then expose something that other docs sites can reference (e.g. objects.inv), such as provided by:

By keeping them together, we can maintain:

  • consistent linting and formatting
  • cross-language build toolchains
  • meta(-meta) schema to validate these things
  • normalize more humane alternative authoring tooks (e.g. YAML, TOML, some notebook workflow)

@Zsailer
Copy link
Member Author

Zsailer commented Apr 24, 2023

@bollwyvl do you mind if I copy these points to the JEP and add you as a co-author (want to make sure I give credit where credit is due!).

@Zsailer
Copy link
Member Author

Zsailer commented Apr 24, 2023

I've got a working draft here: https://hackmd.io/@eesn0xIhTWeoLTbVbbhpdA/ByfCRmV72/edit

@Zsailer
Copy link
Member Author

Zsailer commented Apr 26, 2023

Woohoo! Thank you, @bollwyvl and @tonyfast for all the amazing work on developing a first draft of this proposal!! I'm opening the PR now 😎

@bollwyvl
Copy link

Thanks! We started noodling on some of those ideas in greater anger in the "Future of Notebooks" workshop and the attendant "Cells from the Future" weekly. As the terminating output of that call would be rather... sweeping... having a more structured place than this narrative-based repo would be super helpful.

There's a lot of opportunity/risk for scope creep on this, but in the end, it's probably for the best, but I didn't want to put all of the ideas into that JEP. Once we've got our feet under us, building packages (which i kind of feel is really the "final exam" of that JEP) there's probably a whole raft of things that can be further formalized, but were out of scope for that document.

  • directory structures of well-known paths
    • while i don't know of a solid, cross-platform schema for describing file trees, this would be a boon for complex deployments to .d folders, and confections like lab/extensions
  • binary structures
    • kaitaistruct provide a way to document these outside of magic numbers in reference implementations, with limited codegen
  • grammars
    • it's looking like lark might end up being a real possibility for portable grammar descriptions
    • examples
  • non-purely-declarative inputs
    • schema-as-code
      • pydantic, dataclasses, etc. provide opinonated ways to ergonomically write schema beyond TOML/YAML
      • construct similarly allows for defining
      • more generally, a .schema.ipynb, where the outputs are the schema, might be quite interesting
    • migrations-as-data
      • tools like json-e, we could describe closed-form, declarative migrations between schema versions
  • well-known mimetype
    • while the "high road" is IANAL, that's... a long road, and can leave things in a chicken and egg
    • this would show up all the way at the bottom of the stack in kernelspecs (Kernelspec JSON schema #105) and all the way at the end
  • DOM validation
    • XSLT could be used to verify that a given HTML-like output (either a client, static, or... some other thing) has agreed-upon characteristics
    • appropriate aria-* annotation would be huge, and should likely be the abstract design language for Jupyter clients and documents

@Zsailer
Copy link
Member Author

Zsailer commented Jun 12, 2023

We merged this in #108. Closing here.

@Zsailer Zsailer closed this as completed Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants