Skip to content
This repository has been archived by the owner on Jan 27, 2022. It is now read-only.


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

πŸ“‘ Schema

Extensions to to support semantic, composable, parameterize-able and executable documents

⚠️ Moved

This project has been moved to the schema folder of our main stencila/stencila repository. Please go there for the latest version.

πŸ‘‹ Introduction

This is the Stencila Schema, an extension to to support semantic, composable, parameterize-able and executable documents (we call them stencils for short). It also provides implementations of types (and our extensions) for several languages including JSON Schema, Typescript, Python and R. It is a central part of our platform that is used widely throughout our open-source tools as the data model for executable documents.

Why an extension to is "a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.". is is used by most major search engines to provide richer, more semantic, search results. More and more web sites are using the vocabulary and there is increasing uptake in the research community e.g.,

The vocabulary encompasses many varied concepts and topics. Of particular relevance to Stencila are types for research outputs such as ScholarlyArticle, Dataset and SoftwareSourceCode and their associated meta data e.g. Person, Organization, and Organization.

However, does not provide types for the content of research articles. This is where our extensions come in. This schema adds types (and some properties to existing types) to be able to represent a complete executable, research article. These extensions types include "static" nodes such as Paragraph, Heading and Figure, and "dynamic" nodes involved in execution such as CodeChunk and Parameter.

It's about names, not formats

An important aspect of and similar vocabularies are that they really just define a shared way of naming things. They are format agnostic. As says, it can be used with "many different encodings, including RDFa, Microdata and JSON-LD".

We extend this philosophy to the encoding of executable articles, allowing them to be encoded in several existing document formats. For example, the following very small Article, containing only one Paragraph, and with no metadata, can be represented in Markdown:

Hello world!

as YAML,

type: Article
  - type: Paragraph
      - Hello world!

as a Jupyter Notebook,

  "nbformat": 4,
  "nbformat_minor": 4,
  "metadata": {
    "title": ""
  "cells": [
      "cell_type": "markdown",
      "metadata": {},
      "source": ["Hello world!"]


  "@context": "",
  "type": "Article",
  "content": [
      "type": "Paragraph",
      "content": ["Hello world!"]

or as HTML with Microdata,

<article itemscope="" itemtype="">
  <p itemscope="" itemtype="">Hello world!</p>

This repository does not deal with format conversion per se. Please see Encoda for that. However, when developing our extensions, we aimed to not reinvent the wheel and maintain consistency and compatibility with existing schemas for representing document content. Those include:

But, sometimes (often) we need more than just names

Despite its name, does not define strong rules around the shape of data, as say a database schema or XML schema does. All the properties of types are optional, and although they have "expected types", this is not enforced. In addition, properties can be singular values or array, but always have a singular name. For example, a Article has a author property which could be undefined, a string, a Person or an Organization, or an array of Person or Organization items.

This flexibility makes a lot of sense for the primary purpose of semantic annotation of other content. However, for use as an internal data model, as in Stencila, it can result in a lot of defensive code to check exactly which of these alternatives a property value is. And writing more code than you need to is A Bad Thingβ„’.

Instead, we wanted a schema that placed some restrictions on the shape of executable documents. This has flow on benefits for developer experience such as type inference and checking. To achieve this the Stencila Schema defines types using JSON Schema. Yes, that's a lot of "schemas", but bear with us...

Using JSON Schema for validation and type safety

JSON Schema is "a vocabulary that allows you to annotate and validate JSON documents". It is a draft internet standard, which like has a growing adoption e.g.

In Stencila Schema, when we define a type of document node, either a type, or an extension, we define it,

  • as a JSON Schema document, with restrictions on the marginality, type and shape of it's properties
  • using type and property names, pluralized as appropriate to avoid confusion

For example, an Article is defined to have an optional authors property (note the s this time) which is always an array whose items are either a Person or Organization.

  "title": "Article",
  "@id": "schema:Article",
  "description": "An article, including news and scholarly articles.",
  "properties": {
    "authors": {
      "@id": "schema:author",
      "description": "The authors of this creative work.",
      "type": "array",
      "items": {
        "anyOf": [
            "$ref": "Person.schema.json"
            "$ref": "Organization.schema.json"

To keep things simpler, this is a stripped down version of the actualPerson.schema.json.

With a JSON Schema, we are able to:

  • use a JSON Schema validator to check that content meets the schema
  • generate types (i.e. interface and class elements) matching the schema in other languages.

But, JSON Schema can be a pain to write

JSON can be quite fiddly to write by hand. And JSON Schema lacks a way to easily express parent-child relationships between types. For these reasons, we define types using YAML with custom keywords such as extends and generate JSON Schema and ultimately bindings for each language from those.

πŸ“œ Documentation

Documentation is available at

Alternatively, you may want to directly consult the type definitions (*.yaml files) and documentation (*.md files) in the schema directory.

πŸš€ Usage

JSON-LD context

A JSON-LD @context is generated from the JSON Schema sources and published at

Individual files are published for each extension type e.g. and extension property e.g.

Programming language bindings

Binding for this schema, in the form of installable packages, are currently generated for:

Depending on the capabilities of the host language, these packages expose type definitions as well as utility functions for constructing valid Stencila Schema nodes. Each packages has its own documentation auto-generated from the code.

πŸ›  Contributing

We πŸ’• contributions! All contributions: ideas πŸ€”, examples πŸ’‘, bug reports πŸ›, documentation πŸ“–, code πŸ’», questions πŸ’¬.

Please see for a guide on how to contribute to the schema definitions. See the files of each language sub-folder e.g. python for advice on development of language bindings and issue for how to add you or others to the following important table:

Mac Cowell

πŸ’» πŸ€”


πŸ’» πŸ“– πŸ€”

Ben Shaw

πŸ’» πŸ€” πŸš‡ πŸ“–

Alex Ketch

πŸ’» πŸ“– 🎨

Nokome Bentley

πŸ’» πŸ“– πŸ€”


πŸ’» πŸ€”

Aleksandra Pawlik

πŸ’» πŸ“– πŸ€”


πŸ€” πŸ’»

Robert Gieseke

πŸ€” πŸ’» πŸ“–

πŸ™ Acknowledgments

Thanks to the developers of the existing schemas and open source tools we use, or have been inspired by, including: