Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



96 Commits

Repository files navigation


Dar stands for (Reproducible) Document Archive and specifies a virtual file format that holds multiple digital documents, complete with images and other assets. A Dar consists of a manifest file (manifest.xml) that describes the contents.

<!DOCTYPE manifest PUBLIC "DarManifest 0.1.0" "">
    <document id="manuscript" name="Reproducible Document Stack" type="article" path="manuscript.xml" />
    <document id="sheet" name="Sheet 1" type="sheet" path="sheet.xml" />
    <asset id="234o23489237498234798" mime-type="image/png" name="Picture 1" path="234o23489237498234798.png"/>

There are two types of contents:

  • Documents: Those are meant to be manipulated by a visual editor, and typically stored as XML/HTML or JSON.
  • Assets: Regular files which can be used from any document. For instance, two documents could embed the same image.

Designed for research and scientific publishing

Dar is being designed for storing reproducible research publications, but the underlying concepts are suitable for any kind of digital publications that can be bundled together with their assets.


  • Establish standardised research publications
  • Self-contained archive (includes manuscript, images, source code and data)
  • Machine-friendly format to ease development of tools
  • Long-term preservation
  • Stand-alone, offline execution of reproducible elements
  • Language agnostic (e.g. run Python, R, Jupyter, Kernels etc.)
  • Tool agnostic (use Jupyter, RMarkdown or Stencila for authoring)


The following specifications define a markup language (XML) for research articles and spreadsheets:

  • Texture Article: An XML format, based on JATS, the de facto standard for archiving and interchange of scientific open-access contents with XML


The following editors are developed to edit document archives of research projects:

  • Stencila: an office suite for reproducible research
  • Texture: an open source manuscript editor designed for publishers and authors


These two examples are continuously updated, to reflect the latest versions of the related specifications.


This is an early stage proposal (alpha) that will be continuously advanced. We are using existing standards when possible (such as JATS-XML for representing articles) and seek for consensus in the research community to offer the most flexible and concise tagging guidelines.


The JATS Standard is copyrighted by NISO, but all of the non-normative information found in this repository is in the CC BY-SA 4.0.

More info at


Dar is developed by the Substance Consortium, an open community formed by the Public Knowledge Project (PKP), the Collaborative Knowledge Foundation (CoKo), SciELO, Érudit, eLife and Stencila.


Reproducible Document Archive






No releases published


No packages published