Article metadata

Edward Abraham edited this page Jul 22, 2014 · 6 revisions

Citation information

Papers, reports and scholarly documents need information that provides basic metadata: title, authors, institutions, etc. There are two goals of the metadata, to allow for formatting of the paper, and to supply the information to machine readers of the document. There is discussion of some of the requirements for Scholarly Markdown in a blog post by Martin Fenner, and this is elaborated further below.

The standard way of adding metadata to markdown is through a YAML metadata block at the start of the file. The core information is the title and authors, and ideally the metadata should be able to specify enough information to recreate the citation of the article or report. The recommendation is that the accepted YAML should follow Citation Style Language JSON schema. This covers a wide range of requirements.

Here is a paper on dolphin bycatch published in PLoS. A simple YAML header for a scholarly markdown version of this document might be:

 ---
 type: article
 title: Common dolphin *Delphinus delphis* bycatch in New Zealand commercial trawl fisheries
 author:
  - family: Thompson
    given: Finlay N.
  - family: Abraham
    given: Edward R.
  - family: Berkenbusch
    given: Katrin
 ---

Note that this article needs formatting in the title, to handle the italics needed for the species name. In typesetting of the article, these fields are treated as markdown. This is close to the format used in the blog post, however we use the type field, rather than the layout field in the blog post, as type is in the CSL JSON specification.

As this particular article has now been published, the full citation could be added to the YAML, to give a record like the one below. The CSL specification is verbose, and some of the fields read strangely (for example, we have a container-title field rather than a common word such as journal, and the publication date is a structured list), so this gets to the point where the writeability is being lost. However, we are now able to have a single file with all the publication metadata, and the publication source. From structured data such as this, we would be able to build a publication database, as well as generate our documents.

 ---
 type: article-journal
 title: Common dolphin *Delphinus delphis* bycatch in New Zealand commercial trawl fisheries
 author:
  - family: Thompson
    given: Finlay N.
  - family: Abraham
    given: Edward R.
  - family: Berkenbusch
    given: Katrin
  container-title: PloS one
  volume: 8
  number: 5
  page: e64438
  doi: 10.1371/journal.pone.0064438
  published:
     date-parts:
     - - 2013
  keyword:
  - dolphin
  - bycatch
 ---

Extended information

The author information may be extended to include fields that are typically used in publications

  • email: email address
  • tel: primary telephone number (the name is chosen from the hcard format)
  • url: a url giving further information on the author
  • orcid: the ORCID of the author. If specified, this would provide a canonical source for all the other author information. Implementing this would require querying the ORCID API.
  • affiliation: the id of the author's organization

The proposal here is that the organizations are listed as separate metadata blocks, with the id field used to reference each author to their corresponding affiliation. When formatting the article, these may be turned into footnotes (depending on the journal style). The organization fields are:

  • id (required): the id of the organization
  • name: the name of the organization
  • address: a text address
  • url: website

There are more structured approaches to the address data in particular (the h-card microformat could be used as a guide here), however more structure would make entering the data more cumbersome.

Abstract

  • abstract: The CSL JSON has a field for the abstract. The simple approach is to put the entire markdown for the abstract into the YAML. This has the disadvantage of bulking up metadata with multiline YAML, which may contain paragraph breaks. A suggestion is that this field specifies a section title, with the contents of that section being assumed to be the abstract. By default, this section is called 'abstract', and in this case the abstract field does not need to be specified.

Bibliography information

  • bibliography: filename of the bibliography (bibtex, yaml, or other understood formats), or else a YAML block that contains all the references.
  • csl: filename of the Citation Style Language file that is used to specify the layout of the references

Putting this together, a YAML header suitable for a manuscript now looks as follows:

 ---
 type: article
 title: Common dolphin *Delphinus delphis* bycatch in New Zealand commercial trawl fisheries
 author:
  - family: Thompson
    given: Finlay N.
    email: finlay@dragonfly.co.nz
    tel: +64 4 385 9285
    affiliation: 1
  - family: Abraham
    given: Edward R.
    affiliation: dragonfly
  - family: Berkenbusch
    given: Katrin
    affiliation: 1
 organization:
  - id: 1
    name: Dragonfly Science
    address: PO Box 27535, Wellington 6141, New Zealand
    url: http://www.dragonfly.co.nz
 bibliography: dolphins.bib
 csl: plos.csl
 keyword:
  - dolphin
  - bycatch
 ---