# riboseinc/metanorma-standoc

Fetching contributors…
Cannot retrieve contributors at this time
713 lines (526 sloc) 29 KB

# Metanorma-standoc

 Warning This gem is still under development.

Gem for serialising the Metanorma Standoc model.

## Functionality

This gem processes Metanorma documents following a template for generating standards documents, according to a range of standards classes. This gem provides underlying generic functionality; behaviour specific to each standards class is refined in the gem specific to that standards class (e.g. https://github.com/riboseinc/metanorma-iso). The following outputs are generated.

• Metanorma XML representation of the document, intended as a document model for Standards. The Metanorma XML representation is processed in turn, to generate one or more of the following outputs as end deliverables, depending on each standards class gem:

• Microsoft Word output (.doc)

• HTML output (.html)

• PDF (.pdf)

The following input formats are supported:

This README provides an overview of the functionality of the gem; see also Guidance for authoring. Quickstart guide gives a summary overview.

 Note AsciiMathML is used for mathematical formatting. The gem uses the Ruby AsciiMath parser, which is syntactically stricter than the common MathJax processor; if you do not get expected results, try bracketting terms your in AsciiMathML expressions.

### Installation

If you are using a Mac, the https://github.com/riboseinc/metanorma-macos-setup repository has instructions on setting up your machine to run Metanorma scripts such as this one. You need only run the following in a Terminal console:

$bash <(curl -s https://raw.githubusercontent.com/riboseinc/metanorma-macos-setup/master/metanorma-setup)$ gem install metanorma-standoc
\$ gem install metanorma-cli

The metanorma-cli gem is the command-line interface for the Metanorma tool suite (incorporating the metanorma executable seen above).

## Approach

### Document model

The Metanorma document model used in document generation intends to introduce rigour into the standards authoring process; existing document templates do not support such rigour down to the element level. It also introduces flexibility by decoupling the document structure from its presentation.

Formal definitions of standards prescribe the contents of standards to a level amenable to an explicit document model. The ISO International Standard format, ss prescribed in ISO/IEC DIR 2 "Principles and rules for the structure and drafting of ISO and IEC documents", is one of the more detailed such prescriptions available. A formal document model would allow checking for consistency in format and content, and expedite authoring and quality control of ISO standards. Authoring standards through a more abstract formal model also permit enhanced functionality such as cross-reference link checking and auto-numbering of sections, figures, tables and formulas. Outputting a document in different languages also becomes straightforward.

### Asciidoctor

Asciidoctor has been selected as the authoring tool to generate the document model representation of standards. It is a document formatting tool like Markdown and DocBook, which combines the relative ease of use of the former (using relatively lightweight markup), and the rigour and expressively of the latter (it has a well-defined syntax, and was in fact initially developed as a DocBook document authoring tool). Asciidoctor has built-in capability to output Text and HTML; so it can be used to preview the file as it is being authored. However the gem natively outputs HTML and Word output, so there should not be much need for this.

Asciidoctor has some formatting constraints because of its own document model, that users need to be aware. For example, Asciidoc has a strict division between inline and block elements, which disallows certain kinds of nesting; so a list cannot be embedded within a paragraph, it can only constitute its own paragraph (though lists themselves can be nested within each other). Asciidoctor also disallows multiple paragraphs in footnotes, by design. (The document model does not impose this constraint, so you could edit the generated XML to break up paragraphs within a footnote.)

## Different behaviour from native Asciidoctor

### Autonumbering

Autonumbering in Metanorma extends to formulas (which are encoded as "stem" blocks) and notes. Autonumbering is applied in the conversion from Metanorma XML to output formats (isodoc); by default it restarts for each annex, but is continuous for the main body of text.

### Unsupported blocks

Sidebars (aside) are not supported, and have been repurposed for reviewer comments. Page breaks (thematic break) are not supported; ASCII art/preformatted text (literal) are not supported in most standards classes.

### Footnotes

Table and figure footnotes are treated diffferently from all other footnotes: they are rendered at the bottom of the table or figure, and they are numbered separately.

### References

References to well-defined standards codes use the document identifiers for citations (e.g. ISO 20483:2013); generic references in bibliographies use bracketed numbers [1].

### Section titles

Metanorma standards has special section types: "Scope", "Normative References", "Terms and Definitions", "Symbols and Abbreviated Terms", "Bibliography". By default, these are identified in Asciidoc by using those titles. The gem allows you to override the title by using a heading attribute on the node, so that the actual title in your Asciidoc can be something different; that is useful, for example, if you are translating the document into different languages. So:

[heading=scope]
== 范围

Note that both the XML population, and the isodoc gem will overwrite any supplied title. If you are translating Metanorma documents into other languages, you will still need access to versions of the metanorma-standoc and isodoc gems in those languages.

### Obligation

The obligation of sections (whether they are normative or informative) is indicated with the attribute "obligation". For most sections, this is fixed; for annexes and clauses, the default value of the obligation is "normative", and users need to set the obligation to "informative" as a section attribute.

[[AnnexA]]
[appendix,obligation=informative]
== Determination of defects

### Term markup

To ensure the structure of Terms and Definitions is captured accurately, the following macros are defined, and must be used to mark up their respective content:

alt:[TERM]

for alternative terms

deprecated:[TERM]

for deprecated terms

domain:[TERM]

for term domains

The macro contents can contain their own markup.

=== paddy
deprecated:[#[smallcap]#cargo# rice]
domain:[rice]

_paddy_ (<<paddy>>) from which the husk only has been removed

### Terms and Definitions markup

If the Terms and Definitions of a standard are partly or fully sourced from another standard, that standard is cited in a source attribute to the section, which is set to the reference anchor of the standard (given under the Normative Referencecs). Any boilerplate of the Terms and Definitions section is adjusted accordingly.

[source=ISO712]
== Terms and Definitions

Multiple sources are allowed, and need to be quoted and comma-delimited:

[source="ISO712,ISO24333"]
== Terms and Definitions

### Paragraph alignment

Alignment is defined as an attribute for paragraphs:

[align=left]
This paragraph is aligned left

[align=right]
This paragraph is aligned right

[align=center]
This paragraph is aligned center

[align=justified]
This paragraph is justified, which is the default

### Reviewer notes

Reviewer notes are encoded as sidebars, and can be separated at a distance from the text they are annotating; the text they are annotating is indicated through anchors. Reviewer notes are only rendered if the document has a :draft: attribute.

The following attributes on reviewer notes are mandatory:

• reviewer attribute (naming the reviewer)

• the starting target anchor of the note (from attribute)

The following attributes are optional:

• date attribute, optionally including the time (as xs:date or xs:datetime)

• the ending target anchor of the note (to attribute)

The span of text covered by the reviewer note is from the start of the text encompassed by the from element, to the end of the text encompassed by the to element. If only the from element supplied, the reviewer note covers the from element. The from and to elements can be bookmarks, which cover no space.

[[clause_address_profile_definition]]

[[para1]]
This is a clause address [[A]]profile[[B]] definition

****
I do not agree with this statement.
****

[reviewer="Nick Nicholas",date=20180125T0121,from=A,to=B]
****
Profile?!
****

### Strikethrough and Small Caps

The following formatting macros are used for strikethrough and small caps text:

[strike]#strike through text#
[smallcap]#small caps text#

In Asciidoc, a table can have at most one header row or footer row. In Metanorma, a nominal single header row is routinely broken up into multiple rows in order to accommodate units or symbols, that line up against each other, though they are displayed as merged cells with no grid between them. To address this, tables can be marked up with an optional headerrows attribute:

[headerrows=2]
|===
.2+|Defect 4+^| Maximum permissible mass fraction of defects in husked rice +
stem:[w_max]
| in husked rice | in milled rice (non-glutinous) | in husked parboiled rice | in milled parboiled rice

| Extraneous matter: organic footnote:[Organic extraneous matter includes foreign seeds, husks, bran, parts of straw, etc.] | 1,0 | 0,5 | 1,0 | 0,5
|===

### Inline clause numbers

For some clauses (notably test methods), the clause heading appears inline with the clause, instead of being separated on a different line. This is indicated in Asciidoc by the option attribute inline-header:

[%inline-header]
[[AnnexA-2-1]]
==== Sample divider,

consisting of a conical sample divider

### Bibliographic details

Citations can include details of where in the document the citation is located; these are entered by suffixing the type of locality, then an equals sign, then the reference. The word "whole" on its own is also treated as a locality. Multiple instances of locality and reference can be provided, delimited by comma or colon. Any trailing text after the sequence of locality=reference (or locality, space, reference) are treated as substitute text, as would occur normally in an Asciidoctor crossreference. For example:

<<ISO712,the foregoing reference>>     # renders as: the foregoing reference
<<ISO712,section=5, page 8-10>>         # renders as: ISO 712, Section 5, Page 8-10
<<ISO712,section=5, page=8-10: 5:8-10>> # renders as ISO 712, 5:8-10 ("5:8-10" treated as replacement text for all the foregoing)
<<ISO712,whole>>                        # renders as: ISO 712, Whole of text

The references cannot contain spaces. Any text following the sequence of localities will be displayed instead of the localities.

A custom locality can be entered by prefixing it with locality::

<<ISO712,locality:frontispiece=5, page=8-10>>         # renders as: ISO 712, Frontispiece 5, Page 8-10

Custom localities may not contain commas, colons, or space. Localities with the locality: prefix are recognised in internationalisation configuration files.

### Block Quotes

As in normal Asciidoctor, block quotes are preceded with an author and a citation; but the citation is expected to be in the same format as all other citations, a cross-reference optionally followed by text, which may include the bibliographic sections referenced:

[quote, ISO, "ISO7301,section 1"]
_____
This International Standard gives the minimum specifications for rice (_Oryza sativa_ L.)
which is subject to international trade. It is applicable to the following types: husked rice
and milled rice, parboiled or not, intended for direct human consumption. It is neither
applicable to other products derived from rice, nor to waxy rice (glutinous rice).
_____

### Image size

The value auto is accepted for image width and height attributes. It is only passed on to HTML output; if the output is to Word, both the width and height attributes are stripped from the image.

[height=90,width=auto]
image::logo.jpg

### Subclauses in Terms & Definitions sections

Normally any terminal subclause in a Terms & Definitions section is treated as a term definition. Exceptionally, an introductory section can be tagged to be treated as a clause, instead of a term, by prefixing it with the style attribute [.nonterm].

== Terms and definitions

[.nonterm]
=== Introduction
The following terms have non-normative effect, and should be ignored by the ametrical.

=== Anapaest

metrical foot consisting of a short, a long, and a short

Any clause within a Terms & Definitions section which is a nonterminal subclause (has child nodes) is automatically itself a terms (or definitions) section. On the other hand, any descendant of a nonterm clause is also a nonterm clause.

### Cross-references to external documents

Metanorma Asciidoctor, like normal Asciidoctor, will process cross-references to anchors within external documents. So document1.adoc will be processed as a link to anchor #b in document document1.adoc. The .adoc suffix is presupposed for Asciidoctor documents (as in normal Asciidoctor): it is stripped in Metanorma XML, and substituted with the extension of the current document type when rendered. So document1.adoc is rendered in Metanorma XML as <xref target="document1#b">, in HTML as <a href="document1.html#b">, and in PDF as <a href="document1.pdf#b">.

### Sections embedded more than 5 levels

Asciidoctor permits only 5 levels of section embedding (not counting the document title). Standards do contain more levels of embedding; ISO/IEC DIR 2 only considers it a problem if there are more than 7 levels of embedding. To realise higher levels of embedding, prefix a 5-level section title with the attribute level=:

====== Clause 5

[level=6]
===== Clause 6

[level=7]
====== Clause 7A

[level=7]
====== Clause 7B

[level=6]
====== Clause 6B

====== Clause 5B

This generates the following ISO XML:

<clause id="_" inline-header="false" obligation="normative">
<title>
Clause 5
</title>
<title>
Clause 6
</title>
<title>
Clause 7A
</title>
</clause>
<title>
Clause 7B
</title>
</clause>
</clause>
<title>
Clause 6B
</title>
</clause>
</clause>
<title>
Clause 5B
</title>
</clause>

### PlantUML

The PlantUML diagramming tool is integrated with Asciidoctor in this gem, as a literal block with the style attribute plantuml:

[plantuml]
....
@startuml
Alice -> Bob: Authentication Request
Bob --> Alice: Authentication Response

Alice -> Bob: Another authentication Request
Alice <-- Bob: another authentication Response
@enduml
....

The integration runs PlantUML for each such block, generating a PNG image. The images are stored in the plantuml directory, and linked into the output document in place of the PlantUML.

PlantUML needs to be installed by users separately, and accesssible from the command line:

• brew install plantuml on MacOS.

• For Linux, link the PlantUML jar file into a command line executable; see .travis.yml for an example.

If PlantUML is not installed locally, the source PlantUML is incorporated into the output document as sourcecode.

## Bibliography integration

Bibliographic entries for standards are expected to use the standard document identifier as the item label; e.g.

* [[[ref1,ISO 712]]], _Cereals and cereal products -- Determination of moisture content -- Reference method_

By default, the relaton gem is used to look up the reference details for standards known to have online bibliographies. For bibliographic standards to be looked up via relaton, the standard document identifier needs to be encoded in a format recognised by relaton as a key:

• For ISO: ISO(identifier), or any identifier prefixed with ISO

• For IEC: IEC(identifier), or any identifier prefixed with IEC

• For IETF: IETF(identifier) (e.g. IETF(I-D.-burger-xcon-mmodels)), or any identifier prefixed with RFC

• For GB: CN(identifier) (e.g. CN(JB/T 13368-2018))

The full bibliographic details of the item are screenscraped from the online bibliography and inserted into the XML file (although only the title of the reference is used in rendering).

In addition, if any entries in Terms and Definitions cite the International Electrotechnical Vocabulary (IEV), the IEV Electropedia termbank is queried during validation, to confirm that the cited entries are the same as what is cited online; those queries are routed through the iev gem

The results of all relaton searches done to date, across all documents, are cached in the global cache file ~/.relaton/cache, so they do not need to be re-fetched each time a document is processed. (The web query takes a few seconds per reference.)

The results of all relaton searches done to date in a given directory are stored in the same directory as the current document, by default to the file relaton/cache. (The filename can be overriden in document attributes.) The local cache overrides entries in the global cache, and can be manually edited. The local cache is only used if the :local-cache: or :local-cache-only: document attribute is set.

If the document attribute :no-isobib: is set, the reference details for items are not looked up via isobib, and the isobib caches are not used. If the document attribute :no-isobib-cache: is set, the reference details for items are still looked up via isobib, but the isobib caches are not used.

Any entry in the cache that corresponds to an undated ISO reference fetches its details from the latest available entry on the ISO web site. If the entry is more than 60 days old, it is refetched.

The results of all iev searches done to date across all documents are cached in the global cache fule ~/iev.pstore, and the results of all iev searches done to date for the current document are stored in the same directory as the current document, in the file (filename).iev.pstore.

## Document Attributes

The gem relies on Asciidoctor document attributes to provide necessary metadata about the document. These include:

:nodoc:

Do not generate Word and HTML output, only generate XML output. Can be used as a command-line option (like all other document attributes): asciidoctor -a nodoc -b iso -r "metanorma-iso" a.adoc

:novalid:

Suppress validation.

:flush-caches:

If set, delete and reinitialise the cache of relaton searches.

:no-isobib:

If set, do not use the relaton or iev gem functionality to look up ISO and IEV references online, nor the cache of relaton and iev searches.

:no-isobib-cache:

If set, use the relaton and iev gem functionality to look up ISO and IEV references online, but do not use the cache of relaton and iev searches.

:local-cache:

Use the local relaton and iev search caches to override the global relaton and iev search caches. If a directory name is given for the attribute, that name overrides relaton as the cache name.

:local-cache-only:

Use the local relaton and iev search caches to the exclusion of the global relaton and iev search caches. If a directory name is given for the attribute, that name overrides relaton as the cache name.

:i18nyaml:

Name of YAML file of internationalisation text, to use instead of the built-in English, French or Chinese text used to label parts of the document (e.g. "Table", "Foreword", boilerplate text for Normative References, etc.) Use if you wish to output an standard in a language other than those three. A sample YAML file for English, with "Foreword" replaced with "Frontispiece", is available at https://github.com/riboseinc/metanorma-iso/blob/master/spec/examples/english.yaml

:docnumber:

The numeric component of the document identifier (mandatory). The full identifier is formed by prefixing and suffixing this element with other strings derived from metadata.

:edition:

The document edition

:revdate:

The date the document was last updated

:copyright-year:

The year which will be claimed as when the copyright for the document was issued

:library-ics:

The ICS (International Categorization for Standards) number for the standard. There may be more than one ICS for a document; if so, they should be comma-delimited. (The ics identifier is added to the document metadata, but is not output to the current document templates.)

:title:

The title of the document. If not supplied, the built-in Asciidoctor title (first line of document header) is used instead.

:title-XX:

The title of the document in the language XX (presumed to be a ISO 639-1 code).

:doctype:

The document type; e.g. "standard", "guide", "report".

:status:

The status of the document; e.g. "draft", "published".

:technical-committee:

The name of the relevant technical committee

:fullname{_i}:

The full name of a person who is a contributor to the document. A second person is indicated by using a numeric suffix: :fullname:, :fullname_2:, fullname_3:, &c. (This and the other personal name attributes are not displayed in all standards.)

:surname{_i}:

The surname of a person who is a contributor to the document.

:givenname{_i}:

The given name(s) of a person who is a contributor to the document.

:initials{_i}:

The initials(s) of a person who is a contributor to the document.

:role{_i}:

The role of a a person who is a contributor to the document. By default, they are coded as an editor; they can also be represented as an author.

:affiliation{_i}:

The organisational affiliation of a person who is a contributor to the document.

:address{_i}:

The organisational address of a person who is a contributor to the document.

:contributor-uri{_i}:

The URI of a person who is a contributor to the document.

:email{_i}:

The email of a person who is a contributor to the document.

:draft:

The document draft (used in addition to document stage, for multiple iterations: expected format X.Y)

:issued-date:

The date on which the standard was issued (authorised for publication by the issuing authority).

:published-date:

The date on which the standard was published (distributed by the publisher).

:implemented-date:

The date on which the standard became active.

:created-date:

The date on which the first version of the standard was created.

:updated-date:

The date on which the current version of the standard was updated.

:obsoleted-date:

The date on which the standard was obsoleted/revoked.

:confirmed-date:

The date on which the standard was reviewed and approved by the issuing authority.

:unchanged-date:

The date on which the standard was last renewed without any changes in content.

:circulated-date:

The date on which the unpublished standard was last circulated officially as a preprint. For standards, this is associated with the latest transition to a formally defined preparation stage, such as Working Draft or Committee Draft.

:date:

An arbitrary date in the production of the standard. Content of the attribute should be a token, giving the type of date, then space, then the date itself. Multiple dates can be added as :date_2:, date_3, etc.

:uri:

The URI to which this standard is published.

:xml-uri:

The URI to which the (Metanorma) XML representation of this standard is published.

:html-uri:

The URI to which the HTML representation of this standard is published.

:pdf-uri:

The URI to which the PDF representation of this standard is published.

:doc-uri:

The URI to which the DOC representation of this standard is published.

:relaton-uri:

The URI to which the Relaton XML representation of this standard is published.

:language:

The language of the document (en or fr). Defaults to en.

:script:

The script of the document (defaults to Latn). Must be supplied as Hans for Simplified Chinese.

:publisher:

The standards agency publishing the standard; can be multiple (comma-delimited). Defaults to ISO.

:body-font:

Font for body text; will be inserted into CSS. Defaults to Cambria for Latin script, SimSun for Simplified Chinese.

:header-font:

Font for headers; will be inserted into CSS. Defaults to Cambria for Latin script, SimHei for Simplified Chinese.

:monospace-font

Font for monospace; will be inserted into CSS. Defaults to Courier New.

:htmlstylesheet

SCSS stylesheet to use for HTML output. Defaults to built in stylesheet, which adheres with ISO formatting requirements. Recommend against overriding this.

:htmlcoverpage

HTML template for cover page. Defaults to built in template. Recommend against overriding this.

:htmlintropage

HTML template for introductory section. Defaults to built in template. Recommend against overriding this.

:scripts

Javascript scripts for HTML output. Defaults to built in scripts. Recommend against overriding this.

:scripts-pdf

Javascript scripts for HTML > PDF output. Defaults to built in scripts. Recommend against overriding this.

:wordstylesheet

Primary SCSS stylesheet to use for Word output. Defaults to built in stylesheet, which adheres with ISO formatting requirements. Recommend against overriding this.

:standardstylesheet

Secondary SCSS stylesheet use for Word output. Defaults to built in stylesheet, which adheres with ISO formatting requirements. Recommend against overriding this.

:header

Header and footer file for Word output. Defaults to built in template. Recommend against overriding this.

:wordcoverpage

Word template for cover page. Defaults to built in template. Recommend against overriding this.

:wordintropage

Word template for introductory section. Defaults to built in template. Recommend against overriding this.

:ulstyle

Word CSS selector for unordered lists in supplied stylesheets. Defaults to value for built in stylesheet. Recommend against overriding this.

:olstyle

Word CSS selector for ordered lists in supplied stylesheets. Defaults to value for built in stylesheet. Recommend against overriding this.

:data-uri-image

Encode all images in HTML output as inline data-URIs. Defaults to true.

:smartquotes

Apply smartquotes and other autoformatting to the XML output (and hence the downstream outputs) (default true). The rules for smart formatting follow the sterile gem, and are given in https://github.com/pbhogan/sterile/blob/master/lib/sterile/data/smart_format_rules.rb. If :smartquotes: is set to false, then the Asciidoctor default is used to generate smart quotes: " ", ' '.

The attribute :draft:, if present, includes review notes in the XML output; these are otherwise suppressed.

The document proper can reference the values of document attributes, which is convenient for reusability. For example,

This document was prepared by Technical Committee ISO/TC {technical-committee-number}, _{technical-committee}_, Subcommittee SC {subcommittee-number}, _{subcommittee}_.

If the corresponding document attributes are not populated in the header, then the references themselves will not be populated.