Skip to content
Brian Stafford edited this page Oct 22, 2020 · 2 revisions

Thoughts on the ID API

Enforce uniqueness

cmark_node_set_id() should fail if the id already exists. However an application might combine multiple commonmark sources to form a single output document. So enforcing uniqueness within the document may be regarded as necessary but not sufficient. It's probably sufficient to use a linked list of ids since there will be a relatively small number so a linear search should be ok.

Limit Nodes Where ID is Valid?

Would make no sense for text nodes. In XML it would make sense to allow soft and hard breaks to have an ID since these are rendered as XML elements but not for HTML. Maybe easiest to allow all cases since the id is harmless.

Use Case

  • parse document
  • traverse tree to generate TOC - set ids during traversal
  • render document

The application can generate a TOC referencing nodes in the document without any need for special markup in the source document. If an application makes multiple traversals, say for different cross referencing purposes, it could encounter a node where the id is already set. Therefore it should check for an existing id and use that before attempting to set a new id.

This gains greater flexibility than automatically assigning ids to heading elements. For example, an application might prefer to tag the first paragraph following a heading and/or skip the heading if not followed immediately by a paragraph; say it is putting XML documents in an archive and it wants to subsequently retrieve text for a summary.

Immutable

cmark_node_set_id() should fail when attempting to change the id of a node (automatic if the above is implemented). Problematic if combining two documents and the same id exists in each. An application should probably ensure it never generates the same id multiple times across all documents it processes. Ids from user input should be verified for uniqueness.

Updated Spec.

If Commonmark is subsequently modified so that the source document can explicitly set ids, the above strategy will be safe, as the application traversals will find the id, and presumably the modified parser uses the cmark_node_set_id() API internally. If combining Commonmark documents to produce a single output document, immutability of ids could create conflicts, however this applies also if an author explicitly marks up the same id in the same document.

Clone this wiki locally