Skip to content

Commit 4cc07f1

Browse files
committed
✨ Modular Metadata
1 parent 898d817 commit 4cc07f1

2 files changed

Lines changed: 296 additions & 0 deletions

File tree

content/RFC0006/index.md

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
---
2+
title: Modular Metadata
3+
abstract: |
4+
Proposes a reserved `metadata` property on every OXA node that carries structured information about the node's content — title, authors, affiliations, funding, licenses, identifiers, and other descriptive data. Metadata is modular, referenceable, and composable: it propagates from parent to child, can be overridden at any level of the tree, and supports cross-references between metadata entries. This RFC establishes the principles and structural conventions for metadata; the specific field definitions are deferred to future RFCs.
5+
---
6+
7+
This RFC introduces a `metadata` property available on every OXA content node. The property provides a structured place for descriptive information — authorship, licensing, identifiers, titles, affiliations, funding, and similar concerns — that applies to the node and, by default, to all of its descendants.
8+
9+
The design is motivated by modular scientific publishing, where individual components of a document (a figure panel, an embedded image, a chapter) may have distinct authorship, licensing, or provenance from the containing document. Rather than requiring all metadata to live at the document root, OXA treats metadata as a contextual property that flows down the tree and can be narrowed or replaced at any node.
10+
11+
This RFC lays out the principles of the approach. It does not define the specific metadata fields (e.g. the shape of an author object or the license vocabulary) — those will be specified in subsequent RFCs that can draw on this structural foundation.
12+
13+
## Motivation
14+
15+
Scientific documents are not monolithic. A single article may contain:
16+
17+
- **Figures** contributed by a collaborator who is not a document author
18+
- **Panels within a figure** created by a subset of the figure's authors
19+
- **Embedded images** sourced from external works with different licenses
20+
- **Chapters** in a collection, each written by different author groups
21+
- **Datasets** with their own DOIs, funders, and data-availability statements
22+
23+
Current document formats handle this poorly. JATS places all metadata in a single `<front>` section at the document level; there is no standard mechanism for per-component authorship or licensing. LaTeX has no native metadata model at all. Pandoc's YAML frontmatter is document-level only.
24+
25+
OXA needs metadata that is **modular** — it must be possible to describe _any_ node in the tree with its own metadata context, independent of the document root.
26+
27+
## Proposal
28+
29+
### The `metadata` Property
30+
31+
Every OXA node MAY include a `metadata` property. This is a reserved key, distinct from `data` (the general extension bucket defined in RFC0002). Where `data` is an unstructured escape hatch for tool-specific or experimental fields, `metadata` is a structured, well-defined space for descriptive information about the node's content.
32+
33+
```typescript
34+
interface Node {
35+
type: string;
36+
children?: Node[];
37+
value?: string;
38+
data?: Record<string, unknown>;
39+
metadata?: Metadata;
40+
}
41+
```
42+
43+
The `Metadata` type will be defined in detail by subsequent RFCs. For the purposes of this RFC, it is an object that may contain fields such as:
44+
45+
```typescript
46+
interface Metadata {
47+
title?: InlineContent[];
48+
subtitle?: InlineContent[];
49+
authors?: (AuthorData | MetadataReference)[];
50+
license?: LicenseData | MetadataReference;
51+
identifiers?: Record<string, string>;
52+
affiliations?: (AffiliationData | MetadataReference)[];
53+
funding?: (FundingData | MetadataReference)[];
54+
// ... additional fields defined by future RFCs
55+
}
56+
```
57+
58+
### Metadata Context and Propagation
59+
60+
The top-level node in a tree establishes the **metadata context** for all of its descendants. Children inherit the parent's metadata unless they provide their own `metadata` property, in which case the child's metadata becomes the new context for that subtree.
61+
62+
This is analogous to how the JATS `<front>` section describes the document — but generalized to any node in the tree.
63+
64+
```yaml
65+
type: Document
66+
metadata:
67+
title:
68+
- type: Text
69+
value: 'Seismic Observations of the 2024 Noto Peninsula Earthquake'
70+
authors:
71+
- identifier: rowan
72+
name: Rowan Cockett
73+
orcid: 0000-0002-7859-8394
74+
- identifier: tracy
75+
name: Tracy K. Teal
76+
orcid: 0000-0002-9180-9598
77+
license:
78+
id: CC-BY-4.0
79+
children:
80+
- type: Heading
81+
level: 1
82+
children:
83+
- type: Text
84+
value: 'Introduction'
85+
- type: Paragraph
86+
children:
87+
- type: Text
88+
value: 'This document demonstrates modular metadata...'
89+
- type: Image
90+
src: 'https://example.com/seismic-map.png'
91+
metadata:
92+
authors:
93+
- xref: '@rowan'
94+
roles:
95+
- Visualization
96+
license:
97+
id: CC-BY-4.0
98+
```
99+
100+
In this example:
101+
102+
- The `Document` node establishes authorship and licensing for the entire tree.
103+
- The `Image` node overrides the metadata context: it credits a specific author with a specific role, and declares its own license. The `Heading` and `Paragraph` nodes inherit the document-level metadata.
104+
- The image's author entry uses a cross-reference (`xref: '@rowan'`) to point back to the full author definition in the document metadata, rather than duplicating the data.
105+
106+
### Principles
107+
108+
#### 1. Modular
109+
110+
Metadata can be attached to any node. A figure, a panel within a figure, a chapter, an embedded dataset — any node that needs its own descriptive context can carry `metadata`. This supports modular science, where components are authored, licensed, and identified independently.
111+
112+
**Example:** A figure composed of four panels, where panel (b) was created by a different research group:
113+
114+
```yaml
115+
type: Figure
116+
metadata:
117+
authors:
118+
- xref: '@rowan'
119+
- xref: '@tracy'
120+
children:
121+
- type: Image
122+
src: 'panel-a.png'
123+
- type: Image
124+
src: 'panel-b.png'
125+
metadata:
126+
authors:
127+
- identifier: external-collab
128+
name: J. Martinez
129+
orcid: 0000-0001-2345-6789
130+
affiliations:
131+
- name: Universidad Nacional
132+
license:
133+
id: CC-BY-SA-4.0
134+
- type: Image
135+
src: 'panel-c.png'
136+
- type: Image
137+
src: 'panel-d.png'
138+
```
139+
140+
Panels (a), (c), and (d) inherit the figure-level metadata. Panel (b) has its own authorship and a different license.
141+
142+
#### 2. Referenceable
143+
144+
Metadata entries can be **defined once and referenced elsewhere** in the document. Authors, affiliations, funders, and other entities are given identifiers within the metadata and can be referenced using cross-reference (`xref`) syntax from other metadata sections or from inline content.
145+
146+
**Example:** An author defined in the document metadata and referenced in an acknowledgements section:
147+
148+
```yaml
149+
type: Document
150+
metadata:
151+
authors:
152+
- identifier: rowan
153+
name: Rowan Cockett
154+
orcid: 0000-0002-7859-8394
155+
children:
156+
# ... document content ...
157+
- type: Paragraph
158+
children:
159+
- type: Text
160+
value: 'In this manuscript, '
161+
- type: CrossReference
162+
xref: '@rowan'
163+
kind: Person
164+
children:
165+
- type: Text
166+
value: 'R. C.'
167+
- type: Text
168+
value: ' conceived the study and wrote the initial draft.'
169+
```
170+
171+
The `@` prefix distinguishes metadata references from content references (e.g. `#fig1` for a figure, `@rowan` for a metadata reference, like authors). The exact cross-reference mechanics will be defined in a future RFC on cross-references.
172+
173+
#### 3. Composable
174+
175+
Metadata references can be **composed** — a new node can reference existing metadata entries while adding or overriding specific fields. This avoids duplication and keeps the source of truth in one place.
176+
177+
**Example:** An image that references an existing author but adds a role specific to this context:
178+
179+
```yaml
180+
type: Image
181+
src: 'visualization.png'
182+
metadata:
183+
authors:
184+
- xref: '@rowan'
185+
roles:
186+
- Visualization
187+
- Software
188+
```
189+
190+
The image does not redefine Rowan's name, ORCID, or affiliations — it references the canonical entry and layers on context-specific roles. This composition pattern means that updating the author's ORCID in the document metadata automatically propagates to all references.
191+
192+
### Metadata Identifiers
193+
194+
All identifiable entries within metadata (authors, affiliations, funders, grants, venues, etc.) carry an `identifier` field. These identifiers:
195+
196+
- MUST be unique across all metadata in the document (i.e. you cannot have an author and an affiliation with the same identifier)
197+
- Need NOT be unique across the content of the document — a section with identifier `csf` and a metadata affiliation with identifier `csf` occupy different namespaces (content and metadata respectively)
198+
- Are referenced using the `@` prefix in cross-references (e.g. `@rowan`, `@csf`)
199+
200+
The `@` prefix is a convention proposed by this RFC to distinguish metadata references from content references. A future cross-reference RFC will formalize the full syntax, including how to disambiguate when content and metadata identifiers overlap.
201+
202+
### Title and Subtitle
203+
204+
Titles and subtitles are included in `metadata` as inline content arrays (`InlineContent[]`), allowing rich formatting (e.g. math, emphasis, superscripts in titles).
205+
206+
A node's metadata title and its content are distinct concepts. A figure may have a caption (in its `children`) that differs from the metadata title inherited from the image's original source. Both can coexist:
207+
208+
- The **metadata title** describes the node for indexing, citation, and metadata propagation purposes
209+
- The **content title** (e.g. a caption, heading) is what appears in the rendered document
210+
211+
This distinction is useful when embedding components from external sources. An image sourced from a different publication carries its original metadata title, but the containing figure may present it with a different caption in context.
212+
213+
Titles in metadata do need to be traversable by tree algorithms for transformations (e.g. resolving cross-references within a title). Because `metadata.title` is an array of inline nodes — the same types that appear in `children` — existing tree walkers can be extended to traverse metadata content with minimal additional complexity.
214+
215+
### Unknown and Experimental Metadata
216+
217+
Metadata that does not fit a defined field SHOULD be placed in the node's `data` property (RFC0002), not in `metadata`. The `metadata` property is reserved for structured, well-defined fields specified by RFCs. This keeps `metadata` predictable for tooling while preserving `data` as the extension point for experimental or tool-specific information.
218+
219+
## Relationship to JATS
220+
221+
The document-level `metadata` is analogous to the JATS `<front>` element, which contains `<article-meta>` with title, authors, affiliations, funding, licenses, and identifiers. Future RFCs that define the specific metadata fields SHOULD aim for mostly lossless mapping to and from JATS `<front>`, with the understanding that some JATS elements may be omitted where open alternatives exist (e.g. preferring ROR over Ringold for organization identifiers, or ORCID over proprietary author IDs)[^jats-lossless].
222+
223+
[^jats-lossless]: There are elements of JATS that we may choose to not include in this metadata, for example, support for non-open identifiers that have open alternatives (e.g. Ringold).
224+
225+
The key difference from JATS is that OXA metadata is not restricted to the document root. Any node can carry `metadata`, enabling per-component attribution and licensing that JATS does not natively support.
226+
227+
| Concern | JATS | OXA |
228+
| ----------------- | --------------------------------------------------- | ---------------------------------- |
229+
| Metadata scope | Document-level only (`<front>`) | Any node in the tree |
230+
| Author per-figure | Not natively supported | `metadata.authors` on any node |
231+
| License per-asset | `<license>` in `<permissions>`, document-level only | `metadata.license` on any node |
232+
| Identifiers | `<article-id>`, fixed vocabulary | `metadata.identifiers`, extensible |
233+
| Extension | Custom XML elements or processing instructions | `data` property (RFC0002) |
234+
235+
## Alternatives Considered
236+
237+
### Backmatter Node
238+
239+
We considered a `Backmatter` block-level node that would live exactly once as the last child of any tree and contain contributor definitions, affiliations, funding information, and supporting sections (data availability, acknowledgements, etc.).
240+
241+
This approach was rejected because:
242+
243+
- It conflates **metadata** (descriptive information about the content) with **content** (sections like acknowledgements that are part of the narrative). Acknowledgements are content that happen to appear at the end; they belong in the tree as regular nodes, not in a special container.
244+
- It does not support per-component metadata. A `Backmatter` on the document root cannot express that a specific image has different authorship.
245+
- It raises awkward questions about where to define new metadata entries that are first introduced mid-document (e.g. an author who only contributed one figure). With the `metadata` property approach, the author can be defined where they are first relevant — either on the document node (if they should be discoverable at the top level) or on the specific component.
246+
247+
### Metadata on CrossReference Nodes
248+
249+
An alternative for mid-document author definitions would be to allow `CrossReference` nodes to carry `metadata` that defines new entries inline:
250+
251+
```yaml
252+
type: CrossReference
253+
metadata:
254+
authors:
255+
- identifier: someone
256+
name: 'A. Helpful Person'
257+
xref: '@someone'
258+
children:
259+
- type: Text
260+
value: 'Person'
261+
```
262+
263+
While this works mechanically, it adds complexity to cross-reference semantics — a `CrossReference` would sometimes _define_ metadata rather than just _reference_ it. The simpler approach is to define all metadata entries on the appropriate container node (typically the document root) and reference them from content. This keeps the definition site predictable and avoids special-casing `CrossReference` for metadata propagation.
264+
265+
## Open Questions
266+
267+
- **Identifier scoping:** Should metadata identifiers be required to start with a special prefix (e.g. `person:rowan`, `org:csf`), or is the `@` reference prefix sufficient to prevent conflicts with content identifiers? Starting without type prefixes keeps the syntax lighter, but may need revisiting if collision patterns emerge.
268+
- **Propagation semantics:** When a child node provides `metadata`, does it _replace_ the parent context entirely, or _merge_ with it? Full replacement is simpler and more predictable; merging risks ambiguity about which fields are inherited vs. overridden. This RFC proposes full replacement as the default — a child with `metadata` establishes a new, complete context for its subtree.
269+
- **Metadata field definitions:** The specific shapes of author, affiliation, funding, license, and identifier objects are intentionally deferred. Future RFCs should define these, drawing on JATS, schema.org, DataCite, and CRediT for established vocabularies.
270+
- **Tree traversal of metadata content:** Metadata fields like `title` contain inline node arrays. Should tree-walking algorithms traverse `metadata` by default, or require explicit opt-in? Traversing by default ensures transformations (e.g. resolving cross-references in titles) work transparently, but increases the surface area that algorithms must handle.
271+
- **Inline metadata references:** Can metadata entries be referenced freely from inline content (e.g. an author callout in the acknowledgements)? This RFC proposes yes — a `CrossReference` node with `xref: '@rowan'` can appear anywhere in the document. The rendering of such references (e.g. expanding to the author's full name, linking to their ORCID) is a renderer concern.
272+
273+
## Implications
274+
275+
If accepted, this RFC:
276+
277+
- Reserves `metadata` as a property on all OXA nodes, alongside `type`, `children`, `value`, and `data`
278+
- Establishes metadata propagation as a core tree semantic: parent metadata applies to children unless overridden
279+
- Introduces the `@` prefix convention for metadata cross-references, to be formalized in a future cross-reference RFC
280+
- Provides the structural foundation for future RFCs to define specific metadata fields (authors, licenses, identifiers, etc.)
281+
- Enables per-component attribution and licensing, supporting modular scientific publishing
282+
- Maintains a clear separation between structured metadata (`metadata`) and unstructured extensions (`data`)
283+
284+
## Decision
285+
286+
Acceptance of this RFC establishes the `metadata` property as a reserved, structured extension point on every OXA node, enabling modular, referenceable, and composable metadata throughout the document tree. Subsequent RFCs will define the specific metadata vocabularies (authorship, licensing, funding, identifiers) within this framework.

content/RFC0006/myst.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# See docs at: https://mystmd.org/guide/frontmatter
2+
version: 1
3+
extends:
4+
- ../rfc.yml
5+
project:
6+
id: fcefbe87-7299-4877-8685-e16ff2862e74
7+
short_title: Metadata
8+
date: 2026-04-22
9+
authors:
10+
- rowanc1

0 commit comments

Comments
 (0)