|
| 1 | +--- |
| 2 | +title: Images and Media |
| 3 | +abstract: | |
| 4 | + Defines four OXA node types for visual and media content: `Image` (a block-level still image), `InlineImage` (an inline still image), `Video` (a block-level video or animation), and `InlineVideo` (an inline video). These nodes provide a minimal, URL-based representation of media objects aligned with Markdown, HTML, JATS, and schema.org conventions. |
| 5 | +--- |
| 6 | + |
| 7 | +This RFC introduces four node types (`Image`, `InlineImage`, `Video`, and `InlineVideo`) for representing images and video in OXA documents. Media objects are fundamental to scientific and technical writing — figures, diagrams, plots, animations, and video supplements are integral to how research is communicated and understood. |
| 8 | + |
| 9 | +The design follows the naming convention established in RFC0003 (block-level default, `Inline` prefix for inline variants) and keeps the initial property set deliberately minimal: a URL, an encoding format, and alternative text. Future RFCs may introduce richer media containers (e.g. `Figure` with captions, labels, and numbering) that wrap these primitive media nodes. |
| 10 | + |
| 11 | +## Motivation & Background |
| 12 | + |
| 13 | +Every document format supports embedded media, but the abstraction level varies: |
| 14 | + |
| 15 | +- **Markdown** uses `` for images — simple, inline-capable, no video support |
| 16 | +- **HTML** separates `<img>` (images) from `<video>` (video/animation), with attributes for `src`, `alt`, `type`, `width`, `height` |
| 17 | +- **JATS** distinguishes `<graphic>` / `<inline-graphic>` (still images) from `<media>` / `<inline-media>` (video, audio, animations), with `@mimetype`, `@mime-subtype`, and `@xlink:href` |
| 18 | +- **schema.org** models these as [`ImageObject`](https://schema.org/ImageObject) and [`VideoObject`](https://schema.org/VideoObject), subtypes of `MediaObject`, with properties like `contentUrl`, `encodingFormat`, and `caption` |
| 19 | + |
| 20 | +Across these systems, a consistent pattern emerges: |
| 21 | + |
| 22 | +1. **Still images and video/animation are distinct** — they have different rendering requirements, accessibility concerns, and player semantics |
| 23 | +2. **Block and inline placement matter** — a full-width figure image behaves differently from an inline icon or equation graphic |
| 24 | +3. **The core data is a URL and a format** — everything else (captions, labels, sizing, positioning) belongs to the containing structure |
| 25 | + |
| 26 | +OXA follows this pattern by defining four nodes that serve as the primitive media references, separate from the higher-level containers (like `Figure`) that will provide captions, labels, and layout semantics in a future RFC. |
| 27 | + |
| 28 | +### Why Separate Image and Video Types |
| 29 | + |
| 30 | +JATS uses distinct elements for still images (`<graphic>`) and time-based media (`<media>`) because they have fundamentally different rendering and accessibility requirements: |
| 31 | + |
| 32 | +- Images are rendered immediately and completely; videos require player controls, buffering, and temporal navigation |
| 33 | +- Images have a single visual representation; videos have duration, frame rate, and potentially audio tracks |
| 34 | +- Screen readers describe images with alt text; video accessibility involves captions, transcripts, and audio descriptions |
| 35 | + |
| 36 | +While a single "media" node with a MIME type could theoretically cover both, this conflates presentation semantics that tooling needs to distinguish. Separate types make the tree self-describing — a walker can find all images or all videos without inspecting MIME types. |
| 37 | + |
| 38 | +### Why Not `Graphic` and `Media` |
| 39 | + |
| 40 | +JATS uses `<graphic>` and `<media>` — names inherited from SGML-era publishing workflows. OXA prefers `Image` and `Video` because: |
| 41 | + |
| 42 | +- They align with HTML (`<img>`, `<video>`), the dominant rendering target |
| 43 | +- They align with schema.org (`ImageObject`, `VideoObject`), the dominant structured data vocabulary |
| 44 | +- They are immediately understood by developers and authors — `Image` unambiguously means a still picture; `Graphic` could mean a vector illustration, a chart, or a design asset |
| 45 | +- `Media` is overly broad — in JATS it covers video, audio, datasets, and arbitrary binary objects. OXA benefits from precise types — `Image`, `Video`, and in the future `Audio`, as well as computational media types (e.g. interactive visualizations, notebooks, executable figures) that have their own distinct rendering, execution, and accessibility requirements |
| 46 | + |
| 47 | +## Proposed Node Types |
| 48 | + |
| 49 | +### Image |
| 50 | + |
| 51 | +A **block-level** node representing a still image (photograph, diagram, chart, illustration, etc.). |
| 52 | + |
| 53 | +```typescript |
| 54 | +interface Image extends Node { |
| 55 | + type: 'Image'; |
| 56 | + url: string; |
| 57 | + alt?: string; |
| 58 | + encodingFormat?: string; |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +**Fields:** |
| 63 | + |
| 64 | +- `url` — the URL or path to the image file. This corresponds to `contentUrl` in schema.org, `@xlink:href` in JATS, and `src` in HTML. URLs may be fully qualified (`https://cdn.example.com/images/fig1.png`) or relative to the document (`figures/scatter.png`). Relative URLs are preferred for portability — they allow the same document tree to be served through different URL resolution strategies at render time. For example, a deployment pipeline may resolve relative paths through a CDN function (e.g. a Cloudflare Worker that maps `figures/scatter.png` to a versioned object in a storage bucket), while a local preview tool resolves them against the filesystem. The document should not embed deployment-specific URL schemes; resolution is a rendering concern. |
| 65 | +- `alt` — alternative text describing the image for accessibility (screen readers) and fallback display. Corresponds to the `alt` attribute in HTML `<img>` and `<alt-text>` in JATS. Alt text should convey the _meaning_ or _purpose_ of the image, not merely describe its visual appearance. |
| 66 | +- `encodingFormat` — the MIME type of the image file (e.g. `"image/png"`, `"image/svg+xml"`, `"image/jpeg"`). Corresponds to `encodingFormat` in schema.org and the combination of `@mimetype` / `@mime-subtype` in JATS. When omitted, the format may be inferred from the URL file extension or HTTP response headers. |
| 67 | + |
| 68 | +`Image` is a leaf node — it has no `children` or `value`. The image content is external, referenced by `url`. This follows the same pattern as JATS `<graphic>`, where the element is a pointer to external content, not a container for it. |
| 69 | + |
| 70 | +In most documents, `Image` will appear inside a higher-level container such as `Figure` (to be defined in a future RFC) that provides captions, labels, and positioning. A bare `Image` node — without a containing `Figure` — represents an unlabeled image embedded directly in the document flow, analogous to a Markdown `` not wrapped in a figure directive, or a JATS `<graphic>` appearing directly in `<body>` or `<p>`. |
| 71 | + |
| 72 | +### InlineImage |
| 73 | + |
| 74 | +An **inline** node representing a still image that participates in inline text flow. |
| 75 | + |
| 76 | +```typescript |
| 77 | +interface InlineImage extends Node { |
| 78 | + type: 'InlineImage'; |
| 79 | + url: string; |
| 80 | + alt?: string; |
| 81 | + encodingFormat?: string; |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +**Fields** are identical to `Image`. |
| 86 | + |
| 87 | +`InlineImage` is used for small images that appear within prose — icons, inline equations rendered as images, small logos, or decorative glyphs. It corresponds to JATS `<inline-graphic>` and an HTML `<img>` used within a `<p>` or `<span>`. |
| 88 | + |
| 89 | +The distinction between `Image` and `InlineImage` is structural, not visual: `Image` is a block-level node that occupies its own position in the document tree (a sibling of `Paragraph`, `Heading`, etc.), while `InlineImage` is an inline node that appears within the `children` array of a `Paragraph` or other inline container. |
| 90 | + |
| 91 | +:::{tip .dropdown} Why Both `Image` and `InlineImage` |
| 92 | + |
| 93 | +A single `Image` node used in both block and inline positions would be simpler, but it creates real problems for tooling and round-tripping: |
| 94 | + |
| 95 | +1. **Tree validation becomes context-dependent.** With a single type, whether an `Image` is valid depends on _where_ it appears — is it a direct child of the document body (block) or nested inside a `Paragraph` (inline)? Separate types make validity checkable locally: an `InlineImage` inside a `Paragraph` is correct by construction; an `Image` there is a type error. This is the same reason HTML has both block and inline elements rather than making all elements context-dependent. |
| 96 | + |
| 97 | +2. **JATS requires the distinction.** JATS uses `<graphic>` (block) and `<inline-graphic>` (inline) as separate elements. Round-tripping through JATS without losing the block/inline distinction requires that OXA preserve it structurally. A single node with a "placement hint" would need to be inferred during JATS export — fragile and lossy. |
| 98 | + |
| 99 | +3. **Markdown parsing produces the distinction naturally.** In CommonMark, `` as the sole content of a paragraph creates a block-level image (the paragraph is typically unwrapped by renderers), while the same syntax mid-sentence is inline. Parsers already know which case they are in — encoding that knowledge in the node type is cheaper and more reliable than reconstructing it later. |
| 100 | + |
| 101 | +4. **Renderers need to know without inspecting parents.** A block image may be rendered as a standalone `<figure>` or full-width `<img>` with margin handling. An inline image is rendered as an `<img>` inside a `<span>` with `vertical-align` and constrained sizing. These are different code paths. A renderer visiting an `InlineImage` knows immediately what to do; a renderer visiting a generic `Image` would need to walk up the tree to determine context. |
| 102 | + |
| 103 | +5. **Consistent with the OXA naming convention.** RFC0003 established the `Code` / `InlineCode` pattern precisely for this reason — block and inline variants are structurally different nodes even when they share the same properties. `Image` / `InlineImage` follows the same precedent. |
| 104 | + |
| 105 | +Markdown gets away with a single syntax because it delegates the block/inline distinction to context and renderer heuristics. OXA, as a structured schema, cannot afford that ambiguity — the tree must be self-describing. |
| 106 | + |
| 107 | +::: |
| 108 | + |
| 109 | +### Video |
| 110 | + |
| 111 | +A **block-level** node representing a video or animation. |
| 112 | + |
| 113 | +```typescript |
| 114 | +interface Video extends Node { |
| 115 | + type: 'Video'; |
| 116 | + url: string; |
| 117 | + alt?: string; |
| 118 | + encodingFormat?: string; |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +**Fields:** |
| 123 | + |
| 124 | +- `url` — the URL or path to the video file. Corresponds to `contentUrl` in schema.org, `@xlink:href` in JATS `<media>`, and `src` in HTML `<video>`. |
| 125 | +- `alt` — alternative text describing the video content for accessibility. For video, alt text should describe what the video shows or demonstrates. Richer video accessibility (captions, transcripts, audio descriptions) is out of scope for this RFC and may be addressed alongside a `Figure` container or dedicated accessibility RFC. |
| 126 | +- `encodingFormat` — the MIME type of the video file (e.g. `"video/mp4"`, `"video/webm"`, `"video/ogg"`). Corresponds to `encodingFormat` in schema.org and `@mimetype` / `@mime-subtype` in JATS. |
| 127 | + |
| 128 | +Like `Image`, `Video` is a leaf node — a pointer to external content. It corresponds to JATS `<media>` with a video MIME type, and schema.org `VideoObject`. |
| 129 | + |
| 130 | +### InlineVideo |
| 131 | + |
| 132 | +An **inline** node representing a video or animation that participates in inline text flow. |
| 133 | + |
| 134 | +```typescript |
| 135 | +interface InlineVideo extends Node { |
| 136 | + type: 'InlineVideo'; |
| 137 | + url: string; |
| 138 | + alt?: string; |
| 139 | + encodingFormat?: string; |
| 140 | +} |
| 141 | +``` |
| 142 | + |
| 143 | +**Fields** are identical to `Video`. |
| 144 | + |
| 145 | +`InlineVideo` is used for small, inline video content — animated icons, short looping demonstrations, or GIF-like animations embedded within prose. It corresponds to JATS `<inline-media>` with a video MIME type. |
| 146 | + |
| 147 | +## Explicitly Deferred |
| 148 | + |
| 149 | +The following concerns are intentionally out of scope for this RFC: |
| 150 | + |
| 151 | +- **Figures** — a container node (`Figure`) that wraps media nodes with captions, labels, numbering, and positioning semantics. This is a separate structural concern and will be addressed in a dedicated RFC. |
| 152 | +- **Width, height, and sizing** — dimensions, aspect ratios, and responsive sizing are rendering concerns that may be addressed as optional properties in a future RFC or handled by the containing `Figure`. |
| 153 | +- **Alternative formats** — JATS supports `<alternatives>` to provide the same content in multiple formats (e.g. a PNG and an SVG of the same diagram, or AVI and MP4 of the same video). This is a valid concern but adds complexity that should be addressed alongside `Figure`. |
| 154 | +- **Audio** — audio content (podcasts, sound clips, narration) has distinct rendering and accessibility requirements. A future `Audio` / `InlineAudio` node pair may be introduced following the same pattern. |
| 155 | +- **Supplementary material** — JATS distinguishes "integral" media (`<graphic>`, `<media>`) from "supplementary" material (`<supplementary-material>`). This distinction is better handled at the container or document-section level. |
| 156 | +- **Thumbnails and poster images** — `VideoObject` in schema.org supports `thumbnail`; HTML `<video>` supports `poster`. These are rendering hints that may be added as optional properties later. |
| 157 | +- **Embedding and streaming** — `embedUrl` (schema.org) and streaming protocols are out of scope; `url` points to a file, not a player. |
| 158 | +- **Licensing and attribution** — media objects frequently carry their own licenses (e.g. a CC-BY photograph in an otherwise CC-BY-SA document) and authorship distinct from the document's authors. JATS handles this with `<permissions>` and `<attrib>` children on `<graphic>` and `<media>`; schema.org uses `license`, `creator`, and `copyrightHolder` on `MediaObject`. A future RFC will define how licensing, attribution, and provenance metadata attach to nodes — these properties will be designed consistently across all node types that need them (images, videos, figures, code, tables, etc.), not as media-specific fields. |
| 159 | + |
| 160 | +## Examples |
| 161 | + |
| 162 | +### Block-Level Image |
| 163 | + |
| 164 | +> A simple image in the document flow. |
| 165 | +
|
| 166 | +Markdown: `` |
| 167 | + |
| 168 | +```yaml |
| 169 | +{ |
| 170 | + type: 'Image', |
| 171 | + url: 'figures/scatter.png', |
| 172 | + alt: 'A scatter plot showing correlation between variables X and Y', |
| 173 | +} |
| 174 | +``` |
| 175 | + |
| 176 | +### Inline Image (Icon in Prose) |
| 177 | + |
| 178 | +> Click the settings icon {icon} to configure. |
| 179 | +
|
| 180 | +```yaml |
| 181 | +{ |
| 182 | + type: 'Paragraph', |
| 183 | + children: |
| 184 | + [ |
| 185 | + { type: 'Text', value: 'Click the settings icon ' }, |
| 186 | + { |
| 187 | + type: 'InlineImage', |
| 188 | + url: 'icons/settings.svg', |
| 189 | + alt: 'settings icon', |
| 190 | + encodingFormat: 'image/svg+xml', |
| 191 | + }, |
| 192 | + { type: 'Text', value: ' to configure.' }, |
| 193 | + ], |
| 194 | +} |
| 195 | +``` |
| 196 | + |
| 197 | +### Block-Level Video |
| 198 | + |
| 199 | +> A video showing the experimental procedure. |
| 200 | +
|
| 201 | +```yaml |
| 202 | +{ |
| 203 | + type: 'Video', |
| 204 | + url: 'supplementary/experiment-v1.mp4', |
| 205 | + alt: 'Video of the droplet formation process under varying pressure conditions', |
| 206 | + encodingFormat: 'video/mp4', |
| 207 | +} |
| 208 | +``` |
| 209 | + |
| 210 | +### Inline Video (Animated Demonstration) |
| 211 | + |
| 212 | +> The particle follows a helical path {animation} under the applied field. |
| 213 | +
|
| 214 | +```yaml |
| 215 | +{ |
| 216 | + type: 'Paragraph', |
| 217 | + children: |
| 218 | + [ |
| 219 | + { type: 'Text', value: 'The particle follows a helical path ' }, |
| 220 | + { |
| 221 | + type: 'InlineVideo', |
| 222 | + url: 'animations/helix.webm', |
| 223 | + alt: 'particle tracing a helical path', |
| 224 | + encodingFormat: 'video/webm', |
| 225 | + }, |
| 226 | + { type: 'Text', value: ' under the applied field.' }, |
| 227 | + ], |
| 228 | +} |
| 229 | +``` |
| 230 | + |
| 231 | +### Image with Encoding Format |
| 232 | + |
| 233 | +```yaml |
| 234 | +{ |
| 235 | + type: 'Image', |
| 236 | + url: 'https://example.com/diagram.svg', |
| 237 | + alt: 'System architecture diagram', |
| 238 | + encodingFormat: 'image/svg+xml', |
| 239 | +} |
| 240 | +``` |
| 241 | + |
| 242 | +## Mapping to Existing Formats |
| 243 | + |
| 244 | +| OXA Node | Markdown | HTML | JATS | schema.org | |
| 245 | +| ------------- | ------------------ | ------------------- | ------------------------ | ------------- | |
| 246 | +| `Image` | `` | `<img>` | `<graphic>` | `ImageObject` | |
| 247 | +| `InlineImage` | `` [^1] | `<img>` (in flow) | `<inline-graphic>` | `ImageObject` | |
| 248 | +| `Video` | — | `<video>` | `<media>` (video) | `VideoObject` | |
| 249 | +| `InlineVideo` | — | `<video>` (in flow) | `<inline-media>` (video) | `VideoObject` | |
| 250 | + |
| 251 | +### Property Mapping |
| 252 | + |
| 253 | +| OXA Property | Markdown | HTML | JATS | schema.org | |
| 254 | +| ---------------- | -------- | ------ | ----------------------------- | ----------------- | |
| 255 | +| `url` | `(url)` | `src` | `@xlink:href` | `contentUrl` | |
| 256 | +| `alt` | `[alt]` | `alt` | `<alt-text>` | `description`[^2] | |
| 257 | +| `encodingFormat` | — | `type` | `@mimetype` + `@mime-subtype` | `encodingFormat` | |
| 258 | + |
| 259 | +## Implications |
| 260 | + |
| 261 | +If accepted, this RFC: |
| 262 | + |
| 263 | +- Introduces `Image`, `InlineImage`, `Video`, and `InlineVideo` as standard OXA node types |
| 264 | +- Establishes the minimal property set (`url`, `alt`, `encodingFormat`) for media references |
| 265 | +- Provides a clear mapping path from Markdown, HTML, JATS, and schema.org |
| 266 | +- Creates the primitive media nodes that a future `Figure` RFC can wrap with captions, labels, and layout semantics |
| 267 | +- Follows the block/inline naming convention from RFC0003 |
| 268 | + |
| 269 | +## Decision |
| 270 | + |
| 271 | +Acceptance of this RFC establishes the media vocabulary for OXA schemas, providing the building blocks for representing visual and video content in a structured, interoperable way. |
| 272 | + |
| 273 | +## References |
| 274 | + |
| 275 | +- **JATS `<graphic>`** — <https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/graphic.html> |
| 276 | +- **JATS `<inline-graphic>`** — <https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/inline-graphic.html> |
| 277 | +- **JATS `<media>`** — <https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/media.html> |
| 278 | +- **JATS `<inline-media>`** — <https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/inline-media.html> |
| 279 | +- **schema.org `ImageObject`** — <https://schema.org/ImageObject> |
| 280 | +- **schema.org `VideoObject`** — <https://schema.org/VideoObject> |
| 281 | +- **HTML `<img>`** — <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img> |
| 282 | +- **HTML `<video>`** — <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video> |
| 283 | +- **CommonMark Images** — <https://spec.commonmark.org/0.31.2/#images> |
| 284 | + |
| 285 | +[^1]: Markdown does not syntactically distinguish block and inline images — the same `` syntax is used in both contexts. The block vs. inline distinction is determined by the parser based on whether the image is the sole content of a paragraph. |
| 286 | + |
| 287 | +[^2]: schema.org `ImageObject` does not have a dedicated `alt` property. The closest mapping is `description` (from `Thing`). The `caption` property on `ImageObject` serves a different purpose — it is a visible caption, not accessibility alt text. |
0 commit comments