Skip to content

Commit f002407

Browse files
committed
🌠 Images and Videos
1 parent 898d817 commit f002407

File tree

2 files changed

+297
-0
lines changed

2 files changed

+297
-0
lines changed

content/RFC0006/index.md

Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
---
2+
title: Images and Media
3+
abstract: |
4+
Defines four OXA node types for visual and media content: `Image` (a block-level still image), `InlineImage` (an inline still image), `Video` (a block-level video or animation), and `InlineVideo` (an inline video). These nodes provide a minimal, URL-based representation of media objects aligned with Markdown, HTML, JATS, and schema.org conventions.
5+
---
6+
7+
This RFC introduces four node types (`Image`, `InlineImage`, `Video`, and `InlineVideo`) for representing images and video in OXA documents. Media objects are fundamental to scientific and technical writing — figures, diagrams, plots, animations, and video supplements are integral to how research is communicated and understood.
8+
9+
The design follows the naming convention established in RFC0003 (block-level default, `Inline` prefix for inline variants) and keeps the initial property set deliberately minimal: a URL, an encoding format, and alternative text. Future RFCs may introduce richer media containers (e.g. `Figure` with captions, labels, and numbering) that wrap these primitive media nodes.
10+
11+
## Motivation & Background
12+
13+
Every document format supports embedded media, but the abstraction level varies:
14+
15+
- **Markdown** uses `![alt](url)` for images — simple, inline-capable, no video support
16+
- **HTML** separates `<img>` (images) from `<video>` (video/animation), with attributes for `src`, `alt`, `type`, `width`, `height`
17+
- **JATS** distinguishes `<graphic>` / `<inline-graphic>` (still images) from `<media>` / `<inline-media>` (video, audio, animations), with `@mimetype`, `@mime-subtype`, and `@xlink:href`
18+
- **schema.org** models these as [`ImageObject`](https://schema.org/ImageObject) and [`VideoObject`](https://schema.org/VideoObject), subtypes of `MediaObject`, with properties like `contentUrl`, `encodingFormat`, and `caption`
19+
20+
Across these systems, a consistent pattern emerges:
21+
22+
1. **Still images and video/animation are distinct** — they have different rendering requirements, accessibility concerns, and player semantics
23+
2. **Block and inline placement matter** — a full-width figure image behaves differently from an inline icon or equation graphic
24+
3. **The core data is a URL and a format** — everything else (captions, labels, sizing, positioning) belongs to the containing structure
25+
26+
OXA follows this pattern by defining four nodes that serve as the primitive media references, separate from the higher-level containers (like `Figure`) that will provide captions, labels, and layout semantics in a future RFC.
27+
28+
### Why Separate Image and Video Types
29+
30+
JATS uses distinct elements for still images (`<graphic>`) and time-based media (`<media>`) because they have fundamentally different rendering and accessibility requirements:
31+
32+
- Images are rendered immediately and completely; videos require player controls, buffering, and temporal navigation
33+
- Images have a single visual representation; videos have duration, frame rate, and potentially audio tracks
34+
- Screen readers describe images with alt text; video accessibility involves captions, transcripts, and audio descriptions
35+
36+
While a single "media" node with a MIME type could theoretically cover both, this conflates presentation semantics that tooling needs to distinguish. Separate types make the tree self-describing — a walker can find all images or all videos without inspecting MIME types.
37+
38+
### Why Not `Graphic` and `Media`
39+
40+
JATS uses `<graphic>` and `<media>` — names inherited from SGML-era publishing workflows. OXA prefers `Image` and `Video` because:
41+
42+
- They align with HTML (`<img>`, `<video>`), the dominant rendering target
43+
- They align with schema.org (`ImageObject`, `VideoObject`), the dominant structured data vocabulary
44+
- They are immediately understood by developers and authors — `Image` unambiguously means a still picture; `Graphic` could mean a vector illustration, a chart, or a design asset
45+
- `Media` is overly broad — in JATS it covers video, audio, datasets, and arbitrary binary objects. OXA benefits from precise types — `Image`, `Video`, and in the future `Audio`, as well as computational media types (e.g. interactive visualizations, notebooks, executable figures) that have their own distinct rendering, execution, and accessibility requirements
46+
47+
## Proposed Node Types
48+
49+
### Image
50+
51+
A **block-level** node representing a still image (photograph, diagram, chart, illustration, etc.).
52+
53+
```typescript
54+
interface Image extends Node {
55+
type: 'Image';
56+
url: string;
57+
alt?: string;
58+
encodingFormat?: string;
59+
}
60+
```
61+
62+
**Fields:**
63+
64+
- `url` — the URL or path to the image file. This corresponds to `contentUrl` in schema.org, `@xlink:href` in JATS, and `src` in HTML. URLs may be fully qualified (`https://cdn.example.com/images/fig1.png`) or relative to the document (`figures/scatter.png`). Relative URLs are preferred for portability — they allow the same document tree to be served through different URL resolution strategies at render time. For example, a deployment pipeline may resolve relative paths through a CDN function (e.g. a Cloudflare Worker that maps `figures/scatter.png` to a versioned object in a storage bucket), while a local preview tool resolves them against the filesystem. The document should not embed deployment-specific URL schemes; resolution is a rendering concern.
65+
- `alt` — alternative text describing the image for accessibility (screen readers) and fallback display. Corresponds to the `alt` attribute in HTML `<img>` and `<alt-text>` in JATS. Alt text should convey the _meaning_ or _purpose_ of the image, not merely describe its visual appearance.
66+
- `encodingFormat` — the MIME type of the image file (e.g. `"image/png"`, `"image/svg+xml"`, `"image/jpeg"`). Corresponds to `encodingFormat` in schema.org and the combination of `@mimetype` / `@mime-subtype` in JATS. When omitted, the format may be inferred from the URL file extension or HTTP response headers.
67+
68+
`Image` is a leaf node — it has no `children` or `value`. The image content is external, referenced by `url`. This follows the same pattern as JATS `<graphic>`, where the element is a pointer to external content, not a container for it.
69+
70+
In most documents, `Image` will appear inside a higher-level container such as `Figure` (to be defined in a future RFC) that provides captions, labels, and positioning. A bare `Image` node — without a containing `Figure` — represents an unlabeled image embedded directly in the document flow, analogous to a Markdown `![alt](url)` not wrapped in a figure directive, or a JATS `<graphic>` appearing directly in `<body>` or `<p>`.
71+
72+
### InlineImage
73+
74+
An **inline** node representing a still image that participates in inline text flow.
75+
76+
```typescript
77+
interface InlineImage extends Node {
78+
type: 'InlineImage';
79+
url: string;
80+
alt?: string;
81+
encodingFormat?: string;
82+
}
83+
```
84+
85+
**Fields** are identical to `Image`.
86+
87+
`InlineImage` is used for small images that appear within prose — icons, inline equations rendered as images, small logos, or decorative glyphs. It corresponds to JATS `<inline-graphic>` and an HTML `<img>` used within a `<p>` or `<span>`.
88+
89+
The distinction between `Image` and `InlineImage` is structural, not visual: `Image` is a block-level node that occupies its own position in the document tree (a sibling of `Paragraph`, `Heading`, etc.), while `InlineImage` is an inline node that appears within the `children` array of a `Paragraph` or other inline container.
90+
91+
:::{tip .dropdown} Why Both `Image` and `InlineImage`
92+
93+
A single `Image` node used in both block and inline positions would be simpler, but it creates real problems for tooling and round-tripping:
94+
95+
1. **Tree validation becomes context-dependent.** With a single type, whether an `Image` is valid depends on _where_ it appears — is it a direct child of the document body (block) or nested inside a `Paragraph` (inline)? Separate types make validity checkable locally: an `InlineImage` inside a `Paragraph` is correct by construction; an `Image` there is a type error. This is the same reason HTML has both block and inline elements rather than making all elements context-dependent.
96+
97+
2. **JATS requires the distinction.** JATS uses `<graphic>` (block) and `<inline-graphic>` (inline) as separate elements. Round-tripping through JATS without losing the block/inline distinction requires that OXA preserve it structurally. A single node with a "placement hint" would need to be inferred during JATS export — fragile and lossy.
98+
99+
3. **Markdown parsing produces the distinction naturally.** In CommonMark, `![alt](url)` as the sole content of a paragraph creates a block-level image (the paragraph is typically unwrapped by renderers), while the same syntax mid-sentence is inline. Parsers already know which case they are in — encoding that knowledge in the node type is cheaper and more reliable than reconstructing it later.
100+
101+
4. **Renderers need to know without inspecting parents.** A block image may be rendered as a standalone `<figure>` or full-width `<img>` with margin handling. An inline image is rendered as an `<img>` inside a `<span>` with `vertical-align` and constrained sizing. These are different code paths. A renderer visiting an `InlineImage` knows immediately what to do; a renderer visiting a generic `Image` would need to walk up the tree to determine context.
102+
103+
5. **Consistent with the OXA naming convention.** RFC0003 established the `Code` / `InlineCode` pattern precisely for this reason — block and inline variants are structurally different nodes even when they share the same properties. `Image` / `InlineImage` follows the same precedent.
104+
105+
Markdown gets away with a single syntax because it delegates the block/inline distinction to context and renderer heuristics. OXA, as a structured schema, cannot afford that ambiguity — the tree must be self-describing.
106+
107+
:::
108+
109+
### Video
110+
111+
A **block-level** node representing a video or animation.
112+
113+
```typescript
114+
interface Video extends Node {
115+
type: 'Video';
116+
url: string;
117+
alt?: string;
118+
encodingFormat?: string;
119+
}
120+
```
121+
122+
**Fields:**
123+
124+
- `url` — the URL or path to the video file. Corresponds to `contentUrl` in schema.org, `@xlink:href` in JATS `<media>`, and `src` in HTML `<video>`.
125+
- `alt` — alternative text describing the video content for accessibility. For video, alt text should describe what the video shows or demonstrates. Richer video accessibility (captions, transcripts, audio descriptions) is out of scope for this RFC and may be addressed alongside a `Figure` container or dedicated accessibility RFC.
126+
- `encodingFormat` — the MIME type of the video file (e.g. `"video/mp4"`, `"video/webm"`, `"video/ogg"`). Corresponds to `encodingFormat` in schema.org and `@mimetype` / `@mime-subtype` in JATS.
127+
128+
Like `Image`, `Video` is a leaf node — a pointer to external content. It corresponds to JATS `<media>` with a video MIME type, and schema.org `VideoObject`.
129+
130+
### InlineVideo
131+
132+
An **inline** node representing a video or animation that participates in inline text flow.
133+
134+
```typescript
135+
interface InlineVideo extends Node {
136+
type: 'InlineVideo';
137+
url: string;
138+
alt?: string;
139+
encodingFormat?: string;
140+
}
141+
```
142+
143+
**Fields** are identical to `Video`.
144+
145+
`InlineVideo` is used for small, inline video content — animated icons, short looping demonstrations, or GIF-like animations embedded within prose. It corresponds to JATS `<inline-media>` with a video MIME type.
146+
147+
## Explicitly Deferred
148+
149+
The following concerns are intentionally out of scope for this RFC:
150+
151+
- **Figures** — a container node (`Figure`) that wraps media nodes with captions, labels, numbering, and positioning semantics. This is a separate structural concern and will be addressed in a dedicated RFC.
152+
- **Width, height, and sizing** — dimensions, aspect ratios, and responsive sizing are rendering concerns that may be addressed as optional properties in a future RFC or handled by the containing `Figure`.
153+
- **Alternative formats** — JATS supports `<alternatives>` to provide the same content in multiple formats (e.g. a PNG and an SVG of the same diagram, or AVI and MP4 of the same video). This is a valid concern but adds complexity that should be addressed alongside `Figure`.
154+
- **Audio** — audio content (podcasts, sound clips, narration) has distinct rendering and accessibility requirements. A future `Audio` / `InlineAudio` node pair may be introduced following the same pattern.
155+
- **Supplementary material** — JATS distinguishes "integral" media (`<graphic>`, `<media>`) from "supplementary" material (`<supplementary-material>`). This distinction is better handled at the container or document-section level.
156+
- **Thumbnails and poster images**`VideoObject` in schema.org supports `thumbnail`; HTML `<video>` supports `poster`. These are rendering hints that may be added as optional properties later.
157+
- **Embedding and streaming**`embedUrl` (schema.org) and streaming protocols are out of scope; `url` points to a file, not a player.
158+
- **Licensing and attribution** — media objects frequently carry their own licenses (e.g. a CC-BY photograph in an otherwise CC-BY-SA document) and authorship distinct from the document's authors. JATS handles this with `<permissions>` and `<attrib>` children on `<graphic>` and `<media>`; schema.org uses `license`, `creator`, and `copyrightHolder` on `MediaObject`. A future RFC will define how licensing, attribution, and provenance metadata attach to nodes — these properties will be designed consistently across all node types that need them (images, videos, figures, code, tables, etc.), not as media-specific fields.
159+
160+
## Examples
161+
162+
### Block-Level Image
163+
164+
> A simple image in the document flow.
165+
166+
Markdown: `![A scatter plot showing correlation between variables X and Y](figures/scatter.png)`
167+
168+
```yaml
169+
{
170+
type: 'Image',
171+
url: 'figures/scatter.png',
172+
alt: 'A scatter plot showing correlation between variables X and Y',
173+
}
174+
```
175+
176+
### Inline Image (Icon in Prose)
177+
178+
> Click the settings icon {icon} to configure.
179+
180+
```yaml
181+
{
182+
type: 'Paragraph',
183+
children:
184+
[
185+
{ type: 'Text', value: 'Click the settings icon ' },
186+
{
187+
type: 'InlineImage',
188+
url: 'icons/settings.svg',
189+
alt: 'settings icon',
190+
encodingFormat: 'image/svg+xml',
191+
},
192+
{ type: 'Text', value: ' to configure.' },
193+
],
194+
}
195+
```
196+
197+
### Block-Level Video
198+
199+
> A video showing the experimental procedure.
200+
201+
```yaml
202+
{
203+
type: 'Video',
204+
url: 'supplementary/experiment-v1.mp4',
205+
alt: 'Video of the droplet formation process under varying pressure conditions',
206+
encodingFormat: 'video/mp4',
207+
}
208+
```
209+
210+
### Inline Video (Animated Demonstration)
211+
212+
> The particle follows a helical path {animation} under the applied field.
213+
214+
```yaml
215+
{
216+
type: 'Paragraph',
217+
children:
218+
[
219+
{ type: 'Text', value: 'The particle follows a helical path ' },
220+
{
221+
type: 'InlineVideo',
222+
url: 'animations/helix.webm',
223+
alt: 'particle tracing a helical path',
224+
encodingFormat: 'video/webm',
225+
},
226+
{ type: 'Text', value: ' under the applied field.' },
227+
],
228+
}
229+
```
230+
231+
### Image with Encoding Format
232+
233+
```yaml
234+
{
235+
type: 'Image',
236+
url: 'https://example.com/diagram.svg',
237+
alt: 'System architecture diagram',
238+
encodingFormat: 'image/svg+xml',
239+
}
240+
```
241+
242+
## Mapping to Existing Formats
243+
244+
| OXA Node | Markdown | HTML | JATS | schema.org |
245+
| ------------- | ------------------ | ------------------- | ------------------------ | ------------- |
246+
| `Image` | `![alt](url)` | `<img>` | `<graphic>` | `ImageObject` |
247+
| `InlineImage` | `![alt](url)` [^1] | `<img>` (in flow) | `<inline-graphic>` | `ImageObject` |
248+
| `Video` || `<video>` | `<media>` (video) | `VideoObject` |
249+
| `InlineVideo` || `<video>` (in flow) | `<inline-media>` (video) | `VideoObject` |
250+
251+
### Property Mapping
252+
253+
| OXA Property | Markdown | HTML | JATS | schema.org |
254+
| ---------------- | -------- | ------ | ----------------------------- | ----------------- |
255+
| `url` | `(url)` | `src` | `@xlink:href` | `contentUrl` |
256+
| `alt` | `[alt]` | `alt` | `<alt-text>` | `description`[^2] |
257+
| `encodingFormat` || `type` | `@mimetype` + `@mime-subtype` | `encodingFormat` |
258+
259+
## Implications
260+
261+
If accepted, this RFC:
262+
263+
- Introduces `Image`, `InlineImage`, `Video`, and `InlineVideo` as standard OXA node types
264+
- Establishes the minimal property set (`url`, `alt`, `encodingFormat`) for media references
265+
- Provides a clear mapping path from Markdown, HTML, JATS, and schema.org
266+
- Creates the primitive media nodes that a future `Figure` RFC can wrap with captions, labels, and layout semantics
267+
- Follows the block/inline naming convention from RFC0003
268+
269+
## Decision
270+
271+
Acceptance of this RFC establishes the media vocabulary for OXA schemas, providing the building blocks for representing visual and video content in a structured, interoperable way.
272+
273+
## References
274+
275+
- **JATS `<graphic>`**<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/graphic.html>
276+
- **JATS `<inline-graphic>`**<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/inline-graphic.html>
277+
- **JATS `<media>`**<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/media.html>
278+
- **JATS `<inline-media>`**<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/inline-media.html>
279+
- **schema.org `ImageObject`**<https://schema.org/ImageObject>
280+
- **schema.org `VideoObject`**<https://schema.org/VideoObject>
281+
- **HTML `<img>`**<https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img>
282+
- **HTML `<video>`**<https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video>
283+
- **CommonMark Images**<https://spec.commonmark.org/0.31.2/#images>
284+
285+
[^1]: Markdown does not syntactically distinguish block and inline images — the same `![alt](url)` syntax is used in both contexts. The block vs. inline distinction is determined by the parser based on whether the image is the sole content of a paragraph.
286+
287+
[^2]: schema.org `ImageObject` does not have a dedicated `alt` property. The closest mapping is `description` (from `Thing`). The `caption` property on `ImageObject` serves a different purpose — it is a visible caption, not accessibility alt text.

content/RFC0006/myst.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# See docs at: https://mystmd.org/guide/frontmatter
2+
version: 1
3+
extends:
4+
- ../rfc.yml
5+
project:
6+
id: 85aeebe9-1e26-4293-bfa0-b463c0ce7589
7+
short_title: Images and Media
8+
date: 2026-03-26
9+
authors:
10+
- rowanc1

0 commit comments

Comments
 (0)