Rewrite Hyperaudio JSON export as flat words/paragraphs/sections by maboa · Pull Request #297 · hyperaudio/hyperaudio-lite-editor

maboa · 2026-05-27T14:41:14Z

Summary

Replaces the nested article.section.paragraphs[].spans[] export shape with three flat arrays that are easier for downstream consumers to read, slice, and stream:

{
  "words": [
    { "start": 34.76, "end": 35.4, "text": "Test" }
  ],
  "paragraphs": [
    { "speaker": "SPEAKER_S1", "start": 34.76, "end": 44.68 }
  ],
  "sections": [
    { "start": 0, "end": 239.02, "mediaUrl": "https://example.com/audio.mp3" }
  ]
}

Times in seconds, each entry carries start and end.
Speakers live on paragraphs (turns), not on every word — recoverable by checking which paragraph contains a word's start.
sections[] captures the source-media context. mediaUrl was previously a top-level jsonData.url field; it's now per-section.

Implementation notes

htmlToJson walks <section> → <p> → <span data-m>, splits speaker spans (class="speaker") out into paragraph.speaker with brackets stripped, and inherits speaker across paragraphs that have no explicit tag (matching the editor's convention).
jsonToHtml rebuilds the editor HTML using a two-pointer walk to bucket paragraphs into sections and words into paragraphs (sorted by start, handles boundary equality cleanly). Speaker tags are emitted only on speaker change.
Legacy exports (jsonData.article present) still import via a legacyJsonToHtml fallback so older saved files keep working.
exportJson() no longer writes the top-level jsonData.url. importJson() reads from sections[0].mediaUrl, falling back to legacy jsonData.url if present.

Closes #296.

Test plan

Load the default transcript, click Export Hyperaudio JSON → confirm the file contains three flat arrays with the expected shape.
Import Hyperaudio JSON that file → transcript renders identically (words, speaker labels, paragraph breaks all preserved).
Edit a word, re-export, re-import → edits round-trip.
Import a legacy JSON export (if available) → still imports via the fallback.
Multi-paragraph transcript with multiple speaker turns → speaker labels appear at each turn boundary on import, inherited correctly within a turn.

The previous JSON shape mirrored the editor's HTML (article.section.paragraphs[].spans[]), which round-tripped but was awkward for downstream consumers. Replace with three flat arrays: words[] — { start, end, text } in seconds, in document order. paragraphs[] — { speaker, start, end } speaker turns. A paragraph without an explicit speaker tag in the HTML inherits the previous paragraph's speaker. sections[] — { start, end, mediaUrl }, one per <section> in the transcript. mediaUrl comes from the current #hyperplayer src. jsonToHtml accepts the new shape and rebuilds the editor HTML, emitting speaker tags only on speaker change (matching the editor's convention). Legacy exports (with jsonData.article) still import via a legacyJsonToHtml fallback. The top-level jsonData.url field is no longer written but is still read on import for back-compat. Closes #296.

maboa merged commit d34fcf4 into main May 27, 2026

This was referenced May 27, 2026

Pretty-print Hyperaudio JSON and drop legacy import path #299

Merged

Pretty-print Hyperaudio JSON and drop legacy import path #300

Open

296 improved json format #301

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite Hyperaudio JSON export as flat words/paragraphs/sections#297

Rewrite Hyperaudio JSON export as flat words/paragraphs/sections#297
maboa merged 1 commit into
mainfrom
296-improved-json-format

maboa commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

maboa commented May 27, 2026

Summary

Implementation notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant