First proposal for a finalized interlinear model by alex-rawlings-yyc · Pull Request #25 · sillsdev/interlinearizer-extension

alex-rawlings-yyc · 2026-04-21T22:04:31Z

Interlinear Model — Current vs. Proposed

A proposal for reworking the interlinear model. The current model
merges baseline text and analysis into a single tree and references
LCM-specific lexical types (allomorphs, grammar / MSA) via raw
GUIDs. The proposed model would split text from analysis, move
lexical references to the Lexicon extension (lexicon)
via structured ref types, promote phrases to a first-class entity,
and name a single canonical analysis per record-scope (token,
phrase, segment) while still allowing alternates.

1. Structural shift: one tree → two parallel layers

Current

Interlinearization
  └─ AnalyzedBook
       └─ Segment
            └─ Occurrence       (text + analysis mixed)
                 └─ AnalysisAssignment
                      └─ Analysis
                           └─ MorphemeBundle

Occurrence carries both the surface text and the link to its
analysis. Analyses are reusable objects shared between occurrences.
Phrase grouping is a groupId field on AnalysisAssignment.

Proposed

InterlinearAlignment
  ├─ source : InterlinearText
  └─ target : InterlinearText

InterlinearText
  ├─ books : Book[]                     ← text layer (baseline, nested)
  │    └─ Segment[]
  │         └─ Token[]
  └─ analysis : TextAnalysis            ← analysis layer (flat)
       ├─ segmentAnalyses : SegmentAnalysis[]
       ├─ tokenAnalyses   : TokenAnalysis[]
       └─ phrases         : Phrase[]

Text and analysis are peers. The text side is nested (book → segment
→ token) because that's the natural shape of scripture. The analysis
side is flat: each record carries an id reference back to its
text-layer counterpart (segmentId, tokenId). This avoids
container types that would exist solely to mirror a parent.
Consumers that need segment-local views build Map<tokenId, …> and
Map<segmentId, …> at load time.

A Token is just surface text; a TokenAnalysis carries the gloss /
parse and references its token by id. Phrases are peers of token
analyses, not a grouping flag.

Why the `TextAnalysis` wrapper?

The analysis layer is three sibling lists: segmentAnalyses,
tokenAnalyses, phrases. They could sit directly on
InterlinearText — the wrapper adds an indirection hop
(text.analysis.tokenAnalyses rather than text.tokenAnalyses). It
earns that hop by making "the analysis" a first-class noun in the
model:

Analysis is a passable value. Functions that produce or
consume analysis — external analyzers, import / export routines,
re-analysis pipelines — can take and return a single
TextAnalysis. Without the wrapper, those same APIs have to
move three parallel arrays around as a tuple or an ad-hoc object.
Attach / detach / replace is one move. Running a fresh
analysis, wiping analysis for a re-run, or swapping in the result
of an external analyzer is text.analysis = … or
text.analysis = undefined. Without the wrapper these become
three coordinated assignments, and "no analysis yet" has to be
inferred from every array being empty.
The text/analysis dichotomy is visible in the shape.
InterlinearText reads as books (text) and analysis
(analysis) — two peers. Hoisting the three arrays would blur
that distinction: the type would list one nested field plus three
sibling arrays with no visual grouping.

(Our current importers — LCM, Paratext, BT Extension — receive text
and analysis together at import time, so the "import text first,
analyze later" flow isn't load-bearing for imports. It is
load-bearing for in-session re-analysis and external-analyzer
round-trips.)

What the wrapper intentionally does not buy: multiple competing
analyses of the same text. Competing analyses are handled at the
record level (multiple TokenAnalysis entries per tokenId,
distinguished by status), not by holding multiple TextAnalysis
objects per InterlinearText.

2. Top-level container

Current	Proposed	Notes
`Interlinearization` (single-language container)	`InterlinearText`	Each side of an `InterlinearAlignment` would be an `InterlinearText`.
`InterlinearAlignment { source, target, links[] }`	`InterlinearAlignment { source, target, links[] }`	Shape preserved, but `source` / `target` would become `InterlinearText` rather than `Interlinearization`.
(no separate analysis container)	`TextAnalysis`	New — would root the parallel analysis layer inside each `InterlinearText`.

3. Text layer

Current	Proposed	Notes
`AnalyzedBook` (mixed text + analysis)	`Book`	Text-only — just baseline segments.
`Segment { occurrences: Occurrence[] }`	`Segment { tokens: Token[], baselineText: string }`	`Segment` would be text-only. `baselineText` is required (not optional) — token character offsets are expressed relative to it, so it must be present for the text layer to be interpretable.
`Occurrence { surfaceText, writingSystem, type, assignment? }`	`Token { surfaceText, writingSystem, type, charStart, charEnd }`	Rename; drop analysis field — analysis would live in the parallel layer. `charStart` / `charEnd` are zero-based character offsets within the owning `Segment.baselineText` (`charEnd` exclusive). Required for all scripts; essential for scriptio continua languages (Chinese, Thai, Tibetan, Lao, Burmese, …) where token boundaries are not derivable from whitespace. Invariant: `baselineText.slice(charStart, charEnd) === surfaceText`.
`OccurrenceType` enum	`TokenType` string literal union	Rename; converted from enum to string literal union. Values unchanged (`'word'`, `'punctuation'`).

4. Analysis layer

Current	Proposed	Notes
`Analysis` (reusable, shared) + `AnalysisAssignment` (join)	`TokenAnalysis`	Collapse. Each token would carry its own `TokenAnalysis` records (approved plus any competing alternates) — no shared-analysis indirection via a join entity.
`AnalysisAssignment.groupId` (string, phrases encoded as shared ids across assignments)	`Phrase` (first-class entity)	Phrases would no longer be a flag on an assignment — they would be a standalone record with their own `tokenIds`, `gloss`, `senseRef`, `status`, `confidence`.
`MorphemeBundle { form, allomorphRef, lexemeRef, senseRef, grammarRef }`	`Morpheme { form, entryRef?, senseRef?, allomorphRef?, grammarRef? }`	Same four refs, retyped. `lexemeRef` would be renamed to `entryRef`; allomorph and MSA refs retained — the Lexicon extension would be expected to surface the required types.
`AnalysisType` enum (`wordform` / `morph` / `gloss` / `punctuation`)	(removed)	The "type" would be implicit: presence of `morphemes` ⇒ morph-level; absence of analysis ⇒ unanalyzed; punctuation tokens would simply be omitted from `TextAnalysis.tokenAnalyses` (they would still live in the text layer's `Segment.tokens`).
`Analysis.producer` + `Analysis.sourceUser`	`producer` + `sourceUser` on `TokenAnalysis`, `Phrase`, and `SegmentAnalysis`	Kept. Provenance tag (`producer`) + human identifier (`sourceUser`) are distinct from the trust score (`confidence`), and would be carried by every analysis-layer record type.

Under the proposal, the analysis layer is flat — no
AnalyzedBook, no tokens-nested-inside-segment. TextAnalysis
would hold three sibling lists: segmentAnalyses, tokenAnalyses,
phrases. Each record would reference its text-layer counterpart
by id.

Proposed new fields worth noting:

SegmentAnalysis.freeTranslation / .literalTranslation — would
move off Segment (where they live in the current model) onto the
analysis-side SegmentAnalysis since they are analysis artifacts,
not baseline text.
TokenAnalysis.glossSenseRef — a lexicon-backed alternative to
the free-form gloss field.
TokenAnalysis.tokenSnapshot, Phrase.tokenSnapshots,
AlignmentEndpoint.tokenSnapshot — surface-text snapshots taken
at analysis/link-creation time. Would enable drift detection:
compare snapshot vs. current Token.surfaceText and flip status
to a new Stale value on mismatch.
Token.charStart / .charEnd — zero-based character offsets
within the owning Segment.baselineText (charEnd exclusive).
Required for all scripts; critical for scriptio continua languages
where word boundaries are not whitespace-delimited. Without these
fields, the tokenization decision is irrecoverable once the token
list is reconstructed from surface text alone.

5. Lexicon coupling: LCM-typed refs → Lexicon extension refs

The current model stores four GUID-style refs per MorphemeBundle
pointing at LCM lexical objects. All four would survive under the
proposal, but each would become a structured reference into the
Lexicon extension (lexicon) rather than a bare GUID
string:

Current ref	Points at	Proposed ref
`allomorphRef`	`IMoForm` (specific allomorph)	`Morpheme.allomorphRef` → `AllomorphRef`
`lexemeRef`	`ILexEntry`	`Morpheme.entryRef` → `EntryRef` → `IEntry`
`senseRef`	`ILexSense`	`Morpheme.senseRef` → `SenseRef` → `ISense`
`grammarRef`	`IMoMorphSynAnalysis` (MSA)	`Morpheme.grammarRef` → `GrammarRef`

Four new ref types are proposed:

interface EntryRef {
  entryId: string;
  projectId?: string;
}
interface SenseRef {
  senseId: string;
  projectId?: string;
}
interface AllomorphRef {
  allomorphId: string;
  projectId?: string;
}
interface GrammarRef {
  msaId: string;
  projectId?: string;
}

These would be pure identifiers — consumers would resolve them
through the Lexicon extension's lexicon.entryService network
object (typed lexicon.IEntryService, defined in
platform.bible-extension's lexicon.d.ts).

There are a few places where the proposal reaches for detail the
Lexicon extension's current public surface doesn't expose directly.
The proposed model would be usable today with workarounds, but each
of the following would make resolution more direct and is worth
coordinating with the Lexicon team on when priorities allow:

Current gap	Would benefit from
No by-id lookup on `IEntryService`	A `getEntry(projectId, entryId): Promise<IEntry>` method
No sense-level service method	A `getSense(projectId, entryId, senseId): Promise<ISense>` method
`IMoForm` not exported; no allomorph service	Exporting `IMoForm` and a `getAllomorph(projectId, entryId, allomorphId)` method
`IMoMorphSynAnalysis` not exported; no MSA service	Exporting `IMoMorphSynAnalysis` and a `getMsa(projectId, entryId, msaId)` method

In the meantime, consumers can work around these: query-and-filter
for entries by id, walk entry.senses[] for senses, inspect
IEntry.components / lexemeForm for allomorphs, and defer MSA
data until it's surfaced.

Consequence: under the proposal, the interlinear model would no
longer duplicate lexical data. Edits to an IEntry / ISense in
the Lexicon extension would propagate automatically to every token
that references them.

6. Phrases: groupId → first-class entity

Current

interface AnalysisAssignment {
  occurrenceId: string;
  analysisId: string;
  groupId?: string; // assignments sharing this id form a phrase
}

Phrases are reconstructed by grouping AnalysisAssignments with
the same groupId and dereferencing the shared Analysis.

Proposed

type Phrase = {
  id: string;
  tokenIds: [string, ...string[]]; // non-empty, ordered, may be disjoint, may cross segments
  status: AssignmentStatus; // required
  confidence?: Confidence;
  producer?: string;
  sourceUser?: string;
  tokenSnapshots?: [string, ...string[]]; // parallel to tokenIds; same length invariant
} & (
  | { gloss: MultiString; senseRef?: never }         // → free-form phrase gloss
  | { senseRef: SenseRef; gloss?: never }             // → IEntry with morphType Phrase | DiscontiguousPhrase
  | { gloss?: never; senseRef?: never }
);

Benefits:

Disjoint phrases like "ne … pas" would be a direct shape, not a
reconstruction.
A suggestion engine could write a Phrase with
status: suggested, confidence: medium; the user approves or
rejects via status.
senseRef would point at an IEntry with
morphType: Phrase (contiguous) or DiscontiguousPhrase
(disjoint) — already supported by the Lexicon extension's
MorphType enum (MiniLcm/Models/MorphType.ts).

7. Alignment endpoints

Current	Proposed
`AlignmentEndpoint { occurrenceId, bundleId? }`	`AlignmentEndpoint { tokenId, tokenAnalysisId?, morphemeId? }`

Same two-level concept (token-level vs. morpheme-level endpoint). Field names track the proposed entity renames. When morphemeId is set, tokenAnalysisId is required — because a token may have multiple competing TokenAnalysis entries, the specific TokenAnalysis that owns the referenced morpheme must be identified explicitly.

8. Proposed invariants and machinery

Not present in the current model:

Competing analyses are permitted, one is canonical. A single
segmentId would be allowed multiple SegmentAnalysis entries;
a single tokenId would be allowed multiple TokenAnalysis
entries and appearances in multiple Phrase records,
distinguished by status / confidence / producer.
Invariant: at most one SegmentAnalysis per segmentId, at
most one TokenAnalysis per tokenId, and at most one Phrase
containing a given tokenId may have status: 'approved'. The
approved record would be canonical for rendering; alternates
would live alongside for review workflows (AI-drafted back
translation vs. human edit, parser suggestion vs. human choice,
etc.). The current model allows competing AnalysisAssignments
with no such "one canonical" rule; the proposal keeps the freedom
but names the winner.
TokenAnalysis + Phrase may coexist on the same token. A
per-token parse would not be considered a competing analysis
to a phrase-level gloss; they record different things and both
would render simultaneously in the UI.
Staleness detection. Analysis records would snapshot the
token's surface text at creation time
(TokenAnalysis.tokenSnapshot, Phrase.tokenSnapshots,
AlignmentEndpoint.tokenSnapshot). When the baseline text
drifts, consumers would compare snapshot vs. current
Token.surfaceText and flip status to 'stale'. Book.textVersion
would provide a coarse-grained "something changed in this book"
signal for batching the per-token comparisons.
Punctuation is first-class in the text layer. Tokens with
type = punctuation would be stored in Segment.tokens on both
source and target so baseline text reconstructs exactly. They
would be omitted only from the analysis layer's tokenAnalyses.
Analysis layer is flat. TextAnalysis would have no
per-book or per-segment containers on the analysis side — it
would hold segmentAnalyses, tokenAnalyses, and phrases as
sibling lists keyed by id back to the text layer. Consumers
would index by id at load time for segment-local rendering.

9. Entity-by-entity summary

Current entity	Proposed entity	Change
`Interlinearization`	`InterlinearText`	Rename; add `analysis?` field.
`AnalyzedBook` (text+analysis)	`Book` (text-only)	Text side only. No analysis-side per-book container — the flat `TextAnalysis` would replace it.
`Segment`	`Segment` + `SegmentAnalysis`	Split. Free / literal translation would move to the analysis-side `SegmentAnalysis`, which would also carry `status` / `confidence` / `producer` so competing segment-level analyses are permitted under the same "one Approved per `segmentId`" invariant used for tokens and phrases. `Segment.baselineText` becomes required (was optional) — token character offsets depend on it.
`Occurrence`	`Token` + `TokenAnalysis`	Split. Rename. `Token` gains required `charStart` / `charEnd` fields (zero-based offsets within `Segment.baselineText`, `charEnd` exclusive) to record token boundaries for all scripts, especially scriptio continua languages.
`Analysis`	(merged into `TokenAnalysis`)	Remove reusability — each token would carry its own `TokenAnalysis` records (approved plus any competing alternates distinguished by `status`).
`AnalysisAssignment`	(merged into `TokenAnalysis`)	Collapse join entity.
`MorphemeBundle`	`Morpheme`	Four refs preserved (`entryRef`, `senseRef`, `allomorphRef`, `grammarRef`), but each would become a structured ref into the Lexicon extension rather than a bare GUID.
(`groupId` field)	`Phrase`	Promote to first-class.
`AnalysisType` enum	(removed)	Would be implicit in field presence.
`OccurrenceType` enum	`TokenType` string literal union	Rename; converted from enum to string literal union. Values unchanged (`'word'`, `'punctuation'`).
`Confidence` enum	`Confidence` string literal union	Converted from enum to string literal union. Values unchanged (`'high'`, `'medium'`, `'low'`, `'guess'`).
`AssignmentStatus` enum	`AssignmentStatus` string literal union + `'stale'`	Add `'stale'` value for drift-detection workflow; converted from enum to string literal union.
`ScriptureRef`, `MultiString`	(unchanged)	Keep as-is.
`InterlinearAlignment`, `AlignmentLink`	(unchanged shape)	Preserve; endpoint field names would track `Token` / `Morpheme` renames.

This change is

Summary by CodeRabbit

Refactor
- Migrated public interlinear types from a verse/cluster model to a token-first interlinear representation with richer alignment and analysis structures
- Added alignment/linking and enhanced lexicon reference support
- Reorganized internal type usage for consistency
Chores
- Expanded the spell-check dictionary with additional accepted terms

coderabbitai · 2026-04-21T22:04:39Z

Warning

Rate limit exceeded

@alex-rawlings-yyc has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 45 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 45 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9edcdfc3-64c3-481f-8c19-56771a3b1e80

📥 Commits

Reviewing files that changed from the base of the PR and between 546bd06 and 2b462bc.

📒 Files selected for processing (2)

cspell.json
src/types/interlinearizer.d.ts

📝 Walkthrough

Walkthrough

Consolidates interlinear types: interlinearXmlParser now declares and exports local XML-aligned interfaces and InterlinearData; interlinearizer.d.ts replaces the legacy XML model with a new token-based InterlinearAlignment model; interlinearizer.web-view.tsx import adjusted; cspell.json wordlist extended.

Changes

Cohort / File(s)	Summary
Parser + View import `src/parsers/interlinearXmlParser.ts`, `src/interlinearizer.web-view.tsx`	`interlinearXmlParser.ts` now declares and exports local interfaces (`StringRange`, `LexemeData`, `ClusterData`, `PunctuationData`, `VerseData`, `InterlinearData`) and updates `parse()` to return the local `InterlinearData`. `interlinearizer.web-view.tsx` import updated to import `InterlinearData` from the parser.
Public type surface `src/types/interlinearizer.d.ts`	Replaces legacy XML-aligned exported types with a new token-first interlinear model: `InterlinearAlignment`, `InterlinearText`, `Book`, `Segment`, `Token`, `TextAnalysis`, `SegmentAnalysis`, `TokenAnalysis`, `Morpheme`, `Phrase`, alignment/link types, lifecycle enums/unions, and lexicon reference types; legacy XML types removed.
Spellcheck config `cspell.json`	Extended `words` list with additional accepted terms (e.g., `BBCCCVVVV`, `Discontiguous`, `eflomal`, `morphosyntactic`, `Wordform`) and adjusted trailing punctuation to append entries.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I nibbled at types, then hopped to the core,
Strings turned to tokens, and clauses to more,
Books, segments, and glosses now dance on the floor,
XML whispers fade at the door,
I thump for the change and peek for encore. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: a redesigned interlinear data model with new top-level shapes and organizational principles.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch interlinear-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jasonleenaylor · 2026-04-22T21:00:33Z

src/types/interlinearizer.d.ts line 156 at r2 (raw file):

    /** `ISense.entryId` — the owning `IEntry`. */
    entryId: string;

This duplicates data that exists in the sense and adds a secondary reference that would need to be checked (a sense could be moved to a new entry in edge cases) I think the entryId should be excluded from this ref. I think this applies, although probably even less common, to the allomorph and grammatical info refs as well.

One consideration in however we do cross-extension-linking is the need to check for references which are removed.

alex-rawlings-yyc

@alex-rawlings-yyc made 1 comment.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on jasonleenaylor).

src/types/interlinearizer.d.ts line 156 at r2 (raw file):

Previously, jasonleenaylor (Jason Naylor) wrote…

This duplicates data that exists in the sense and adds a secondary reference that would need to be checked (a sense could be moved to a new entry in edge cases) I think the entryId should be excluded from this ref. I think this applies, although probably even less common, to the allomorph and grammatical info refs as well.

One consideration in however we do cross-extension-linking is the need to check for references which are removed.

Removed the entryIds. I'll add that to up next

coderabbitai

🧹 Nitpick comments (3)

src/types/interlinearizer.d.ts (3)
584-773: Consider modeling the gloss / sense-ref mutual exclusion in the type.

The doc comments for TokenAnalysis (Lines 555-558) and Phrase (Lines 716-718) describe gloss and glossSenseRef / senseRef as alternatives ("alternatively resolves the gloss through…"), but the type allows both to be set simultaneously — and nothing in the invariant list addresses what a consumer should render if both are present. A discriminated union ({ gloss: MultiString } | { glossSenseRef: SenseRef } on the relevant slice) would make the either/or constraint explicit and catch ambiguous records at compile time. Left as-is this is a minor ergonomic nit rather than a correctness issue — flagging as optional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/interlinearizer.d.ts` around lines 584 - 773, The TokenAnalysis and
Phrase interfaces allow both a free-form gloss and a lexicon-backed sense ref
simultaneously (gloss / glossSenseRef on TokenAnalysis, gloss / senseRef on
Phrase); change these to a discriminated union so consumers cannot set both:
replace the gloss/glossSenseRef pair in TokenAnalysis with a union like { gloss:
MultiString; glossSenseRef?: never } | { gloss?: never; glossSenseRef: SenseRef
} (and do the same for Phrase with gloss/senseRef) so the either/or constraint
is enforced at compile time while keeping existing field names (refer to
TokenAnalysis, gloss, glossSenseRef, Phrase, gloss, senseRef).
452-818: Inconsistent optionality of status across analysis/link records undermines the "at most one Approved" invariants.

AlignmentLink.status (Line 805) is required, but SegmentAnalysis.status (Line 533), TokenAnalysis.status (Line 615), and Phrase.status (Line 747) are all optional. The documented invariants in TextAnalysis (Lines 461-463, 477-479, 491-496) all key off of status === Approved, so absence of status is semantically ambiguous — is a SegmentAnalysis with no status implicitly Approved, Suggested, or something else entirely? Consumers enforcing the single-Approved invariant will have to invent a default, and different consumers are likely to pick differently.

Consider making status required on all four shapes (and on AlignmentLink — where it already is), or explicitly documenting the default when omitted. Given the invariants, required is probably the safer contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/interlinearizer.d.ts` around lines 452 - 818, The issue is that
status is optional on SegmentAnalysis, TokenAnalysis, and Phrase but required on
AlignmentLink while TextAnalysis invariants rely on comparing status ===
Approved; make status a required field on SegmentAnalysis.status,
TokenAnalysis.status, and Phrase.status (i.e., remove the optional "?" so each
interface has status: AssignmentStatus), update their JSDoc to reflect the
requiredness, adjust any consumer code/tests to set a status when constructing
these objects, and run type-checks to fix resulting compilation errors so the
single-Approved invariants in TextAnalysis can be enforced without defaults.
767-773: Phrase.tokenSnapshots parallel-array invariant is not type-enforceable; consider documenting the expectation more prominently.

When set, tokenSnapshots must be the same length as tokenIds and index-aligned (Lines 768-770). This is invisible to the type system and easy to get wrong during construction (e.g., filtering tokens out of tokenIds without filtering tokenSnapshots). Worth either (a) calling this out as an explicit "Invariant:" in the TSDoc (matching the style used in TextAnalysis), or (b) modeling snapshots alongside ids as { tokenId: string; snapshot?: string }[] so the parallelism is expressed in the type. Minor.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/interlinearizer.d.ts` around lines 767 - 773, The
Phrase.tokenSnapshots field currently relies on an unenforced parallel-array
invariant with Phrase.tokenIds; update the TSDoc for tokenSnapshots to include a
clear "Invariant:" statement (matching the style used in TextAnalysis)
specifying that tokenSnapshots, when present, must have the same length as
tokenIds and each index i corresponds to the Token.surfaceText for tokenIds[i],
and mention that consumers must maintain alignment when filtering or
transforming tokens; alternatively (optional) consider replacing
tokenSnapshots?: string[] with a structured array type like { tokenId: string;
snapshot?: string }[] to encode the association in the type itself so the
parallelism is enforced by the compiler.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/types/interlinearizer.d.ts`:
- Around line 584-773: The TokenAnalysis and Phrase interfaces allow both a
free-form gloss and a lexicon-backed sense ref simultaneously (gloss /
glossSenseRef on TokenAnalysis, gloss / senseRef on Phrase); change these to a
discriminated union so consumers cannot set both: replace the
gloss/glossSenseRef pair in TokenAnalysis with a union like { gloss:
MultiString; glossSenseRef?: never } | { gloss?: never; glossSenseRef: SenseRef
} (and do the same for Phrase with gloss/senseRef) so the either/or constraint
is enforced at compile time while keeping existing field names (refer to
TokenAnalysis, gloss, glossSenseRef, Phrase, gloss, senseRef).
- Around line 452-818: The issue is that status is optional on SegmentAnalysis,
TokenAnalysis, and Phrase but required on AlignmentLink while TextAnalysis
invariants rely on comparing status === Approved; make status a required field
on SegmentAnalysis.status, TokenAnalysis.status, and Phrase.status (i.e., remove
the optional "?" so each interface has status: AssignmentStatus), update their
JSDoc to reflect the requiredness, adjust any consumer code/tests to set a
status when constructing these objects, and run type-checks to fix resulting
compilation errors so the single-Approved invariants in TextAnalysis can be
enforced without defaults.
- Around line 767-773: The Phrase.tokenSnapshots field currently relies on an
unenforced parallel-array invariant with Phrase.tokenIds; update the TSDoc for
tokenSnapshots to include a clear "Invariant:" statement (matching the style
used in TextAnalysis) specifying that tokenSnapshots, when present, must have
the same length as tokenIds and each index i corresponds to the
Token.surfaceText for tokenIds[i], and mention that consumers must maintain
alignment when filtering or transforming tokens; alternatively (optional)
consider replacing tokenSnapshots?: string[] with a structured array type like {
tokenId: string; snapshot?: string }[] to encode the association in the type
itself so the parallelism is enforced by the compiler.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6264ba8-61c9-449b-9e94-dd5e927f1164

📥 Commits

Reviewing files that changed from the base of the PR and between 3cb8a50 and 0c1a762.

📒 Files selected for processing (2)

cspell.json
src/types/interlinearizer.d.ts

✅ Files skipped from review due to trivial changes (1)

cspell.json

* Initial commit * Added Vite/TypeScript/React and pulling papi types * Added package.json for extension because Paranext needs it * Changed extensions back to commonjs because paranext does not support es modules in production * Removed data provider info * Added scss style example * Added support for running paranext with this extension, added source map for main.ts * Fixed typo in type declaration reference * Updated to new command-line argument name * Moved vite config into vite folder * Began splitting Vite into two build steps * WebViews transpile and output into adjacent temp-vite folder * Build step 2 grabs web view files built in step 1, excluded imports from web views are now exact matches * Got two-step vite build working, changed React webviews to use global variable instead of component name to prevent tree shaking * Cleanup * Fixed build:vite, cleanup * Small fix to the readme * Some cleanup, adding comments where config is shared with paranext-core * Upgraded to latest rollup-plugin-import-manager which includes the patch we made * Used imports in web view * Added notice about import manager false positive * Updated calls to addWebView * Update rollup-plugin-import-manager to latest * Replace NODE_ENV on WebViews in Vite (#12) * Updated data provider to data type api, other updates to match existing extension code to show other features and such * Updated useData to new type api * Updated to newer data provider api * Fix vite security vulnerability (#16) * Converted papi components to use package * Bumped rollup-plugin-import-manager to fix irregular require statements * Updated template to use web view provider api * Update for split `papi.d.ts` (#20) * Upgrade TypeScript (#21) * Shared paranext extension types, improved getting papi-dts * Converted extension types to modules, removed unnecessary path aliases * Fixed extension import in main * Use extension lib types instead of dist * Replace fetch to bible-api.com with the USFM data provider (#22) * Set `.editorconfig` (#23) - also VS Code rulers and EOF * Update command syntax for papi-commands * Added more explanation for why we have public/assets/index.d.ts * Lower camel case in most ids * Fix problems seen when using the latest core branch (#25) * Update security vulnerabilities (#30) - also fix VerseRef selector - also fix Greet button - also add prettier config * Changed build tool to webpack, many other small improvements * Changed folder structure: lib to src, dist to release, and build to dist * Various fixes and tweaks found while converting paranext-core to webpack * Added DEBUG_PROD for sourcemaps in production builds * Minor update to package-lock.json * Added import that got removed in the rebase * Switched to commonjs-static modules for more predictable use * Prettier formatted all files * Changed extension dependencies to peerDependencies since the only ingestor Paranext must have them installed to be able to provide them * Removed data url support since paranext does not support them * Apply updates based on changes to core * Debug production (#34) * Extend readme with update section (#36) * Extend readme with update section * Rename papi-commands to papi-shared-types * Update deps and prettier config (#38) * Setup linting rules * Configure eslint setup. Way too many packages installed though * Remove unnecessary packages * Remove .eslintcache * Add .eslintcache to gitignore * Remove contents of template * Sort contents of package.json * Review comments * Delete eslintcache * Add eslintcache to gitignore * Processed review comments * Add empty browserlist to package.json * Update readme * Rename types file * Revert changes to paranext-extension-template naming * Change types file extension * Fix broken links to types file * Bump postcss from 8.4.29 to 8.4.31 (#46) * Update webpack.config.base.ts * Bump @babel/traverse from 7.22.11 to 7.23.2 (#47) Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.22.11 to 7.23.2. - [Release notes](https://github.com/babel/babel/releases) - [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md) - [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse) --- updated-dependencies: - dependency-name: "@babel/traverse" dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Revert "Update webpack.config.base.ts" This reverts commit 5eeb763. * Allow data urls * update import from papi-backend, and add papi-frontend/react to webpack externals * Added cache/extension-types to typeRoots for easy sharing extension types * Add ESLint rule for flagging use of "null" * Refactored imports * Prep for use in multi-extension-template - moved files, synced formatting settings with core, added more template instructions to readme * Changed Paranext to Platform.Bible in various places, misc improvements * Added warning about editing update merge commit history * Moved Special features description to this repo for lower frequency of merge conflicts * Genericized license * Added 'shared with' statements to various style files and brought them in sync * Removed no-non-null-assertion as it is covered by no-type-assertion * Fixed emotion package duplication * Reworked explanation for package aliases. Also removed note about splitting into its own repo as this problem will not be solved as long as the package used is local * npm updates (#59) - also add Volta node version * Add contributions folder for menus.json * Added settings and project settings contribution files * Fixed swc bindings not found error * security update `@sillsdev/scripture` (#63) - also update other npm packages * Updated to node 20.11.1 LTS * update all npm packages (#65) - except `typescript` and `@typescript-eslint/*` * Fixed dts self-import * Added localized string contribution doc * Fixed typo in contributions path * Bump braces from 3.0.2 to 3.0.3 (#69) Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. - [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md) - [Commits](micromatch/braces@3.0.2...3.0.3) --- updated-dependencies: - dependency-name: braces dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: added displayData.json file for localization * fix: adjusted displayData.json to have localizedDisplayInfo field * update `@sillsdev/scripture` (#71) - enable full serialization that can now include verse ranges, sequences, and segments. * refactor: adjusted file structure for extension descriptions * feat: added 'moreInfoUrl' and 'supportUrl' to manifest * refactor: adjusted structure for fields in manifest.json * fix: changed elevatedPrivileges to an array * update `@sillsdev/scripture` (#74) - fix .NET deserialization * #481: Changed a couple places in files where we descriptions that refer to Paranext (#75) * Removed outdated change description line, add comment template info section and instructions to avoid merge conflicts * Bump webpack from 5.91.0 to 5.94.0 (#78) Bumps [webpack](https://github.com/webpack/webpack) from 5.91.0 to 5.94.0. - [Release notes](https://github.com/webpack/webpack/releases) - [Commits](webpack/webpack@v5.91.0...v5.94.0) --- updated-dependencies: - dependency-name: webpack dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Clarify readme uncommenting directions * update dependencies (#80) - also ran `npx update-browserslist-db@latest` * Added tailwind to extension template * Fixed template info comment not closing properly * Updated sass * Moved LIBRARY_TYPE to match multi template * update dependencies (#83) - also fix stylelint that was failing on `@tailwind` * Changed tailwind file to .css * provide for type checking (#76) - use `noImplicitReturns` instead of `consistent-return` * improve linting (#85) - use the TS version of `class-methods-use-this` so we can ignore override methods. * add missing format checks (#86) - let prettier format CSS & SCSS - also `npm audit fix` * Update tailwind.css (#87) follow up of - intensify secondary color in platform theme #1360 * update dependencies (#88) - `npm audit fix` - update minor and patch dependencies * #1587: Changed (c) to © in LICENSE file (for consistency) (#89) * Added publisher to manifest * update dependencies (#91) * fix linting (#92) - remove jest (no longer used) - remove duplicated eslint rule * Changed main file target so it understands built-in modules except crypto are not available * Adjusted extension name guidance - lowerCamelCase and kebab-case * Remove MUI * Update .eslintrc.js from paranext-core * PT-2390: Minor wording improvements to README (#96) * PT-2390: Minor wording improvements to README * PT-2390: Consolidated repetitive instructions. Wording improvements * PT-2390: Made it clear that updating a newly cloned extension template repo *should* be updated from the template before starting development. * Set up auto-release workflow (#99) * Set up auto-release workflow * Added explanation of .github files * Removed line numbers from file link urls * Fixed out-of-order readme section (#100) * Added consistent return in bump-versions (#101) * Small fixes to auto-release process (#102) * PT-1886: Set up theme contributions, fixed raw file imports not working properly (#103) Set up theme contributions, fixed raw file imports not working properly * Add "remote-allow-origins" to launch.json to allow external debugging * Add codeql.yml to run CodeQL manually (#105) * Add github action for CI with lint and format check (#106) * Add github action for CI with lint and format check * Fix lint.yml GitHub workflow and run npm install to update package-lock.json (#107) * Update Tailwind prose colors * In package.json sripts, update lint:scripts, lint:styles (#109) Matching the multi extension template * Swapped built-in logs to debug since we already have logs that cover this in core; they are just sample actions (#110) * Use standard 'WebView' in comments * PT-2390: Clarify template use in README.md (#112) * Added CODEOWNERS matching our code stewardship list (#113) * PT-2390: README improvements (#114) * PT-2390: Clarify template use in README.md * Minor formatting/capitalization/wording tweaks and addition of some content and HTML comments to bring README.md into sync with the one in paranext-multi-extension-template This will also make it easier to keep the two versions appropriately in sync in the future * Improved wording about editing the README itself when basing an extension on the template * Extended the comment in paranext-extension-template.d.ts to point to the useful section of the wiki page * Fixed typo and added link to wiki in Summary section. Reverted: Extended the comment in paranext-extension-template.d.ts to point to the useful section of the wiki page * Removed SYNC points in README. * Fix lint issue and updated package-lock.json * PT-3107: Improve release workflow (#116) * PT-3107: Improve release workflow (#44) * Fix typo * Fixed typo: proceeding (#117) * Pinned @swc/core to 1.10.18 to fix native binding issue (#118) * Bump versions of node and swc to align with core (#119) * PT-3107: Fixed path to bump versions action in Publish workflow (#121) Fixed path to bump versions action in Publish workflow * Reworked readme to link to correct explanation of swc/core problems, updated readme with bump-versions.yml (#122) Reworked readme to link to correct explanation of swc/core problems * PT-3049 Align tailwind.css with index.css (#95) * Update popover color in ext template * Update shared regions of tailwind.css * Updated themes from core and shadcn * Re-add tw- prefix --------- Co-authored-by: tjcouch-sil <tj_couch@sil.org> * Ran "npm audit fix", "npm prune", and "npx update-browserslist-db@latest" (#123) * fix dependencies (#127) - run `npm audit fix` - bump `glob` version * Added TJ and Ira to every CODEOWNER line Matt is currently on (#128) * Sync a few config files with their counterparts in core (#129) * update vulnerable deps (#133) - upgrade lock file to v3 - ran `npm audit fix` - Bump lodash from 4.17.21 to 4.17.23 - Bump webpack from 5.97.1 to 5.105.2 * Fix all npm audit vulnerabilities by upgrading dependencies (#136) - Upgrade @typescript-eslint/* v6 → v8 to fix minimatch ReDoS vulnerabilities - Upgrade copy-webpack-plugin v12 → v14 to fix serialize-javascript RCE vulnerability - Replace eslint-config-erb with direct airbnb, promise, compat, and prettier configs to remove peer dependency conflicts with @typescript-eslint v8 - Add @stylistic/eslint-plugin-ts for lines-between-class-members rule with exceptAfterOverload support (removed from @typescript-eslint v8) - Add allowJs to tsconfig.lint.json to replace removed createDefaultProgram parser option - Remove jest integration; extensions that use jest will need to add eslint-plugin-jest themselves Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Eslint naming convention (#137) - PR #550 suggested a lint check for type naming conventions * update vulnerable deps (#139) - `npm audit fix` * Bump picomatch (#140) Bumps and [picomatch](https://github.com/micromatch/picomatch). These dependencies needed to be updated together. Updates `picomatch` from 4.0.3 to 4.0.4 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@4.0.3...4.0.4) Updates `picomatch` from 2.3.1 to 2.3.2 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@4.0.3...4.0.4) --- updated-dependencies: - dependency-name: picomatch dependency-version: 4.0.4 dependency-type: indirect - dependency-name: picomatch dependency-version: 2.3.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump serialize-javascript from 7.0.4 to 7.0.5 (#141) Bumps [serialize-javascript](https://github.com/yahoo/serialize-javascript) from 7.0.4 to 7.0.5. - [Release notes](https://github.com/yahoo/serialize-javascript/releases) - [Commits](yahoo/serialize-javascript@v7.0.4...v7.0.5) --- updated-dependencies: - dependency-name: serialize-javascript dependency-version: 7.0.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix vulnerable dependencies (#142) - `npm aduit fix` * Added repoRoot (propagated changes from multi-template) (#143) Propagated changes from mutli-template * Add dark mode selector to match paranext-core * fix vulnerable dependencies (#152) - `npm audit fix` - update `@sillsdev/scripture` * Finish merge resolution; Fix linting --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: TJ Couch <104016682+tjcouch-sil@users.noreply.github.com> Co-authored-by: tjcouch-sil <tj_couch@sil.org> Co-authored-by: FoolRunning <foolrunning@gmail.com> Co-authored-by: Ira Hopkinson <irahopkinson@users.noreply.github.com> Co-authored-by: Matt Lyons <matt_lyons@sil.org> Co-authored-by: rolfheij-sil <108285668+rolfheij-sil@users.noreply.github.com> Co-authored-by: rolfheij-sil <rolf_heij@sil.org> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jolie Rabideau <jolierabideau@Jolies-MacBook-Pro.local> Co-authored-by: jolierabideau <135999578+jolierabideau@users.noreply.github.com> Co-authored-by: Danny Hammer <denialhammer@gmail.com> Co-authored-by: Danny Hammer <HammerAPI@gmail.com> Co-authored-by: Tom Bogle <tom_bogle@sil.org> Co-authored-by: Katherine Jensen <katherine_jensen@sil.org> Co-authored-by: Sebastian-ubs <141921979+Sebastian-ubs@users.noreply.github.com> Co-authored-by: Jason Naylor <jasonleenaylor@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

alex-rawlings-yyc · 2026-04-23T16:56:33Z

@coderabbitai review

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/types/interlinearizer.d.ts`:
- Around line 48-51: The doc comments in src/types/interlinearizer.d.ts use
enum-style capitalized names (e.g., Approved, Suggested, Stale) but the public
type AssignmentStatus is a lower-case string union; update the comments to use
the exact lower-case string literals (e.g., 'approved', 'suggested', 'stale')
wherever AssignmentStatus or status is mentioned (including the shown block and
the other occurrences around lines 447-465, 611-613, 836-838), and ensure
references to Token.surfaceText and status comparison mention the exact literal
strings to avoid misleading consumers.
- Around line 459-464: The current model allows multiple TokenAnalysis entries
per tokenId but AlignmentEndpoint only references tokenId + morphemeId, making
morpheme ownership ambiguous; update the types so AlignmentEndpoint (and any
other alignment structures) include tokenAnalysisId whenever morphemeId is
present (i.e., require { tokenId, tokenAnalysisId, morphemeId } for
morpheme-level endpoints) or alternatively enforce globally unique Morpheme.id
and document that invariant in TextAnalysis/TokenAnalysis; modify the
declarations around TextAnalysis, TokenAnalysis, AlignmentEndpoint and Morpheme
(and corresponding comments at the mentioned regions) to reflect the chosen
approach so morpheme-level links unambiguously identify the owning
TokenAnalysis.
- Line 336: The current id: string (used by flat analysis/phrase/alignment
references) is ambiguous because segmentId/tokenId may be reused per
book/verse/segment; update the type definitions in
src/types/interlinearizer.d.ts (where id appears around the flat analysis,
phrase and alignment layer types at the spots corresponding to the current ids
near lines ~336 and ~391) to either (a) document and enforce that these ids are
globally unique within the owning InterlinearText (e.g., change the
comment/description to state “unique within InterlinearText”) or (b) expand the
reference shapes to include their parent scope (e.g., include
parentBookId/verseId/segmentId fields alongside tokenId/segmentId) so references
are unambiguous; pick one approach and update the related type declarations and
inline JSDoc comments for InterlinearText, segmentId, and tokenId accordingly.
- Around line 722-755: The tokenIds array currently allows an empty phrase;
change the type of tokenIds in the Phrase-ish interface to a non-empty tuple
(e.g. [string, ...string[]]) to enforce at least one member, and mirror that
change for tokenSnapshots (tokenSnapshots?: [string, ...string[]]) so the
compile-time invariant about matching lengths is preserved; update any code that
constructs or validates phrases (places referencing tokenIds, tokenSnapshots,
and the interface name in src/types/interlinearizer.d.ts) to handle the tuple
shape instead of a plain string[].
- Around line 91-100: Update the documentation to explicitly state that all
character offsets are measured in JavaScript UTF-16 code units: update the
ScriptureRef interface comment for charIndex, the Token documentation that
mentions charStart/charEnd, and the charStart/charEnd property comments
(referencing the Token and Segment usages) to say "UTF-16 code units (JavaScript
string index/ slice semantics)" so the invariant
Segment.baselineText.slice(charStart, charEnd) === surfaceText is accurate and
callers won't assume code points or grapheme clusters.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6db2165-cc1c-4883-9c27-b169837cdd53

📥 Commits

Reviewing files that changed from the base of the PR and between 0c1a762 and 546bd06.

📒 Files selected for processing (1)

src/types/interlinearizer.d.ts

alex-rawlings-yyc

@alex-rawlings-yyc resolved 5 discussions.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

…ntinua - Both are non-optional because computation is trivial and keeps our options open for user-defined token boundaries in whitespace languages if ever desired - Updated cspell

…lly exclusive with gloss/sense from lexicon extension, clarify invariance requirement for tokenSnapshots

- `enum` values are inlined by `tsc` at compile time. The Platform.Bible build pipeline uses SWC, which processes files in isolation and cannot inline `enum` values from `.d.ts` declarations.

… it clear that we're working with UTF-16 offset units, define id scope used by flat analysis references, disambiguate morpheme-level alignment endpoints, prevent empty Phrase membership

jasonleenaylor · 2026-04-23T18:28:07Z

I think specifying utf-16 for the character offset may be problematic.

We will need to have some logic to prevent splitting surrogate pairs, saying utf-32 may not be correct either. We should probably just leave it saying character in the model and see how it works out in practice before encoding a solution in the comments.

alex-rawlings-yyc

Sounds good. Reverting

@alex-rawlings-yyc made 1 comment.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 6 files and all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

imnasnainaec

@imnasnainaec reviewed 5 files and all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

alex-rawlings-yyc · 2026-04-23T20:40:13Z

@jasonleenaylor this PR is marked as ready to merge, but I want to make sure that we're actually in agreement about the model's current form before merging.

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 3 files and all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

jasonleenaylor

@jasonleenaylor reviewed 6 files and all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

alex-rawlings-yyc linked an issue Apr 21, 2026 that may be closed by this pull request

Finalize interlinear model 🚧 #14

Closed

alex-rawlings-yyc force-pushed the interlinear-model branch from c1ca763 to 4592089 Compare April 21, 2026 22:09

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

alex-rawlings-yyc commented Apr 22, 2026

View reviewed changes

Comment thread src/types/interlinearizer.d.ts Outdated

alex-rawlings-yyc commented Apr 22, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

This comment was marked as spam.

Sign in to view

This comment was marked as duplicate.

Sign in to view

imnasnainaec assigned alex-rawlings-yyc Apr 23, 2026

This comment was marked as outdated.

Sign in to view

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread src/types/interlinearizer.d.ts

Comment thread src/types/interlinearizer.d.ts

Comment thread src/types/interlinearizer.d.ts

Comment thread src/types/interlinearizer.d.ts

Comment thread src/types/interlinearizer.d.ts Outdated

alex-rawlings-yyc commented Apr 23, 2026

View reviewed changes

alex-rawlings-yyc added 7 commits April 23, 2026 11:50

First proposal for a finalized interlinear model

8e6f040

Adjusted JSDoc, made enums const

cb9ec51

Add charStart and charEnd to Token in order to facilitate scriptio co…

60b8291

…ntinua - Both are non-optional because computation is trivial and keeps our options open for user-defined token boundaries in whitespace languages if ever desired - Updated cspell

Remove entryIds from where it could cause problems

8f22868

status field now required everywhere present, local gloss now mutua…

c1880b4

…lly exclusive with gloss/sense from lexicon extension, clarify invariance requirement for tokenSnapshots

Convert enums to string literal unions

9992935

- `enum` values are inlined by `tsc` at compile time. The Platform.Bible build pipeline uses SWC, which processes files in isolation and cannot inline `enum` values from `.d.ts` declarations.

Update docs to reflect new string literal union instead of enum, make…

9cb9fb2

… it clear that we're working with UTF-16 offset units, define id scope used by flat analysis references, disambiguate morpheme-level alignment endpoints, prevent empty Phrase membership

alex-rawlings-yyc force-pushed the interlinear-model branch from cdd418b to 9cb9fb2 Compare April 23, 2026 17:51

Revert specification of offset characters as UTF-16 code units

a1edbb4

alex-rawlings-yyc commented Apr 23, 2026

View reviewed changes

imnasnainaec reviewed Apr 23, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into interlinear-model

2b462bc

alex-rawlings-yyc commented Apr 23, 2026

View reviewed changes

jasonleenaylor approved these changes Apr 23, 2026

View reviewed changes

alex-rawlings-yyc merged commit 3340605 into main Apr 23, 2026
8 checks passed

alex-rawlings-yyc deleted the interlinear-model branch April 23, 2026 21:06

This was referenced Apr 24, 2026

Add USJ book parsing pipeline and interlinearizer display WebView #30

Merged

Transform ESM rather than mock #31

Closed

coderabbitai Bot mentioned this pull request May 4, 2026

Create continuous scroll view #40

Draft

Uh oh!

Conversation

alex-rawlings-yyc commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Interlinear Model — Current vs. Proposed

1. Structural shift: one tree → two parallel layers

Current

Proposed

Why the TextAnalysis wrapper?

2. Top-level container

3. Text layer

4. Analysis layer

5. Lexicon coupling: LCM-typed refs → Lexicon extension refs

6. Phrases: groupId → first-class entity

Current

Proposed

7. Alignment endpoints

8. Proposed invariants and machinery

9. Entity-by-entity summary

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as resolved.

Uh oh!

Uh oh!

jasonleenaylor commented Apr 22, 2026

Uh oh!

alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as spam.

This comment was marked as duplicate.

alex-rawlings-yyc commented Apr 23, 2026

Uh oh!

This comment was marked as outdated.

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

Uh oh!

jasonleenaylor commented Apr 23, 2026

Uh oh!

alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

Uh oh!

alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

Uh oh!

imnasnainaec left a comment

Choose a reason for hiding this comment

Uh oh!

alex-rawlings-yyc commented Apr 23, 2026

Uh oh!

alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

Uh oh!

jasonleenaylor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

alex-rawlings-yyc commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

Why the `TextAnalysis` wrapper?

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading