Skip to content

First proposal for a finalized interlinear model#25

Merged
alex-rawlings-yyc merged 9 commits intomainfrom
interlinear-model
Apr 23, 2026
Merged

First proposal for a finalized interlinear model#25
alex-rawlings-yyc merged 9 commits intomainfrom
interlinear-model

Conversation

@alex-rawlings-yyc
Copy link
Copy Markdown
Contributor

@alex-rawlings-yyc alex-rawlings-yyc commented Apr 21, 2026

Interlinear Model — Current vs. Proposed

A proposal for reworking the interlinear model. The current model
merges baseline text and analysis into a single tree and references
LCM-specific lexical types (allomorphs, grammar / MSA) via raw
GUIDs. The proposed model would split text from analysis, move
lexical references to the Lexicon extension (lexicon)
via structured ref types, promote phrases to a first-class entity,
and name a single canonical analysis per record-scope (token,
phrase, segment) while still allowing alternates.


1. Structural shift: one tree → two parallel layers

Current

Interlinearization
  └─ AnalyzedBook
       └─ Segment
            └─ Occurrence       (text + analysis mixed)
                 └─ AnalysisAssignment
                      └─ Analysis
                           └─ MorphemeBundle

Occurrence carries both the surface text and the link to its
analysis. Analyses are reusable objects shared between occurrences.
Phrase grouping is a groupId field on AnalysisAssignment.

Proposed

InterlinearAlignment
  ├─ source : InterlinearText
  └─ target : InterlinearText

InterlinearText
  ├─ books : Book[]                     ← text layer (baseline, nested)
  │    └─ Segment[]
  │         └─ Token[]
  └─ analysis : TextAnalysis            ← analysis layer (flat)
       ├─ segmentAnalyses : SegmentAnalysis[]
       ├─ tokenAnalyses   : TokenAnalysis[]
       └─ phrases         : Phrase[]

Text and analysis are peers. The text side is nested (book → segment
→ token) because that's the natural shape of scripture. The analysis
side is flat: each record carries an id reference back to its
text-layer counterpart (segmentId, tokenId). This avoids
container types that would exist solely to mirror a parent.
Consumers that need segment-local views build Map<tokenId, …> and
Map<segmentId, …> at load time.

A Token is just surface text; a TokenAnalysis carries the gloss /
parse and references its token by id. Phrases are peers of token
analyses, not a grouping flag.

Why the TextAnalysis wrapper?

The analysis layer is three sibling lists: segmentAnalyses,
tokenAnalyses, phrases. They could sit directly on
InterlinearText — the wrapper adds an indirection hop
(text.analysis.tokenAnalyses rather than text.tokenAnalyses). It
earns that hop by making "the analysis" a first-class noun in the
model:

  1. Analysis is a passable value. Functions that produce or
    consume analysis — external analyzers, import / export routines,
    re-analysis pipelines — can take and return a single
    TextAnalysis. Without the wrapper, those same APIs have to
    move three parallel arrays around as a tuple or an ad-hoc object.
  2. Attach / detach / replace is one move. Running a fresh
    analysis, wiping analysis for a re-run, or swapping in the result
    of an external analyzer is text.analysis = … or
    text.analysis = undefined. Without the wrapper these become
    three coordinated assignments, and "no analysis yet" has to be
    inferred from every array being empty.
  3. The text/analysis dichotomy is visible in the shape.
    InterlinearText reads as books (text) and analysis
    (analysis) — two peers. Hoisting the three arrays would blur
    that distinction: the type would list one nested field plus three
    sibling arrays with no visual grouping.

(Our current importers — LCM, Paratext, BT Extension — receive text
and analysis together at import time, so the "import text first,
analyze later" flow isn't load-bearing for imports. It is
load-bearing for in-session re-analysis and external-analyzer
round-trips.)

What the wrapper intentionally does not buy: multiple competing
analyses of the same text. Competing analyses are handled at the
record level (multiple TokenAnalysis entries per tokenId,
distinguished by status), not by holding multiple TextAnalysis
objects per InterlinearText.


2. Top-level container

Current Proposed Notes
Interlinearization (single-language container) InterlinearText Each side of an InterlinearAlignment would be an InterlinearText.
InterlinearAlignment { source, target, links[] } InterlinearAlignment { source, target, links[] } Shape preserved, but source / target would become InterlinearText rather than Interlinearization.
(no separate analysis container) TextAnalysis New — would root the parallel analysis layer inside each InterlinearText.

3. Text layer

Current Proposed Notes
AnalyzedBook (mixed text + analysis) Book Text-only — just baseline segments.
Segment { occurrences: Occurrence[] } Segment { tokens: Token[], baselineText: string } Segment would be text-only. baselineText is required (not optional) — token character offsets are expressed relative to it, so it must be present for the text layer to be interpretable.
Occurrence { surfaceText, writingSystem, type, assignment? } Token { surfaceText, writingSystem, type, charStart, charEnd } Rename; drop analysis field — analysis would live in the parallel layer. charStart / charEnd are zero-based character offsets within the owning Segment.baselineText (charEnd exclusive). Required for all scripts; essential for scriptio continua languages (Chinese, Thai, Tibetan, Lao, Burmese, …) where token boundaries are not derivable from whitespace. Invariant: baselineText.slice(charStart, charEnd) === surfaceText.
OccurrenceType enum TokenType string literal union Rename; converted from enum to string literal union. Values unchanged ('word', 'punctuation').

4. Analysis layer

Current Proposed Notes
Analysis (reusable, shared) + AnalysisAssignment (join) TokenAnalysis Collapse. Each token would carry its own TokenAnalysis records (approved plus any competing alternates) — no shared-analysis indirection via a join entity.
AnalysisAssignment.groupId (string, phrases encoded as shared ids across assignments) Phrase (first-class entity) Phrases would no longer be a flag on an assignment — they would be a standalone record with their own tokenIds, gloss, senseRef, status, confidence.
MorphemeBundle { form, allomorphRef, lexemeRef, senseRef, grammarRef } Morpheme { form, entryRef?, senseRef?, allomorphRef?, grammarRef? } Same four refs, retyped. lexemeRef would be renamed to entryRef; allomorph and MSA refs retained — the Lexicon extension would be expected to surface the required types.
AnalysisType enum (wordform / morph / gloss / punctuation) (removed) The "type" would be implicit: presence of morphemes ⇒ morph-level; absence of analysis ⇒ unanalyzed; punctuation tokens would simply be omitted from TextAnalysis.tokenAnalyses (they would still live in the text layer's Segment.tokens).
Analysis.producer + Analysis.sourceUser producer + sourceUser on TokenAnalysis, Phrase, and SegmentAnalysis Kept. Provenance tag (producer) + human identifier (sourceUser) are distinct from the trust score (confidence), and would be carried by every analysis-layer record type.

Under the proposal, the analysis layer is flat — no
AnalyzedBook, no tokens-nested-inside-segment. TextAnalysis
would hold three sibling lists: segmentAnalyses, tokenAnalyses,
phrases. Each record would reference its text-layer counterpart
by id.

Proposed new fields worth noting:

  • SegmentAnalysis.freeTranslation / .literalTranslation — would
    move off Segment (where they live in the current model) onto the
    analysis-side SegmentAnalysis since they are analysis artifacts,
    not baseline text.
  • TokenAnalysis.glossSenseRef — a lexicon-backed alternative to
    the free-form gloss field.
  • TokenAnalysis.tokenSnapshot, Phrase.tokenSnapshots,
    AlignmentEndpoint.tokenSnapshot — surface-text snapshots taken
    at analysis/link-creation time. Would enable drift detection:
    compare snapshot vs. current Token.surfaceText and flip status
    to a new Stale value on mismatch.
  • Token.charStart / .charEnd — zero-based character offsets
    within the owning Segment.baselineText (charEnd exclusive).
    Required for all scripts; critical for scriptio continua languages
    where word boundaries are not whitespace-delimited. Without these
    fields, the tokenization decision is irrecoverable once the token
    list is reconstructed from surface text alone.

5. Lexicon coupling: LCM-typed refs → Lexicon extension refs

The current model stores four GUID-style refs per MorphemeBundle
pointing at LCM lexical objects. All four would survive under the
proposal, but each would become a structured reference into the
Lexicon extension (lexicon) rather than a bare GUID
string:

Current ref Points at Proposed ref
allomorphRef IMoForm (specific allomorph) Morpheme.allomorphRefAllomorphRef
lexemeRef ILexEntry Morpheme.entryRefEntryRefIEntry
senseRef ILexSense Morpheme.senseRefSenseRefISense
grammarRef IMoMorphSynAnalysis (MSA) Morpheme.grammarRefGrammarRef

Four new ref types are proposed:

interface EntryRef {
  entryId: string;
  projectId?: string;
}
interface SenseRef {
  senseId: string;
  projectId?: string;
}
interface AllomorphRef {
  allomorphId: string;
  projectId?: string;
}
interface GrammarRef {
  msaId: string;
  projectId?: string;
}

These would be pure identifiers — consumers would resolve them
through the Lexicon extension's lexicon.entryService network
object (typed lexicon.IEntryService, defined in
platform.bible-extension's lexicon.d.ts).

There are a few places where the proposal reaches for detail the
Lexicon extension's current public surface doesn't expose directly.
The proposed model would be usable today with workarounds, but each
of the following would make resolution more direct and is worth
coordinating with the Lexicon team on when priorities allow:

Current gap Would benefit from
No by-id lookup on IEntryService A getEntry(projectId, entryId): Promise<IEntry> method
No sense-level service method A getSense(projectId, entryId, senseId): Promise<ISense> method
IMoForm not exported; no allomorph service Exporting IMoForm and a getAllomorph(projectId, entryId, allomorphId) method
IMoMorphSynAnalysis not exported; no MSA service Exporting IMoMorphSynAnalysis and a getMsa(projectId, entryId, msaId) method

In the meantime, consumers can work around these: query-and-filter
for entries by id, walk entry.senses[] for senses, inspect
IEntry.components / lexemeForm for allomorphs, and defer MSA
data until it's surfaced.

Consequence: under the proposal, the interlinear model would no
longer duplicate lexical data. Edits to an IEntry / ISense in
the Lexicon extension would propagate automatically to every token
that references them.


6. Phrases: groupId → first-class entity

Current

interface AnalysisAssignment {
  occurrenceId: string;
  analysisId: string;
  groupId?: string; // assignments sharing this id form a phrase
}

Phrases are reconstructed by grouping AnalysisAssignments with
the same groupId and dereferencing the shared Analysis.

Proposed

type Phrase = {
  id: string;
  tokenIds: [string, ...string[]]; // non-empty, ordered, may be disjoint, may cross segments
  status: AssignmentStatus; // required
  confidence?: Confidence;
  producer?: string;
  sourceUser?: string;
  tokenSnapshots?: [string, ...string[]]; // parallel to tokenIds; same length invariant
} & (
  | { gloss: MultiString; senseRef?: never }         // → free-form phrase gloss
  | { senseRef: SenseRef; gloss?: never }             // → IEntry with morphType Phrase | DiscontiguousPhrase
  | { gloss?: never; senseRef?: never }
);

Benefits:

  • Disjoint phrases like "ne … pas" would be a direct shape, not a
    reconstruction.
  • A suggestion engine could write a Phrase with
    status: suggested, confidence: medium; the user approves or
    rejects via status.
  • senseRef would point at an IEntry with
    morphType: Phrase (contiguous) or DiscontiguousPhrase
    (disjoint) — already supported by the Lexicon extension's
    MorphType enum (MiniLcm/Models/MorphType.ts).

7. Alignment endpoints

Current Proposed
AlignmentEndpoint { occurrenceId, bundleId? } AlignmentEndpoint { tokenId, tokenAnalysisId?, morphemeId? }

Same two-level concept (token-level vs. morpheme-level endpoint). Field names track the proposed entity renames. When morphemeId is set, tokenAnalysisId is required — because a token may have multiple competing TokenAnalysis entries, the specific TokenAnalysis that owns the referenced morpheme must be identified explicitly.


8. Proposed invariants and machinery

Not present in the current model:

  1. Competing analyses are permitted, one is canonical. A single
    segmentId would be allowed multiple SegmentAnalysis entries;
    a single tokenId would be allowed multiple TokenAnalysis
    entries and appearances in multiple Phrase records,
    distinguished by status / confidence / producer.
    Invariant: at most one SegmentAnalysis per segmentId, at
    most one TokenAnalysis per tokenId, and at most one Phrase
    containing a given tokenId may have status: 'approved'. The
    approved record would be canonical for rendering; alternates
    would live alongside for review workflows (AI-drafted back
    translation vs. human edit, parser suggestion vs. human choice,
    etc.). The current model allows competing AnalysisAssignments
    with no such "one canonical" rule; the proposal keeps the freedom
    but names the winner.
  2. TokenAnalysis + Phrase may coexist on the same token. A
    per-token parse would not be considered a competing analysis
    to a phrase-level gloss; they record different things and both
    would render simultaneously in the UI.
  3. Staleness detection. Analysis records would snapshot the
    token's surface text at creation time
    (TokenAnalysis.tokenSnapshot, Phrase.tokenSnapshots,
    AlignmentEndpoint.tokenSnapshot). When the baseline text
    drifts, consumers would compare snapshot vs. current
    Token.surfaceText and flip status to 'stale'. Book.textVersion
    would provide a coarse-grained "something changed in this book"
    signal for batching the per-token comparisons.
  4. Punctuation is first-class in the text layer. Tokens with
    type = punctuation would be stored in Segment.tokens on both
    source and target so baseline text reconstructs exactly. They
    would be omitted only from the analysis layer's tokenAnalyses.
  5. Analysis layer is flat. TextAnalysis would have no
    per-book or per-segment containers on the analysis side — it
    would hold segmentAnalyses, tokenAnalyses, and phrases as
    sibling lists keyed by id back to the text layer. Consumers
    would index by id at load time for segment-local rendering.

9. Entity-by-entity summary

Current entity Proposed entity Change
Interlinearization InterlinearText Rename; add analysis? field.
AnalyzedBook (text+analysis) Book (text-only) Text side only. No analysis-side per-book container — the flat TextAnalysis would replace it.
Segment Segment + SegmentAnalysis Split. Free / literal translation would move to the analysis-side SegmentAnalysis, which would also carry status / confidence / producer so competing segment-level analyses are permitted under the same "one Approved per segmentId" invariant used for tokens and phrases. Segment.baselineText becomes required (was optional) — token character offsets depend on it.
Occurrence Token + TokenAnalysis Split. Rename. Token gains required charStart / charEnd fields (zero-based offsets within Segment.baselineText, charEnd exclusive) to record token boundaries for all scripts, especially scriptio continua languages.
Analysis (merged into TokenAnalysis) Remove reusability — each token would carry its own TokenAnalysis records (approved plus any competing alternates distinguished by status).
AnalysisAssignment (merged into TokenAnalysis) Collapse join entity.
MorphemeBundle Morpheme Four refs preserved (entryRef, senseRef, allomorphRef, grammarRef), but each would become a structured ref into the Lexicon extension rather than a bare GUID.
(groupId field) Phrase Promote to first-class.
AnalysisType enum (removed) Would be implicit in field presence.
OccurrenceType enum TokenType string literal union Rename; converted from enum to string literal union. Values unchanged ('word', 'punctuation').
Confidence enum Confidence string literal union Converted from enum to string literal union. Values unchanged ('high', 'medium', 'low', 'guess').
AssignmentStatus enum AssignmentStatus string literal union + 'stale' Add 'stale' value for drift-detection workflow; converted from enum to string literal union.
ScriptureRef, MultiString (unchanged) Keep as-is.
InterlinearAlignment, AlignmentLink (unchanged shape) Preserve; endpoint field names would track Token / Morpheme renames.

This change is Reviewable

Summary by CodeRabbit

Summary by CodeRabbit

  • Refactor

    • Migrated public interlinear types from a verse/cluster model to a token-first interlinear representation with richer alignment and analysis structures
    • Added alignment/linking and enhanced lexicon reference support
    • Reorganized internal type usage for consistency
  • Chores

    • Expanded the spell-check dictionary with additional accepted terms

@alex-rawlings-yyc alex-rawlings-yyc linked an issue Apr 21, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Warning

Rate limit exceeded

@alex-rawlings-yyc has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 45 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 45 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9edcdfc3-64c3-481f-8c19-56771a3b1e80

📥 Commits

Reviewing files that changed from the base of the PR and between 546bd06 and 2b462bc.

📒 Files selected for processing (2)
  • cspell.json
  • src/types/interlinearizer.d.ts
📝 Walkthrough

Walkthrough

Consolidates interlinear types: interlinearXmlParser now declares and exports local XML-aligned interfaces and InterlinearData; interlinearizer.d.ts replaces the legacy XML model with a new token-based InterlinearAlignment model; interlinearizer.web-view.tsx import adjusted; cspell.json wordlist extended.

Changes

Cohort / File(s) Summary
Parser + View import
src/parsers/interlinearXmlParser.ts, src/interlinearizer.web-view.tsx
interlinearXmlParser.ts now declares and exports local interfaces (StringRange, LexemeData, ClusterData, PunctuationData, VerseData, InterlinearData) and updates parse() to return the local InterlinearData. interlinearizer.web-view.tsx import updated to import InterlinearData from the parser.
Public type surface
src/types/interlinearizer.d.ts
Replaces legacy XML-aligned exported types with a new token-first interlinear model: InterlinearAlignment, InterlinearText, Book, Segment, Token, TextAnalysis, SegmentAnalysis, TokenAnalysis, Morpheme, Phrase, alignment/link types, lifecycle enums/unions, and lexicon reference types; legacy XML types removed.
Spellcheck config
cspell.json
Extended words list with additional accepted terms (e.g., BBCCCVVVV, Discontiguous, eflomal, morphosyntactic, Wordform) and adjusted trailing punctuation to append entries.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I nibbled at types, then hopped to the core,
Strings turned to tokens, and clauses to more,
Books, segments, and glosses now dance on the floor,
XML whispers fade at the door,
I thump for the change and peek for encore. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: a redesigned interlinear data model with new top-level shapes and organizational principles.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch interlinear-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alex-rawlings-yyc

This comment was marked as resolved.

@coderabbitai

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as resolved.

Comment thread src/types/interlinearizer.d.ts Outdated
@jasonleenaylor
Copy link
Copy Markdown
Contributor

src/types/interlinearizer.d.ts line 156 at r2 (raw file):

    /** `ISense.entryId` — the owning `IEntry`. */
    entryId: string;

This duplicates data that exists in the sense and adds a secondary reference that would need to be checked (a sense could be moved to a new entry in edge cases) I think the entryId should be excluded from this ref. I think this applies, although probably even less common, to the allomorph and grammatical info refs as well.

One consideration in however we do cross-extension-linking is the need to check for references which are removed.

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc made 1 comment.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on jasonleenaylor).


src/types/interlinearizer.d.ts line 156 at r2 (raw file):

Previously, jasonleenaylor (Jason Naylor) wrote…

This duplicates data that exists in the sense and adds a secondary reference that would need to be checked (a sense could be moved to a new entry in edge cases) I think the entryId should be excluded from this ref. I think this applies, although probably even less common, to the allomorph and grammatical info refs as well.

One consideration in however we do cross-extension-linking is the need to check for references which are removed.

Removed the entryIds. I'll add that to up next

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/types/interlinearizer.d.ts (3)

584-773: Consider modeling the gloss / sense-ref mutual exclusion in the type.

The doc comments for TokenAnalysis (Lines 555-558) and Phrase (Lines 716-718) describe gloss and glossSenseRef / senseRef as alternatives ("alternatively resolves the gloss through…"), but the type allows both to be set simultaneously — and nothing in the invariant list addresses what a consumer should render if both are present. A discriminated union ({ gloss: MultiString } | { glossSenseRef: SenseRef } on the relevant slice) would make the either/or constraint explicit and catch ambiguous records at compile time. Left as-is this is a minor ergonomic nit rather than a correctness issue — flagging as optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/interlinearizer.d.ts` around lines 584 - 773, The TokenAnalysis and
Phrase interfaces allow both a free-form gloss and a lexicon-backed sense ref
simultaneously (gloss / glossSenseRef on TokenAnalysis, gloss / senseRef on
Phrase); change these to a discriminated union so consumers cannot set both:
replace the gloss/glossSenseRef pair in TokenAnalysis with a union like { gloss:
MultiString; glossSenseRef?: never } | { gloss?: never; glossSenseRef: SenseRef
} (and do the same for Phrase with gloss/senseRef) so the either/or constraint
is enforced at compile time while keeping existing field names (refer to
TokenAnalysis, gloss, glossSenseRef, Phrase, gloss, senseRef).

452-818: Inconsistent optionality of status across analysis/link records undermines the "at most one Approved" invariants.

AlignmentLink.status (Line 805) is required, but SegmentAnalysis.status (Line 533), TokenAnalysis.status (Line 615), and Phrase.status (Line 747) are all optional. The documented invariants in TextAnalysis (Lines 461-463, 477-479, 491-496) all key off of status === Approved, so absence of status is semantically ambiguous — is a SegmentAnalysis with no status implicitly Approved, Suggested, or something else entirely? Consumers enforcing the single-Approved invariant will have to invent a default, and different consumers are likely to pick differently.

Consider making status required on all four shapes (and on AlignmentLink — where it already is), or explicitly documenting the default when omitted. Given the invariants, required is probably the safer contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/interlinearizer.d.ts` around lines 452 - 818, The issue is that
status is optional on SegmentAnalysis, TokenAnalysis, and Phrase but required on
AlignmentLink while TextAnalysis invariants rely on comparing status ===
Approved; make status a required field on SegmentAnalysis.status,
TokenAnalysis.status, and Phrase.status (i.e., remove the optional "?" so each
interface has status: AssignmentStatus), update their JSDoc to reflect the
requiredness, adjust any consumer code/tests to set a status when constructing
these objects, and run type-checks to fix resulting compilation errors so the
single-Approved invariants in TextAnalysis can be enforced without defaults.

767-773: Phrase.tokenSnapshots parallel-array invariant is not type-enforceable; consider documenting the expectation more prominently.

When set, tokenSnapshots must be the same length as tokenIds and index-aligned (Lines 768-770). This is invisible to the type system and easy to get wrong during construction (e.g., filtering tokens out of tokenIds without filtering tokenSnapshots). Worth either (a) calling this out as an explicit "Invariant:" in the TSDoc (matching the style used in TextAnalysis), or (b) modeling snapshots alongside ids as { tokenId: string; snapshot?: string }[] so the parallelism is expressed in the type. Minor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/interlinearizer.d.ts` around lines 767 - 773, The
Phrase.tokenSnapshots field currently relies on an unenforced parallel-array
invariant with Phrase.tokenIds; update the TSDoc for tokenSnapshots to include a
clear "Invariant:" statement (matching the style used in TextAnalysis)
specifying that tokenSnapshots, when present, must have the same length as
tokenIds and each index i corresponds to the Token.surfaceText for tokenIds[i],
and mention that consumers must maintain alignment when filtering or
transforming tokens; alternatively (optional) consider replacing
tokenSnapshots?: string[] with a structured array type like { tokenId: string;
snapshot?: string }[] to encode the association in the type itself so the
parallelism is enforced by the compiler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/types/interlinearizer.d.ts`:
- Around line 584-773: The TokenAnalysis and Phrase interfaces allow both a
free-form gloss and a lexicon-backed sense ref simultaneously (gloss /
glossSenseRef on TokenAnalysis, gloss / senseRef on Phrase); change these to a
discriminated union so consumers cannot set both: replace the
gloss/glossSenseRef pair in TokenAnalysis with a union like { gloss:
MultiString; glossSenseRef?: never } | { gloss?: never; glossSenseRef: SenseRef
} (and do the same for Phrase with gloss/senseRef) so the either/or constraint
is enforced at compile time while keeping existing field names (refer to
TokenAnalysis, gloss, glossSenseRef, Phrase, gloss, senseRef).
- Around line 452-818: The issue is that status is optional on SegmentAnalysis,
TokenAnalysis, and Phrase but required on AlignmentLink while TextAnalysis
invariants rely on comparing status === Approved; make status a required field
on SegmentAnalysis.status, TokenAnalysis.status, and Phrase.status (i.e., remove
the optional "?" so each interface has status: AssignmentStatus), update their
JSDoc to reflect the requiredness, adjust any consumer code/tests to set a
status when constructing these objects, and run type-checks to fix resulting
compilation errors so the single-Approved invariants in TextAnalysis can be
enforced without defaults.
- Around line 767-773: The Phrase.tokenSnapshots field currently relies on an
unenforced parallel-array invariant with Phrase.tokenIds; update the TSDoc for
tokenSnapshots to include a clear "Invariant:" statement (matching the style
used in TextAnalysis) specifying that tokenSnapshots, when present, must have
the same length as tokenIds and each index i corresponds to the
Token.surfaceText for tokenIds[i], and mention that consumers must maintain
alignment when filtering or transforming tokens; alternatively (optional)
consider replacing tokenSnapshots?: string[] with a structured array type like {
tokenId: string; snapshot?: string }[] to encode the association in the type
itself so the parallelism is enforced by the compiler.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6264ba8-61c9-449b-9e94-dd5e927f1164

📥 Commits

Reviewing files that changed from the base of the PR and between 3cb8a50 and 0c1a762.

📒 Files selected for processing (2)
  • cspell.json
  • src/types/interlinearizer.d.ts
✅ Files skipped from review due to trivial changes (1)
  • cspell.json

@alex-rawlings-yyc

This comment was marked as spam.

@coderabbitai

This comment was marked as duplicate.

alex-rawlings-yyc pushed a commit that referenced this pull request Apr 23, 2026
* Initial commit

* Added Vite/TypeScript/React and pulling papi types

* Added package.json for extension because Paranext needs it

* Changed extensions back to commonjs because paranext does not support es modules in production

* Removed data provider info

* Added scss style example

* Added support for running paranext with this extension, added source map for main.ts

* Fixed typo in type declaration reference

* Updated to new command-line argument name

* Moved vite config into vite folder

* Began splitting Vite into two build steps

* WebViews transpile and output into adjacent temp-vite folder

* Build step 2 grabs web view files built in step 1, excluded imports from web views are now exact matches

* Got two-step vite build working, changed React webviews to use global variable instead of component name to prevent tree shaking

* Cleanup

* Fixed build:vite, cleanup

* Small fix to the readme

* Some cleanup, adding comments where config is shared with paranext-core

* Upgraded to latest rollup-plugin-import-manager which includes the patch we made

* Used imports in web view

* Added notice about import manager false positive

* Updated calls to addWebView

* Update rollup-plugin-import-manager to latest

* Replace NODE_ENV on WebViews in Vite (#12)

* Updated data provider to data type api, other updates to match existing extension code to show other features and such

* Updated useData to new type api

* Updated to newer data provider api

* Fix vite security vulnerability (#16)

* Converted papi components to use package

* Bumped rollup-plugin-import-manager to fix irregular require statements

* Updated template to use web view provider api

* Update for split `papi.d.ts` (#20)

* Upgrade TypeScript (#21)

* Shared paranext extension types, improved getting papi-dts

* Converted extension types to modules, removed unnecessary path aliases

* Fixed extension import in main

* Use extension lib types instead of dist

* Replace fetch to bible-api.com with the USFM data provider (#22)

* Set `.editorconfig` (#23)

- also VS Code rulers and EOF

* Update command syntax for papi-commands

* Added more explanation for why we have public/assets/index.d.ts

* Lower camel case in most ids

* Fix problems seen when using the latest core branch (#25)

* Update security vulnerabilities (#30)

- also fix VerseRef selector
- also fix Greet button
- also add prettier config

* Changed build tool to webpack, many other small improvements

* Changed folder structure: lib to src, dist to release, and build to dist

* Various fixes and tweaks found while converting paranext-core to webpack

* Added DEBUG_PROD for sourcemaps in production builds

* Minor update to package-lock.json

* Added import that got removed in the rebase

* Switched to commonjs-static modules for more predictable use

* Prettier formatted all files

* Changed extension dependencies to peerDependencies since the only ingestor Paranext must have them installed to be able to provide them

* Removed data url support since paranext does not support them

* Apply updates based on changes to core

* Debug production (#34)

* Extend readme with update section (#36)

* Extend readme with update section

* Rename papi-commands to papi-shared-types

* Update deps and prettier config (#38)

* Setup linting rules

* Configure eslint setup. Way too many packages installed though

* Remove unnecessary packages

* Remove .eslintcache

* Add .eslintcache to gitignore

* Remove contents of template

* Sort contents of package.json

* Review comments

* Delete eslintcache

* Add eslintcache to gitignore

* Processed review comments

* Add empty browserlist to package.json

* Update readme

* Rename types file

* Revert changes to paranext-extension-template naming

* Change types file extension

* Fix broken links to types file

* Bump postcss from 8.4.29 to 8.4.31 (#46)

* Update webpack.config.base.ts

* Bump @babel/traverse from 7.22.11 to 7.23.2 (#47)

Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.22.11 to 7.23.2.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse)

---
updated-dependencies:
- dependency-name: "@babel/traverse"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Revert "Update webpack.config.base.ts"

This reverts commit 5eeb763.

* Allow data urls

* update import from papi-backend, and add papi-frontend/react to webpack externals

* Added cache/extension-types to typeRoots for easy sharing extension types

* Add ESLint rule for flagging use of "null"

* Refactored imports

* Prep for use in multi-extension-template - moved files, synced formatting settings with core, added more template instructions to readme

* Changed Paranext to Platform.Bible in various places, misc improvements

* Added warning about editing update merge commit history

* Moved Special features description to this repo for lower frequency of merge conflicts

* Genericized license

* Added 'shared with' statements to various style files and brought them in sync

* Removed no-non-null-assertion as it is covered by no-type-assertion

* Fixed emotion package duplication

* Reworked explanation for package aliases. Also removed note about splitting into its own repo as this problem will not be solved as long as the package used is local

* npm updates (#59)

- also add Volta node version

* Add contributions folder for menus.json

* Added settings and project settings contribution files

* Fixed swc bindings not found error

* security update `@sillsdev/scripture` (#63)

- also update other npm packages

* Updated to node 20.11.1 LTS

* update all npm packages (#65)

- except `typescript` and `@typescript-eslint/*`

* Fixed dts self-import

* Added localized string contribution doc

* Fixed typo in contributions path

* Bump braces from 3.0.2 to 3.0.3 (#69)

Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.
- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)
- [Commits](micromatch/braces@3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: braces
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: added displayData.json file for localization

* fix: adjusted displayData.json to have localizedDisplayInfo field

* update `@sillsdev/scripture` (#71)

- enable full serialization that can now include verse ranges, sequences, and segments.

* refactor: adjusted file structure for extension descriptions

* feat: added 'moreInfoUrl' and 'supportUrl' to manifest

* refactor: adjusted structure for fields in manifest.json

* fix: changed elevatedPrivileges to an array

* update `@sillsdev/scripture` (#74)

- fix .NET deserialization

* #481: Changed a couple places in files where we descriptions that refer to Paranext (#75)

* Removed outdated change description line, add comment template info section and instructions to avoid merge conflicts

* Bump webpack from 5.91.0 to 5.94.0 (#78)

Bumps [webpack](https://github.com/webpack/webpack) from 5.91.0 to 5.94.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](webpack/webpack@v5.91.0...v5.94.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Clarify readme uncommenting directions

* update dependencies (#80)

- also ran `npx update-browserslist-db@latest`

* Added tailwind to extension template

* Fixed template info comment not closing properly

* Updated sass

* Moved LIBRARY_TYPE to match multi template

* update dependencies (#83)

- also fix stylelint that was failing on `@tailwind`

* Changed tailwind file to .css

* provide for type checking (#76)

- use `noImplicitReturns` instead of `consistent-return`

* improve linting (#85)

- use the TS version of `class-methods-use-this` so we can ignore override methods.

* add missing format checks (#86)

- let prettier format CSS & SCSS
- also `npm audit fix`

* Update tailwind.css (#87)

follow up of - intensify secondary color in platform theme
#1360

* update dependencies (#88)

- `npm audit fix`
- update minor and patch dependencies

* #1587: Changed (c) to © in LICENSE file (for consistency) (#89)

* Added publisher to manifest

* update dependencies (#91)

* fix linting (#92)

- remove jest (no longer used)
- remove duplicated eslint rule

* Changed main file target so it understands built-in modules except crypto are not available

* Adjusted extension name guidance - lowerCamelCase and kebab-case

* Remove MUI

* Update .eslintrc.js from paranext-core

* PT-2390: Minor wording improvements to README (#96)

* PT-2390: Minor wording improvements to README

* PT-2390: Consolidated repetitive instructions.

Wording improvements

* PT-2390: Made it clear that updating a newly cloned extension template repo *should* be updated from the template before starting development.

* Set up auto-release workflow (#99)

* Set up auto-release workflow

* Added explanation of .github files

* Removed line numbers from file link urls

* Fixed out-of-order readme section (#100)

* Added consistent return in bump-versions (#101)

* Small fixes to auto-release process (#102)

* PT-1886: Set up theme contributions, fixed raw file imports not working properly (#103)

Set up theme contributions, fixed raw file imports not working properly

* Add "remote-allow-origins" to launch.json to allow external debugging

* Add codeql.yml to run CodeQL manually (#105)

* Add github action for CI with lint and format check (#106)

* Add github action for CI with lint and format check

* Fix lint.yml GitHub workflow and run npm install to update package-lock.json (#107)

* Update Tailwind prose colors

* In package.json sripts, update lint:scripts, lint:styles (#109)

Matching the multi extension template

* Swapped built-in logs to debug since we already have logs that cover this in core; they are just sample actions (#110)

* Use standard 'WebView' in comments

* PT-2390: Clarify template use in README.md (#112)

* Added CODEOWNERS matching our code stewardship list (#113)

* PT-2390: README improvements (#114)

* PT-2390: Clarify template use in README.md

* Minor formatting/capitalization/wording tweaks and addition of some content and HTML comments to bring README.md into sync with the one in paranext-multi-extension-template

This will also make it easier to keep the two versions appropriately in sync in the future

* Improved wording about editing the README itself when basing an extension on the template

* Extended the comment in paranext-extension-template.d.ts to point to the useful section of the wiki page

* Fixed typo and added link to wiki in Summary section.

Reverted: Extended the comment in paranext-extension-template.d.ts to point to the useful section of the wiki page

* Removed SYNC points in README.

* Fix lint issue and updated package-lock.json

* PT-3107: Improve release workflow (#116)

* PT-3107: Improve release workflow (#44)

* Fix typo

* Fixed typo: proceeding (#117)

* Pinned @swc/core to 1.10.18 to fix native binding issue (#118)

* Bump versions of node and swc to align with core (#119)

* PT-3107: Fixed path to bump versions action in Publish workflow (#121)

Fixed path to bump versions action in Publish workflow

* Reworked readme to link to correct explanation of swc/core problems, updated readme with bump-versions.yml (#122)

Reworked readme to link to correct explanation of swc/core problems

* PT-3049 Align tailwind.css with index.css (#95)

* Update popover color in ext template

* Update shared regions of tailwind.css

* Updated themes from core and shadcn

* Re-add tw- prefix

---------

Co-authored-by: tjcouch-sil <tj_couch@sil.org>

* Ran "npm audit fix", "npm prune", and "npx update-browserslist-db@latest" (#123)

* fix dependencies (#127)

- run `npm audit fix`
- bump `glob` version

* Added TJ and Ira to every CODEOWNER line Matt is currently on (#128)

* Sync a few config files with their counterparts in core (#129)

* update vulnerable deps (#133)

- upgrade lock file to v3
- ran `npm audit fix`
- Bump lodash from 4.17.21 to 4.17.23
- Bump webpack from 5.97.1 to 5.105.2

* Fix all npm audit vulnerabilities by upgrading dependencies (#136)

- Upgrade @typescript-eslint/* v6 → v8 to fix minimatch ReDoS vulnerabilities
- Upgrade copy-webpack-plugin v12 → v14 to fix serialize-javascript RCE vulnerability
- Replace eslint-config-erb with direct airbnb, promise, compat, and
  prettier configs to remove peer dependency conflicts with @typescript-eslint v8
- Add @stylistic/eslint-plugin-ts for lines-between-class-members rule with
  exceptAfterOverload support (removed from @typescript-eslint v8)
- Add allowJs to tsconfig.lint.json to replace removed createDefaultProgram
  parser option
- Remove jest integration; extensions that use jest will need to add
  eslint-plugin-jest themselves

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Eslint naming convention (#137)

- PR #550 suggested a lint check for type naming conventions

* update vulnerable deps (#139)

- `npm audit fix`

* Bump picomatch (#140)

Bumps  and [picomatch](https://github.com/micromatch/picomatch). These dependencies needed to be updated together.

Updates `picomatch` from 4.0.3 to 4.0.4
- [Release notes](https://github.com/micromatch/picomatch/releases)
- [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md)
- [Commits](micromatch/picomatch@4.0.3...4.0.4)

Updates `picomatch` from 2.3.1 to 2.3.2
- [Release notes](https://github.com/micromatch/picomatch/releases)
- [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md)
- [Commits](micromatch/picomatch@4.0.3...4.0.4)

---
updated-dependencies:
- dependency-name: picomatch
  dependency-version: 4.0.4
  dependency-type: indirect
- dependency-name: picomatch
  dependency-version: 2.3.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump serialize-javascript from 7.0.4 to 7.0.5 (#141)

Bumps [serialize-javascript](https://github.com/yahoo/serialize-javascript) from 7.0.4 to 7.0.5.
- [Release notes](https://github.com/yahoo/serialize-javascript/releases)
- [Commits](yahoo/serialize-javascript@v7.0.4...v7.0.5)

---
updated-dependencies:
- dependency-name: serialize-javascript
  dependency-version: 7.0.5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix vulnerable dependencies (#142)

- `npm aduit fix`

* Added repoRoot (propagated changes from multi-template) (#143)

Propagated changes from mutli-template

* Add dark mode selector to match paranext-core

* fix vulnerable dependencies (#152)

- `npm audit fix`
- update `@sillsdev/scripture`

* Finish merge resolution; Fix linting

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: TJ Couch <104016682+tjcouch-sil@users.noreply.github.com>
Co-authored-by: tjcouch-sil <tj_couch@sil.org>
Co-authored-by: FoolRunning <foolrunning@gmail.com>
Co-authored-by: Ira Hopkinson <irahopkinson@users.noreply.github.com>
Co-authored-by: Matt Lyons <matt_lyons@sil.org>
Co-authored-by: rolfheij-sil <108285668+rolfheij-sil@users.noreply.github.com>
Co-authored-by: rolfheij-sil <rolf_heij@sil.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jolie Rabideau <jolierabideau@Jolies-MacBook-Pro.local>
Co-authored-by: jolierabideau <135999578+jolierabideau@users.noreply.github.com>
Co-authored-by: Danny Hammer <denialhammer@gmail.com>
Co-authored-by: Danny Hammer <HammerAPI@gmail.com>
Co-authored-by: Tom Bogle <tom_bogle@sil.org>
Co-authored-by: Katherine Jensen <katherine_jensen@sil.org>
Co-authored-by: Sebastian-ubs <141921979+Sebastian-ubs@users.noreply.github.com>
Co-authored-by: Jason Naylor <jasonleenaylor@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@alex-rawlings-yyc
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/types/interlinearizer.d.ts`:
- Around line 48-51: The doc comments in src/types/interlinearizer.d.ts use
enum-style capitalized names (e.g., Approved, Suggested, Stale) but the public
type AssignmentStatus is a lower-case string union; update the comments to use
the exact lower-case string literals (e.g., 'approved', 'suggested', 'stale')
wherever AssignmentStatus or status is mentioned (including the shown block and
the other occurrences around lines 447-465, 611-613, 836-838), and ensure
references to Token.surfaceText and status comparison mention the exact literal
strings to avoid misleading consumers.
- Around line 459-464: The current model allows multiple TokenAnalysis entries
per tokenId but AlignmentEndpoint only references tokenId + morphemeId, making
morpheme ownership ambiguous; update the types so AlignmentEndpoint (and any
other alignment structures) include tokenAnalysisId whenever morphemeId is
present (i.e., require { tokenId, tokenAnalysisId, morphemeId } for
morpheme-level endpoints) or alternatively enforce globally unique Morpheme.id
and document that invariant in TextAnalysis/TokenAnalysis; modify the
declarations around TextAnalysis, TokenAnalysis, AlignmentEndpoint and Morpheme
(and corresponding comments at the mentioned regions) to reflect the chosen
approach so morpheme-level links unambiguously identify the owning
TokenAnalysis.
- Line 336: The current id: string (used by flat analysis/phrase/alignment
references) is ambiguous because segmentId/tokenId may be reused per
book/verse/segment; update the type definitions in
src/types/interlinearizer.d.ts (where id appears around the flat analysis,
phrase and alignment layer types at the spots corresponding to the current ids
near lines ~336 and ~391) to either (a) document and enforce that these ids are
globally unique within the owning InterlinearText (e.g., change the
comment/description to state “unique within InterlinearText”) or (b) expand the
reference shapes to include their parent scope (e.g., include
parentBookId/verseId/segmentId fields alongside tokenId/segmentId) so references
are unambiguous; pick one approach and update the related type declarations and
inline JSDoc comments for InterlinearText, segmentId, and tokenId accordingly.
- Around line 722-755: The tokenIds array currently allows an empty phrase;
change the type of tokenIds in the Phrase-ish interface to a non-empty tuple
(e.g. [string, ...string[]]) to enforce at least one member, and mirror that
change for tokenSnapshots (tokenSnapshots?: [string, ...string[]]) so the
compile-time invariant about matching lengths is preserved; update any code that
constructs or validates phrases (places referencing tokenIds, tokenSnapshots,
and the interface name in src/types/interlinearizer.d.ts) to handle the tuple
shape instead of a plain string[].
- Around line 91-100: Update the documentation to explicitly state that all
character offsets are measured in JavaScript UTF-16 code units: update the
ScriptureRef interface comment for charIndex, the Token documentation that
mentions charStart/charEnd, and the charStart/charEnd property comments
(referencing the Token and Segment usages) to say "UTF-16 code units (JavaScript
string index/ slice semantics)" so the invariant
Segment.baselineText.slice(charStart, charEnd) === surfaceText is accurate and
callers won't assume code points or grapheme clusters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6db2165-cc1c-4883-9c27-b169837cdd53

📥 Commits

Reviewing files that changed from the base of the PR and between 0c1a762 and 546bd06.

📒 Files selected for processing (1)
  • src/types/interlinearizer.d.ts

Comment thread src/types/interlinearizer.d.ts
Comment thread src/types/interlinearizer.d.ts
Comment thread src/types/interlinearizer.d.ts
Comment thread src/types/interlinearizer.d.ts
Comment thread src/types/interlinearizer.d.ts Outdated
Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc resolved 5 discussions.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

…ntinua

- Both are non-optional because computation is trivial and keeps our options open for user-defined token boundaries in whitespace languages if ever desired
- Updated cspell
…lly exclusive with gloss/sense from lexicon extension, clarify invariance requirement for tokenSnapshots
- `enum` values are inlined by `tsc` at compile time. The Platform.Bible build pipeline uses SWC, which processes files in isolation and cannot inline `enum` values from `.d.ts` declarations.
… it clear that we're working with UTF-16 offset units, define id scope used by flat analysis references, disambiguate morpheme-level alignment endpoints, prevent empty Phrase membership
@jasonleenaylor
Copy link
Copy Markdown
Contributor

I think specifying utf-16 for the character offset may be problematic.

We will need to have some logic to prevent splitting surrogate pairs, saying utf-32 may not be correct either. We should probably just leave it saying character in the model and see how it works out in practice before encoding a solution in the comments.

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Reverting

@alex-rawlings-yyc made 1 comment.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 6 files and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

Copy link
Copy Markdown
Contributor

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imnasnainaec reviewed 5 files and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

@alex-rawlings-yyc
Copy link
Copy Markdown
Contributor Author

@jasonleenaylor this PR is marked as ready to merge, but I want to make sure that we're actually in agreement about the model's current form before merging.

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 3 files and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

Copy link
Copy Markdown
Contributor

@jasonleenaylor jasonleenaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonleenaylor reviewed 6 files and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

@alex-rawlings-yyc alex-rawlings-yyc merged commit 3340605 into main Apr 23, 2026
8 checks passed
@alex-rawlings-yyc alex-rawlings-yyc deleted the interlinear-model branch April 23, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Finalize interlinear model 🚧

3 participants