First proposal for a finalized interlinear model#25
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 45 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughConsolidates interlinear types: Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c1ca763 to
4592089
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
|
This duplicates data that exists in the sense and adds a secondary reference that would need to be checked (a sense could be moved to a new entry in edge cases) I think the entryId should be excluded from this ref. I think this applies, although probably even less common, to the allomorph and grammatical info refs as well. One consideration in however we do cross-extension-linking is the need to check for references which are removed. |
alex-rawlings-yyc
left a comment
There was a problem hiding this comment.
@alex-rawlings-yyc made 1 comment.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on jasonleenaylor).
src/types/interlinearizer.d.ts line 156 at r2 (raw file):
Previously, jasonleenaylor (Jason Naylor) wrote…
This duplicates data that exists in the sense and adds a secondary reference that would need to be checked (a sense could be moved to a new entry in edge cases) I think the entryId should be excluded from this ref. I think this applies, although probably even less common, to the allomorph and grammatical info refs as well.
One consideration in however we do cross-extension-linking is the need to check for references which are removed.
Removed the entryIds. I'll add that to up next
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/types/interlinearizer.d.ts (3)
584-773: Consider modeling thegloss/ sense-ref mutual exclusion in the type.The doc comments for
TokenAnalysis(Lines 555-558) andPhrase(Lines 716-718) describeglossandglossSenseRef/senseRefas alternatives ("alternatively resolves the gloss through…"), but the type allows both to be set simultaneously — and nothing in the invariant list addresses what a consumer should render if both are present. A discriminated union ({ gloss: MultiString } | { glossSenseRef: SenseRef }on the relevant slice) would make the either/or constraint explicit and catch ambiguous records at compile time. Left as-is this is a minor ergonomic nit rather than a correctness issue — flagging as optional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/types/interlinearizer.d.ts` around lines 584 - 773, The TokenAnalysis and Phrase interfaces allow both a free-form gloss and a lexicon-backed sense ref simultaneously (gloss / glossSenseRef on TokenAnalysis, gloss / senseRef on Phrase); change these to a discriminated union so consumers cannot set both: replace the gloss/glossSenseRef pair in TokenAnalysis with a union like { gloss: MultiString; glossSenseRef?: never } | { gloss?: never; glossSenseRef: SenseRef } (and do the same for Phrase with gloss/senseRef) so the either/or constraint is enforced at compile time while keeping existing field names (refer to TokenAnalysis, gloss, glossSenseRef, Phrase, gloss, senseRef).
452-818: Inconsistent optionality ofstatusacross analysis/link records undermines the "at most one Approved" invariants.
AlignmentLink.status(Line 805) is required, butSegmentAnalysis.status(Line 533),TokenAnalysis.status(Line 615), andPhrase.status(Line 747) are all optional. The documented invariants inTextAnalysis(Lines 461-463, 477-479, 491-496) all key off ofstatus === Approved, so absence ofstatusis semantically ambiguous — is aSegmentAnalysiswith nostatusimplicitlyApproved,Suggested, or something else entirely? Consumers enforcing the single-Approved invariant will have to invent a default, and different consumers are likely to pick differently.Consider making
statusrequired on all four shapes (and onAlignmentLink— where it already is), or explicitly documenting the default when omitted. Given the invariants, required is probably the safer contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/types/interlinearizer.d.ts` around lines 452 - 818, The issue is that status is optional on SegmentAnalysis, TokenAnalysis, and Phrase but required on AlignmentLink while TextAnalysis invariants rely on comparing status === Approved; make status a required field on SegmentAnalysis.status, TokenAnalysis.status, and Phrase.status (i.e., remove the optional "?" so each interface has status: AssignmentStatus), update their JSDoc to reflect the requiredness, adjust any consumer code/tests to set a status when constructing these objects, and run type-checks to fix resulting compilation errors so the single-Approved invariants in TextAnalysis can be enforced without defaults.
767-773:Phrase.tokenSnapshotsparallel-array invariant is not type-enforceable; consider documenting the expectation more prominently.When set,
tokenSnapshotsmust be the same length astokenIdsand index-aligned (Lines 768-770). This is invisible to the type system and easy to get wrong during construction (e.g., filtering tokens out oftokenIdswithout filteringtokenSnapshots). Worth either (a) calling this out as an explicit "Invariant:" in the TSDoc (matching the style used inTextAnalysis), or (b) modeling snapshots alongside ids as{ tokenId: string; snapshot?: string }[]so the parallelism is expressed in the type. Minor.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/types/interlinearizer.d.ts` around lines 767 - 773, The Phrase.tokenSnapshots field currently relies on an unenforced parallel-array invariant with Phrase.tokenIds; update the TSDoc for tokenSnapshots to include a clear "Invariant:" statement (matching the style used in TextAnalysis) specifying that tokenSnapshots, when present, must have the same length as tokenIds and each index i corresponds to the Token.surfaceText for tokenIds[i], and mention that consumers must maintain alignment when filtering or transforming tokens; alternatively (optional) consider replacing tokenSnapshots?: string[] with a structured array type like { tokenId: string; snapshot?: string }[] to encode the association in the type itself so the parallelism is enforced by the compiler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/types/interlinearizer.d.ts`:
- Around line 584-773: The TokenAnalysis and Phrase interfaces allow both a
free-form gloss and a lexicon-backed sense ref simultaneously (gloss /
glossSenseRef on TokenAnalysis, gloss / senseRef on Phrase); change these to a
discriminated union so consumers cannot set both: replace the
gloss/glossSenseRef pair in TokenAnalysis with a union like { gloss:
MultiString; glossSenseRef?: never } | { gloss?: never; glossSenseRef: SenseRef
} (and do the same for Phrase with gloss/senseRef) so the either/or constraint
is enforced at compile time while keeping existing field names (refer to
TokenAnalysis, gloss, glossSenseRef, Phrase, gloss, senseRef).
- Around line 452-818: The issue is that status is optional on SegmentAnalysis,
TokenAnalysis, and Phrase but required on AlignmentLink while TextAnalysis
invariants rely on comparing status === Approved; make status a required field
on SegmentAnalysis.status, TokenAnalysis.status, and Phrase.status (i.e., remove
the optional "?" so each interface has status: AssignmentStatus), update their
JSDoc to reflect the requiredness, adjust any consumer code/tests to set a
status when constructing these objects, and run type-checks to fix resulting
compilation errors so the single-Approved invariants in TextAnalysis can be
enforced without defaults.
- Around line 767-773: The Phrase.tokenSnapshots field currently relies on an
unenforced parallel-array invariant with Phrase.tokenIds; update the TSDoc for
tokenSnapshots to include a clear "Invariant:" statement (matching the style
used in TextAnalysis) specifying that tokenSnapshots, when present, must have
the same length as tokenIds and each index i corresponds to the
Token.surfaceText for tokenIds[i], and mention that consumers must maintain
alignment when filtering or transforming tokens; alternatively (optional)
consider replacing tokenSnapshots?: string[] with a structured array type like {
tokenId: string; snapshot?: string }[] to encode the association in the type
itself so the parallelism is enforced by the compiler.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c6264ba8-61c9-449b-9e94-dd5e927f1164
📒 Files selected for processing (2)
cspell.jsonsrc/types/interlinearizer.d.ts
✅ Files skipped from review due to trivial changes (1)
- cspell.json
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as duplicate.
This comment was marked as duplicate.
* Initial commit * Added Vite/TypeScript/React and pulling papi types * Added package.json for extension because Paranext needs it * Changed extensions back to commonjs because paranext does not support es modules in production * Removed data provider info * Added scss style example * Added support for running paranext with this extension, added source map for main.ts * Fixed typo in type declaration reference * Updated to new command-line argument name * Moved vite config into vite folder * Began splitting Vite into two build steps * WebViews transpile and output into adjacent temp-vite folder * Build step 2 grabs web view files built in step 1, excluded imports from web views are now exact matches * Got two-step vite build working, changed React webviews to use global variable instead of component name to prevent tree shaking * Cleanup * Fixed build:vite, cleanup * Small fix to the readme * Some cleanup, adding comments where config is shared with paranext-core * Upgraded to latest rollup-plugin-import-manager which includes the patch we made * Used imports in web view * Added notice about import manager false positive * Updated calls to addWebView * Update rollup-plugin-import-manager to latest * Replace NODE_ENV on WebViews in Vite (#12) * Updated data provider to data type api, other updates to match existing extension code to show other features and such * Updated useData to new type api * Updated to newer data provider api * Fix vite security vulnerability (#16) * Converted papi components to use package * Bumped rollup-plugin-import-manager to fix irregular require statements * Updated template to use web view provider api * Update for split `papi.d.ts` (#20) * Upgrade TypeScript (#21) * Shared paranext extension types, improved getting papi-dts * Converted extension types to modules, removed unnecessary path aliases * Fixed extension import in main * Use extension lib types instead of dist * Replace fetch to bible-api.com with the USFM data provider (#22) * Set `.editorconfig` (#23) - also VS Code rulers and EOF * Update command syntax for papi-commands * Added more explanation for why we have public/assets/index.d.ts * Lower camel case in most ids * Fix problems seen when using the latest core branch (#25) * Update security vulnerabilities (#30) - also fix VerseRef selector - also fix Greet button - also add prettier config * Changed build tool to webpack, many other small improvements * Changed folder structure: lib to src, dist to release, and build to dist * Various fixes and tweaks found while converting paranext-core to webpack * Added DEBUG_PROD for sourcemaps in production builds * Minor update to package-lock.json * Added import that got removed in the rebase * Switched to commonjs-static modules for more predictable use * Prettier formatted all files * Changed extension dependencies to peerDependencies since the only ingestor Paranext must have them installed to be able to provide them * Removed data url support since paranext does not support them * Apply updates based on changes to core * Debug production (#34) * Extend readme with update section (#36) * Extend readme with update section * Rename papi-commands to papi-shared-types * Update deps and prettier config (#38) * Setup linting rules * Configure eslint setup. Way too many packages installed though * Remove unnecessary packages * Remove .eslintcache * Add .eslintcache to gitignore * Remove contents of template * Sort contents of package.json * Review comments * Delete eslintcache * Add eslintcache to gitignore * Processed review comments * Add empty browserlist to package.json * Update readme * Rename types file * Revert changes to paranext-extension-template naming * Change types file extension * Fix broken links to types file * Bump postcss from 8.4.29 to 8.4.31 (#46) * Update webpack.config.base.ts * Bump @babel/traverse from 7.22.11 to 7.23.2 (#47) Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.22.11 to 7.23.2. - [Release notes](https://github.com/babel/babel/releases) - [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md) - [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse) --- updated-dependencies: - dependency-name: "@babel/traverse" dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Revert "Update webpack.config.base.ts" This reverts commit 5eeb763. * Allow data urls * update import from papi-backend, and add papi-frontend/react to webpack externals * Added cache/extension-types to typeRoots for easy sharing extension types * Add ESLint rule for flagging use of "null" * Refactored imports * Prep for use in multi-extension-template - moved files, synced formatting settings with core, added more template instructions to readme * Changed Paranext to Platform.Bible in various places, misc improvements * Added warning about editing update merge commit history * Moved Special features description to this repo for lower frequency of merge conflicts * Genericized license * Added 'shared with' statements to various style files and brought them in sync * Removed no-non-null-assertion as it is covered by no-type-assertion * Fixed emotion package duplication * Reworked explanation for package aliases. Also removed note about splitting into its own repo as this problem will not be solved as long as the package used is local * npm updates (#59) - also add Volta node version * Add contributions folder for menus.json * Added settings and project settings contribution files * Fixed swc bindings not found error * security update `@sillsdev/scripture` (#63) - also update other npm packages * Updated to node 20.11.1 LTS * update all npm packages (#65) - except `typescript` and `@typescript-eslint/*` * Fixed dts self-import * Added localized string contribution doc * Fixed typo in contributions path * Bump braces from 3.0.2 to 3.0.3 (#69) Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. - [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md) - [Commits](micromatch/braces@3.0.2...3.0.3) --- updated-dependencies: - dependency-name: braces dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: added displayData.json file for localization * fix: adjusted displayData.json to have localizedDisplayInfo field * update `@sillsdev/scripture` (#71) - enable full serialization that can now include verse ranges, sequences, and segments. * refactor: adjusted file structure for extension descriptions * feat: added 'moreInfoUrl' and 'supportUrl' to manifest * refactor: adjusted structure for fields in manifest.json * fix: changed elevatedPrivileges to an array * update `@sillsdev/scripture` (#74) - fix .NET deserialization * #481: Changed a couple places in files where we descriptions that refer to Paranext (#75) * Removed outdated change description line, add comment template info section and instructions to avoid merge conflicts * Bump webpack from 5.91.0 to 5.94.0 (#78) Bumps [webpack](https://github.com/webpack/webpack) from 5.91.0 to 5.94.0. - [Release notes](https://github.com/webpack/webpack/releases) - [Commits](webpack/webpack@v5.91.0...v5.94.0) --- updated-dependencies: - dependency-name: webpack dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Clarify readme uncommenting directions * update dependencies (#80) - also ran `npx update-browserslist-db@latest` * Added tailwind to extension template * Fixed template info comment not closing properly * Updated sass * Moved LIBRARY_TYPE to match multi template * update dependencies (#83) - also fix stylelint that was failing on `@tailwind` * Changed tailwind file to .css * provide for type checking (#76) - use `noImplicitReturns` instead of `consistent-return` * improve linting (#85) - use the TS version of `class-methods-use-this` so we can ignore override methods. * add missing format checks (#86) - let prettier format CSS & SCSS - also `npm audit fix` * Update tailwind.css (#87) follow up of - intensify secondary color in platform theme #1360 * update dependencies (#88) - `npm audit fix` - update minor and patch dependencies * #1587: Changed (c) to © in LICENSE file (for consistency) (#89) * Added publisher to manifest * update dependencies (#91) * fix linting (#92) - remove jest (no longer used) - remove duplicated eslint rule * Changed main file target so it understands built-in modules except crypto are not available * Adjusted extension name guidance - lowerCamelCase and kebab-case * Remove MUI * Update .eslintrc.js from paranext-core * PT-2390: Minor wording improvements to README (#96) * PT-2390: Minor wording improvements to README * PT-2390: Consolidated repetitive instructions. Wording improvements * PT-2390: Made it clear that updating a newly cloned extension template repo *should* be updated from the template before starting development. * Set up auto-release workflow (#99) * Set up auto-release workflow * Added explanation of .github files * Removed line numbers from file link urls * Fixed out-of-order readme section (#100) * Added consistent return in bump-versions (#101) * Small fixes to auto-release process (#102) * PT-1886: Set up theme contributions, fixed raw file imports not working properly (#103) Set up theme contributions, fixed raw file imports not working properly * Add "remote-allow-origins" to launch.json to allow external debugging * Add codeql.yml to run CodeQL manually (#105) * Add github action for CI with lint and format check (#106) * Add github action for CI with lint and format check * Fix lint.yml GitHub workflow and run npm install to update package-lock.json (#107) * Update Tailwind prose colors * In package.json sripts, update lint:scripts, lint:styles (#109) Matching the multi extension template * Swapped built-in logs to debug since we already have logs that cover this in core; they are just sample actions (#110) * Use standard 'WebView' in comments * PT-2390: Clarify template use in README.md (#112) * Added CODEOWNERS matching our code stewardship list (#113) * PT-2390: README improvements (#114) * PT-2390: Clarify template use in README.md * Minor formatting/capitalization/wording tweaks and addition of some content and HTML comments to bring README.md into sync with the one in paranext-multi-extension-template This will also make it easier to keep the two versions appropriately in sync in the future * Improved wording about editing the README itself when basing an extension on the template * Extended the comment in paranext-extension-template.d.ts to point to the useful section of the wiki page * Fixed typo and added link to wiki in Summary section. Reverted: Extended the comment in paranext-extension-template.d.ts to point to the useful section of the wiki page * Removed SYNC points in README. * Fix lint issue and updated package-lock.json * PT-3107: Improve release workflow (#116) * PT-3107: Improve release workflow (#44) * Fix typo * Fixed typo: proceeding (#117) * Pinned @swc/core to 1.10.18 to fix native binding issue (#118) * Bump versions of node and swc to align with core (#119) * PT-3107: Fixed path to bump versions action in Publish workflow (#121) Fixed path to bump versions action in Publish workflow * Reworked readme to link to correct explanation of swc/core problems, updated readme with bump-versions.yml (#122) Reworked readme to link to correct explanation of swc/core problems * PT-3049 Align tailwind.css with index.css (#95) * Update popover color in ext template * Update shared regions of tailwind.css * Updated themes from core and shadcn * Re-add tw- prefix --------- Co-authored-by: tjcouch-sil <tj_couch@sil.org> * Ran "npm audit fix", "npm prune", and "npx update-browserslist-db@latest" (#123) * fix dependencies (#127) - run `npm audit fix` - bump `glob` version * Added TJ and Ira to every CODEOWNER line Matt is currently on (#128) * Sync a few config files with their counterparts in core (#129) * update vulnerable deps (#133) - upgrade lock file to v3 - ran `npm audit fix` - Bump lodash from 4.17.21 to 4.17.23 - Bump webpack from 5.97.1 to 5.105.2 * Fix all npm audit vulnerabilities by upgrading dependencies (#136) - Upgrade @typescript-eslint/* v6 → v8 to fix minimatch ReDoS vulnerabilities - Upgrade copy-webpack-plugin v12 → v14 to fix serialize-javascript RCE vulnerability - Replace eslint-config-erb with direct airbnb, promise, compat, and prettier configs to remove peer dependency conflicts with @typescript-eslint v8 - Add @stylistic/eslint-plugin-ts for lines-between-class-members rule with exceptAfterOverload support (removed from @typescript-eslint v8) - Add allowJs to tsconfig.lint.json to replace removed createDefaultProgram parser option - Remove jest integration; extensions that use jest will need to add eslint-plugin-jest themselves Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Eslint naming convention (#137) - PR #550 suggested a lint check for type naming conventions * update vulnerable deps (#139) - `npm audit fix` * Bump picomatch (#140) Bumps and [picomatch](https://github.com/micromatch/picomatch). These dependencies needed to be updated together. Updates `picomatch` from 4.0.3 to 4.0.4 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@4.0.3...4.0.4) Updates `picomatch` from 2.3.1 to 2.3.2 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@4.0.3...4.0.4) --- updated-dependencies: - dependency-name: picomatch dependency-version: 4.0.4 dependency-type: indirect - dependency-name: picomatch dependency-version: 2.3.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump serialize-javascript from 7.0.4 to 7.0.5 (#141) Bumps [serialize-javascript](https://github.com/yahoo/serialize-javascript) from 7.0.4 to 7.0.5. - [Release notes](https://github.com/yahoo/serialize-javascript/releases) - [Commits](yahoo/serialize-javascript@v7.0.4...v7.0.5) --- updated-dependencies: - dependency-name: serialize-javascript dependency-version: 7.0.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix vulnerable dependencies (#142) - `npm aduit fix` * Added repoRoot (propagated changes from multi-template) (#143) Propagated changes from mutli-template * Add dark mode selector to match paranext-core * fix vulnerable dependencies (#152) - `npm audit fix` - update `@sillsdev/scripture` * Finish merge resolution; Fix linting --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: TJ Couch <104016682+tjcouch-sil@users.noreply.github.com> Co-authored-by: tjcouch-sil <tj_couch@sil.org> Co-authored-by: FoolRunning <foolrunning@gmail.com> Co-authored-by: Ira Hopkinson <irahopkinson@users.noreply.github.com> Co-authored-by: Matt Lyons <matt_lyons@sil.org> Co-authored-by: rolfheij-sil <108285668+rolfheij-sil@users.noreply.github.com> Co-authored-by: rolfheij-sil <rolf_heij@sil.org> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jolie Rabideau <jolierabideau@Jolies-MacBook-Pro.local> Co-authored-by: jolierabideau <135999578+jolierabideau@users.noreply.github.com> Co-authored-by: Danny Hammer <denialhammer@gmail.com> Co-authored-by: Danny Hammer <HammerAPI@gmail.com> Co-authored-by: Tom Bogle <tom_bogle@sil.org> Co-authored-by: Katherine Jensen <katherine_jensen@sil.org> Co-authored-by: Sebastian-ubs <141921979+Sebastian-ubs@users.noreply.github.com> Co-authored-by: Jason Naylor <jasonleenaylor@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
@coderabbitai review |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/types/interlinearizer.d.ts`:
- Around line 48-51: The doc comments in src/types/interlinearizer.d.ts use
enum-style capitalized names (e.g., Approved, Suggested, Stale) but the public
type AssignmentStatus is a lower-case string union; update the comments to use
the exact lower-case string literals (e.g., 'approved', 'suggested', 'stale')
wherever AssignmentStatus or status is mentioned (including the shown block and
the other occurrences around lines 447-465, 611-613, 836-838), and ensure
references to Token.surfaceText and status comparison mention the exact literal
strings to avoid misleading consumers.
- Around line 459-464: The current model allows multiple TokenAnalysis entries
per tokenId but AlignmentEndpoint only references tokenId + morphemeId, making
morpheme ownership ambiguous; update the types so AlignmentEndpoint (and any
other alignment structures) include tokenAnalysisId whenever morphemeId is
present (i.e., require { tokenId, tokenAnalysisId, morphemeId } for
morpheme-level endpoints) or alternatively enforce globally unique Morpheme.id
and document that invariant in TextAnalysis/TokenAnalysis; modify the
declarations around TextAnalysis, TokenAnalysis, AlignmentEndpoint and Morpheme
(and corresponding comments at the mentioned regions) to reflect the chosen
approach so morpheme-level links unambiguously identify the owning
TokenAnalysis.
- Line 336: The current id: string (used by flat analysis/phrase/alignment
references) is ambiguous because segmentId/tokenId may be reused per
book/verse/segment; update the type definitions in
src/types/interlinearizer.d.ts (where id appears around the flat analysis,
phrase and alignment layer types at the spots corresponding to the current ids
near lines ~336 and ~391) to either (a) document and enforce that these ids are
globally unique within the owning InterlinearText (e.g., change the
comment/description to state “unique within InterlinearText”) or (b) expand the
reference shapes to include their parent scope (e.g., include
parentBookId/verseId/segmentId fields alongside tokenId/segmentId) so references
are unambiguous; pick one approach and update the related type declarations and
inline JSDoc comments for InterlinearText, segmentId, and tokenId accordingly.
- Around line 722-755: The tokenIds array currently allows an empty phrase;
change the type of tokenIds in the Phrase-ish interface to a non-empty tuple
(e.g. [string, ...string[]]) to enforce at least one member, and mirror that
change for tokenSnapshots (tokenSnapshots?: [string, ...string[]]) so the
compile-time invariant about matching lengths is preserved; update any code that
constructs or validates phrases (places referencing tokenIds, tokenSnapshots,
and the interface name in src/types/interlinearizer.d.ts) to handle the tuple
shape instead of a plain string[].
- Around line 91-100: Update the documentation to explicitly state that all
character offsets are measured in JavaScript UTF-16 code units: update the
ScriptureRef interface comment for charIndex, the Token documentation that
mentions charStart/charEnd, and the charStart/charEnd property comments
(referencing the Token and Segment usages) to say "UTF-16 code units (JavaScript
string index/ slice semantics)" so the invariant
Segment.baselineText.slice(charStart, charEnd) === surfaceText is accurate and
callers won't assume code points or grapheme clusters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c6db2165-cc1c-4883-9c27-b169837cdd53
📒 Files selected for processing (1)
src/types/interlinearizer.d.ts
alex-rawlings-yyc
left a comment
There was a problem hiding this comment.
@alex-rawlings-yyc resolved 5 discussions.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).
…ntinua - Both are non-optional because computation is trivial and keeps our options open for user-defined token boundaries in whitespace languages if ever desired - Updated cspell
…lly exclusive with gloss/sense from lexicon extension, clarify invariance requirement for tokenSnapshots
- `enum` values are inlined by `tsc` at compile time. The Platform.Bible build pipeline uses SWC, which processes files in isolation and cannot inline `enum` values from `.d.ts` declarations.
… it clear that we're working with UTF-16 offset units, define id scope used by flat analysis references, disambiguate morpheme-level alignment endpoints, prevent empty Phrase membership
cdd418b to
9cb9fb2
Compare
|
I think specifying utf-16 for the character offset may be problematic. We will need to have some logic to prevent splitting surrogate pairs, saying utf-32 may not be correct either. We should probably just leave it saying character in the model and see how it works out in practice before encoding a solution in the comments. |
alex-rawlings-yyc
left a comment
There was a problem hiding this comment.
Sounds good. Reverting
@alex-rawlings-yyc made 1 comment.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).
alex-rawlings-yyc
left a comment
There was a problem hiding this comment.
@alex-rawlings-yyc reviewed 6 files and all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).
imnasnainaec
left a comment
There was a problem hiding this comment.
@imnasnainaec reviewed 5 files and all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).
|
@jasonleenaylor this PR is marked as ready to merge, but I want to make sure that we're actually in agreement about the model's current form before merging. |
alex-rawlings-yyc
left a comment
There was a problem hiding this comment.
@alex-rawlings-yyc reviewed 3 files and all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).
jasonleenaylor
left a comment
There was a problem hiding this comment.
@jasonleenaylor reviewed 6 files and all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).
Interlinear Model — Current vs. Proposed
A proposal for reworking the interlinear model. The current model
merges baseline text and analysis into a single tree and references
LCM-specific lexical types (allomorphs, grammar / MSA) via raw
GUIDs. The proposed model would split text from analysis, move
lexical references to the Lexicon extension (
lexicon)via structured ref types, promote phrases to a first-class entity,
and name a single canonical analysis per record-scope (token,
phrase, segment) while still allowing alternates.
1. Structural shift: one tree → two parallel layers
Current
Occurrencecarries both the surface text and the link to itsanalysis. Analyses are reusable objects shared between occurrences.
Phrase grouping is a
groupIdfield onAnalysisAssignment.Proposed
Text and analysis are peers. The text side is nested (book → segment
→ token) because that's the natural shape of scripture. The analysis
side is flat: each record carries an id reference back to its
text-layer counterpart (
segmentId,tokenId). This avoidscontainer types that would exist solely to mirror a parent.
Consumers that need segment-local views build
Map<tokenId, …>andMap<segmentId, …>at load time.A
Tokenis just surface text; aTokenAnalysiscarries the gloss /parse and references its token by id. Phrases are peers of token
analyses, not a grouping flag.
Why the
TextAnalysiswrapper?The analysis layer is three sibling lists:
segmentAnalyses,tokenAnalyses,phrases. They could sit directly onInterlinearText— the wrapper adds an indirection hop(
text.analysis.tokenAnalysesrather thantext.tokenAnalyses). Itearns that hop by making "the analysis" a first-class noun in the
model:
consume analysis — external analyzers, import / export routines,
re-analysis pipelines — can take and return a single
TextAnalysis. Without the wrapper, those same APIs have tomove three parallel arrays around as a tuple or an ad-hoc object.
analysis, wiping analysis for a re-run, or swapping in the result
of an external analyzer is
text.analysis = …ortext.analysis = undefined. Without the wrapper these becomethree coordinated assignments, and "no analysis yet" has to be
inferred from every array being empty.
InterlinearTextreads asbooks(text) andanalysis(analysis) — two peers. Hoisting the three arrays would blur
that distinction: the type would list one nested field plus three
sibling arrays with no visual grouping.
(Our current importers — LCM, Paratext, BT Extension — receive text
and analysis together at import time, so the "import text first,
analyze later" flow isn't load-bearing for imports. It is
load-bearing for in-session re-analysis and external-analyzer
round-trips.)
What the wrapper intentionally does not buy: multiple competing
analyses of the same text. Competing analyses are handled at the
record level (multiple
TokenAnalysisentries pertokenId,distinguished by
status), not by holding multipleTextAnalysisobjects per
InterlinearText.2. Top-level container
Interlinearization(single-language container)InterlinearTextInterlinearAlignmentwould be anInterlinearText.InterlinearAlignment { source, target, links[] }InterlinearAlignment { source, target, links[] }source/targetwould becomeInterlinearTextrather thanInterlinearization.TextAnalysisInterlinearText.3. Text layer
AnalyzedBook(mixed text + analysis)BookSegment { occurrences: Occurrence[] }Segment { tokens: Token[], baselineText: string }Segmentwould be text-only.baselineTextis required (not optional) — token character offsets are expressed relative to it, so it must be present for the text layer to be interpretable.Occurrence { surfaceText, writingSystem, type, assignment? }Token { surfaceText, writingSystem, type, charStart, charEnd }charStart/charEndare zero-based character offsets within the owningSegment.baselineText(charEndexclusive). Required for all scripts; essential for scriptio continua languages (Chinese, Thai, Tibetan, Lao, Burmese, …) where token boundaries are not derivable from whitespace. Invariant:baselineText.slice(charStart, charEnd) === surfaceText.OccurrenceTypeenumTokenTypestring literal union'word','punctuation').4. Analysis layer
Analysis(reusable, shared) +AnalysisAssignment(join)TokenAnalysisTokenAnalysisrecords (approved plus any competing alternates) — no shared-analysis indirection via a join entity.AnalysisAssignment.groupId(string, phrases encoded as shared ids across assignments)Phrase(first-class entity)tokenIds,gloss,senseRef,status,confidence.MorphemeBundle { form, allomorphRef, lexemeRef, senseRef, grammarRef }Morpheme { form, entryRef?, senseRef?, allomorphRef?, grammarRef? }lexemeRefwould be renamed toentryRef; allomorph and MSA refs retained — the Lexicon extension would be expected to surface the required types.AnalysisTypeenum (wordform/morph/gloss/punctuation)morphemes⇒ morph-level; absence of analysis ⇒ unanalyzed; punctuation tokens would simply be omitted fromTextAnalysis.tokenAnalyses(they would still live in the text layer'sSegment.tokens).Analysis.producer+Analysis.sourceUserproducer+sourceUseronTokenAnalysis,Phrase, andSegmentAnalysisproducer) + human identifier (sourceUser) are distinct from the trust score (confidence), and would be carried by every analysis-layer record type.Under the proposal, the analysis layer is flat — no
AnalyzedBook, no tokens-nested-inside-segment.TextAnalysiswould hold three sibling lists:
segmentAnalyses,tokenAnalyses,phrases. Each record would reference its text-layer counterpartby id.
Proposed new fields worth noting:
SegmentAnalysis.freeTranslation/.literalTranslation— wouldmove off
Segment(where they live in the current model) onto theanalysis-side
SegmentAnalysissince they are analysis artifacts,not baseline text.
TokenAnalysis.glossSenseRef— a lexicon-backed alternative tothe free-form
glossfield.TokenAnalysis.tokenSnapshot,Phrase.tokenSnapshots,AlignmentEndpoint.tokenSnapshot— surface-text snapshots takenat analysis/link-creation time. Would enable drift detection:
compare snapshot vs. current
Token.surfaceTextand flipstatusto a new
Stalevalue on mismatch.Token.charStart/.charEnd— zero-based character offsetswithin the owning
Segment.baselineText(charEndexclusive).Required for all scripts; critical for scriptio continua languages
where word boundaries are not whitespace-delimited. Without these
fields, the tokenization decision is irrecoverable once the token
list is reconstructed from surface text alone.
5. Lexicon coupling: LCM-typed refs → Lexicon extension refs
The current model stores four GUID-style refs per
MorphemeBundlepointing at LCM lexical objects. All four would survive under the
proposal, but each would become a structured reference into the
Lexicon extension (
lexicon) rather than a bare GUIDstring:
allomorphRefIMoForm(specific allomorph)Morpheme.allomorphRef→AllomorphReflexemeRefILexEntryMorpheme.entryRef→EntryRef→IEntrysenseRefILexSenseMorpheme.senseRef→SenseRef→ISensegrammarRefIMoMorphSynAnalysis(MSA)Morpheme.grammarRef→GrammarRefFour new ref types are proposed:
These would be pure identifiers — consumers would resolve them
through the Lexicon extension's
lexicon.entryServicenetworkobject (typed
lexicon.IEntryService, defined inplatform.bible-extension'slexicon.d.ts).There are a few places where the proposal reaches for detail the
Lexicon extension's current public surface doesn't expose directly.
The proposed model would be usable today with workarounds, but each
of the following would make resolution more direct and is worth
coordinating with the Lexicon team on when priorities allow:
IEntryServicegetEntry(projectId, entryId): Promise<IEntry>methodgetSense(projectId, entryId, senseId): Promise<ISense>methodIMoFormnot exported; no allomorph serviceIMoFormand agetAllomorph(projectId, entryId, allomorphId)methodIMoMorphSynAnalysisnot exported; no MSA serviceIMoMorphSynAnalysisand agetMsa(projectId, entryId, msaId)methodIn the meantime, consumers can work around these: query-and-filter
for entries by id, walk
entry.senses[]for senses, inspectIEntry.components/lexemeFormfor allomorphs, and defer MSAdata until it's surfaced.
Consequence: under the proposal, the interlinear model would no
longer duplicate lexical data. Edits to an
IEntry/ISenseinthe Lexicon extension would propagate automatically to every token
that references them.
6. Phrases: groupId → first-class entity
Current
Phrases are reconstructed by grouping
AnalysisAssignments withthe same
groupIdand dereferencing the sharedAnalysis.Proposed
Benefits:
reconstruction.
Phrasewithstatus: suggested,confidence: medium; the user approves orrejects via
status.senseRefwould point at anIEntrywithmorphType: Phrase(contiguous) orDiscontiguousPhrase(disjoint) — already supported by the Lexicon extension's
MorphTypeenum (MiniLcm/Models/MorphType.ts).7. Alignment endpoints
AlignmentEndpoint { occurrenceId, bundleId? }AlignmentEndpoint { tokenId, tokenAnalysisId?, morphemeId? }Same two-level concept (token-level vs. morpheme-level endpoint). Field names track the proposed entity renames. When
morphemeIdis set,tokenAnalysisIdis required — because a token may have multiple competingTokenAnalysisentries, the specificTokenAnalysisthat owns the referenced morpheme must be identified explicitly.8. Proposed invariants and machinery
Not present in the current model:
segmentIdwould be allowed multipleSegmentAnalysisentries;a single
tokenIdwould be allowed multipleTokenAnalysisentries and appearances in multiple
Phraserecords,distinguished by
status/confidence/producer.Invariant: at most one
SegmentAnalysispersegmentId, atmost one
TokenAnalysispertokenId, and at most onePhrasecontaining a given
tokenIdmay havestatus: 'approved'. Theapproved record would be canonical for rendering; alternates
would live alongside for review workflows (AI-drafted back
translation vs. human edit, parser suggestion vs. human choice,
etc.). The current model allows competing
AnalysisAssignmentswith no such "one canonical" rule; the proposal keeps the freedom
but names the winner.
per-token parse would not be considered a competing analysis
to a phrase-level gloss; they record different things and both
would render simultaneously in the UI.
token's surface text at creation time
(
TokenAnalysis.tokenSnapshot,Phrase.tokenSnapshots,AlignmentEndpoint.tokenSnapshot). When the baseline textdrifts, consumers would compare snapshot vs. current
Token.surfaceTextand flipstatusto'stale'.Book.textVersionwould provide a coarse-grained "something changed in this book"
signal for batching the per-token comparisons.
Tokens withtype = punctuationwould be stored inSegment.tokenson bothsource and target so baseline text reconstructs exactly. They
would be omitted only from the analysis layer's
tokenAnalyses.TextAnalysiswould have noper-book or per-segment containers on the analysis side — it
would hold
segmentAnalyses,tokenAnalyses, andphrasesassibling lists keyed by id back to the text layer. Consumers
would index by id at load time for segment-local rendering.
9. Entity-by-entity summary
InterlinearizationInterlinearTextanalysis?field.AnalyzedBook(text+analysis)Book(text-only)TextAnalysiswould replace it.SegmentSegment+SegmentAnalysisSegmentAnalysis, which would also carrystatus/confidence/producerso competing segment-level analyses are permitted under the same "one Approved persegmentId" invariant used for tokens and phrases.Segment.baselineTextbecomes required (was optional) — token character offsets depend on it.OccurrenceToken+TokenAnalysisTokengains requiredcharStart/charEndfields (zero-based offsets withinSegment.baselineText,charEndexclusive) to record token boundaries for all scripts, especially scriptio continua languages.AnalysisTokenAnalysis)TokenAnalysisrecords (approved plus any competing alternates distinguished bystatus).AnalysisAssignmentTokenAnalysis)MorphemeBundleMorphemeentryRef,senseRef,allomorphRef,grammarRef), but each would become a structured ref into the Lexicon extension rather than a bare GUID.groupIdfield)PhraseAnalysisTypeenumOccurrenceTypeenumTokenTypestring literal union'word','punctuation').ConfidenceenumConfidencestring literal union'high','medium','low','guess').AssignmentStatusenumAssignmentStatusstring literal union +'stale''stale'value for drift-detection workflow; converted from enum to string literal union.ScriptureRef,MultiStringInterlinearAlignment,AlignmentLinkToken/Morphemerenames.This change is
Summary by CodeRabbit
Summary by CodeRabbit
Refactor
Chores