Skip to content

Add USJ book parsing pipeline and interlinearizer display WebView#30

Merged
alex-rawlings-yyc merged 32 commits intomainfrom
tokenize-book
Apr 30, 2026
Merged

Add USJ book parsing pipeline and interlinearizer display WebView#30
alex-rawlings-yyc merged 32 commits intomainfrom
tokenize-book

Conversation

@alex-rawlings-yyc
Copy link
Copy Markdown
Contributor

@alex-rawlings-yyc alex-rawlings-yyc commented Apr 23, 2026

This change is Reviewable

Summary by CodeRabbit

  • New Features

    • Tab toolbar with book/chapter control and scroll-group selector; submitting a reference records recent scripture entries.
    • Scripture rendered as tokenized, clickable segment "chips" that update the current passage.
  • Improvements

    • Deterministic, Unicode-aware tokenization and stable book extraction; clearer "Error loading book" vs "Error processing book"; reduced unnecessary re-fetching.
  • Tests

    • Large expansion of deterministic unit and view tests for extraction, tokenization, XML parsing, and interlinear UI.
  • Chores

    • Updated test mocks and Jest config, broader coverage collection, added testing utilities, and spell-check word list updates.
  • Documentation

    • Parser docs moved to PT9 directory and enhanced JSDoc for public models.

- Add usjBookExtractor and bookTokenizer parsers that transform a Platform.Bible USJ book into the interlinear Book/Segment/Token model
- Replace placeholder WebView with SegmentView showing tokenized word and punctuation chips
- Add BookChapterControl verse picker wired to the scroll-group scrRef
- Add unit tests for both parsers and the full WebView component (100% coverage); fix mock signatures for useLocalizedStrings (tuple) and useRecentScriptureRefs (object)
- Add Jest mocks for lucide-react and platform-bible-react to unblock ESM imports in test environment
@alex-rawlings-yyc alex-rawlings-yyc linked an issue Apr 23, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Refactors Jest mocks and mappings; adds USJ→RawBook extractor and Unicode-aware tokenizer; updates web view to render tokenized segments and controls with stable scrRef handling; expands parser/tokenizer tests; changes PT9 parser validation to throw SyntaxError; updates docs, config, and JSDoc.

Changes

Cohort / File(s) Summary
Jest mocks & mappings
__mocks__/papi-frontend-react.ts, __mocks__/papi-frontend.ts, __mocks__/papi-backend.ts, __mocks__/interlinearXmlContent.ts, jest.config.ts
Trim and reshape mocks (simplify papi react/backend mocks, remove raw-XML mock export), add module mapping for platform-bible-react, broaden coverage globs and adjust mock exports.
Platform-bible mock & utils
__mocks__/platform-bible-react.tsx, __mocks__/platform-bible-utils.ts
Add TSX test components (TabToolbar, ScrollGroupSelector, BookChapterControl, BOOK_CHAPTER_CONTROL_STRING_KEYS); change PlatformError discriminator to platformErrorVersion and update isPlatformError.
PAPI USJ parsers / tokenizer
src/parsers/papi/usjBookExtractor.ts, src/parsers/papi/bookTokenizer.ts
Add extractBookFromUsj (USJ traversal, deterministic FNV-1a contentHash, verse accumulation/validation) and tokenizeBook (Unicode-aware tokenization, per-token char offsets, token ids, one Segment per verse).
Web view & UI
src/interlinearizer.web-view.tsx
Refactor web view to fetch project USJ, call extractor+tokenizer, filter segments by book/chapter, render clickable SegmentViews, wire BookChapterControl and ScrollGroupSelector, pass setScrRef, and separate loading vs processing error states.
Tests
src/__tests__/interlinearizer.web-view.test.tsx, src/__tests__/parsers/papi/usjBookExtractor.test.ts, src/__tests__/parsers/papi/bookTokenizer.test.ts, src/__tests__/parsers/pt9/interlinearXmlParser.test.ts, src/__tests__/main.test.ts
Add extensive tests for USJ extraction and tokenization (IDs, Unicode, invariants, errors); update web-view interaction tests; convert PT9 expectations to SyntaxError; move shared test helpers to file scope.
PT9 parser & types
src/parsers/pt9/interlinearXmlParser.ts, src/types/interlinearizer.d.ts, src/parsers/pt9/pt9-xml.md
Make parser validation throw SyntaxError, add parseStrictNumber, use Set-based duplicate detection; add/clarify JSDoc for IDs/conventions; update docs path to src/parsers/pt9/.
Config / tooling / docs
cspell.json, README.md, package.json
Add cspell words ("eten", "imte", "Punct", "recalc"); update README PT9 parser path; add @testing-library/user-event devDependency.
Other
src/main.ts
Update JSDoc (optional projectId, clarified return behavior) and change one mismatch-case throw to TypeError.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant BookCtrl as BookChapterControl
    participant WebView as InterlinearizerWebView
    participant Hooks as ProjectDataHooks
    participant Extractor as extractBookFromUsj
    participant Tokenizer as tokenizeBook
    participant SegmentView

    User->>BookCtrl: submit scrRef
    BookCtrl->>WebView: setScrRef(newRef)
    WebView->>Hooks: fetch USJ & writingSystem
    Hooks-->>WebView: usj, writingSystem
    WebView->>Extractor: extractBookFromUsj(usj, writingSystem)
    Extractor-->>WebView: RawBook
    WebView->>Tokenizer: tokenizeBook(RawBook)
    Tokenizer-->>WebView: Book (segments & tokens)
    WebView->>WebView: filter segments by book/chapter
    WebView->>SegmentView: render segments (clickable)
    SegmentView-->>WebView: onSegmentClick(segmentRef)
    WebView->>Hooks: setScrRef(segmentRef) / addRecentScriptureRef
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

🟨Medium

Suggested reviewers

  • imnasnainaec
  • jasonleenaylor

Poem

🐇 I nibble SIDs into tiny seeds,

Tokens sprout where every baseline leads.
Segments hop out, chip-rows neat and bright,
Click a chip — it bounds the scrRef into sight.
Hooray — the interlinear garden leaps to light!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.97% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main changes: adding a USJ book parsing pipeline and a WebView for displaying interlinear content, which aligns with the substantial new modules (usjBookExtractor.ts, bookTokenizer.ts) and WebView component updates throughout the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tokenize-book

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alex-rawlings-yyc alex-rawlings-yyc linked an issue Apr 23, 2026 that may be closed by this pull request
Comment thread src/__tests__/parsers/papi/usjBookExtractor.test.ts Fixed
alex-rawlings-yyc

This comment was marked as outdated.

alex-rawlings-yyc

This comment was marked as outdated.

@alex-rawlings-yyc alex-rawlings-yyc marked this pull request as ready for review April 24, 2026 17:56
coderabbitai[bot]

This comment was marked as resolved.

Comment thread src/interlinearizer.web-view.tsx
…o deps

- Flush open verse at chapter boundaries and traverse chapter/verse inline content
- Add inter-paragraph spacing in usjBookExtractor
- Use stableStringify for deterministic content hashing
- Fix bookTokenizer to test full surfaceText (not first code unit) so astral-plane letters tokenize as words
- Narrow scrRef memo deps to primitives to avoid stale re-renders
- Add aria-current to active segment button
- Upgrade cn mock to handle object-form class arguments
alex-rawlings-yyc

This comment was marked as outdated.

alex-rawlings-yyc

This comment was marked as outdated.

@alex-rawlings-yyc

This comment was marked as resolved.

@coderabbitai

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as resolved.

…le-dependence

- bookTokenizer.test: extend Unicode test to assert '.' is punctuation; add combining-mark (n\u0303) test covering the \p{M} TOKEN_RE branch
- usjBookExtractor: guard para space-append with endsWith(' ') to prevent double spaces across consecutive empty paragraphs
- usjBookExtractor: replace localeCompare with code-point sort in stableStringify so FNV-1a input is deterministic regardless of engine locale
- interlinearizer.web-view: log parse/tokenize errors via papi logger instead of silently swallowing them
- Expand JSDoc on stableStringify and fnv1a32
alex-rawlings-yyc

This comment was marked as outdated.

@sillsdev sillsdev deleted a comment from coderabbitai Bot Apr 24, 2026
alex-rawlings-yyc

This comment was marked as outdated.

@alex-rawlings-yyc

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Comment thread src/__tests__/parsers/papi/bookTokenizer.test.ts
Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc partially reviewed 8 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec).


src/parsers/papi/bookTokenizer.ts line 12 at r21 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

I asked Gemini about how to only write the LNM set once, and it also suggested including Join_Control (for the sake of Arabic, Farsi, and Indic scripts) and replacing u with v, a la

const CHAR_SET = '\\p{L}\\p{N}\\p{M}\\p{Join_Control}';
const TOKEN_RE = new RegExp(`[${CHAR_SET}]+|[^${CHAR_SET}\\s]`, 'gv');
const WORD_START_RE = new RegExp(`[${CHAR_SET}]`, 'v');

Made the change. Devin suggested that apostrophes and dashes be included in tokens rather than splitting them, and I tend to agree, so I made that change too. Let me know if you can think of any reason to split words along those characters


src/parsers/papi/usjBookExtractor.ts line 227 at r21 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

⛏️ How about

  .sort(([a], [b]) => (a.localeCompare(b)))

Using localeCompare() runs the risk of returning different strings if it's being used by people in different regions. Hash generation and comparison is done client-side, so I'm concerned that a translator and a consultant may end up getting different hashes for the same source text. I'm not totally sure what sorts of locale differences the method produces. For now, replacing with boolean subtraction pattern to allow return of 0 without nested ternary.

…act`, add clarifying comment to method result to ensure it isn't flagged as problematic in the future
- Previously only covered parsers, main.ts, and the WebView component.
- Now includes any new source files without manual config updates.
alex-rawlings-yyc

This comment was marked as outdated.

@alex-rawlings-yyc

This comment was marked as resolved.

@coderabbitai

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/main.ts (1)

90-96: ⚠️ Potential issue | 🟡 Minor

Avoid prompting project picker when webViewId is stale/invalid.

If Line 95 returns no definition (or no projectId), Line 96 currently falls back to openInterlinearizer(undefined), which opens the picker unexpectedly in a context-menu flow.

Suggested fix
 async function openInterlinearizerForWebView(webViewId?: string): Promise<string | undefined> {
   if (!webViewId) return openInterlinearizer();
   const webViewDefinition = await papi.webViews.getOpenWebViewDefinition(webViewId);
-  return openInterlinearizer(webViewDefinition?.projectId);
+  if (!webViewDefinition?.projectId) {
+    logger.warn(
+      `Could not resolve projectId from webViewId "${webViewId}" while opening Interlinearizer`,
+    );
+    return undefined;
+  }
+  return openInterlinearizer(webViewDefinition.projectId);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.ts` around lines 90 - 96, openInterlinearizerForWebView currently
calls openInterlinearizer(undefined) when a provided webViewId resolves to no
definition or missing projectId, which triggers the project picker unexpectedly;
update the function so that after awaiting
papi.webViews.getOpenWebViewDefinition(webViewId) it checks that
webViewDefinition and webViewDefinition.projectId exist and if not returns
undefined (do not call openInterlinearizer), otherwise call
openInterlinearizer(webViewDefinition.projectId); the change should be applied
within openInterlinearizerForWebView around the
papi.webViews.getOpenWebViewDefinition and openInterlinearizer call sites.
🧹 Nitpick comments (1)
src/parsers/pt9/interlinearXmlParser.ts (1)

224-227: Clarify error text for invalid numeric attributes.

At Line 227, the thrown message says attributes are “missing”, but this branch also triggers for present-but-invalid values (e.g., non-integer strings). Making the message explicit will reduce debugging ambiguity.

Suggested patch
-    if (index === undefined || length === undefined) {
-      throw new SyntaxError('Invalid XML: Range missing required Index or Length attributes');
-    }
+    if (index === undefined || length === undefined) {
+      throw new SyntaxError(
+        'Invalid XML: Range missing or invalid Index/Length attributes (must be non-negative integers)',
+      );
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parsers/pt9/interlinearXmlParser.ts` around lines 224 - 227, Update the
thrown SyntaxError in the range parsing branch to indicate invalid numeric
values (not just missing attributes): when parseStrictNumber returns undefined
for rangeElement['@_Index'] or rangeElement['@_Length'] in
interlinearXmlParser.ts (the logic using parseStrictNumber for Index and
Length), throw a clearer message such as "Invalid XML: Range has missing or
non-numeric Index/Length attributes" so callers know the attributes may be
present but malformed; keep this change localized to the code that validates
index and length in the range parsing logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/main.ts`:
- Around line 90-96: openInterlinearizerForWebView currently calls
openInterlinearizer(undefined) when a provided webViewId resolves to no
definition or missing projectId, which triggers the project picker unexpectedly;
update the function so that after awaiting
papi.webViews.getOpenWebViewDefinition(webViewId) it checks that
webViewDefinition and webViewDefinition.projectId exist and if not returns
undefined (do not call openInterlinearizer), otherwise call
openInterlinearizer(webViewDefinition.projectId); the change should be applied
within openInterlinearizerForWebView around the
papi.webViews.getOpenWebViewDefinition and openInterlinearizer call sites.

---

Nitpick comments:
In `@src/parsers/pt9/interlinearXmlParser.ts`:
- Around line 224-227: Update the thrown SyntaxError in the range parsing branch
to indicate invalid numeric values (not just missing attributes): when
parseStrictNumber returns undefined for rangeElement['@_Index'] or
rangeElement['@_Length'] in interlinearXmlParser.ts (the logic using
parseStrictNumber for Index and Length), throw a clearer message such as
"Invalid XML: Range has missing or non-numeric Index/Length attributes" so
callers know the attributes may be present but malformed; keep this change
localized to the code that validates index and length in the range parsing
logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32ea5924-5506-42c5-b24b-4f325a424b0e

📥 Commits

Reviewing files that changed from the base of the PR and between 4c7bb09 and 5fc175c.

📒 Files selected for processing (4)
  • __mocks__/papi-frontend-react.ts
  • jest.config.ts
  • src/main.ts
  • src/parsers/pt9/interlinearXmlParser.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • jest.config.ts

Co-authored-by: alex-rawlings-yyc <alex.rawlings@wycliffe.ca>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/parsers/papi/bookTokenizer.ts (2)

116-124: Consider enforcing sid/bookCode consistency here.

Right now a bad RawBook can yield bookRef: rawBook.bookCode while startRef.book / endRef.book come from a different SID. Guarding that invariant would keep the returned Book internally consistent.

Diff
 export function tokenizeBook(rawBook: RawBook): Book {
   const segments: Segment[] = rawBook.verses.map(({ sid, text }) => {
     const ref = parseSid(sid);
+    if (ref.book !== rawBook.bookCode) {
+      throw new SyntaxError(
+        `Verse SID "${sid}" does not match book code "${rawBook.bookCode}"`,
+      );
+    }
     return {
       id: sid,
       startRef: { ...ref },
       endRef: { ...ref },
       baselineText: text,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parsers/papi/bookTokenizer.ts` around lines 116 - 124, The segments
construction can produce inconsistent Book data when a verse's parsed SID book
(from parseSid(sid)) differs from rawBook.bookCode; update the logic in the
segments creation (where rawBook.verses are mapped and parseSid, sid, startRef,
endRef are used) to enforce consistency by either validating that parsedRef.book
=== rawBook.bookCode and throwing/logging an error on mismatch, or by overriding
parsedRef.book to rawBook.bookCode when building startRef/endRef so the returned
Book and its segments share the same book code; pick one approach and apply it
consistently in the map that builds segments.

91-98: Use Array.from(..., mapper) to skip the extra allocation.

This path runs for every verse, so it’s worth avoiding the intermediate array of matches.

Diff
 function tokenizeVerse(text: string, sid: string, writingSystem: string): Token[] {
-  return Array.from(text.matchAll(TOKEN_RE)).map((match) => {
+  return Array.from(text.matchAll(TOKEN_RE), (match) => {
     const surfaceText = match[0];
     const charStart = match.index;
     const charEnd = charStart + surfaceText.length;
     const type: TokenType = WORD_CONTAIN_RE.test(surfaceText) ? 'word' : 'punctuation';
     return { id: `${sid}:${charStart}`, surfaceText, writingSystem, type, charStart, charEnd };
   });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parsers/papi/bookTokenizer.ts` around lines 91 - 98, tokenizeVerse
currently calls Array.from(text.matchAll(TOKEN_RE)).map(...) which creates an
intermediate array; change it to use Array.from with the optional mapper:
Array.from(text.matchAll(TOKEN_RE), match => ...) to produce tokens in one
allocation, keeping the same fields (surfaceText from match[0], charStart from
match.index, charEnd = charStart + surfaceText.length, type using
WORD_CONTAIN_RE.test(surfaceText), id `${sid}:${charStart}`, and include
writingSystem). Ensure you handle the possibility of match.index being null (use
a non-null assertion or safe guard) when computing charStart and charEnd.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/parsers/papi/bookTokenizer.ts`:
- Around line 116-124: The segments construction can produce inconsistent Book
data when a verse's parsed SID book (from parseSid(sid)) differs from
rawBook.bookCode; update the logic in the segments creation (where
rawBook.verses are mapped and parseSid, sid, startRef, endRef are used) to
enforce consistency by either validating that parsedRef.book ===
rawBook.bookCode and throwing/logging an error on mismatch, or by overriding
parsedRef.book to rawBook.bookCode when building startRef/endRef so the returned
Book and its segments share the same book code; pick one approach and apply it
consistently in the map that builds segments.
- Around line 91-98: tokenizeVerse currently calls
Array.from(text.matchAll(TOKEN_RE)).map(...) which creates an intermediate
array; change it to use Array.from with the optional mapper:
Array.from(text.matchAll(TOKEN_RE), match => ...) to produce tokens in one
allocation, keeping the same fields (surfaceText from match[0], charStart from
match.index, charEnd = charStart + surfaceText.length, type using
WORD_CONTAIN_RE.test(surfaceText), id `${sid}:${charStart}`, and include
writingSystem). Ensure you handle the possibility of match.index being null (use
a non-null assertion or safe guard) when computing charStart and charEnd.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 915905cb-3c0c-4f9d-89a2-99ffc8bcf11b

📥 Commits

Reviewing files that changed from the base of the PR and between 5fc175c and c6d91cb.

📒 Files selected for processing (1)
  • src/parsers/papi/bookTokenizer.ts

… error handling when attempting to open Interlinearizer WebView when `projectId` can't be resolved from `webViewId`
imnasnainaec

This comment was marked as resolved.

…ct picker for when `undefined` is returned, add `user-event` package for better react testing, other minor adjustments
Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc partially reviewed 9 files and all commit messages, made 2 comments, and resolved 1 discussion.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on imnasnainaec).


src/__tests__/interlinearizer.web-view.test.tsx line 352 at r21 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

⛏️ I'm under the impression that using

import userEvent from '@testing-library/user-event';
[...]
await userEvent.click(

is generally recommended as more closely mimicking user interaction than

fireEvent.click(

Though doing so would require

  • npm i -D @testing-library/user-event
  • Making these two test functions async

I think this is a worthy change to make.


src/parsers/pt9/interlinearXmlParser.ts line 170 at r27 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

❓ Does this intentionally exclude numbers like 1. and 2.0?

Yes and no. I hadn't considered numbers like that, but PT9 shouldn't generate XML with numbers like that anyway, afaik. Do we want to account for edge cases like 1. and 2.0?

imnasnainaec

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on imnasnainaec).


src/parsers/papi/bookTokenizer.ts line 26 at r27 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

On reflection, why is this only for word-initial? E.g., Hebrew words can end with aleph.

Adjusted so these characters are always captured as a part of tokens. Though do we want to account for single quotes being used simply as quotation marks? I think this is a question that can be resolved later, unless we already know how glottal character and quotations are generally handled in Platform.Bible projects.

coderabbitai[bot]

This comment was marked as resolved.

alex-rawlings-yyc

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 1 file and all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on imnasnainaec).

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/__tests__/interlinearizer.web-view.test.tsx`:
- Around line 177-192: Tests leak mock call history because module mocks are
only re-mocked, not cleared; add a call to jest.clearAllMocks() at the start of
the beforeEach so call history for the long-lived mocks (mockBookData,
mockWritingSystem, jest.mocked(useLocalizedStrings),
jest.mocked(useRecentScriptureRefs), jest.mocked(extractBookFromUsj),
jest.mocked(tokenizeBook)) is reset before each test run.

In `@src/parsers/papi/bookTokenizer.ts`:
- Around line 17-18: CHAR_SET currently includes \p{Join_Control}, which causes
isolated joiner characters to be tokenized as words; remove \p{Join_Control}
from CHAR_SET and instead allow join controls only as internal separators in the
word regex: update WORD_CONTAIN_RE to require at least one letter/number/mark
(from CHAR_SET without join controls) and permit \u200C/\u200D between those
characters (e.g., allow join-control sequences between valid codepoints), and
apply the same change where CHAR_SET/WORD_CONTAIN_RE are reused (the other
occurrence around the second reference).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88618db8-ffbe-4841-b1fb-4624d976f46c

📥 Commits

Reviewing files that changed from the base of the PR and between c6d91cb and 13acc79.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (7)
  • package.json
  • src/__tests__/interlinearizer.web-view.test.tsx
  • src/__tests__/parsers/papi/bookTokenizer.test.ts
  • src/__tests__/parsers/pt9/interlinearXmlParser.test.ts
  • src/interlinearizer.web-view.tsx
  • src/parsers/papi/bookTokenizer.ts
  • src/parsers/pt9/interlinearXmlParser.ts
✅ Files skipped from review due to trivial changes (1)
  • package.json
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/tests/parsers/papi/bookTokenizer.test.ts
  • src/parsers/pt9/interlinearXmlParser.ts
  • src/tests/parsers/pt9/interlinearXmlParser.test.ts

Comment thread src/__tests__/interlinearizer.web-view.test.tsx
Comment thread src/parsers/papi/bookTokenizer.ts
Copy link
Copy Markdown
Contributor

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imnasnainaec partially reviewed 9 files and all commit messages, made 1 comment, and resolved 3 discussions.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).


src/parsers/pt9/interlinearXmlParser.ts line 170 at r27 (raw file):

Previously, alex-rawlings-yyc (Alex Rawlings) wrote…

Yes and no. I hadn't considered numbers like that, but PT9 shouldn't generate XML with numbers like that anyway, afaik. Do we want to account for edge cases like 1. and 2.0?

Not necessary until it is.

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 1 file.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

Copy link
Copy Markdown
Contributor

@jasonleenaylor jasonleenaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@jasonleenaylor partially reviewed 4 files and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on alex-rawlings-yyc).

@alex-rawlings-yyc alex-rawlings-yyc merged commit c238852 into main Apr 30, 2026
8 checks passed
@alex-rawlings-yyc alex-rawlings-yyc deleted the tokenize-book branch April 30, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scroll to selected PT reference Display tokenized book Tokenize book Read in current PT source book and scroll to specific reference

3 participants