Skip to content

feat(web): implement tokenization convergence and associated unit test 🚂#15834

Open
jahorton wants to merge 1 commit intorefactor/web/split-analyze-transitionfrom
feat/web/transition-tokenization-clustering
Open

feat(web): implement tokenization convergence and associated unit test 🚂#15834
jahorton wants to merge 1 commit intorefactor/web/split-analyze-transitionfrom
feat/web/transition-tokenization-clustering

Conversation

@jahorton
Copy link
Copy Markdown
Contributor

@jahorton jahorton commented Apr 9, 2026

Now that the SearchQuotientCluster type exists, and as we now have a way to directly test conditions in which multiple search paths converge in the quotient-graph perspective, it's a good time to actually implement the code needed to build SearchQuotientClusters during context transitions... and to add at least one unit test while we're at it.

Build-bot: skip build:web
Test-bot: skip

@keymanapp-test-bot
Copy link
Copy Markdown

keymanapp-test-bot bot commented Apr 9, 2026

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

  • Web
    • KeymanWeb Test Home - build : all tests passed (no artifacts on BuildLevel "build")

@keymanapp-test-bot keymanapp-test-bot bot changed the title feat(web): implement tokenization convergence and associated unit test feat(web): implement tokenization convergence and associated unit test 🚂 Apr 9, 2026
@keymanapp-test-bot keymanapp-test-bot bot added this to the A19S26 milestone Apr 9, 2026
@jahorton jahorton marked this pull request as ready for review April 9, 2026 21:06
Copy link
Copy Markdown
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little lost by some of the terminology, but otherwise LGTM

finalizedTokens.push(bucket[0]);
} else {
const constituentSpurs = bucket.flatMap((token) => {
const quotientNode = token.searchModule;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand all the terminology here: searchModule doesn't seem to have anything to do with quotientNode?

Even this function refers to a lot of concepts which I am struggling to put together into a coherent model: tokens (vs tokenizations), buckets, nodes, spurs, [search?]modules, clusters, quotients.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should do a quality pass for consistency on many of the properties, class names, and such once all is said and done. For one, some things were put in place before #15161 was developed and put in place. I haven't enforced the new terms in all of the pre-existing code.

(In regard to this specific case: quotient graphs are graphs based on "modules" of a more detailed graph, to visualize the higher-level patterns found within.)

Not aiming to dismiss your concerns here whatsoever; it's been a journey working out the terms as work proceeds, and I hope there will be sufficient time to "polish things up" in this regard before release.

Comment on lines +138 to +139
* An error will be thrown if the instances do not sufficiently converge to the
* same tokenization pattern.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing any error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See SearchQuotientCluster's constructor. This comment is regarding the error that will be generated within that constructor for such cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(path.inputCount != inputCount || path.codepointLength != codepointLength) {
throw new Error(`SearchQuotientNode does not share same properties as others in the cluster: inputCount ${path.inputCount} vs ${inputCount}, codepointLength ${path.codepointLength} vs ${codepointLength}`);
}
// If there's a source-range key mismatch - via mismatch in count or in actual ID, we have an error.
if(path.sourceRangeKey != sourceRangeKey) {
throw new Error(`SearchQuotientNode does not share the same source identifiers as others in the cluster`);
}

Comment on lines +138 to +139
* An error will be thrown if the instances do not sufficiently converge to the
* same tokenization pattern.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing any error thrown in this function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants