feat: enhance state normalization and implement hard cap for counters#64
feat: enhance state normalization and implement hard cap for counters#64purushpsm147 merged 7 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoves probability quantization from MarkovGraph, adds a live-state prune guard and local-entropy calculation, introduces an optional stateNormalizer and bot-protection/counter-limit guards in IntentManager, extends route normalization to collapse numeric IDs, updates cross-tab counter sync cap, and expands tests and benchmark values. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant IntentManager
participant MarkovGraph
participant BroadcastSync
Client->>IntentManager: track(state)
IntentManager->>IntentManager: builtin normalize (UUID/ObjectID/slug)
IntentManager->>IntentManager: stateNormalizer? -> custom normalize (may return "")
alt normalizer returned empty
IntentManager-->>Client: drop (no further action)
else proceed
IntentManager->>IntentManager: check botProtection flag
alt session flagged as bot
IntentManager-->>MarkovGraph: skip graph update
else
IntentManager->>MarkovGraph: runGraphAndSignalStage(normalizedState)
MarkovGraph->>MarkovGraph: ensureState (prune if liveStates > 1.5× maxStates)
MarkovGraph-->>IntentManager: graph updated
IntentManager->>BroadcastSync: applyLocalCounter/update counters
BroadcastSync->>BroadcastSync: applyRemoteCounter (ignore new keys if >= 50)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/core/markov.ts (1)
222-231:⚠️ Potential issue | 🟠 Major
normalizedEntropyForStatemixes local denominator with global-smoothed entropy numerator.With
smoothingAlpha > 0,entropyForState()includes global unseen-state mass, but Line 226 normalizes by local fan-out only. That can saturate normalized entropy to1for low-branching states as the global graph grows, causing false positives.💡 Suggested fix (keep numerator/denominator on the same local support)
- const supportSize = Math.max(2, row.toCounts.size); - if (supportSize <= 1) return 0; - - const entropy = this.entropyForState(state); + const supportSize = Math.max(2, row.toCounts.size); + + let entropy = 0; + if (this.smoothingAlpha === 0) { + row.toCounts.forEach((count) => { + const p = count / row.total; + if (p > 0) entropy -= p * Math.log(p); + }); + } else { + const denominator = row.total + this.smoothingAlpha * supportSize; + row.toCounts.forEach((count) => { + const p = (count + this.smoothingAlpha) / denominator; + if (p > 0) entropy -= p * Math.log(p); + }); + const unobserved = supportSize - row.toCounts.size; + if (unobserved > 0) { + const pUnobserved = this.smoothingAlpha / denominator; + if (pUnobserved > 0) entropy -= unobserved * pUnobserved * Math.log(pUnobserved); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/core/markov.ts` around lines 222 - 231, The normalizedEntropyForState function currently divides entropyForState(state) (which includes global unseen-state mass when smoothingAlpha > 0) by maxEntropy computed from row.toCounts.size, mixing global-smoothed numerator with a local-only denominator; fix by computing the entropy over the same local support used for the denominator (i.e., compute entropy directly from row.toCounts probabilities or add an option to entropyForState to exclude global/unseen mass) so the numerator and denominator match; reference symbols: normalizedEntropyForState, entropyForState, smoothingAlpha, row.toCounts, supportSize/maxEntropy.
🧹 Nitpick comments (1)
packages/core/tests/unit-fast.test.mjs (1)
4969-5021: Add one regression test forstateNormalizerfailure paths.Given the new hook, include a test where
stateNormalizerthrows (and one returning non-string) to lock in non-crashing behavior andonErrorhandling.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/tests/unit-fast.test.mjs` around lines 4969 - 5021, Add regression tests for IntentManager's stateNormalizer failure paths: create two tests using IntentManager with a failing stateNormalizer (one that throws and one that returns a non-string) and verify manager.track(...) does not throw, the 'state_change' handler still receives a sane value (e.g., the original incoming state or a fallback normalized string) and that any provided onError handler (or lifecycleAdapter) is invoked; reference IntentManager, stateNormalizer, track, manager.on('state_change') and onError in the tests and assert both no crash and correct error handling behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/engine/intent-manager.ts`:
- Line 90: stateNormalizer is called without guards and can throw or return
non-strings, which can crash track(); wrap all invocations of stateNormalizer
(notably in the track() flow and the other usages around the referenced blocks)
in a try/catch, coerce its result to a string (e.g., String(result)) and treat
an empty string as a silent drop of the tracking event (do not treat it as a
validation error); on exception or non-string return default to null/undefined
to drop or continue safely and log/debug the error via the existing logger
instead of letting it bubble.
---
Outside diff comments:
In `@packages/core/src/core/markov.ts`:
- Around line 222-231: The normalizedEntropyForState function currently divides
entropyForState(state) (which includes global unseen-state mass when
smoothingAlpha > 0) by maxEntropy computed from row.toCounts.size, mixing
global-smoothed numerator with a local-only denominator; fix by computing the
entropy over the same local support used for the denominator (i.e., compute
entropy directly from row.toCounts probabilities or add an option to
entropyForState to exclude global/unseen mass) so the numerator and denominator
match; reference symbols: normalizedEntropyForState, entropyForState,
smoothingAlpha, row.toCounts, supportSize/maxEntropy.
---
Nitpick comments:
In `@packages/core/tests/unit-fast.test.mjs`:
- Around line 4969-5021: Add regression tests for IntentManager's
stateNormalizer failure paths: create two tests using IntentManager with a
failing stateNormalizer (one that throws and one that returns a non-string) and
verify manager.track(...) does not throw, the 'state_change' handler still
receives a sane value (e.g., the original incoming state or a fallback
normalized string) and that any provided onError handler (or lifecycleAdapter)
is invoked; reference IntentManager, stateNormalizer, track,
manager.on('state_change') and onError in the tests and assert both no crash and
correct error handling behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 61056748-fd1d-430f-86cc-32cbbe8538d6
📒 Files selected for processing (7)
packages/core/benchmarks/latest.jsonpackages/core/src/core/markov.tspackages/core/src/engine/intent-manager.tspackages/core/src/sync/broadcast-sync.tspackages/core/src/types/events.tspackages/core/src/utils/route-normalizer.tspackages/core/tests/unit-fast.test.mjs
There was a problem hiding this comment.
Pull request overview
This PR enhances the @passiveintent/core package with several improvements to route normalization, bot protection behavior, resource limits, and entropy calculation, along with code cleanup of unused quantized probability methods.
Changes:
- Built-in route normalization now replaces 4+ digit numeric path segments with
:id, and a newstateNormalizerconfig option lets consumers apply custom normalization after the built-in pass. - The Markov graph stops recording transitions when a session is flagged as a bot, and the normalized entropy formula uses local branching factor instead of global state count (fixing the "entropy crush" bug).
- Both
IntentManagerandBroadcastSyncenforce a hard cap of 50 unique counter keys (withLIMIT_EXCEEDEDerror code), andMarkovGraph.ensureStatetriggers a synchronous prune at 1.5× maxStates to prevent unbounded growth.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
packages/core/src/utils/route-normalizer.ts |
Adds NUMERIC_ID_RE regex for 4+ digit path segments, updates JSDoc |
packages/core/src/core/markov.ts |
Removes unused quantized probability methods, adds hard-cap guard in ensureState, changes normalized entropy to use local fan-out |
packages/core/src/engine/intent-manager.ts |
Adds stateNormalizer config support, bot guard in graph stage, counter hard cap with LIMIT_EXCEEDED error |
packages/core/src/sync/broadcast-sync.ts |
Adds counter hard cap (50 keys) in applyRemoteCounter |
packages/core/src/types/events.ts |
Adds LIMIT_EXCEEDED error code and stateNormalizer config option with documentation |
packages/core/tests/unit-fast.test.mjs |
Extensive new tests for numeric ID normalization, stateNormalizer, entropy crush fix, counter caps, MarkovGraph hard cap, updated bot protection tests |
packages/core/benchmarks/latest.json |
Updated benchmark snapshot reflecting performance improvements |
Comments suppressed due to low confidence (2)
packages/core/src/engine/intent-manager.ts:611
- The hard cap of 50 unique counter keys is used as a magic number in two places (
IntentManager.incrementCounterhere andBroadcastSync.applyRemoteCounter). If these caps ever need to diverge or change, having the value inline makes it easy to update one and forget the other. Consider extracting this into a named constant (e.g.MAX_COUNTER_KEYS = 50) shared between both files.
code: 'VALIDATION',
message: 'IntentManager.incrementCounter(): key must not be an empty string',
});
}
return 0;
}
if (!Number.isFinite(by)) {
if (this.onError) {
this.onError({
packages/core/src/engine/intent-manager.ts:377
- The early return when
botProtection && signalEngine.suspectedalso skips theif (ctx.isNewToBloom) { this.persistenceCoordinator.markDirty(); }block at lines 404-406. This means that new Bloom filter entries added during a bot session (viarunBloomStage, which runs unconditionally) will not trigger persistence, and could be lost on page refresh. If the intention is to not persist any bot data (including Bloom state), this is fine — but the test at line 394 ("Bloom filter still updates") only verifies in-memory state, not persistence. Consider adding a code comment here clarifying that skipping Bloom persistence for bots is intentional.
// Dwell-time measurement — delegated to DwellTimePolicy when enabled.
for (let i = 0; i < this.policies.length; i += 1) this.policies[i].onTrackContext?.(ctx);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/core/tests/unit-fast.test.mjs (1)
417-421: Make this bot-suppression assertion less brittle to detector tuning.At Line 419,
totalTransitions < 10is tightly coupled to internal bot-threshold constants. This can fail on threshold tuning even when the intended behavior (“not all transitions recorded”) still holds.🧪 Suggested assertion tweak
assert.ok( - totalTransitions < 10, - `Expected graph to have < 10 transitions after bot guard; got ${totalTransitions}`, + totalTransitions < 29, + `Expected graph suppression after bot flag (fewer than full 29 transitions), got ${totalTransitions}`, );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/tests/unit-fast.test.mjs` around lines 417 - 421, The current brittle assertion compares totalTransitions against a hard-coded 10; instead capture a baseline total before bot-suppression and assert that the post-guard totalTransitions is strictly less than that baseline (e.g., compute totalBefore = graphBefore.rows.reduce((sum, [, total]) => sum + total, 0) then assert totalTransitions < totalBefore). Update the test to compute and use totalBefore (or alternatively assert that at least one row in graph.rows has a total of 0) rather than relying on the magic number, referencing totalTransitions and graph.rows in the updated assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/engine/intent-manager.ts`:
- Around line 389-393: The early return when this.botProtection &&
this.signalEngine.suspected skips persisting Bloom-only additions from
runBloomStage; to fix, ensure Bloom updates are still recorded before returning
by invoking the Bloom persistence path (e.g., run the runBloomStage result
through the same dirty/flush mechanism) and marking the Bloom state dirty (call
the existing mark/flush helper such as markBloomDirty or persistBloomUpdates)
before returning; locate the check using this.botProtection and
this.signalEngine.suspected and change it so Bloom updates produced by
runBloomStage are persisted even when graph writes are skipped.
---
Nitpick comments:
In `@packages/core/tests/unit-fast.test.mjs`:
- Around line 417-421: The current brittle assertion compares totalTransitions
against a hard-coded 10; instead capture a baseline total before bot-suppression
and assert that the post-guard totalTransitions is strictly less than that
baseline (e.g., compute totalBefore = graphBefore.rows.reduce((sum, [, total])
=> sum + total, 0) then assert totalTransitions < totalBefore). Update the test
to compute and use totalBefore (or alternatively assert that at least one row in
graph.rows has a total of 0) rather than relying on the magic number,
referencing totalTransitions and graph.rows in the updated assertion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 24effbeb-b2ee-4cd4-bed2-dddc72badf29
📒 Files selected for processing (2)
packages/core/src/engine/intent-manager.tspackages/core/tests/unit-fast.test.mjs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/core/markov.ts`:
- Around line 222-225: entropy normalization is inconsistent:
entropyForState(state) uses global smoothing with k = stateToIndex.size while
maxEntropy is computed from local supportSize, causing scores to be inflated;
fix by making both numerator and denominator use the same basis — either (A)
compute maxEntropy using global k (Math.log(this.stateToIndex.size)) to match
entropyForState's smoothing, or (B) change entropyForState to
accept/supportSize/local k and compute smoothed entropy with that local
supportSize; update references to entropyForState(state) and the maxEntropy
calculation so both use the same symbol (either this.stateToIndex.size or
supportSize) consistently.
- Around line 100-107: The prune() call inside ensureState can remove or recycle
indices while a caller (incrementTransition) still holds a resolved fromState
index, causing writes to land on recycled slots; to fix, avoid synchronous
pruning during ensureState: move the hard-cap check out of ensureState (remove
or disable the prune() call there) and instead trigger pruning asynchronously or
under a global lock so ensureState / incrementTransition always operate on
stable indices; specifically update ensureState (and callers) to rely only on
stateToIndex/indexToState stability, relocate pruning logic elsewhere (e.g.,
background task or guarded pruneWithLock), and ensure functions
incrementTransition and any use of fromState validate or re-resolve indices if a
prune could have run.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 42f7ecb4-1903-4fc3-bf59-3b8988ddb186
📒 Files selected for processing (1)
packages/core/src/core/markov.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/core/src/engine/intent-manager.ts (1)
389-393:⚠️ Potential issue | 🟠 MajorPersist Bloom-only updates before bot short-circuit.
Line 392 returns before dirty-marking, so new Bloom bits added in
runBloomStageare not persisted once the session is suspected. This dropshasSeencontinuity after reload.💡 Proposed fix
private runGraphAndSignalStage = (ctx: TrackContext): void => { // Skip graph updates when the session is flagged as a bot. // This prevents automation / scrapers from poisoning the Markov // transition probabilities that drive real-user predictions. - if (this.botProtection && this.signalEngine.suspected) return; + if (this.botProtection && this.signalEngine.suspected) { + if (ctx.isNewToBloom) this.persistenceCoordinator.markDirty(); + return; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/engine/intent-manager.ts` around lines 389 - 393, The early return on suspected bot sessions (the if checking this.botProtection && this.signalEngine.suspected) short-circuits before persisting Bloom-only updates added by runBloomStage; update the logic so runBloomStage (or whatever code that sets Bloom bits / hasSeen state) is executed and its resulting state is marked dirty/persisted before returning for bot sessions — i.e., call runBloomStage and perform the existing dirty-mark/persist step for Bloom updates, or move the dirty-marking code to run unconditionally before the short-circuit in the method that contains this.botProtection, signalEngine.suspected, and runBloomStage so Bloom bits survive reloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/core/src/engine/intent-manager.ts`:
- Around line 389-393: The early return on suspected bot sessions (the if
checking this.botProtection && this.signalEngine.suspected) short-circuits
before persisting Bloom-only updates added by runBloomStage; update the logic so
runBloomStage (or whatever code that sets Bloom bits / hasSeen state) is
executed and its resulting state is marked dirty/persisted before returning for
bot sessions — i.e., call runBloomStage and perform the existing
dirty-mark/persist step for Bloom updates, or move the dirty-marking code to run
unconditionally before the short-circuit in the method that contains
this.botProtection, signalEngine.suspected, and runBloomStage so Bloom bits
survive reloads.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 190301c5-2a7b-449a-9abc-2216881431c8
📒 Files selected for processing (3)
packages/core/src/core/markov.tspackages/core/src/engine/intent-manager.tspackages/core/tests/unit-fast.test.mjs
…ain hasSeen() consistency
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if (this.onError) { | ||
| this.onError({ | ||
| code: 'VALIDATION', | ||
| message: `IntentManager.incrementCounter(): 'by' must be a finite number, got ${by}`, | ||
| }); | ||
| } | ||
| return this.counters.get(key) ?? 0; | ||
| } |
There was a problem hiding this comment.
The counter hard cap of 50 is used as an inline magic number here and in broadcast-sync.ts:223. Given that constants.ts already defines similar thresholds (e.g., MIN_SAMPLE_TRANSITIONS, MAX_WINDOW_LENGTH), consider extracting a named constant like MAX_COUNTER_KEYS = 50 so the limit is defined in one place and the error message can reference the constant rather than a hard-coded literal.
…and memoryUsageEstimate; add property-based tests for normalizedEntropyForState
This pull request introduces several improvements and fixes to the core intent engine, with a focus on better route normalization, enhanced bot protection, stricter resource limits, and the removal of unused code. The most significant changes are grouped below by theme.
Route normalization improvements:
normalizeRouteStatenow replaces numeric path segments of 4 or more digits (e.g.,/user/12345) with:id, in addition to UUIDs and MongoDB ObjectIDs, to better generalize dynamic routes. This avoids polluting the Markov graph with unique states for user IDs, order numbers, etc. Extensive new tests cover this behavior. [1] [2] [3] [4] [5]IntentManagernow supports an optionalstateNormalizercallback in its config, allowing consumers to further customize or collapse route segments (e.g., for SEO slugs) after the built-in normalization. [1] [2] [3] [4]Bot protection and Markov graph updates:
Resource limits and error handling:
IntentManagerandBroadcastSyncnow enforce a hard cap of 50 unique counter keys to prevent unbounded memory growth. If the limit is exceeded, a newLIMIT_EXCEEDEDerror code is reported via the error handler. [1] [2] [3]Markov graph entropy calculation:
Code cleanup and performance:
quantizeProbability,dequantizeProbability,getQuantizedRow,getQuantizedProbability) have been removed fromMarkovGraphas they were unused, simplifying the codebase and reducing potential confusion. Related tests were updated accordingly. [1] [2] [3]Performance benchmark update:
latest.jsonbenchmark reflects improved performance and reduced memory usage, likely as a result of the above optimizations.## DescriptionType of change
Checklist
npm run lintandnpm run typechecklocallynpm run test:unit -w @passiveintent/core)Summary by CodeRabbit
New Features
Public API
Bug Fixes
Performance
Tests