Release v3.71.0: feat(v3.71): MEASURED ACCURACY — the metric that forbids self-deception · patsa2561-art/mneme-ai

v3.71.0
b1b5a5f
Choose a tag to compare

Filter

View all tags

v3.71.0: feat(v3.71): MEASURED ACCURACY — the metric that forbids self-deception

v3.71.0
b1b5a5f
Choose a tag to compare

Filter

View all tags

patsa2561-art tagged this 09 Jun 08:12

User: 'there must be a metric proving everything is maximally accurate — no lying.' Right: deterministic
does NOT mean accurate, and I never measured it. core/src/accuracy: benchmarks the suite's extractors
against a LABELED corpus with known ground truth + deliberately tricky NEGATIVE cases (CREATE TABLE IF
NOT EXISTS must not yield 'IF'; axios call is not an endpoint; a 'user' string with no DB access is not
an edge; console.log is not a CALLS edge; a .css fetch is not a consumer; a guarded/non-sensitive route
is not an authz gap; a 2-writer table is not a keystone) → real precision/recall/F1 per dimension
(tables/endpoints/functions/data-edges/calls/consumers/authz-gaps/keystones) + macro-F1 + micro-
precision vs a committed floor. It already CAUGHT a real issue (keystone fixture) which I fixed
honestly. MEASURED: macro-F1 1.000, micro-precision 1.000 across 8 dimensions on the corpus.
accuracyGauntlet=100 (harness calibrated + suite clears floor), 5 tests. MCP mneme.accuracy.report
(auto-bridged to matrix gRPC via buildToolMap); CLI mneme accuracy (exit 2 below floor); gateway
EN+Thai; boot. HONEST: a reproducible lower bound on THIS visible corpus, as strong as the corpus is
representative — a number you can audit, not a boast. ALL gems now wired CLI+MCP+matrix+gateway+boot.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!