Skip to content

feat(plans): add Auto-Cleanup Architecture Plan for resolver record normalization#1664

Open
djstrong wants to merge 10 commits intomainfrom
1061-enable-auto-cleanup-of-resolver-record-fields
Open

feat(plans): add Auto-Cleanup Architecture Plan for resolver record normalization#1664
djstrong wants to merge 10 commits intomainfrom
1061-enable-auto-cleanup-of-resolver-record-fields

Conversation

@djstrong
Copy link
Contributor

Lite PR

Tip: Review docs on the ENSNode PR process

Summary

  • What changed (1-3 bullets, no essays).

Why

  • Why this change exists. Link to related GitHub issues where relevant.

Testing

  • How this was tested.
  • If you didn't test it, say why.

Notes for Reviewer (Optional)

  • Anything non-obvious or worth a heads-up.

Pre-Review Checklist (Blocking)

  • This PR does not introduce significant changes and is low-risk to review quickly.
  • Relevant changesets are included (or are not required)

Copilot AI review requested due to automatic review settings February 23, 2026 15:17
@vercel
Copy link
Contributor

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
admin.ensnode.io Ready Ready Preview, Comment Mar 2, 2026 11:55pm
ensnode.io Ready Ready Preview, Comment Mar 2, 2026 11:55pm
ensrainbow.io Ready Ready Preview, Comment Mar 2, 2026 11:55pm

@changeset-bot
Copy link

changeset-bot bot commented Feb 23, 2026

⚠️ No Changeset found

Latest commit: e1e70f1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new architecture plan document describing a phased “resolver record normalization” feature, including proposed in-memory normalization types/ops (Phase 1) and planned API integration parameters/response shape (Phase 3).

Changes:

  • Introduces a detailed spec for key/value normalization operations, record-set consolidation, and key expansion.
  • Defines an initial registry of recognized text record keys and per-key validation/normalization rules.
  • Outlines API query params and response metadata strategy for integration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +167 to +169
/** Key recognized, value attempted but could not be normalized — excluded from clean output */
| "UnnormalizableValue"
/** Another record already claimed this normalized key (lower priority variant) */
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec marks UnnormalizableValue as "excluded from clean output", and later states records.texts always uses the stripped output when normalize=true. That effectively answers Open Question #3 (keys with no normalizable candidate will be omitted), but the question remains listed as unresolved. Either update the stripped-output behavior to match the desired semantics (omit vs null) or remove the question and document the chosen behavior explicitly.

Copilot uses AI. Check for mistakes.
- Updated `url` type to allow null values when no URL can be derived.
- Improved clarity in `Unnormalizable` and `Unrecognized` operations to include handling of null values and reasons for normalization failures.
- Refined comments and invariants for better understanding of normalization logic.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an Architecture Specification for Resolver Record Normalization: Phase 1 in-memory data models, a three-layer normalization pipeline, pre-resolution key expansion, resolver integration notes, Phase 3 API considerations, a nine-key transformation set, normalization invariants, algorithms, and open questions.

Changes

Cohort / File(s) Summary
Architecture Specification
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md
New architecture plan (~554 lines) introducing Resolver Record Normalization: Phase 1 pure in-memory data model (TextRecordNormalizationDefs, TextRecordKeyDef), detailed per-key/per-value transformation types and results, three-layer normalization pipeline (per-record normalization, set-level winner selection, stripped output), pre-resolution key expansion rules, resolver integration (indexed vs RPC), Phase 3 API/response design (optional normalizationMetadata), initial nine-key transformation set, normalization invariants and two-pass set-level algorithm, and an open-questions list. Notes prototype code will be removed before implementation.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Resolver as ResolverService
  participant Normalizer as Normalizer
  participant Registry as Registry
  participant Index as IndexStore

  Client->>Resolver: resolve(name, normalize=true)
  Resolver->>Registry: lookup(resolverPath)
  Registry-->>Resolver: resolverConfig
  Resolver->>Index: fetchRawRecords(resolverConfig)
  Index-->>Resolver: rawRecordSets
  Resolver->>Normalizer: normalize(rawRecordSets, normalizationDefs)
  Normalizer-->>Resolver: normalizedRecords (layer1->layer2->layer3)
  Resolver-->>Client: response { records, normalizationMetadata? }
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through specs with whiskers bright,
Nine keys shimmer in morning light.
Layers tidy, winners take their place,
Records settle, each in its space.
A rabbit’s cheer for normalized grace!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only the template structure with no substantive content filled in—all sections (Summary, Why, Testing, Notes) remain as template placeholders without actual implementation details, rationale, or testing information. Fill in all required sections with concrete details: summarize the architecture plan changes, explain the business rationale and link to related issues, describe how the plan was validated, and note any important review considerations.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding an Architecture Plan document for resolver record normalization using the Auto-Cleanup approach.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 1061-enable-auto-cleanup-of-resolver-record-fields

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 226-231: The docstring for stripNormalizationMetadata currently
omits what to do when an UnrecognizedKeyAndValue record has a null rawValue;
update the spec and implementation to explicitly include such keys with a null
value (i.e., produce { [rawKey]: null }) rather than omitting them. Modify the
description for stripNormalizationMetadata (and any mentions of
NormalizedRecordSet / UnrecognizedKeyAndValue) to state this behavior, and
ensure the function logic that iterates NormalizedRecordSet (referencing
stripNormalizationMetadata and ValueNormalizationResult) adds entries for
unrecognized rawKey even when rawValue === null, consistent with the declared
return type Record<string, string | null>.
- Around line 155-157: Clarify that AlreadyNormalized is determined by comparing
the normalized output to the original raw input: update the sentence around the
`rawValue` string path so it explicitly says "AlreadyNormalized iff
`normalizedValue === rawValue` (i.e., `validate` + `normalize` produced a value
identical to the original `rawValue`); otherwise treat it as `Normalized` when
the normalized value differs; on failure return `Unnormalizable` with the
failing reason." Reference `ValueNormalizationResult`, `rawValue`,
`normalizedValue`, `validate`, `normalize`, `AlreadyNormalized`, `Normalized`,
and `Unnormalizable` to make the baseline unambiguous.
- Around line 272-277: Add a formal TypeScript type for
TextRecordNormalizationRegistry matching the prose (two maps) so the three
functions normalizeRecord, normalizeRecordSet, and expandNormalizedKeys have a
concrete type to reference; define it as an exported type named
TextRecordNormalizationRegistry with byNormalizedKey: Map<string,
TextRecordKeyDef> and byUnnormalizedKey: Map<string, TextRecordKeyDef>, place it
alongside the other type declarations (near TextRecordKeyDef), and update any
function signatures/imports to use this new type.
- Line 86: The spec currently conflicts when normalizeRecord produces an
Unnormalizable (e.g., rawValue === null or invalid) and the rawKey ===
normalizedKey rule that “wins unconditionally” — change the priority algorithm
to a two-pass ordering: first consider only records where validate+normalize
succeeds (i.e., not Unnormalizable), ordering those by rawKey === normalizedKey
first then by unnormalizedKeys order; only if no normalizable records exist
consider the Unnormalizable records, ordered by the same key-priority rule, so
that null/invalid values cannot beat a usable normalized value; update the logic
in normalizeRecord selection and any code paths that implement the old
unconditional rawKey === normalizedKey rule (references: normalizeRecord,
rawKey, normalizedKey, rawValue, Unnormalizable, validate+normalize,
unnormalizedKeys) to implement this two-pass behavior.
- Around line 485-488: Change the default for the `normalize` parameter from
true to false to avoid silently changing response shape for existing callers of
GET /records/:name; update the parameter table so `normalize` default is
`false`, keep `normalizationMetadata` documented as only meaningful when
`normalize=true`, and add a short migration note advising a rollout plan (e.g.,
versioned endpoint or opt-in header and deprecation window) so integrators can
explicitly opt in before normalized responses are enabled.
- Line 334: The Farcaster/Warpcast entry currently requires scheme-prefixed URLs
("https://warpcast.com/alice", "http://warpcast.com/alice") while Twitter and
Telegram accept bare-domain formats; update the Warpcast handling so the
Warpcast URL rule accepts bare-domain forms like "warpcast.com/alice" (in
addition to scheme-prefixed) by normalizing/validating inputs without a scheme
in the same way as Twitter/Telegram, or if intentional, add an explicit note in
the plan next to the "Warpcast URL" entry stating this is a deliberate design
decision and why.
- Line 522: Fix the malformed Markdown in open question `#4` by moving the
backtick so the bold+code label renders correctly: change the segment that
currently shows `**displayValue` to either ``**`displayValue`**`` or
`**displayValue**` (depending on whether you want code formatting inside bold)
and ensure `url` is formatted similarly; confirm the sentence references to
records.texts and normalizationMetadata remain plain text (no stray backticks)
so the question reads: "Should these enrichment fields be part of records.texts
(when normalize=true) or only inside normalizationMetadata?" and update the open
question label to correctly show "#4." with proper Markdown syntax.
- Around line 181-190: Replace the broad RecordNormalizationResult object type
with a discriminated union keyed on the op field (RecordNormalizationResult) so
consumers get compile-time guarantees: define one variant where op: "Normalized"
(include individual: IndividualRecordNormalizationResult plus normalizedKey,
normalizedValue, displayKey, displayValue, and url), and one or more variants
for other op values (e.g., op: "Unnormalizable" | "Unchanged") that only include
individual and the op literal; keep RecordNormalizationOp as the discriminant
source and update code to narrow by result.op (switch/if) when accessing
normalized* or display* fields.
- Line 316: The username validation regex currently used (the pattern starting
with ^[a-zA-Z0-9]([a-zA-Z0-9-]{0,37}[a-zA-Z0-9])?$) allows consecutive hyphens
like "alice--bob"; update the validation to disallow consecutive hyphens by
adding a negative lookahead for "--" (e.g., ensure the pattern starts with a
check that no two hyphens appear adjacently) while preserving the existing
length and start/end character rules so the "extracted username" validation
enforces 1–39 chars, alphanumeric and hyphens, no leading/trailing hyphen, and
no consecutive hyphens.
- Around line 354-360: Update the "New format" and "Legacy format" regexes:
replace the New format regex `^[a-z0-9_.]{2,32}$` with one that forbids
consecutive periods, e.g. `^(?!.*\.\.)[a-z0-9_.]{2,32}$` (reference: "New
format" line), and replace the Legacy format regex `^.{2,32}#[0-9]{4}$` with one
that restricts the username to printable characters and anchors properly, e.g.
`^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$` (reference: "Legacy format" line).

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9bffd55 and cd22b13.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Comment on lines +485 to +488
| Parameter | Type | Default | Description |
| ----------------------- | ------- | ------- | -------------------------------------------------------------------------------------------------------------- |
| `normalize` | boolean | `true` | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
| `normalizationMetadata` | boolean | `false` | Include the full `NormalizedRecordSet` metadata in the response. Only meaningful when `normalize=true`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

normalize=true default is a silent breaking change for existing API consumers.

Defaulting normalize to true means any existing caller of GET /records/:name that doesn't opt in or out explicitly will suddenly receive normalized keys and values — a different shape than what they were getting before. Consider defaulting to false for the initial rollout, or explicitly describe a versioning/migration strategy (e.g., a new endpoint version, a Content-Type negotiation header, or a deprecation window) so existing integrators aren't broken silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
485 - 488, Change the default for the `normalize` parameter from true to false
to avoid silently changing response shape for existing callers of GET
/records/:name; update the parameter table so `normalize` default is `false`,
keep `normalizationMetadata` documented as only meaningful when
`normalize=true`, and add a short migration note advising a rollout plan (e.g.,
versioned endpoint or opt-in header and deprecation window) so integrators can
explicitly opt in before normalized responses are enabled.

- Introduced `TextRecordNormalizationDefs` to streamline key lookups with two maps for normalized and unnormalized keys.
- Updated normalization functions to utilize the new definitions, improving clarity and efficiency.
- Enhanced comments and invariants to better explain normalization operations and priority rules for handling records.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (2)

507-510: normalize=true default is a silent breaking change for existing API consumers.

Existing callers of GET /records/:name who don't explicitly pass normalize will silently receive a different response shape (normalized keys, normalized values). Consider defaulting to false until a versioning or migration strategy (e.g., new endpoint version, opt-in header, or explicit deprecation window) is in place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
507 - 510, The default of normalize=true causes a silent breaking change for
existing API consumers of GET /records/:name; change the default for the
`normalize` parameter to false (and keep `normalizationMetadata` behavior tied
to `normalize=true`) or alternatively implement an explicit opt-in (e.g., a new
API version, query param/opt-in header) and add deprecation notes; update any
validation/default logic that sets `normalize` (and related docs/comments) so
existing callers keep the current response shape unless they explicitly opt in
to normalization.

91-91: ⚠️ Potential issue | 🟡 Minor

Inaccurate invariant: "loses to any record with a non-null rawValue" contradicts the two-pass algorithm.

A non-null rawValue that fails validation is also Unnormalizable and therefore also falls into Pass 2 — competing on equal footing with a null-rawValue record, not beating it. The statement should say "always loses to any record with a normalizable value (op AlreadyNormalized or Normalized)."

📝 Suggested fix
-The key is still recognized; the record participates in the set-level priority contest but always loses to any record with a non-null rawValue.
+The key is still recognized; the record participates in the set-level priority contest but always loses to any record with a normalizable value (op `AlreadyNormalized` or `Normalized`). A non-null rawValue that fails validation is also Unnormalizable and competes in Pass 2 with null-rawValue records, not ahead of them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 91,
Update the inaccurate invariant text: change the claim that a null rawValue
record "always loses to any record with a non-null rawValue" to state it "always
loses to any record with a normalizable value (op `AlreadyNormalized` or
`Normalized`)". Reference the `normalizeRecord` behavior and the two-pass
algorithm (Pass 1 vs Pass 2) and clarify that non-null rawValues that validate
as `Unnormalizable` participate in Pass 2 and therefore compete equally with
null rawValue records (both `Unnormalizable`), rather than automatically losing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 455-457: Add a brief note next to the email validation and
canonicalization lines explaining that lowercasing the full email (the
"Canonical form: lowercased email address." statement) is an intentional
deviation from RFC 5321 which treats the local-part as case-sensitive; reference
the validation pattern `^[^\s@]+@[^\s@]+\.[^\s@]+$` and state succinctly that
this normalization is pragmatic because virtually all real-world mail servers
treat the local-part as case-insensitive, but it may theoretically mismatch
strictly case-sensitive mailboxes.
- Around line 260-266: The function expandNormalizedKeys has an implicit
precondition that every entry in the normalizedKeys parameter must be a
canonical key present in defs; make this explicit and either enforce it: update
the expandNormalizedKeys docstring to state "Precondition: every element of
normalizedKeys must be a valid normalized key defined in defs (caller error
otherwise)", and add a runtime sanity check at the start of expandNormalizedKeys
that validates each normalizedKey against defs (collect invalid keys and throw a
clear TypeError listing them), or if you prefer no runtime cost, at minimum add
the docstring/JSdoc note and consider adding an assert helper; refer to the
expandNormalizedKeys function and its normalizedKeys/defs parameters when making
the change.
- Around line 399-401: The spec currently accepts bare-domain "t.me/alice" but
not the equivalent bare-domain "telegram.me/alice"; update the
accepted-inputs/validation for Telegram links so that "telegram.me/alice" (no
scheme) is treated the same as "t.me/alice". Locate the section listing "- t.me
URL: `https://t.me/alice`, `http://t.me/alice`, `t.me/alice`" and the line "-
telegram.me URL: `https://telegram.me/alice`" and modify the pattern/allowed
list or regex used there to include the bare-domain form "telegram.me/alice"
(and ensure the validation accepts optional http/https variants as already done
for t.me).

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 507-510: The default of normalize=true causes a silent breaking
change for existing API consumers of GET /records/:name; change the default for
the `normalize` parameter to false (and keep `normalizationMetadata` behavior
tied to `normalize=true`) or alternatively implement an explicit opt-in (e.g., a
new API version, query param/opt-in header) and add deprecation notes; update
any validation/default logic that sets `normalize` (and related docs/comments)
so existing callers keep the current response shape unless they explicitly opt
in to normalization.
- Line 91: Update the inaccurate invariant text: change the claim that a null
rawValue record "always loses to any record with a non-null rawValue" to state
it "always loses to any record with a normalizable value (op `AlreadyNormalized`
or `Normalized`)". Reference the `normalizeRecord` behavior and the two-pass
algorithm (Pass 1 vs Pass 2) and clarify that non-null rawValues that validate
as `Unnormalizable` participate in Pass 2 and therefore compete equally with
null rawValue records (both `Unnormalizable`), rather than automatically losing.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd22b13 and c9b1d62.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Comment on lines +455 to +457
**Validation**: must match `^[^\s@]+@[^\s@]+\.[^\s@]+$` (basic structural check; full RFC 5322 compliance is not required).

**Canonical form**: lowercased email address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the deliberate RFC deviation for email local-part lowercasing.

RFC 5321 treats the local part (before @) as case-sensitive. Lowercasing the full email address is a pragmatic normalization choice but could theoretically mismatch a server that distinguishes Alice@example.com from alice@example.com. The spec already disclaims full RFC 5322 compliance; add a brief note here that local-part case normalization is an intentional deviation and that real-world mail servers virtually universally treat it as case-insensitive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
455 - 457, Add a brief note next to the email validation and canonicalization
lines explaining that lowercasing the full email (the "Canonical form:
lowercased email address." statement) is an intentional deviation from RFC 5321
which treats the local-part as case-sensitive; reference the validation pattern
`^[^\s@]+@[^\s@]+\.[^\s@]+$` and state succinctly that this normalization is
pragmatic because virtually all real-world mail servers treat the local-part as
case-insensitive, but it may theoretically mismatch strictly case-sensitive
mailboxes.

- Updated `normalizeRecord` to return an `IndividualRecordNormalizationResult` for null values, clarifying its treatment in the normalization process.
- Refined the `normalizedRecords` and `unnormalizedRecords` types to ensure only relevant operation values are included.
- Enhanced comments to provide clearer explanations of the priority rules and preconditions for key normalization.
@djstrong djstrong marked this pull request as ready for review February 23, 2026 18:07
@djstrong djstrong requested a review from a team as a code owner February 23, 2026 18:07
Copilot AI review requested due to automatic review settings February 23, 2026 18:07
@djstrong djstrong marked this pull request as draft February 23, 2026 18:08
@djstrong
Copy link
Contributor Author

@lightwalker-eth The plan is ready for review. There are a few open questions:

  1. Normalized key for Twitter: com.twitter or com.x (reflecting the platform rebrand)? What unnormalized variants should be included?
  2. Parameter name for metadata field: normalizationMetadata, includeNormalizationMetadata, or another name?
  3. Unnormalizable behavior: When a key is recognized but no candidate produces a valid value, should records.texts contain null for that key, or should the key be omitted from the response entirely?
  4. displayValue and url placement: Should these enrichment fields be part of records.texts (when normalize=true) or only inside normalizationMetadata? Including them in the main response is more convenient for UI clients but changes the primary response shape significantly for all callers.
  5. Client requesting an unnormalized key directly: If a client passes texts=vnd.twitter (an unnormalized variant) instead of texts=com.twitter, should expandNormalizedKeys throw immediately (current spec — fail fast, caller error), or silently map it to the canonical key and expand from there (more forgiving for legacy integrations)? The issue does not address this case.
  6. Placement of UnrecognizedKeyAndValue records: The current spec places them in normalizedRecords (keyed by rawKey) so that stripNormalizationMetadata only needs to iterate one map to produce the full output. The issue's own description says normalizedRecords maps normalized keys and unnormalizedRecords holds "all records that were unnormalized for one reason or another" — which by that framing would put unrecognized records in unnormalizedRecords. Decision: should unnormalizedRecords mean "records that don't appear in output" (current spec) or "records that weren't normalized" (issue's framing)?

@djstrong djstrong marked this pull request as ready for review February 25, 2026 12:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (6)

513-516: ⚠️ Potential issue | 🟠 Major

Breaking default behavior change remains risky (normalize defaults to true).

Line 515 still defaults normalization on for all callers, which can silently change records.texts semantics for existing clients. Keep this opt-in for rollout safety.

Proposed change
-| `normalize`             | boolean | `true`  | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
+| `normalize`             | boolean | `false` | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
513 - 516, The plan currently sets the normalize parameter default to true which
can silently change existing clients' behavior (e.g., semantics of
records.texts); change the default for `normalize` to false (keep
`normalizationMetadata` default false) and update any related docs/examples
mentioning `normalize` or `NormalizedRecordSet` to state that normalization is
opt-in to avoid breaking existing callers. Ensure references to `normalize` in
the plan and API descriptions explicitly note the default is false and that
enabling it will expand keys pre-resolution and run the normalization pipeline
post-resolution.

550-550: ⚠️ Potential issue | 🟡 Minor

Malformed Markdown in open question #4 is still present.

Line 550 has broken backtick/bold ordering and renders incorrectly.

Proposed change
-4. `**displayValue` and `url` placement**: Should these enrichment fields be part of `records.texts` (when `normalize=true`) or only inside `normalizationMetadata`? Including them in the main response is more convenient for UI clients but changes the primary response shape significantly for all callers.
+4. **`displayValue` and `url` placement**: Should these enrichment fields be part of `records.texts` (when `normalize=true`) or only inside `normalizationMetadata`? Including them in the main response is more convenient for UI clients but changes the primary response shape significantly for all callers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 550,
Fix the malformed Markdown in the open question line that currently reads "4.
`**displayValue` and `url` placement**: ...": remove the mismatched
backtick/bold markers and reformat the phrase so inline code and bold are
correctly nested (for example use **displayValue and url placement** or
`displayValue` and `url` inside normal text), updating the sentence in the
paragraph that begins with "4." to render properly in Markdown; ensure the open
question wording (the text mentioning records.texts, normalize=true, and
normalizationMetadata) remains unchanged aside from correcting the backtick/bold
markup.

469-470: ⚠️ Potential issue | 🟡 Minor

Email lowercasing policy needs explicit RFC-deviation note.

Line 469 defines full-address lowercasing but does not document the local-part case-sensitivity tradeoff (RFC 5321). The behavior is reasonable, but it should be explicit in the spec.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
469 - 470, Update the "Canonical form: lowercased email address" section to
explicitly note this is a deliberate deviation from RFC 5321: state that
although the RFC allows local-part case-sensitivity, this system normalizes by
lowercasing the entire address (local-part and domain) for matching and storage,
and document the tradeoff and any user-visible implications (e.g., mailbox
uniqueness, authentication comparisons, and migration considerations). Reference
the canonical-form wording directly so readers see the connection and cite RFC
5321 in the note.

496-496: ⚠️ Potential issue | 🟡 Minor

IPFS url conversion remains underspecified for suffix components.

Line 496 does not state whether /path, ?query, and #fragment are preserved when mapping ipfs:// to gateway URLs. This can cause inconsistent client behavior.

Proposed wording
-**url**: for `https://`/`http://` — same as value; for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}`; for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).
+**url**: for `https://`/`http://` — same as value; for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}` and preserve any subsequent path/query/fragment exactly; for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 496,
Clarify the ipfs:// → https://ipfs.io/ipfs/{cid} mapping by explicitly stating
that any /path, ?query and `#fragment` suffixes on the original ipfs:// URI must
be preserved and appended to the gateway URL (resulting in
https://ipfs.io/ipfs/{cid}{/path}{?query}{`#fragment`}); update the wording around
the "ipfs:// to https://ipfs.io/ipfs/{cid}" conversion so it unambiguously
instructs preservation of those components and how to handle percent-encoding or
empty components during the transformation.

394-394: ⚠️ Potential issue | 🟡 Minor

Discord legacy regex still permits # inside the username segment.

Line 394 can match alice#bob#1234 due to the current class. Exclude # from the username portion.

Proposed change
-- Legacy format: must match `^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$` (printable characters only before the `#` discriminator).
+- Legacy format: must match `^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$` (printable, non-`#` characters only before the `#` discriminator).
Discord legacy username format can username contain # before discriminator
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 394,
The legacy Discord username regex currently uses the class
`^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$`, which still allows `#` inside the username;
update the character class to explicitly exclude `#` (e.g. change
`[^\x00-\x1F\x7F]` to `[^\x00-\x1F\x7F#]`) so usernames like `alice#bob#1234` no
longer match while preserving the 2–32 printable-character constraint and the
`#[0-9]{4}` discriminator.

415-415: ⚠️ Potential issue | 🟡 Minor

Telegram username regex is too permissive.

Line 415 still allows leading/trailing/consecutive underscores (_alice, alice_, alice__bob), which should be rejected.

Proposed change
-**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
+**Validation**: extracted username must match `^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$`.
Telegram username rules leading trailing underscore consecutive underscores official documentation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 415,
The current Validation rule that checks the extracted username using a simple
"letters/digits/underscore, 5–32 chars" pattern allows leading, trailing, and
consecutive underscores; update the Validation for the extracted username to
enforce three constraints: total length 5–32, disallow a leading underscore,
disallow a trailing underscore, and disallow consecutive underscores while still
allowing only letters, digits and underscores; implement this by replacing the
existing permissive pattern on the "Validation" line with a regex that uses a
length check and negative lookarounds to forbid starting/ending underscores and
double underscores so usernames like "_alice", "alice_", and "alice__bob" are
rejected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 549-553: Summary: The spec leaves core response-shape decisions
ambiguous (null vs omit, enrichment placement, unrecognized-record placement,
handling unnormalized keys), while other sections (Layers 2/3, response-shape)
assume concrete behavior. Fix: update the plan so that for each item listed (1.
unnormalizable behavior for records.texts, 2. placement of displayValue and url
between records.texts vs normalizationMetadata, 3. behavior of
expandNormalizedKeys when given unnormalized keys like texts=vnd.twitter, 4.
placement semantics for UnrecognizedKeyAndValue between normalizedRecords and
unnormalizedRecords, and 5. validation regex verification) you either (A) choose
and document the canonical behavior explicitly in the earlier response-shape and
Layers 2/3 sections, referencing symbols records.texts, normalizationMetadata,
expandNormalizedKeys, normalizedRecords, unnormalizedRecords,
stripNormalizationMetadata, or (B) mark those earlier sections as
provisional/TBD with a clear non-normative flag and add a TODO linking to this
decision list; ensure the doc includes the verification task for
service-specific regexes as a required pre-implementation step.

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 513-516: The plan currently sets the normalize parameter default
to true which can silently change existing clients' behavior (e.g., semantics of
records.texts); change the default for `normalize` to false (keep
`normalizationMetadata` default false) and update any related docs/examples
mentioning `normalize` or `NormalizedRecordSet` to state that normalization is
opt-in to avoid breaking existing callers. Ensure references to `normalize` in
the plan and API descriptions explicitly note the default is false and that
enabling it will expand keys pre-resolution and run the normalization pipeline
post-resolution.
- Line 550: Fix the malformed Markdown in the open question line that currently
reads "4. `**displayValue` and `url` placement**: ...": remove the mismatched
backtick/bold markers and reformat the phrase so inline code and bold are
correctly nested (for example use **displayValue and url placement** or
`displayValue` and `url` inside normal text), updating the sentence in the
paragraph that begins with "4." to render properly in Markdown; ensure the open
question wording (the text mentioning records.texts, normalize=true, and
normalizationMetadata) remains unchanged aside from correcting the backtick/bold
markup.
- Around line 469-470: Update the "Canonical form: lowercased email address"
section to explicitly note this is a deliberate deviation from RFC 5321: state
that although the RFC allows local-part case-sensitivity, this system normalizes
by lowercasing the entire address (local-part and domain) for matching and
storage, and document the tradeoff and any user-visible implications (e.g.,
mailbox uniqueness, authentication comparisons, and migration considerations).
Reference the canonical-form wording directly so readers see the connection and
cite RFC 5321 in the note.
- Line 496: Clarify the ipfs:// → https://ipfs.io/ipfs/{cid} mapping by
explicitly stating that any /path, ?query and `#fragment` suffixes on the original
ipfs:// URI must be preserved and appended to the gateway URL (resulting in
https://ipfs.io/ipfs/{cid}{/path}{?query}{`#fragment`}); update the wording around
the "ipfs:// to https://ipfs.io/ipfs/{cid}" conversion so it unambiguously
instructs preservation of those components and how to handle percent-encoding or
empty components during the transformation.
- Line 394: The legacy Discord username regex currently uses the class
`^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$`, which still allows `#` inside the username;
update the character class to explicitly exclude `#` (e.g. change
`[^\x00-\x1F\x7F]` to `[^\x00-\x1F\x7F#]`) so usernames like `alice#bob#1234` no
longer match while preserving the 2–32 printable-character constraint and the
`#[0-9]{4}` discriminator.
- Line 415: The current Validation rule that checks the extracted username using
a simple "letters/digits/underscore, 5–32 chars" pattern allows leading,
trailing, and consecutive underscores; update the Validation for the extracted
username to enforce three constraints: total length 5–32, disallow a leading
underscore, disallow a trailing underscore, and disallow consecutive underscores
while still allowing only letters, digits and underscores; implement this by
replacing the existing permissive pattern on the "Validation" line with a regex
that uses a length check and negative lookarounds to forbid starting/ending
underscores and double underscores so usernames like "_alice", "alice_", and
"alice__bob" are rejected.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 463048d and c3c1572.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (5)

496-496: ⚠️ Potential issue | 🟡 Minor

IPFS url() conversion is silent on path/query/fragment after the CID.

ipfs://bafybei.../metadata.json → should it produce https://ipfs.io/ipfs/bafybei.../metadata.json (path preserved) or https://ipfs.io/ipfs/bafybei... (path dropped)? The current spec is ambiguous. Implementors need an explicit rule.

📝 Proposed clarification
-for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}`
+for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}` preserving any path, query, and fragment components from the original URI (e.g. `ipfs://bafybei.../metadata.json` → `https://ipfs.io/ipfs/bafybei.../metadata.json`); percent-encoding is preserved as-is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 496,
Clarify the IPFS url() conversion rule: update the specification for the IPFS
branch (the description that currently reads "for `ipfs://` — convert to
`https://ipfs.io/ipfs/{cid}`") to explicitly preserve the entire hierarchical
suffix (path), query string, and fragment — e.g.,
`ipfs://{cid}/path/file?query=1#frag` ->
`https://ipfs.io/ipfs/{cid}/path/file?query=1#frag`; reference the `url()`
conversion rule and the `ipfs://` handling text so implementors know to append
the original path/query/fragment after the CID rather than dropping them.

467-469: ⚠️ Potential issue | 🟡 Minor

Email local-part lowercasing deviates from RFC 5321 — add a note.

RFC 5321 §2.4 treats the local part (before @) as case-sensitive; lowercasing the full address is a pragmatic decision that could theoretically mismatch a server that honours case in the local part. The spec disclaims full RFC 5322 compliance for parsing but does not document this normalization choice as an intentional deviation.

📝 Proposed addition
 **Canonical form**: lowercased email address.
+> **Note**: lowercasing the local part (before `@`) is an intentional deviation from RFC 5321, which treats it as case-sensitive. This is pragmatic because virtually all real-world mail servers are case-insensitive for the local part, but it may theoretically mismatch a strictly case-sensitive mailbox.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
467 - 469, Add a short note under the "Canonical form: lowercased email
address." section clarifying that lowercasing the full address intentionally
deviates from RFC 5321's treatment of the local-part as potentially
case-sensitive; reference RFC 5321 §2.4, explain the pragmatic rationale
(compatibility and normalization benefits) and warn about the small risk of
mismatch with servers that honor local-part case, and mark this as an explicit
design decision tied to the "Validation" and "Canonical form" statements.

550-550: ⚠️ Potential issue | 🟡 Minor

Malformed markdown in open question #4 — stray backtick before bold marker.

The current text starts with `**displayValue, which opens a code span rather than bold. The intended rendering is bold+code for both field names.

📝 Fix
-4. `**displayValue` and `url` placement`: Should these enrichment fields...
+4. **`displayValue` and `url` placement**: Should these enrichment fields...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 550,
The markdown has a stray backtick before the bold marker in the open question so
"`**displayValue`" renders as a code span instead of bold+code for the field
names; fix the sentence to render both field names as bold+code (e.g.,
**`displayValue`** and **`url`**) and ensure the surrounding text references the
correct symbols (`records.texts`, `normalizationMetadata`, and `normalize=true`)
so the question reads clearly about whether these enrichment fields should be in
records.texts (when normalize=true) or only in normalizationMetadata.

513-516: ⚠️ Potential issue | 🟠 Major

normalize=true default silently breaks existing API consumers.

Any existing caller of GET /records/:name that doesn't explicitly pass normalize=false will immediately receive a different response shape (normalized keys, normalized values) after deployment — with no opt-in window. The safe default for any additive API feature is false, with a documented migration path (e.g. versioned endpoint, deprecation window, or Content-Type negotiation).

📝 Proposed fix
-| `normalize`             | boolean | `true`  | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
+| `normalize`             | boolean | `false` | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
513 - 516, The default for the new normalization feature is unsafe — change the
`normalize` parameter default from true to false in the API docs/table and
update the GET /records/:name implementation to respect that default (ensure
code paths in the request handler that reference `normalize` and
`normalizationMetadata` treat absent/undefined as false), keep
`normalizationMetadata` behavior gated on `normalize===true`, and add a short
migration note/changelog entry describing how consumers can opt in (pass
normalize=true or use the new versioned endpoint) and that NormalizedRecordSet
will only appear when normalize is true.

415-415: ⚠️ Potential issue | 🟡 Minor

Telegram regex still permits leading/trailing and consecutive underscores.

^[a-zA-Z0-9_]{5,32}$ accepts _alice, alice_, and alice__bob. Telegram Desktop bug #569 confirms that usernames like @abcde_ or @_abcde are invalid. The fix proposed in prior review rounds has not yet been applied.

🐛 Proposed fix
-**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
+**Validation**: extracted username must match `^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$` (5–32 chars, no leading/trailing underscore, no consecutive underscores).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 415,
The current validation pattern `^[a-zA-Z0-9_]{5,32}$` (the "extracted username
must match" rule) allows leading, trailing and consecutive underscores; update
that validation to disallow usernames that start or end with '_' and to forbid
consecutive '__' while still enforcing the 5–32 length and allowed characters.
Replace the existing regex/pattern with one that (1) asserts total length 5–32,
(2) rejects any occurrence of "__", (3) rejects a leading "_" and (4) rejects a
trailing "_" so usernames like `_alice`, `alice_`, and `alice__bob` are invalid.
Ensure the new pattern is used wherever the original `^[a-zA-Z0-9_]{5,32}$` rule
is referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 150-165: The spec incorrectly places set-level deduplication
behavior inside the per-record function description for normalizeRecord (which
returns IndividualRecordNormalizationResult and has no access to set state);
remove the paragraph beginning "At the set level, the first record with a given
unrecognized `rawKey`..." from the normalizeRecord logic and instead insert that
entire sentence into the normalizeRecordSet section (alongside the existing
two-pass algorithm) so that deduplication is described where the full candidate
list and set-level decisions are available.

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Line 496: Clarify the IPFS url() conversion rule: update the specification for
the IPFS branch (the description that currently reads "for `ipfs://` — convert
to `https://ipfs.io/ipfs/{cid}`") to explicitly preserve the entire hierarchical
suffix (path), query string, and fragment — e.g.,
`ipfs://{cid}/path/file?query=1#frag` ->
`https://ipfs.io/ipfs/{cid}/path/file?query=1#frag`; reference the `url()`
conversion rule and the `ipfs://` handling text so implementors know to append
the original path/query/fragment after the CID rather than dropping them.
- Around line 467-469: Add a short note under the "Canonical form: lowercased
email address." section clarifying that lowercasing the full address
intentionally deviates from RFC 5321's treatment of the local-part as
potentially case-sensitive; reference RFC 5321 §2.4, explain the pragmatic
rationale (compatibility and normalization benefits) and warn about the small
risk of mismatch with servers that honor local-part case, and mark this as an
explicit design decision tied to the "Validation" and "Canonical form"
statements.
- Line 550: The markdown has a stray backtick before the bold marker in the open
question so "`**displayValue`" renders as a code span instead of bold+code for
the field names; fix the sentence to render both field names as bold+code (e.g.,
**`displayValue`** and **`url`**) and ensure the surrounding text references the
correct symbols (`records.texts`, `normalizationMetadata`, and `normalize=true`)
so the question reads clearly about whether these enrichment fields should be in
records.texts (when normalize=true) or only in normalizationMetadata.
- Around line 513-516: The default for the new normalization feature is unsafe —
change the `normalize` parameter default from true to false in the API
docs/table and update the GET /records/:name implementation to respect that
default (ensure code paths in the request handler that reference `normalize` and
`normalizationMetadata` treat absent/undefined as false), keep
`normalizationMetadata` behavior gated on `normalize===true`, and add a short
migration note/changelog entry describing how consumers can opt in (pass
normalize=true or use the new versioned endpoint) and that NormalizedRecordSet
will only appear when normalize is true.
- Line 415: The current validation pattern `^[a-zA-Z0-9_]{5,32}$` (the
"extracted username must match" rule) allows leading, trailing and consecutive
underscores; update that validation to disallow usernames that start or end with
'_' and to forbid consecutive '__' while still enforcing the 5–32 length and
allowed characters. Replace the existing regex/pattern with one that (1) asserts
total length 5–32, (2) rejects any occurrence of "__", (3) rejects a leading "_"
and (4) rejects a trailing "_" so usernames like `_alice`, `alice_`, and
`alice__bob` are invalid. Ensure the new pattern is used wherever the original
`^[a-zA-Z0-9_]{5,32}$` rule is referenced.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 463048d and c3c1572.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

@lightwalker-eth
Copy link
Member

@djstrong Thanks for your questions 😄

  1. Suggest we make it com.x reflecting the platform rebrand.
  2. Not sure I understand this question. You're asking about "parameter names"? Are you referring to the name of a param for an API? We shouldn't be working on any APIs now in relation to this goal. We should only be working on this goal as a purely in-memory library.
  3. I'm struggling to answer this question because in my mind the answer depends at which step in the data flow you're asking about. Probably best to setup a call to more efficiently clarify?
  4. I think it would be good to expose these values such that you don't need the full metadata still to access them.
  5. Clients requesting keys should be out of scope for our goals now. Our goals now aren't about requesting keys. It's about taking the raw result from some normalization operation and then normalizing it.
  6. I think it creates problems if we name a field normalizedRecords because then it creates confusion if it contains both normalized records AND unrecognized records. Therefore, better to just call it something like records and have it include both normalized and unrecognized records.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (4)

167-167: ⚠️ Potential issue | 🟡 Minor

Move set-level deduplication rules out of normalizeRecord section.

Line 167 mixes set-level behavior into a per-record function contract. This makes the normalizeRecord spec misleading because duplicate detection requires set context and belongs in normalizeRecordSet.

✂️ Suggested doc move
-- If not found: `keyResult` is Unrecognized; `valueResult` is Unrecognized. At the set level, the first record with a given unrecognized `rawKey` is written into `records` as `UnrecognizedKeyAndValue`; any subsequent record with the same `rawKey` gets `op: "DuplicateNormalizedKey"` and goes into `excludedRecords` (consistent with how recognized-key duplicates are handled).
+- If not found: `keyResult` is Unrecognized; `valueResult` is Unrecognized.

And add that paragraph under Section 1.5 (normalizeRecordSet) instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 167,
The paragraph describing set-level deduplication currently placed under
normalizeRecord should be moved into the normalizeRecordSet section because
duplicate detection requires set context; update the spec by removing the
sentence beginning "If not found: `keyResult` is Unrecognized..." from the
normalizeRecord contract and add it (verbatim) to the normalizeRecordSet section
(Section 1.5), clarifying that at the set level the first record with an
unrecognized `rawKey` is written as `UnrecognizedKeyAndValue` and subsequent
records with the same `rawKey` are marked `op: "DuplicateNormalizedKey"` and
placed into `excludedRecords`.

531-534: ⚠️ Potential issue | 🟠 Major

normalize=true default is a breaking-response-shape default.

Line 533 defaults normalization on, which changes records.texts behavior for existing callers that omit the param. This should be opt-in first or explicitly versioned/migrated.

🔁 Safer rollout diff
-| `normalize`             | boolean | `true`  | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
+| `normalize`             | boolean | `false` | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
531 - 534, The defaulting of normalize to true is a breaking change because it
alters response shape (e.g., records.texts) for callers that omit the param;
change the default for the parameter named `normalize` to false (or make it
required/explicit) and keep `normalizationMetadata` behavior tied to
`normalize=true`, or introduce a new version flag to opt-in normalization;
update any documentation and the `NormalizedRecordSet`-producing code paths to
only run the expansion/normalization pipeline when `normalize` is explicitly
true so existing callers keep the original `records.texts` shape.

412-412: ⚠️ Potential issue | 🟡 Minor

Discord legacy regex still allows # inside username segment.

Line 412 pattern allows cases like alice#bob#1234. Exclude # from the username character class.

✅ Regex correction
-- Legacy format: must match `^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$` (printable characters only before the `#` discriminator).
+- Legacy format: must match `^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$` (printable, non-`#` characters before the `#` discriminator).
#!/bin/bash
python3 - << 'PY'
import re
old = re.compile(r'^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$')
new = re.compile(r'^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$')
samples = ["alice#1234", "alice#bob#1234"]
for s in samples:
    print(s, "old=", bool(old.fullmatch(s)), "new=", bool(new.fullmatch(s)))
PY

Expected: alice#bob#1234 should be rejected by the new regex.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 412,
The legacy Discord username regex currently allows '#' inside the username
portion; update the pattern string '^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$' to
explicitly exclude '#' from the username character class by changing it to
'^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$' so inputs like "alice#bob#1234" are
rejected.

433-433: ⚠️ Potential issue | 🟡 Minor

Telegram username regex is too permissive for underscore placement.

Line 433 allows _alice, alice_, and alice__bob, which are invalid by Telegram username constraints.

✅ Regex hardening
-**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
+**Validation**: extracted username must match `^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$` (5–32 chars, no leading/trailing underscore, no consecutive underscores).
#!/bin/bash
python3 - << 'PY'
import re
old = re.compile(r'^[a-zA-Z0-9_]{5,32}$')
new = re.compile(r'^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$')
samples = ["alice", "_alice", "alice_", "alice__bob", "alice_bob"]
for s in samples:
    print(s, "old=", bool(old.fullmatch(s)), "new=", bool(new.fullmatch(s)))
PY

Expected: _alice, alice_, alice__bob rejected by new regex; valid usernames still accepted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 433,
The current username validation uses the permissive regex
`^[a-zA-Z0-9_]{5,32}$`; replace it with a stricter pattern that forbids a
leading underscore, trailing underscore, and consecutive underscores (so
usernames like `_alice`, `alice_`, and `alice__bob` are rejected) — e.g., use a
pattern that asserts not starting with `_`, not ending with `_`, and no `__`
inside while preserving the 5–32 allowed chars constraint; update the validation
location where the `^[a-zA-Z0-9_]{5,32}$` pattern is defined or referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Line 296: The example uses the wrong canonical key; update the sentence so the
illustrative pair matches the spec's canonical form: replace references to
com.twitter/vnd.twitter with com.x/vnd.x in the precondition text that mentions
normalizedKeys and the defs.byUnnormalizedKey / defs.byNormalizedKey check so
callers and tests are consistent with the spec.

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Line 167: The paragraph describing set-level deduplication currently placed
under normalizeRecord should be moved into the normalizeRecordSet section
because duplicate detection requires set context; update the spec by removing
the sentence beginning "If not found: `keyResult` is Unrecognized..." from the
normalizeRecord contract and add it (verbatim) to the normalizeRecordSet section
(Section 1.5), clarifying that at the set level the first record with an
unrecognized `rawKey` is written as `UnrecognizedKeyAndValue` and subsequent
records with the same `rawKey` are marked `op: "DuplicateNormalizedKey"` and
placed into `excludedRecords`.
- Around line 531-534: The defaulting of normalize to true is a breaking change
because it alters response shape (e.g., records.texts) for callers that omit the
param; change the default for the parameter named `normalize` to false (or make
it required/explicit) and keep `normalizationMetadata` behavior tied to
`normalize=true`, or introduce a new version flag to opt-in normalization;
update any documentation and the `NormalizedRecordSet`-producing code paths to
only run the expansion/normalization pipeline when `normalize` is explicitly
true so existing callers keep the original `records.texts` shape.
- Line 412: The legacy Discord username regex currently allows '#' inside the
username portion; update the pattern string '^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$'
to explicitly exclude '#' from the username character class by changing it to
'^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$' so inputs like "alice#bob#1234" are
rejected.
- Line 433: The current username validation uses the permissive regex
`^[a-zA-Z0-9_]{5,32}$`; replace it with a stricter pattern that forbids a
leading underscore, trailing underscore, and consecutive underscores (so
usernames like `_alice`, `alice_`, and `alice__bob` are rejected) — e.g., use a
pattern that asserts not starting with `_`, not ending with `_`, and no `__`
inside while preserving the 5–32 allowed chars constraint; update the validation
location where the `^[a-zA-Z0-9_]{5,32}$` pattern is defined or referenced.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3c1572 and 67f17c4.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +512 to +515
- HTTPS/HTTP URL: `https://example.com/avatar.png`
- IPFS URI: `ipfs://Qm...`, `ipfs://bafy...`
- EIP-155 NFT reference: `eip155:1/erc721:0x.../1`, `eip155:1/erc1155:0x.../1`
- Data URI: `data:image/png;base64,...`
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avatar IPFS handling is underspecified: the accepted input formats list only CID-only forms (e.g. ipfs://Qm...), but does not clarify whether ipfs://<cid>/<path> is supported. Please explicitly specify the supported ipfs:// URI shapes (CID-only vs CID+path) so implementors handle the same cases consistently.

Copilot uses AI. Check for mistakes.

**displayValue**: same as canonical form.

**url**: for `https://`/`http://` — same as value; for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}`; for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Avatar url enrichment rule for ipfs:// says to convert to https://ipfs.io/ipfs/{cid}, but it doesn't define how {cid} is extracted and whether any path segments should be preserved. Define the exact conversion rule (e.g. ipfs://<cid>[/<path>]https://ipfs.io/ipfs/<cid>[/<path>]) or explicitly state only CID-only URIs are accepted.

Suggested change
**url**: for `https://`/`http://` — same as value; for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}`; for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).
**url**: for `https://`/`http://` — same as value; for `ipfs://`treat the value as `ipfs://<cid>[/<path>]` and convert to `https://ipfs.io/ipfs/<cid>[/<path>]` (i.e. everything after `ipfs://` is appended after `/ipfs/`); for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants