Skip to content

Fix URI parser: IPv6 structural validation and pct-encoded case sensitivity#2354

Merged
jviotti merged 1 commit into
sourcemeta:mainfrom
AcEKaycgR:fix/uri-ipv6-validation
Apr 24, 2026
Merged

Fix URI parser: IPv6 structural validation and pct-encoded case sensitivity#2354
jviotti merged 1 commit into
sourcemeta:mainfrom
AcEKaycgR:fix/uri-ipv6-validation

Conversation

@AcEKaycgR
Copy link
Copy Markdown
Contributor

@AcEKaycgR AcEKaycgR commented Apr 24, 2026

Fixes #2353

What changed

IPv6 false positives (6 cases)

parse_ipv6() previously accepted any sequence of hex digits, colons, and dots inside brackets without validating RFC 3986 §3.2.2 / RFC 4291 structural rules. After the character-level scan, the extracted address is now passed to is_ipv6() which already enforces all the rules correctly:

  • http://[] - empty brackets
  • http://[2001:db8::00000] - h16 group exceeds 4 hex digits
  • http://[2001::db8::1] - more than one ::
  • http://[1:2:3:4:5:6:7] - 7 groups without ::
  • http://[1:2:3:4:5:6:7:8:9] - 9 groups
  • http://[::ffff:1.2.3.256] - embedded IPv4 octet out of range

Percent-encoding false negative (1 case)

validate_percent_encoded_utf8() was treating bytes like 0xAF as bare UTF-8 continuation bytes and throwing. RFC 3986 §2.1 only requires two valid HEXDIG after %; UTF-8 validity of the encoded octet is not required. The function now returns after confirming two valid hex digits.

  • http://a.com/%aF - mixed-case hex rejected incorrectly

Tests

7 regression tests added to uri_parse_test.cc, one per fixed case.
All 628 existing URI unit tests continue to pass.

Its ready for review @jviotti

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 24, 2026

🤖 Augment PR Summary

Summary: This PR tightens URI parsing around IP-literals and percent-encoding to better match RFC 3986.

Changes:

  • Guarded UNC path detection in URI::from_path so //-prefixed paths are treated as UNC only on Windows.
  • Simplified percent-encoding validation to require only % + two HEXDIG (no UTF-8 byte-sequence validation), fixing mixed-case hex handling.
  • Improved IPv6 parsing by validating bracket contents structurally via sourcemeta::core::is_ipv6() after the character-level scan.
  • Adjusted path/query/fragment scanners to advance cleanly after consuming a percent-encoded triplet.
  • Updated existing percent-encoding negative tests to use truly invalid hex and added regression tests for mixed-case percent-encoding and IPv6 structural false positives.

Technical Notes: IPv6 validation now leverages the existing core IP validator, while IPvFuture handling remains parser-local per RFC 3986.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

@AcEKaycgR AcEKaycgR force-pushed the fix/uri-ipv6-validation branch from 0e33b86 to 8141fbd Compare April 24, 2026 11:21
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

@AcEKaycgR AcEKaycgR force-pushed the fix/uri-ipv6-validation branch from 8141fbd to 322ce03 Compare April 24, 2026 11:24
…tivity

Signed-off-by: AcE <kintan0108@gmail.com>
@AcEKaycgR AcEKaycgR force-pushed the fix/uri-ipv6-validation branch from 322ce03 to ca16483 Compare April 24, 2026 11:37
Comment thread src/core/uri/parse.cc
}

const auto literal{input.substr(start + 1, position - start - 1)};
if (!is_ipvfuture && !sourcemeta::core::is_ipv6(literal)) [[unlikely]] {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh good catch. We developed the ip module very recently. Makes sense using here now! 💯

Copy link
Copy Markdown
Member

@jviotti jviotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impeccable PR. Great job. Thanks!

@jviotti jviotti merged commit f92bfdd into sourcemeta:main Apr 24, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

URI parser: 7 mismatches found against RFC 3986 test suite

2 participants