Skip to content

Fix fragment parsing for relative URI in RFC URI parser#36762

Open
daguimu wants to merge 1 commit intospring-projects:mainfrom
daguimu:fix/rfc-uri-parser-relative-fragment-36759
Open

Fix fragment parsing for relative URI in RFC URI parser#36762
daguimu wants to merge 1 commit intospring-projects:mainfrom
daguimu:fix/rfc-uri-parser-relative-fragment-36759

Conversation

@daguimu
Copy link
Copy Markdown
Contributor

@daguimu daguimu commented May 7, 2026

Problem

UriComponentsBuilder.fromUriString("path2#foo") parses a relative URI with a fragment incorrectly:

path     = "path2"      (correct)
fragment = "path2#foo"  (should be "foo")

As a result, toUriString() produces path2#path2#foo instead of path2#foo.

Root Cause

In RfcUriParser, every state that transitions into FRAGMENT advances the component index to i + 1 so the fragment starts after the # character — except SCHEME_OR_PATH, which calls advanceTo(FRAGMENT) without the index argument. For a schemeless single-segment input the parser enters SCHEME_OR_PATH at index 0, the component index stays at 0, and captureFragmentIfNotEmpty() ends up returning uri.substring(0, length) — the entire input.

URIs with / in the path don't hit this code path: they leave SCHEME_OR_PATH for PATH on the first /, and PATH's # case correctly passes i + 1. The WhatWG parser is also unaffected (it produces the expected output for this input today).

Fix

In RfcUriParser.SCHEME_OR_PATH, change advanceTo(FRAGMENT) to advanceTo(FRAGMENT, i + 1), mirroring the sibling ?QUERY transition two lines above and every other → FRAGMENT transition in the parser.

Tests Added

Change Point Test
SCHEME_OR_PATH # advances component index past # fromUriStringRelativeUriWithFragment — verifies path2#foo splits as path = "path2", fragment = "foo" and round-trips via toUriString()
fromUriStringRelativeUriWithEmptyFragment — boundary: trailing # with no fragment yields path = "path2", fragment = null

Both tests are parameterized over ParserType (RFC and WHAT_WG). Without the fix they fail for RFC and pass for WHAT_WG (which already handled this case correctly); with the fix both parsers agree.

The pre-existing assertion for docs/guide/collections/designfaq.html#28 in fromUriString(ParserType) (which goes through the PATH state, not SCHEME_OR_PATH) continues to pass, providing regression coverage for multi-segment relative URIs.

Impact

  • Scope: spring-web only — one-line behavioral change in a private parser, no public API surface added.
  • Affects: schemeless single-segment URIs with a fragment, when using the default ParserType.RFC parser.
  • Unaffected: URIs with a scheme, authority, or / in the path (different parser states); ParserType.WHAT_WG.

Closes gh-36759

When parsing a relative URI such as `path2#foo`, RfcUriParser's
SCHEME_OR_PATH state advanced to FRAGMENT without moving the component
index past the `#` character, so the captured fragment included the
entire input (`path2#foo`) instead of just `foo`. As a result,
`toUriString()` produced `path2#path2#foo`.

Update the SCHEME_OR_PATH `#` transition to advance the component
index to `i + 1`, matching the sibling `?` -> QUERY transition in the
same state and every other `-> FRAGMENT` transition in the parser.
URIs with a `/` in the path are unaffected because they leave
SCHEME_OR_PATH for PATH on the first slash; the WhatWG parser already
handled this case correctly.

Closes spring-projectsgh-36759

Signed-off-by: daguimu <daguimu.geek@gmail.com>
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: waiting-for-triage An issue we've not yet triaged or decided on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UriComponentsBuilder.fromUriString incorrectly captures fragment for relative URI with fragment

2 participants