Skip to content

net/url: unescape() logic doesn't copy invalid bytes following % as expected by most recent spec #56732

@dgryski

Description

@dgryski

(This is a reopened version of #11249 because the whatwg spec has changed.)

What version of Go are you using (go version)?

Go 1.19

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

All.

What did you do?

Tried to parse a URL with an "invalid" percent escape. (In this case, %u0041).

	u, err := url.ParseRequestURI("http://localhost/%u0041")

Playground link: https://go.dev/play/p/SqA0_6oLuo3

What did you expect to see?

Printing the URL path as /%u0041

What did you see instead?

That there was an error parsing the URL.

I'm reopening this bug because the current version of the URL spec for "Percent Encoded Bytes" ( https://url.spec.whatwg.org/#percent-encoded-bytes )

Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output.

The JavaScript and popular Rust parsers follow this model, which is making interoperability tricky.

JavaScript:

> new URL("http://localhost/%u0041").pathname
< '/%u0041'

Rust:

use url::Url;
use hyper::Uri;

fn main() {
    let uri = "https://localhost/%u0041".parse::<Uri>().unwrap();
    println!("path: {}", uri.path());

    let u = Url::parse("https://localhost/%u0041").unwrap();
    println!("path: {}", u.path());
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f1134169d2851ebe07da4a45e03ff68e

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions