Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: unescape() logic doesn't copy invalid bytes following % as expected by most recent spec #56732

Open
dgryski opened this issue Nov 14, 2022 · 11 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dgryski
Copy link
Contributor

dgryski commented Nov 14, 2022

(This is a reopened version of #11249 because the whatwg spec has changed.)

What version of Go are you using (go version)?

Go 1.19

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

All.

What did you do?

Tried to parse a URL with an "invalid" percent escape. (In this case, %u0041).

	u, err := url.ParseRequestURI("http://localhost/%u0041")

Playground link: https://go.dev/play/p/SqA0_6oLuo3

What did you expect to see?

Printing the URL path as /%u0041

What did you see instead?

That there was an error parsing the URL.

I'm reopening this bug because the current version of the URL spec for "Percent Encoded Bytes" ( https://url.spec.whatwg.org/#percent-encoded-bytes )

Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output.

The JavaScript and popular Rust parsers follow this model, which is making interoperability tricky.

JavaScript:

> new URL("http://localhost/%u0041").pathname
< '/%u0041'

Rust:

use url::Url;
use hyper::Uri;

fn main() {
    let uri = "https://localhost/%u0041".parse::<Uri>().unwrap();
    println!("path: {}", uri.path());

    let u = Url::parse("https://localhost/%u0041").unwrap();
    println!("path: {}", u.path());
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f1134169d2851ebe07da4a45e03ff68e

@dgryski dgryski changed the title net/url: esacpe() logic doesn't copy invalid bytes following % as expected by most recent spec net/url: unescape() logic doesn't copy invalid bytes following % as expected by most recent spec Nov 14, 2022
@gopherbot
Copy link

gopherbot commented Nov 14, 2022

Change https://go.dev/cl/450375 mentions this issue: net/url: accept invalid percent encodings

@dgryski
Copy link
Contributor Author

dgryski commented Nov 15, 2022

Thanks Ian.

shuLhan added a commit to shuLhan/share that referenced this issue Nov 19, 2022
In the next Go release (1.20), parsing URL with invalid percent escape
will not throw error anymore [1].

[1] golang/go#56732
@liggitt
Copy link
Contributor

liggitt commented Nov 19, 2022

Doesn't loosening of URI validation risk propagating invalid %-encodings to other systems that will choke on the invalid URIs (like, for example, something built with go1.19)? This seems opposite to the logic used to justify tightening of ParseIP logic in go1.17 (#30999).

I would not have expected the stdlib method to relax validation unconditionally like this

@liggitt
Copy link
Contributor

liggitt commented Nov 19, 2022

aside from the question about whether allowing these invalid sequences is going to cause problems with other systems, stdlib methods are widely used to validate things like IPs and URIs which are persisted and have to interoperate with multiple go versions. If this is going to update go1.20 to accept URIs as valid which go1.19 rejected as invalid, can this also include guidance for consumers to roll out this change?

  1. For go1.20 users that want to keep validating things strictly to disallow invalid %-encodings, how should they change their code?
  2. For go1.19 users updating to build with go1.20 who want to preserve existing runtime behavior with no code changes, how can they accomplish that? xref extending Go backward compatibility #55090 (cc @rsc)

@aojea
Copy link

aojea commented Nov 19, 2022

if my "git archeology" is correct, the whatwg spec has changed onJan 14, 2014,

whatwg/url@3cfaa17

that predates #11249 , and makes arguable to change the behavior on the parsers at this point, of course, I'm biased due to the implications of this change in kubernetes/kubernetes :)

@dgryski
Copy link
Contributor Author

dgryski commented Nov 19, 2022

I guess there are two issues conflated in the earlier bug. The first is "Please support decoding %u0000 as Unicode" (which I am not in favour of) and the second is "%u shouldn't appear in URLs" (which the current spec allows and provides handling logic for).

Given that 1) there are sites in the wild with %u in their URLs and 2) other languages support the spec and thus those URLs, I believe that net/url should too.

@dims
Copy link

dims commented Nov 19, 2022

Do we want to open another issue as this is closed?

@liggitt
Copy link
Contributor

liggitt commented Nov 20, 2022

Do we want to open another issue as this is closed?

Yeah, I'll open one on Monday

@liggitt
Copy link
Contributor

liggitt commented Nov 21, 2022

opened #56884

@gopherbot
Copy link

gopherbot commented Nov 22, 2022

Change https://go.dev/cl/452795 mentions this issue: Revert "net/url, net/http/httputil: accept invalid percent encodings"

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Nov 22, 2022

CL is being reverted.

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 22, 2022
@ianlancetaylor ianlancetaylor added this to the Go1.21 milestone Nov 22, 2022
gopherbot pushed a commit that referenced this issue Nov 22, 2022
This reverts CL 450375.

Reason for revert: This change causes test failures (and possibly other
problems) for users depending on the existing validation behavior.
Rolling back the change for now to give us more time to consider its
impact. This landed late in the cycle and isn't urgent; it can wait
for 1.21 if we do want to make the change.

Fixes #56884
For #56732

Change-Id: I082023c67f1bbb933a617453ab92b67abba876ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/452795
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants