Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: PathEscape treatment of reserved characters is inconsistent #27559

Open
gibson042 opened this issue Sep 7, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@gibson042
Copy link
Contributor

commented Sep 7, 2018

; and , are permitted by RFC 3986 in path segments, but PathEscape percent-encodes them because "URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents". However, it does not escape other reserved characters potentially subject to the same kind of special treatment:

  • = is "often used to delimit parameters and parameter values", particularly in conjunction with ; (e.g., "name;v=1.1")
  • . is used in . and .. "dot-segments" to "indicate relative position within the hierarchical tree of names… interpreted within the URI path hierarchy… removed as part of the resolution process".
  • : is not allowed in the first path segment of a relative-path reference, "as it would be mistaken for a scheme name" (e.g., a URI reference of "foo:bar/baz/quux" is very different from "foo-bar/baz/quux").

What version of Go are you using (go version)?

1.11

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

linux amd64

What did you do?

https://play.golang.org/p/f-byVa3k2gi

pathSegments := []string{"non-scheme:non-authority", "foo=bar;bar=baz", ".."}
for i, segment := range pathSegments {
	pathSegments[i] = url.PathEscape(segment)
}
fmt.Println(strings.Join(pathSegments, "/"))

What did you expect to see?

non-scheme%3Anon-authority/foo%3Dbar%3Bbar%3Dbaz/%2E%2E (all potentially special characters percent-encoded), or maybe non-scheme:non-authority/foo=bar;bar=baz/.. (no characters percent-encoded unless required by RFC).

What did you see instead?

non-scheme:non-authority/foo=bar%3Bbar=baz/..

@andybons andybons added this to the Unplanned milestone Sep 7, 2018

@andybons

This comment has been minimized.

Copy link
Member

commented Sep 7, 2018

@andriisoldatenko

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2018

@bradfitz @andybons
I tried to debug latest go1.11 version shouldEscape function from src/net/url/url.go:101:
and if you can confirm that bug exists and we need to fix it I think I can try it:

   138:			case encodePathSegment: // §3.3
   139:				// The RFC allows : @ & = + $ but saves / ; , for assigning
   140:				// meaning to individual path segments.
=> 141:				return c == '/' || c == ';' || c == ',' || c == '?'
@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 23, 2018

An actual implementation is always a mix of RFC and practical usage. And trying to follow RFC to the letter has bit us in the past (#27302). I believe what we are trying to do here is only escape those characters which are intended to be used as delimiters in a path segment.

I am curious to know whether you are actually facing issues due to this, or you have just stumbled across this.

On a different note, reading the RFC and the code, I have a different question -

@bradfitz - Regarding this comment -

// The RFC allows : @ & = + $ but saves / ; , for assigning
// meaning to individual path segments.

The RFC mentions -

For example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. The comma (",") reserved character is often used for
similar purposes.

So, following this logic, shouldn't = be escaped too, since we are already escaping , and ; ? Or possibly I missed something ?

@gibson042

This comment has been minimized.

Copy link
Contributor Author

commented Dec 23, 2018

I opened this three months ago, and don't remember the details of the problems it caused (a workaround was simple and implemented without fuss). And if you see compelling reason to leave the inconsistent behavior in PathEscape (e.g., because net/url changes are "always problematic compatibility-wise"), then so be it... but even in that case, documentation should be updated for accuracy, e.g.

PathEscape escapes the string so it can be safely placed inside a URL path segment. Most characters reserved for delimiting data within URIs are percent-encoded, but for compatibility reasons the following characters are not: $ & + = : @

Though I still think that it would make sense to percent-encode =, matching ; and ,.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.