-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: RawPath shows inconsistent, case-sensitive behaviour with percentage encoded Unicode URI strings #33596
Comments
Lines 673 to 686 in 61bb56a
I think this is happening in |
This is testcase to show case-sensitive behavior. When unescaped character is uppercase in raw form func TestURL_ParseEscapingHexValues(t *testing.T) {
var testCases = []struct {
name string
when string
expectPath string
expectRawPath string
}{
{
name: "uppercase F in %3f",
when: "/ab%3Fde",
expectPath: "/ab?de",
expectRawPath: `/ab%3Fde`, // actual RawPath is empty and test fails
},
{
name: "lowercase f in %3f",
when: "/ab%3fde",
expectPath: "/ab?de",
expectRawPath: `/ab%3fde`,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
u, _ := url.Parse(tc.when)
if u.Path != tc.expectPath {
t.Errorf("path `%s` is not `%s`", u.Path, tc.expectPath)
}
if u.RawPath != tc.expectRawPath {
t.Errorf("RawPath `%s` is not `%s`", u.RawPath, tc.expectRawPath)
}
})
}
} |
Change https://go.dev/cl/417395 mentions this issue: |
Use
See, for example: https://go.dev/play/p/TCNnJEl77Zb |
This is not that useful when you compare performance difference between using |
I agree that RFC 3986, Section 2.1 states that -
So, according to RFC3986, |
The way to get the escaped path of a |
IMO, the fact that
IMO, it shouldn't. It should return |
Or maybe we should keep |
From its documentation, But note that the RFC uses the word |
For a URL constructed by For a URL constructed manually, such as |
Guys, you do understand that we are dealing with things that clients send over the (inter)net.
https://go.dev/play/p/566KcdtC18- func main() {
lowerCase, _ := url.Parse("/ab%3fde")
fmt.Printf("RawPath for /ab%%3fde is \"%s\" and Path is \"%s\"\n", lowerCase.RawPath, lowerCase.Path)
upperCase, _ := url.Parse("/ab%3Fde")
fmt.Printf("RawPath for /ab%%3Fde is \"%s\" and Path is \"%s\"\n", upperCase.RawPath, upperCase.Path)
} creates output:
how is this consistent behavior? |
@aldas I think you are giving The documentation says @neild Would it be better if the |
From docs I see and agree that Lines 662 to 669 in 2aa473c
even docs have lower case hex value and no mentions that So it should be?
this |
The documentation of the
where with This documentation is wrong, it should be
that is consistent with the @aldas In light of this, your previous example
behaves correctly and explains the "bizarre" behavior. Also this example is correct
If the original path is |
Could someone explain me if I develop HTTP server related code - request router or "middleware" or http handler and I need to get actual path part that client sent, where can I get it without doing extra work? Real world example - I want to serve file downloads where part of path is the file name and therefore I can not use func handler(w http.ResponseWriter, r *http.Request) {
io.WriteString(w, fmt.Sprintf("raw path part: %s\n", r.URL.RawPath))
}
func main() {
http.HandleFunc("/", handler)
if err := http.ListenAndServe(":8080", nil); err != http.ErrServerClosed {
log.Print(err)
}
} "most" of the time so for Why do we need to call I am not native English speaker - so to summarize: |
|
This is going nowhere. I assume So at least docs should explictly state in public API that these For example Lines 662 to 669 in 2aa473c
It would be useful if URL comment would explicitly state that case matters for escaped values. Currently this is not obvious from Lines 351 to 358 in 2aa473c
this part does not take into consideration that for
Line 365 in 2aa473c
and Lines 685 to 693 in 2aa473c
says that
does this sentence means that path that contains Now take these strings for example and see what happens when you send escaped value https://go.dev/play/p/LNKg1upv9ZP func main() {
for _, p := range []string{
"report.csv%3fpart1?a=1", // lowercase ?
"report.csv%3Fpart1?a=1", // uppercase ? <-- RawPath does not exists
"%5e", // lowercase ^
"%5E", // uppercase ^ <-- RawPath does not exists
"%7ebashrc", // lowercase ~
"%7Ebashrc", // uppercase ~ <-- RawPath exists
} {
u, err := url.Parse(p)
if err != nil {
fmt.Println(err)
continue
}
fmt.Printf("original: %s\t, Path: %s, EscapedPath(): %s\t,RawPath: %s\n", p, u.Path, u.EscapedPath(), u.RawPath)
}
} Output
somehow now This is not intuitive at all. |
This is commit that added In that commit message @rsc wrote
Does "many valid encodings" mean that
would it mean that At the moment cases like and this question still remains: Why do upper-case hex values like |
Change https://go.dev/cl/418035 mentions this issue: |
The original implementation of Go 1.15 changed the I think there's an argument that
The |
Consistently recommend using EscapedPath rather than RawPath directly. For #33596. Change-Id: Ibe5c2dfa7fe6b1fbc540efed6db1291fc6532726 Reviewed-on: https://go-review.googlesource.com/c/go/+/418035 Run-TryBot: Damien Neil <dneil@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Consistently recommend using EscapedPath rather than RawPath directly. For golang#33596. Change-Id: Ibe5c2dfa7fe6b1fbc540efed6db1291fc6532726 Reviewed-on: https://go-review.googlesource.com/c/go/+/418035 Run-TryBot: Damien Neil <dneil@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?Any environment
go env
OutputWhat did you do?
What did you expect to see?
url.PathEscape("😋") => %F0%9F%98%8B
/link/%F0%9F%98%8B
(A) and/link/%f0%9f%98%8b
(B) (upper and lower case respectively) are equivalent as per RFC 3986. Anhttp.HandlerFunc()
handling either of the URLs is expected to show consistent behaviour.What did you see instead?
An http handler that processes the identical URIs A and B behaves differently. B, which has uppercase characters, produces an empty
http.Request.URL.RawPath
where as A that has lowercase characters produces anhttp.Request.URL.RawPath
with unescaped characters. This breaks Unicode URL handling in popular HTTP routers like chi and httprouter.Discovered this inconsistency when using
html/template
that encodes Unicode strings in<a>
to have lowercase characters as opposed tourl.PathEscape
that produces uppercase characters.The text was updated successfully, but these errors were encountered: