Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: RawPath shows inconsistent, case-sensitive behaviour with percentage encoded Unicode URI strings #33596

Open
knadh opened this issue Aug 12, 2019 · 2 comments
Labels
NeedsInvestigation
Milestone

Comments

@knadh
Copy link

@knadh knadh commented Aug 12, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.7 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Any environment

go env Output
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/user/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build606111517=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"fmt"
	"html/template"
	"net/http"
	"net/url"
)

type content struct {
	TplEncoded      string
	ManuallyEncoded template.URL

	ShowPaths bool
	RawPath   string
	Path      string
}

func main() {
	tpl, _ := template.New("test").Parse(`<!doctype html>
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
		<meta charset="utf-8" />
	</head>
	<body>
		{{ if .ShowPaths }}
			<p>RawPath = {{ .RawPath }}</p>
			<p>Path = {{ .Path }}</p>
		{{ else }}
			<a href="/link/{{ .TplEncoded }}">Template encoded link</a><br />
			<a href="/link/{{ .ManuallyEncoded }}">Manually encoded link</a>
			<br />
			<p>View this page's source to see the (lower/upper)case difference
			in the links</p>
		{{ end }}
	</body>
	</html>`)

	// Renders the root with good and bad links.
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		s := "😋" // Unicode emoji.
		tpl.Execute(w, content{
			// html/template encodes into lowercase characters.
			TplEncoded: s,

			// url.PathEscape encodes into uppercase characters.
			ManuallyEncoded: template.URL(url.PathEscape(s)),
		})
	})

	// This handler produces inconsistent RawPath based on (upper/lower)case encoding in the URI.
	http.HandleFunc("/link/", func(w http.ResponseWriter, r *http.Request) {
		tpl.Execute(w, content{
			ShowPaths: true,
			RawPath:   r.URL.RawPath,
			Path:      r.URL.Path,
		})
	})

	fmt.Println("Go to http://127.0.0.1:8080")
	http.ListenAndServe(":8080", nil)
}

What did you expect to see?

url.PathEscape("😋") => %F0%9F%98%8B

/link/%F0%9F%98%8B (A) and /link/%f0%9f%98%8b (B) (upper and lower case respectively) are equivalent as per RFC 3986. An http.HandlerFunc() handling either of the URLs is expected to show consistent behaviour.

What did you see instead?

An http handler that processes the identical URIs A and B behaves differently. B, which has uppercase characters, produces an empty http.Request.URL.RawPath where as A that has lowercase characters produces an http.Request.URL.RawPath with unescaped characters. This breaks Unicode URL handling in popular HTTP routers like chi and httprouter.

Discovered this inconsistency when using html/template that encodes Unicode strings in <a> to have lowercase characters as opposed to url.PathEscape that produces uppercase characters.

@andybons andybons added the NeedsInvestigation label Aug 12, 2019
@andybons andybons added this to the Unplanned milestone Aug 12, 2019
@knadh
Copy link
Author

@knadh knadh commented Aug 13, 2019

go/src/net/url/url.go

Lines 673 to 686 in 61bb56a

func (u *URL) setPath(p string) error {
path, err := unescape(p, encodePath)
if err != nil {
return err
}
u.Path = path
if escp := escape(path, encodePath); p == escp {
// Default encoding is fine.
u.RawPath = ""
} else {
u.RawPath = p
}
return nil
}

I think this is happening in net/url.URL.setPath(). Line 674 unescapes lowercased encoding successfully, and line 679 escapes the unescaped value again which now produces uppercased encoding. Thus, the original lowercase encoded value, and the re-encoded uppercase value, although equivalent semantically, don't match (p == escp), resulting in RawPath being set.

@aldas
Copy link

@aldas aldas commented Apr 30, 2022

This is testcase to show case-sensitive behavior. When unescaped character is uppercase in raw form RawPath will not be filled as escape will provide same value as p at if escp := escape(path, encodePath); p == escp {

func TestURL_ParseEscapingHexValues(t *testing.T) {
	var testCases = []struct {
		name          string
		when          string
		expectPath    string
		expectRawPath string
	}{
		{
			name:          "uppercase F in %3f",
			when:          "/ab%3Fde",
			expectPath:    "/ab?de",
			expectRawPath: `/ab%3Fde`, // actual RawPath is empty and test fails
		},
		{
			name:          "lowercase f in %3f",
			when:          "/ab%3fde",
			expectPath:    "/ab?de",
			expectRawPath: `/ab%3fde`,
		},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			u, _ := url.Parse(tc.when)

			if u.Path != tc.expectPath {
				t.Errorf("path `%s` is not `%s`", u.Path, tc.expectPath)
			}
			if u.RawPath != tc.expectRawPath {
				t.Errorf("RawPath `%s` is not `%s`", u.RawPath, tc.expectRawPath)
			}
		})
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

3 participants