Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: ServeMux forcibly cleans double forward slash in URLs even when behaving as a gateway #42244

Open
iiiusky opened this issue Oct 28, 2020 · 4 comments
Milestone

Comments

@iiiusky
Copy link

@iiiusky iiiusky commented Oct 28, 2020

What version of Go are you using (go version)?

$ go version 
go version go1.14 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/x1x1/Library/Caches/go-build"
GOENV="/Users/x1x1/Library/Application Support/go/env"
GOEXE=""
GOFLAGS="-trimpath"
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOOS="darwin"
GOPATH="/Users/x1x1/code/go"
GOPROXY="direct"
GOROOT="/usr/local/Cellar/go/1.14/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Volumes/Data/code/hello/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ms/07nblg552xl5wjdc54hjxdch0000gn/T/go-build080994454=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I use golang's ReverseProxy to reverse proxy the target website. The target website has a URL address: /tools/_ajax//forgetPwdSeting, the request method is post, but after the proxy, golang will automatically set /tools/_ajax// ForgetPwdSeting jumps to /tools/_ajax/forgetPwdSeting, and there is no real access and response to the website after the jump, which is completely invalid internally, causing some functions to be unavailable after the proxy.

What did you expect to see?

image
image

What did you see instead?

From the above figure and description, I found that the specific location of the intranet 301 redirection is
src/net/http/server.go:2347

func (mux *ServeMux) Handler(r *Request) (h Handler, pattern string) {

	// CONNECT requests are not canonicalized.
	if r.Method == "CONNECT" {
		// If r.URL.Path is /tree and its handler is not registered,
		// the /tree -> /tree/ redirect applies to CONNECT requests
		// but the path canonicalization does not.
		if u, ok := mux.redirectToPathSlash(r.URL.Host, r.URL.Path, r.URL); ok {
			return RedirectHandler(u.String(), StatusMovedPermanently), u.Path
		}

		return mux.handler(r.Host, r.URL.Path)
	}

	// All other requests have any port stripped and path cleaned
	// before passing to mux.handler.
	host := stripHostPort(r.Host)
	path := cleanPath(r.URL.Path)

	// If the given path is /tree and its handler is not registered,
	// redirect for /tree/.
	if u, ok := mux.redirectToPathSlash(host, path, r.URL); ok {
		return RedirectHandler(u.String(), StatusMovedPermanently), u.Path
	}

       // here
	if path != r.URL.Path {
		_, pattern = mux.handler(host, path)
		url := *r.URL
		url.Path = path
		return RedirectHandler(url.String(), StatusMovedPermanently), pattern
	}

	return mux.handler(host, r.URL.Path)
}
@dmitshur dmitshur changed the title In the golang ReverseProxy function, how to turn off the internal forced 301 redirect? net/http/httputil: ReverseProxy forcibly cleans double forward slash in URLs Oct 28, 2020
@dmitshur dmitshur added this to the Backlog milestone Oct 28, 2020
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Oct 28, 2020

This is either necessary and correct per HTTP spec, or a bug/missing feature. This issue needs investigation to determine which.

CC @bradfitz, @empijei.

@empijei
Copy link
Contributor

@empijei empijei commented Nov 11, 2020

I fear this might be neither. RFC 2396 allows for the single slash to be a path separator. This means that a URL with two consecutive slashes implies an empty path segment.

Now, while that spec is clear on the sintactic meaning of an empty path segment, it's not very clear what the semantics of that should be. That is left to the implementation. Filesystem implementations all ignore empty path segments, while other implementations differ.

So this is the point about URL specifications and server implementations: the server (origin) can decide what meaning to attribute to an empty path segment. Go aligns with the Unix interpretation and ignores empty path components on server multiplexers and routers.

So now we have to look into the proxy side of specs: is a hop in HTTP a receiving end for the statement above or not?

In this case we are looking at gateways (aka reverse proxies). This paragraph says that requirements for a gateway are the same for origin servers on one side, and user agents for the other side and adds no restrictions. I think the best approach here would be to make a proxy rely a message as transparently as possible, so not cleaning empty path segments.

So the question is not on what the spec says, but on what we decide might be better to do.

That said Go's ReverseProxy does not strip empty path components. The logic linked here is that of a ServeMux. If you run a ReverseProxy as Handler on an http.Server the paths will be forwarded as-is.

I don't think there is any change needed here as ReverseProxy already behaves in the best possible way. Your issue here is that you are using a ServeMux and when you do so you are not asserting that you are just a gateway, you also become an authoritative origin and, as such, you have to attribute meaning to paths.

Long story short

  • httputil.ReverseProxy does not strip empty path components, and this is in line with the best possible interpretation of the spec
  • http.ServeMux behaves as an origin, and as such it attributes meaning to path components. In this case empty ones are discarded a-la-UNIX
  • Combining a reverse proxy with a server multiplexer makes the behavior of a server multiplexer prevail, which seems reasonable to me (for consistency with other paths that might be registered on the mux).

So, before we close this I have a couple of questions for @iiiusky:

  1. The code you linked is in ServeMux, not in ReverseProxy. Are you using a ReverseProxy inside a ServeMux for another service or are those separate things? If you just need to proxy you can do so without a ServeMux and maybe get around the issue? (Note: you can set any http handler as Handler in a Server, not just muxes).
  2. What is the technology used for the other server? Go should accept empty path segments without issues.
@empijei empijei changed the title net/http/httputil: ReverseProxy forcibly cleans double forward slash in URLs net/http: ServeMux forcibly cleans double forward slash in URLs even when behaving as a gateway Nov 11, 2020
@iiiusky
Copy link
Author

@iiiusky iiiusky commented Nov 24, 2020

@empijei
My code is roughly as follows:

func handleRequestAndRedirect(res http.ResponseWriter, req *http.Request) {
	var xDefaultTransport http.RoundTripper = &http.Transport{
		DialContext: (&net.Dialer{
			Timeout:   30 * time.Second,
			KeepAlive: 30 * time.Second,
			DualStack: true,
		}).DialContext,
		ForceAttemptHTTP2:     true,
		MaxIdleConns:          100,
		IdleConnTimeout:       90 * time.Second,
		TLSHandshakeTimeout:   10 * time.Second,
		ExpectContinueTimeout: 1 * time.Second,
		TLSClientConfig: &tls.Config{
			InsecureSkipVerify: true, // test server certificate is not trusted.
		},
	}

	url, _ := url.Parse(target)

	proxy := httputil.NewSingleHostReverseProxy(url)
	proxy.Transport = xDefaultTransport
	proxy.ModifyResponse = ModifyResponse

	req.URL.Host = url.Host
	req.URL.Scheme = url.Scheme
	req.Header.Set("X-Forwarded-Host", req.Header.Get("Host"))
	req.Header.Set("Referer", strings.ReplaceAll(req.Header.Get("Referer"),
		domain, target))
	proxy.ServeHTTP(res, req)
}

http.HandleFunc("/", handleRequestAndRedirect)
if err := http.ListenAndServe(fmt.Sprintf("%s:%d", bindIP, bindPort), nil); err != nil {
		panic(err)
}

The main realization is that I use http.ListenAndServe to open a web service, and then add a handle to it, this handle uses anti-generation, and can modify the response content.

@empijei
Copy link
Contributor

@empijei empijei commented Nov 25, 2020

http.HandleFunc registers your handler on DefaultServeMux, which is a Mux as I suspected.

Can you please change your last 4 lines to be:

s := http.Server{
  Addr: fmt.Sprintf("%s:%d", bindIP, bindPort),
  Handler: http.HandlerFunc(handleRequestAndRedirect),
}
if err := s.ListenAndServe(fmt.Sprintf("%s:%d", bindIP, bindPort), nil); err != nil {
		panic(err)
}

and check if it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.