Skip to content

net/url: add OmitHost bool to URL #46059

@TimothyGu

Description

@TimothyGu

What version of Go are you using (go version)?

$ go version
go version go1.16.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/timothy-gu/.cache/go-build"
GOENV="/home/timothy-gu/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/timothy-gu/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/timothy-gu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.3"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/timothy-gu/dev/go/go/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1221289449=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://play.golang.org/p/OFrp4cOVW8j

import (
	"fmt"
	"net/url"
)

u, _ := url.Parse("myscheme:path")
fmt.Printf("myscheme:path\t%s\n", u)

u, _ = url.Parse("myscheme:/path")
fmt.Printf("myscheme:/path\t%s\n", u)

u, _ = url.Parse("myscheme://path")
fmt.Printf("myscheme://path\t%s\n", u)

u, _ = url.Parse("myscheme:///path")
fmt.Printf("myscheme:///path\t%s\n", u)

What did you expect to see?

All the parsed URLs roundtrip.

myscheme:path	myscheme:path
myscheme:/path	myscheme:/path
myscheme://path	myscheme://path
myscheme:///path	myscheme:///path

What did you see instead?

myscheme:path	myscheme:path
myscheme:/path	myscheme:///path    <-- three slashes
myscheme://path	myscheme://path
myscheme:///path	myscheme:///path

The issue here is that myscheme:/path and myscheme:///path are treated as the same URL, both parsing to

&url.URL{
	Scheme: "myscheme",
	Host: "",
	Path: "/path",
}

Yet, they are materially different URLs. According to RFC 3986, myscheme:/path should be treated as having a path-absolute, which does not have an authority defined. On the other hand, myscheme:///path does have an authority, albeit an empty one.

Whether authority is undefined or empty is important for the recomposition algorithm in the RFC (i.e., URL.String):

      if defined(authority) then
         append "//" to result;
         append authority to result;
      endif;

I can imagine two different ways to solve this problem:

  1. Add a ForceAuthority boolean to url.URL, such that a true value implies an authority that is present even if Host is "" and User is nil. This has the advantage of being analogous to ForceQuery. However, it can run into compatibility problems: existing code that creates a file URL from scratch will now have their URL serialized to file:/home/... rather than the expected file:///home/....

  2. Add a NoAuthority boolean to url.URL, such that a false value implies an authority is present. url.Parse will set this field if a / (but not //) is present right after the scheme. This maintains the previous behavior for any manually created URLs, but fixes the Parse/String roundtrip for URLs with a single slash.

I propose approach 2.

As a historical note, this limitation was known when introduced in cdd6ae1 (Go 1.1) to fix #4189:

go/src/net/url/url_test.go

Lines 166 to 174 in b38b1b2

// non-authority with path
{
"mailto:/webmaster@golang.org",
&URL{
Scheme: "mailto",
Path: "/webmaster@golang.org",
},
"mailto:///webmaster@golang.org", // unfortunate compromise
},
At the time, the argument against fixing this bug was a desire to avoid introducing more fields to url.URL. Since Go 1.1, we have added quite a few new fields to url.URL: RawPath (1.5), ForceQuery (1.7), and RawFragment (1.15). I think we should still maintain a high bar when introducing new fields, but the memory cost for adding a new boolean is low (zero in fact, if we pack the structure properly).

/cc @rsc @bradfitz

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions