Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: getting the host[name] (or port) of a URL #16142

Closed
lgarron opened this issue Jun 22, 2016 · 16 comments
Closed

net/url: getting the host[name] (or port) of a URL #16142

lgarron opened this issue Jun 22, 2016 · 16 comments
Milestone

Comments

@lgarron
Copy link

lgarron commented Jun 22, 2016

For the hstspreload.appspot.com code, I have found myself needing to do two things:

  • Tell if a request is from localhost.
  • Tell if the Origin header of a CORS request matches a certain hostname.

I was disappointed to find that url.URL does not actually provide a direct way to get the host, as used in the definition of web origins. This meaning of host sometimes goes by different names (e.g. hostname, domain), but it is simply called "host" for security-critical definitions in mechanisms like HSTS, HPKP, and CSP.

In addition, it is also not straightforward to get the port of a URL in Go.

Proposal

I know that changing the meaning of url.URL.Host would break Go's compatibility guarantee. Therefore, would it be possible to do one of the following?

  • Add a HostName field to url.URL
  • Add one of the following to the standard library, possibly based on http.canonicalAddr():
    • A method in the url package: url.URL.HostName()
    • A function in one of the packages under net: HostName(u *url.URL) (string, error)

(Same ideas for Port.)

I used the name HostName here as a parallel with Javascript (see below), but the exact name doesn't matter.

Status Quo

This is the current behaviour:

u, err := url.Parse("http://localhost:8080")
fmt.Printf("%s", u.Host) // "localhost:8080"
fmt.Printf("%s", u.Port) // ERROR: doesn't exist.

I understand that grouping the host and port can be convenient for some developers, and someone has pointed out to me that this matches the behaviour of Javascript, e.g. location.host. However, Javascript has solved this in a backwards-compatible way by offering hostname and port as additional fields:

var a = document.createElement("a");
a.href = "https://localhost:8080";
console.log(a.host);     // "localhost:8080"
console.log(a.hostname); // "localhost"
console.log(a.port);     // "8080"

a.href = "https://github.com";
console.log(a.host);     // "github.com"
console.log(a.hostname); // "github.com"
console.log(a.port);     // ""

If I understand correctly, there is no way to get the host of a URL in a straightforward and semantic way in Go. By "semantic", I mean "consistent with the behaviour of the net and url packages, treated opaquely".

A simple approach

If a developer wants to compare the host of a URL against an expected value, a simple solution that they might try is:

wantHost = "localhost"
u, err := url.Parse("http://localhost:8080")
if strings.HasPrefix(u.Host, wantHost) {
  fmt.Printf("okay!")
}

However, this is completely insecure. google.com.phishing.evil.com will match wantHost == "google.com". Now, we can fix this by doing by forcing the comparison to include the colon iff there is one:

wantHost = "localhost"
u, err := url.Parse("http://localhost:8080")
if u.Host == wantHost || strings.HasPrefix(u.Host, wantHost+":") {
  fmt.Printf("okay!")
}

However, this requires an uncomfortable plunge into the semantics of u.Host when we just need a foolproof way to do a security check. I find that very uncomfortable.

In addition, I don't expect that developers will always follow this chain of reasoning to the end. Either out of accident or laziness, they may make assumptions that only one of u.Host == wantHost or strings.HasPrefix(u.Host, wantHost+":") is necessary for their use case. This is safe, but could introduce a bug. If their test conditions only ever have a case with a port (localhost:8080), or only without a port (say, from a form that accepts pasted URLs from users), the bug might linger for a long while.

A better approach?

Now, let's say that the simple solution doesn't cut it for you. For example:

  1. You need to calculate the host of a URL rather than comparing it against something.
  2. You want a solution that does not make any assumptions about URLs that are not made by the core Go packages (URLs are complicated, so this is a good goal).

Once you have a parsed URL, you can try to use net.SplitHostPort():

u, err := url.Parse("http://localhost:8080")
host, port, err := net.SplitHostPort(u.Host)

However, this will fail if you pass in a URL without a port:

u, err := url.Parse("http://github.com")
host, port, err := net.SplitHostPort(u.Host)

// err is: "missing port in address github.com"

Now, you could detect whether u.Host contains a :, and conditionally call net.SplitHostPort(u.Host) iff it doesn't, but I firmly believe that this is outside the scope of what a Go developer should be expected to do. It requires implementing handling two errors + a branch, and still requires a "semantic plunge", to use my terminology from before.

To me, it is also counterintuitive that one part of the core library (url.Parse) outputs a host+port field with an optional port, while another one (net.SplitHostPort) requires the port. I certainly made the mistake of assuming that it was okay to pass the the value from url.Parse to net.SplitHostPort – I didn't catch the mistake in local testing because the input always had a port (localhost:8080), and it didn't show up in production because it was a rare place where I fell back to the safe path without surfacing an error. Note that this concern also applies in cases where someone is trying to determine the port of a URL.

At this point, I've run out of good ideas for getting the host of a URL safely, using the standard libraries. Please do let me know if I've missed something. It might be the case that there is a great way to do this, and a simple update to the url.URL comments would help someone like me figure out what to do if I want to get the actual host of a URL.

@lgarron lgarron changed the title Getting the hostname (or port) of a URL Getting the host[name] (or port) of a URL Jun 22, 2016
@ianlancetaylor ianlancetaylor changed the title Getting the host[name] (or port) of a URL net/url: getting the host[name] (or port) of a URL Jun 22, 2016
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jun 22, 2016
lgarron added a commit to chromium/hstspreload.org that referenced this issue Jun 23, 2016
This works around golang/go#16142
However, it is fragile because it duplicates some private code from `net.http`, which could theoretically change its behaviour without notice.
@bradfitz
Copy link
Contributor

I too have felt this pain many times. Let's see if we can address it in Go 1.8.

/cc @mpl

@bradfitz bradfitz modified the milestones: Go1.8, Unplanned Jun 27, 2016
@danp
Copy link
Contributor

danp commented Sep 3, 2016

Is adding URL.Hostname (and possibly URL.Port) the way to go here? Should it/they factor into URL.String?

@lgarron
Copy link
Author

lgarron commented Sep 3, 2016

Is adding URL.Hostname (and possibly URL.Port) the way to go here?

I think that would be simple and sufficient.

Should it/they factor into URL.String?

I don't think so.
One problem is that certain URLs don't have a specified port, but an unambiguous default port. I think sticking to existing behaviour for String makes sense, although it might be a good idea to expose the default port mechanism so people can get an actual number.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

A few thoughts:

var portMap = map[string]string{
        "http":  "80",
        "https": "443",
}
  • We can't change existing behavior of anything. Adding new accessor methods is fine.
  • A (*url.URL).Hostname() string method returning the port-less u.Host sounds good, especially if it matches Javascript.
  • So then (*url.URL).Port() string too I guess? The question is what to do on missing/implicit ports. Since not all schemes have a known implicit port (and sometimes we don't even have a scheme), it would have to return an empty string sometimes, so we might as well always return an empty string when it's not present. This is less surprising, too.
  • My pet peeve: how do create a "host:port" string (as from net.JoinHostPort) from a *url.URL? This is where we need a default port. New method on URL alongside Port() string, like PortOrDefault() string? Or just a top-level function like:
func DefaultPort(scheme string) string

But then caller code has to do:

     u, ... := url.Parse(urlString)
     port := u.Port()
     if port == "" {
         port = net.DefaultPort(u.Scheme)
     }
     hostPort := net.JoinHostPort(u.HostName(), port)

Maybe that's acceptable? With PortOrDefault() or even HostPort() string it'd be:

     u, ... := url.Parse(urlString)
     c, err := net.Dial("tcp" , u.HostPort())
     u, ... := url.Parse(urlString)
     c, err := net.Dial("tcp" , net.JoinHostPort(u.Hostname(), u.DefaultPort())

But neither of those are great examples because anybody diving into calling net.Dial or otherwise using the "host:port" will want to do some validation on the scheme, host, and port anyway. So maybe just having a (*url.URL).Port() string + Hostname() string + func DefaultPort(scheme string) is sufficient.

Thoughts?

@bradfitz bradfitz self-assigned this Sep 9, 2016
@danp
Copy link
Contributor

danp commented Sep 9, 2016

Should DefaultPort live in net or net/url? If it takes a scheme, maybe net/url?

I'm a bit torn on URL.Port returning an empty string even in the case where Scheme=http and Host=www.google.com (with no port). Seems like having it return 80 there would satisfy the pretty common case and if someone cares to further inspect Host they can.

@mpl
Copy link
Contributor

mpl commented Sep 9, 2016

Agreed on u.HostName() and u.Port().

I prefer DefaultPort to PortOrDefault.

In the example:

c, err := net.Dial("tcp" , u.HostPort())

what does u.HostPort do? Does it do the full dance that you gave as the first example? If yes, there's still the possibility that u.HostPort() returns something with an empty port, since, as you said, there might not be any scheme, isn't there? If yes, I don't think it's worth providing. if there's still some checking to do on top of it, then I don't think the method is bringing much to the table.

Also, I think a helper like camlistore.org/pkg/netutil.HasPort might be nice. I suppose, with the new functions, it would become something like:

return u.Port() != ""

which is actually simple enough as it is, so maybe no need for HasPort.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

@dpiddy,

Should DefaultPort live in net or net/url? If it takes a scheme, maybe net/url?

Yeah, I meant net/url, not net.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

what does u.HostPort do? Does it do the full dance that you gave as the first example? If yes, there's still the possibility that u.HostPort() returns something with an empty port, since, as you said, there might not be any scheme, isn't there? If yes, I don't think it's worth providing. if there's still some checking to do on top of it, then I don't think the method is bringing much to the table.

Yeah, it was a hypothetical go-all-the-way example. I agree it's too much, since you'd need to do your own checking anyway.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

Actually, I think https://golang.org/pkg/net/#LookupPort is probably sufficient instead of adding func DefaultPort. Try https://play.golang.org/p/zBvrVFM6M7 on a non-playground machine. I filed #17045 to make it less dependent on the host machine and work in the playground.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

@lgarron
Copy link
Author

lgarron commented Sep 9, 2016

@lgarron, @dpiddy, @mpl, see https://go-review.googlesource.com/28933

Thoughts?

Looks excellent!

(And I totally forgot about IPv6 addresses. Good catch. :-)

@danp
Copy link
Contributor

danp commented Sep 9, 2016

What if instead of Port() string it was Port() (string, bool) or similar to indicate whether the value came from something specific in Host or the default for Scheme?

@gopherbot
Copy link

CL https://golang.org/cl/28933 mentions this issue.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

But then you have 3 cases to reason about instead of 2,

"80", true
"80", false
"", false

and you can't use it as an expression, and you still need to do the portString != "" check, since the boolean doesn't tell you whether it's usable. I don't like that it looks like a "comma-ok" style method, but it's not.

And then people will expect it to work on an http.ServerHTTP.Request, but incoming server requests don't know their listener necessarily and what scheme they even arrived on, so the port will usually be empty and it can't be inferred since you can't always infer the scheme of an incoming HTTP request, especially with proxies.

Mostly I don't like the that I can't use it as a string expression. If we did implicit ports from the scheme, I'd almost prefer two methods, like Port() string (with default) and HasPort() bool (whether u.Host had an explicit one). But really... how often are you going to care whether the port was explicit or not? The only time I can think it'd matter is when creating a URL again and you didn't want it to stringify to something verbose. But then you could just clone URL with its Host field instead.

Also, if we do default ports for a scheme, that either means including a static map of scheme name to default port number in the net/url package, or using net.LookupPort (which return an error too!) from the net/url package, but we can't ... that's not an existing dependency, and I don't think it would be acceptable to add. The net package is way larger.

If we do implicit ports, that means two copies of the implicit port map: one in the net package, and one in net/url. Maybe that's fine. Or maybe we share them via a third internal package if it actually mattered.

I don't know. I'm undecided.

Data might be compelling. I wonder if the "http" => 80, "https" => 443 logic exists near lots of code parsing URLs and URL.Host. (Paging @campoy :))

@danp
Copy link
Contributor

danp commented Sep 9, 2016

Good points, all. Having it return a single value would be best and in that case not doing any inferring about default ports is safest. It would apparently match the mentioned javascript behavior, too.

Data would be great to consider if it can be had.

@lgarron
Copy link
Author

lgarron commented Sep 9, 2016

🎉

Thanks for implementing! 😃

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants