Skip to content

net/url: getting the host[name] (or port) of a URL #16142

@lgarron

Description

@lgarron

For the hstspreload.appspot.com code, I have found myself needing to do two things:

  • Tell if a request is from localhost.
  • Tell if the Origin header of a CORS request matches a certain hostname.

I was disappointed to find that url.URL does not actually provide a direct way to get the host, as used in the definition of web origins. This meaning of host sometimes goes by different names (e.g. hostname, domain), but it is simply called "host" for security-critical definitions in mechanisms like HSTS, HPKP, and CSP.

In addition, it is also not straightforward to get the port of a URL in Go.

Proposal

I know that changing the meaning of url.URL.Host would break Go's compatibility guarantee. Therefore, would it be possible to do one of the following?

  • Add a HostName field to url.URL
  • Add one of the following to the standard library, possibly based on http.canonicalAddr():
    • A method in the url package: url.URL.HostName()
    • A function in one of the packages under net: HostName(u *url.URL) (string, error)

(Same ideas for Port.)

I used the name HostName here as a parallel with Javascript (see below), but the exact name doesn't matter.

Status Quo

This is the current behaviour:

u, err := url.Parse("http://localhost:8080")
fmt.Printf("%s", u.Host) // "localhost:8080"
fmt.Printf("%s", u.Port) // ERROR: doesn't exist.

I understand that grouping the host and port can be convenient for some developers, and someone has pointed out to me that this matches the behaviour of Javascript, e.g. location.host. However, Javascript has solved this in a backwards-compatible way by offering hostname and port as additional fields:

var a = document.createElement("a");
a.href = "https://localhost:8080";
console.log(a.host);     // "localhost:8080"
console.log(a.hostname); // "localhost"
console.log(a.port);     // "8080"

a.href = "https://github.com";
console.log(a.host);     // "github.com"
console.log(a.hostname); // "github.com"
console.log(a.port);     // ""

If I understand correctly, there is no way to get the host of a URL in a straightforward and semantic way in Go. By "semantic", I mean "consistent with the behaviour of the net and url packages, treated opaquely".

A simple approach

If a developer wants to compare the host of a URL against an expected value, a simple solution that they might try is:

wantHost = "localhost"
u, err := url.Parse("http://localhost:8080")
if strings.HasPrefix(u.Host, wantHost) {
  fmt.Printf("okay!")
}

However, this is completely insecure. google.com.phishing.evil.com will match wantHost == "google.com". Now, we can fix this by doing by forcing the comparison to include the colon iff there is one:

wantHost = "localhost"
u, err := url.Parse("http://localhost:8080")
if u.Host == wantHost || strings.HasPrefix(u.Host, wantHost+":") {
  fmt.Printf("okay!")
}

However, this requires an uncomfortable plunge into the semantics of u.Host when we just need a foolproof way to do a security check. I find that very uncomfortable.

In addition, I don't expect that developers will always follow this chain of reasoning to the end. Either out of accident or laziness, they may make assumptions that only one of u.Host == wantHost or strings.HasPrefix(u.Host, wantHost+":") is necessary for their use case. This is safe, but could introduce a bug. If their test conditions only ever have a case with a port (localhost:8080), or only without a port (say, from a form that accepts pasted URLs from users), the bug might linger for a long while.

A better approach?

Now, let's say that the simple solution doesn't cut it for you. For example:

  1. You need to calculate the host of a URL rather than comparing it against something.
  2. You want a solution that does not make any assumptions about URLs that are not made by the core Go packages (URLs are complicated, so this is a good goal).

Once you have a parsed URL, you can try to use net.SplitHostPort():

u, err := url.Parse("http://localhost:8080")
host, port, err := net.SplitHostPort(u.Host)

However, this will fail if you pass in a URL without a port:

u, err := url.Parse("http://github.com")
host, port, err := net.SplitHostPort(u.Host)

// err is: "missing port in address github.com"

Now, you could detect whether u.Host contains a :, and conditionally call net.SplitHostPort(u.Host) iff it doesn't, but I firmly believe that this is outside the scope of what a Go developer should be expected to do. It requires implementing handling two errors + a branch, and still requires a "semantic plunge", to use my terminology from before.

To me, it is also counterintuitive that one part of the core library (url.Parse) outputs a host+port field with an optional port, while another one (net.SplitHostPort) requires the port. I certainly made the mistake of assuming that it was okay to pass the the value from url.Parse to net.SplitHostPort – I didn't catch the mistake in local testing because the input always had a port (localhost:8080), and it didn't show up in production because it was a rare place where I fell back to the safe path without surfacing an error. Note that this concern also applies in cases where someone is trying to determine the port of a URL.

At this point, I've run out of good ideas for getting the host of a URL safely, using the standard libraries. Please do let me know if I've missed something. It might be the case that there is a great way to do this, and a simple update to the url.URL comments would help someone like me figure out what to do if I want to get the actual host of a URL.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions