Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: misleading error message when url has a leading space #29261

Open
stefanb opened this Issue Dec 14, 2018 · 14 comments

Comments

Projects
None yet
6 participants
@stefanb
Copy link

commented Dec 14, 2018

What version of Go are you using (go version)?

$ go version
go version go1.11.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/stefanb/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/stefanb/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.3/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z2/3bdl__bs0xxd3kmf5c1pj0gm0000gn/T/go-build212726590=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
	"net/http"
)

func main() {
	_, err := http.Get(" http://example.org")
	if err != nil {
		panic(err)
	}
}

https://play.golang.org/p/jkcYSD6ZRcO

What did you expect to see?

Either

  • a generic error message of invalid URL or
  • a detailed error of invalid URL scheme or
  • a detailed error of leading space in URL

What did you see instead?

A misleading detailed error message:

parse  http://example.org: first path segment in URL cannot contain colon

In #24246 the proposed solution was to trim the URLs before using them, but from the given error message it is very hard to see what the real problem is.

I propose to either:

  • adjust the error message to eg:
parse  http://example.org: URL scheme cannot contain spaces

or

  • quote the problematic URL in the error message, so that it is more obvious even if the message itself remains misleading:
parse " http://example.org": first path segment in URL cannot contain colon

or

  • ideally both adjust the error message + quote the URL:
parse " http://example.org": URL scheme cannot contain spaces

The cannot contain spaces in error message can be changed for cannot contain whitespace or even contains invalid characters, depending how the check is implemented.

Current implementation:

go/src/net/url/url.go

Lines 540 to 562 in b50210f

if !strings.HasPrefix(rest, "/") {
if url.Scheme != "" {
// We consider rootless paths per RFC 3986 as opaque.
url.Opaque = rest
return url, nil
}
if viaRequest {
return nil, errors.New("invalid URI for request")
}
// Avoid confusion with malformed schemes, like cache_object:foo/bar.
// See golang.org/issue/16822.
//
// RFC 3986, §3.3:
// In addition, a URI reference (Section 4.1) may be a relative-path reference,
// in which case the first path segment cannot contain a colon (":") character.
colon := strings.Index(rest, ":")
slash := strings.Index(rest, "/")
if colon >= 0 && (slash < 0 || colon < slash) {
// First path segment has colon. Not allowed in relative URL.
return nil, errors.New("first path segment in URL cannot contain colon")
}
}

@dsnet dsnet pinned this issue Dec 14, 2018

@dsnet dsnet unpinned this issue Dec 14, 2018

@bradfitz bradfitz added this to the Go1.13 milestone Dec 14, 2018

@bradfitz bradfitz added the NeedsFix label Dec 14, 2018

@bradfitz bradfitz changed the title url.parse - misleading error message when url has a leading space net/url: misleading error message when url has a leading space Dec 14, 2018

@cgopalan

This comment has been minimized.

Copy link

commented Dec 15, 2018

@bradfitz @dsnet Wouldnt a TrimSpace before getscheme on this line solve the issue? I can raise a PR for this if thats agreed upon as the fix. Would also write tests of course.

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 21, 2018

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

> require("url").parse('  https://example.org')
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'example.org',
  port: null,
  hostname: 'example.org',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'https://example.org/' }

Curl does not -

curl '   https://example.org' 
curl: (1) Protocol "   https" not supported or disabled in libcurl

Same with wget -

wget '   https://example.org' 
   https://example.org: Scheme missing.

Python also fails to parse-

from urllib.parse import urlparse
>>> o = urlparse('  https://example.org')
>>> o
ParseResult(scheme='', netloc='', path='  https://example.org', params='', query='', fragment='')

Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

@robaho

This comment has been minimized.

Copy link

commented Dec 21, 2018

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 21, 2018

That's right, sorry. Then we can simplify it to ".. does not begin with a letter".

@stefanb

This comment has been minimized.

Copy link
Author

commented Dec 21, 2018

simplify it to ".. does not begin with a letter".

Perhaps ".. scheme does not begin with a letter" would be more accurate as "//example.org" is valid to my understanding.

Interestingly, i found

	u, err := url.Parse("    //example.org")
	if err != nil {
		panic(err)
	}
	fmt.Println(u)

is NOT failing, but outputs:

%20%20%20%20//example.org

https://play.golang.org/p/jRk63PWsKG5

@bradfitz

This comment has been minimized.

Copy link
Member

commented Dec 21, 2018

Yes, we should return an error on leading bogus chars.

@cgopalan

This comment has been minimized.

Copy link

commented Dec 21, 2018

ok I will make a PR for this code change with associated tests.

stefanb added a commit to stefanb/go that referenced this issue Dec 21, 2018

net/url: improve url parsing error messages by quoting
Current implementation doesn't always make it obvious what the exact
problem with the URL is, so this makes it clearer by consistently quoting
the invalid URL, as is the norm in other parsing implementations, eg.:
strconv.Atoi(" 123") returns an error: parsing " 123": invalid syntax

Updates golang#29261
@stefanb

This comment has been minimized.

Copy link
Author

commented Dec 21, 2018

I have implemented the generic improvement of error messages by consistently quoting the URLs in #29384, but more detailed messages and stricter validation are welcome of course.

@cgopalan

This comment has been minimized.

Copy link

commented Dec 22, 2018

@bradfitz what does "bogus chars" translate to besides a space?

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 22, 2018

Any non-letter, according to the spec.

@cgopalan

This comment has been minimized.

Copy link

commented Dec 23, 2018

On further analysis, it seems its not straightforward to implement this because of 2 reasons:

  • the characters that are invalid in a scheme could be valid in other parts of the url (like the period and plus characters) so we will need to remember what we processed before. Currently getscheme does not do that, and am wondering if adding that functionality would increase complexity.
  • the url package says it should be backward compatible.

To account for the space-specific error, we could add a separate case branch and raise the error message like so:

case c == ' ':   // Add other invalid prefix chars here
	if i == 0 {
		return "", "", errors.New("URL scheme has invalid characters")
	} 

However this seems like its a bandaid fix to me, but I am not sure, so would appreciate some input. Any thoughts on whether this is good, or if there's any other approach?

@stefanb @bradfitz @agnivade ?

@stefanb

This comment has been minimized.

Copy link
Author

commented Dec 28, 2018

I assume the proposed change would go just before default line 450 in the switch block here:

go/src/net/url/url.go

Lines 435 to 457 in b50210f

func getscheme(rawurl string) (scheme, path string, err error) {
for i := 0; i < len(rawurl); i++ {
c := rawurl[i]
switch {
case 'a' <= c && c <= 'z' || 'A' <= c && c <= 'Z':
// do nothing
case '0' <= c && c <= '9' || c == '+' || c == '-' || c == '.':
if i == 0 {
return "", rawurl, nil
}
case c == ':':
if i == 0 {
return "", "", errors.New("missing protocol scheme")
}
return rawurl[:i], rawurl[i+1:], nil
default:
// we have encountered an invalid character,
// so there is no valid scheme
return "", rawurl, nil
}
}
return "", rawurl, nil
}

This would make it inconsistent with other validation error cases in the same switch block:

  1. the first character validation:

    go/src/net/url/url.go

    Lines 441 to 444 in b50210f

    case '0' <= c && c <= '9' || c == '+' || c == '-' || c == '.':
    if i == 0 {
    return "", rawurl, nil
    }
  2. the invalid character validation

    go/src/net/url/url.go

    Lines 450 to 453 in b50210f

    default:
    // we have encountered an invalid character,
    // so there is no valid scheme
    return "", rawurl, nil

They both just returns scheme as "" and no error whatsoever. It should be consistent in a way so that all reasons for invalid scheme result either in error or "", but not mixed.

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 28, 2018

It's hard to discuss code changes here. Maybe just send a CL and we can continue it over there.

We probably need to simplify the switch-case a bit. Have an if condition first, which checks for i==0 and non-letter, and return immediately. And integrate the rest of the switch-case accordingly.

We can keep the special c == ':' and i == 0 check, which will preserve the "missing protocol scheme" error, for backward compatibility reasons. But I will let @bradfitz decide that.

@gopherbot

This comment has been minimized.

Copy link

commented Feb 26, 2019

Change https://golang.org/cl/155922 mentions this issue: net/url: give a proper error message on invalid character in scheme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.