New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/doc: link detection in ToHTML is misbehaving #22285

Closed
Bo0mer opened this Issue Oct 16, 2017 · 5 comments

Comments

Projects
None yet
5 participants
@Bo0mer

Bo0mer commented Oct 16, 2017

Please answer these questions before submitting your issue. Thanks!

What did you do?

Run the following commands on my machine (running macOS):

$ godoc -http :6060

navigated to http://localhost:6060/pkg/golang.org/x/tools/present and did a search for renee.

What did you expect to see?

Valid link to https://www.instagram.com/reneefrench named Renée French.

What did you see instead?

Invalid link pointing to https://www.instagram.com/reneefrench/][Ren.

Does this issue reproduce with the latest release (go1.9.1)?

Yes, even on https://godoc.org/golang.org/x/tools/present.

System details

go version go1.9 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/ivan/workspace/Go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ps/1g_475591cv3rf9lkqnyp9qh0000gn/T/go-build145680146=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOROOT/bin/go version: go version go1.9 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.9
uname -v: Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64
ProductName:	Mac OS X
ProductVersion:	10.12.6
BuildVersion:	16G29
lldb --version: lldb-900.0.45
  Swift-4.0
@dmitshur

This comment has been minimized.

Member

dmitshur commented Oct 16, 2017

What did you expect to see?

Valid link to https://www.instagram.com/reneefrench named Renée French.

I would not expect to see that. It's a <pre> block containing the following text:

...
.caption _Gopher_ by [[https://www.instagram.com/reneefrench/][Renée French]]
...

It's just text. It happens to be in present format. But godoc can't know that it should parse the square brackets and text inside as a link.

The best it could do is link all of https://www.instagram.com/reneefrench/][Renée instead of just https://www.instagram.com/reneefrench/][Ren, but whether that's better or not needs to be determined.

Also see https://golang.org/pkg/go/doc/#ToHTML:

A span of indented lines is converted into a <pre> block, with the common indent prefix removed.

URLs in the comment text are converted into links; if the URL also appears in the words map, the link is taken from the map (if the corresponding map value is the empty string, the URL is not converted into a link).

I think this is working as intended (except perhaps the URL matching stopping at the accented letter).

@dmitshur dmitshur changed the title from godoc: Link rendering is misbehaving to x/tools/cmd/godoc: Link rendering is misbehaving Oct 16, 2017

@gopherbot gopherbot added this to the Unreleased milestone Oct 16, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 16, 2017

CC @dsnet who has been looking at godoc links. But I tend to agree that this is working as intended.

@dsnet

This comment has been minimized.

Member

dsnet commented Oct 16, 2017

The logic for URL detection is in go/doc.ToHTML. It uses a combination of regexps and special-case logic as a heuristic for what is a URL. In #5043, the logic was adjusted so that URLs can contain parenthesis. It was further adjusted in #18139 to include square brackets.

A reasonable adjustment to the URL heuristic is to forbid the URL from containing an closing parenthesis as the first parenthesis of that form.

Thus, http://example.com/some_resource[foo] is valid, but http://example.com/some_resource]blah is not valid.

@dsnet dsnet self-assigned this Oct 16, 2017

@dsnet dsnet changed the title from x/tools/cmd/godoc: Link rendering is misbehaving to go/doc: link detection in ToHTML is misbehaving Oct 16, 2017

@dsnet dsnet modified the milestones: Unreleased, Go1.10 Oct 17, 2017

@dsnet dsnet added the NeedsFix label Nov 8, 2017

@dsnet dsnet modified the milestones: Go1.10, Go1.11 Nov 13, 2017

@gopherbot

This comment has been minimized.

gopherbot commented Feb 16, 2018

Change https://golang.org/cl/94876 mentions this issue: go/doc: simplify and robustify link detection logic

@dmitshur

This comment has been minimized.

Member

dmitshur commented Feb 17, 2018

An update on what I said earlier. ] is not one of the safe characters that can be included in a URL without encoding (according to https://www.ietf.org/rfc/rfc1738.txt). RFC 1738 says it must to be encoded. So, it should be safe to stop detecting a URL at ] and not consider it to be a part of the URL.

Following the same logic, I'm not sure if http://example.com/some_resource[foo] should be detected with the [foo] suffix. Perhaps it'd be better to stop before the first [? If that was a correctly encoded URL, wouldn't it be written as http://example.com/some_resource%5Bfoo%5D?

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 29, 2018

@gopherbot gopherbot closed this in 190a5f8 Dec 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment