New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/doc: link detection in ToHTML is misbehaving #22285

Bo0mer opened this Issue Oct 16, 2017 · 5 comments


None yet
5 participants

Bo0mer commented Oct 16, 2017

Please answer these questions before submitting your issue. Thanks!

What did you do?

Run the following commands on my machine (running macOS):

$ godoc -http :6060

navigated to http://localhost:6060/pkg/ and did a search for renee.

What did you expect to see?

Valid link to named Renée French.

What did you see instead?

Invalid link pointing to][Ren.

Does this issue reproduce with the latest release (go1.9.1)?

Yes, even on

System details

go version go1.9 darwin/amd64
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ps/1g_475591cv3rf9lkqnyp9qh0000gn/T/go-build145680146=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.9 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.9
uname -v: Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64
ProductName:	Mac OS X
ProductVersion:	10.12.6
BuildVersion:	16G29
lldb --version: lldb-900.0.45

This comment has been minimized.


dmitshur commented Oct 16, 2017

What did you expect to see?

Valid link to named Renée French.

I would not expect to see that. It's a <pre> block containing the following text:

.caption _Gopher_ by [[][Renée French]]

It's just text. It happens to be in present format. But godoc can't know that it should parse the square brackets and text inside as a link.

The best it could do is link all of][Renée instead of just][Ren, but whether that's better or not needs to be determined.

Also see

A span of indented lines is converted into a <pre> block, with the common indent prefix removed.

URLs in the comment text are converted into links; if the URL also appears in the words map, the link is taken from the map (if the corresponding map value is the empty string, the URL is not converted into a link).

I think this is working as intended (except perhaps the URL matching stopping at the accented letter).

@dmitshur dmitshur changed the title from godoc: Link rendering is misbehaving to x/tools/cmd/godoc: Link rendering is misbehaving Oct 16, 2017

@gopherbot gopherbot added this to the Unreleased milestone Oct 16, 2017


This comment has been minimized.


ianlancetaylor commented Oct 16, 2017

CC @dsnet who has been looking at godoc links. But I tend to agree that this is working as intended.


This comment has been minimized.


dsnet commented Oct 16, 2017

The logic for URL detection is in go/doc.ToHTML. It uses a combination of regexps and special-case logic as a heuristic for what is a URL. In #5043, the logic was adjusted so that URLs can contain parenthesis. It was further adjusted in #18139 to include square brackets.

A reasonable adjustment to the URL heuristic is to forbid the URL from containing an closing parenthesis as the first parenthesis of that form.

Thus,[foo] is valid, but]blah is not valid.

@dsnet dsnet self-assigned this Oct 16, 2017

@dsnet dsnet changed the title from x/tools/cmd/godoc: Link rendering is misbehaving to go/doc: link detection in ToHTML is misbehaving Oct 16, 2017

@dsnet dsnet modified the milestones: Unreleased, Go1.10 Oct 17, 2017

@dsnet dsnet added the NeedsFix label Nov 8, 2017

@dsnet dsnet modified the milestones: Go1.10, Go1.11 Nov 13, 2017


This comment has been minimized.

gopherbot commented Feb 16, 2018

Change mentions this issue: go/doc: simplify and robustify link detection logic


This comment has been minimized.


dmitshur commented Feb 17, 2018

An update on what I said earlier. ] is not one of the safe characters that can be included in a URL without encoding (according to RFC 1738 says it must to be encoded. So, it should be safe to stop detecting a URL at ] and not consider it to be a part of the URL.

Following the same logic, I'm not sure if[foo] should be detected with the [foo] suffix. Perhaps it'd be better to stop before the first [? If that was a correctly encoded URL, wouldn't it be written as

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 29, 2018

@gopherbot gopherbot closed this in 190a5f8 Dec 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment