Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: builders and TryBots should check documentation for broken links #37047

Open
bcmills opened this issue Feb 5, 2020 · 6 comments
Open

all: builders and TryBots should check documentation for broken links #37047

bcmills opened this issue Feb 5, 2020 · 6 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 5, 2020

We occasionally end up with broken or outdated links in documentation (see #37042 (comment), #27860, #21951, #19244).

Testing for broken links would not be entirely trivial, but nonetheless pretty straightforward to implement: we have an HTML parser in golang.org/x/net/html, and it is easy enough to issue a HEAD request for link targets to see if they resolve. (For links with anchors, we would probably want to also check the Content-Type header and then parse the actual linked HTML to ensure that the anchor exists.)

CC @golang/osp-team

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 5, 2020

Thanks for opening an issue about this. I agree it's not entirely trivial, but if implemented in a way that doesn't cause too much noise or false positives, I also agree that this would be helpful.

I believe @jayconrod wanted this too. I can't find an existing issue; it may have been a comment in a CL.

Edit: Found it, it was this comment in a CL.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 6, 2020

Thinking about this some more: we have control over golang.org, go.dev, and github.com/golang URLs, but not over other URLs. Probably the TryBot test should only fail if the broken link is on one of those sites, but at least the longtest builders should also check external links.

@alain91
Copy link

@alain91 alain91 commented Feb 7, 2020

Hello, I'm newbee in golang but want to dive in the core code.
I have experience in developping code but not in go.
I would have a look into this issue (as a use case) but my investigations could take time.
Is my contribution acceptable ?

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 7, 2020

@alain91, the tricky part of this issue will likely be getting the rendered documentation (in order to obtain the links it contains), and indexing that documentation (in order to be able to detect broken relative links). I'm not familiar with that part of the codebase myself, so I don't know how accessible it would be to a newcomer.

@alain91
Copy link

@alain91 alain91 commented Feb 7, 2020

I'm only newcomer in go. I have experience with parser, html decoder...

@ShawnROGrady
Copy link

@ShawnROGrady ShawnROGrady commented Mar 3, 2020

@bcmills I took a crack at implementing something like this. This just walks the filepath, generates documentation html, then calls HEAD on any tag values with an href key. Running against the src directory outputs:

$ godoc_link_validator -check '^golang\.org|go\.dev|github.com/golang$' ./src
2020/03/03 15:05:59 src/cmd/go/internal/lockedfile: http HEAD 'https://golang.org/issue/20803.' status code: 404

Without ignoring external links (just ignoring localhost and example URLs) I get:

$ godoc_link_validator -ignore '^localhost|example|code' ./src 
2020/03/03 15:08:21 src/cmd/go: error retrieving 'https://proxy.golang.org,direct': Head "https://proxy.golang.org,direct": dial tcp: lookup proxy.golang.org,direct: no such host
src/cmd/go/internal/lockedfile: http HEAD 'https://golang.org/issue/20803.' status code: 404
src/cmd/internal/obj/arm: http HEAD 'http://infocenter.arm.com/help/topic/com.arm.doc.ihi0040b/IHI0040B_aadwarf.pdf' status code: 404
src/cmd/internal/obj/arm64: http HEAD 'http://infocenter.arm.com/help/topic/com.arm.doc.ecm0665627/abi_sve_aadwarf_100985_0000_00_en.pdf' status code: 404

That proxy link represents the default GOPROXY value (found in docs here), and I'm not sure if its better to just ignore it or to update the doc generator to not include the ",direct" portion in the clickable link.

That issue link (found in docs here) returns a 404 because of a trailing period (created issue #37640 to track this).

Both of the ARM links are actually broken (here and here in docs).

I am not sure if/how this should be included in the trybots since I am not familiar with how exactly they are configured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.