Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crawl https://git-scm.com looking for broken links #986

Closed
wants to merge 1 commit into from

Conversation

sxlijin
Copy link
Contributor

@sxlijin sxlijin commented Mar 16, 2017

Adds a build matrix entry that uses the broken-link-checker node module to crawl
https://git-scm.com, searching through the site recursively, and attempting all
links, reporting if they succeed or fail. This should make it easier to identify
broken links on the site. (Closes #957.)

Also moves the sudo: line to the top of the file for style (it's a global build
matrix configuration, so it only seems right that it belongs with the other
global config settings up top).

Adds a build matrix entry that uses the broken-link-checker node module to crawl
https://git-scm.com, searching through the site recursively, and attempting all
links, reporting if they succeed or fail. This should make it easier to identify
broken links on the site. (Closes git#957.)

Also moves the sudo: line to the top of the file for style (it's a global build
matrix configuration, so it only seems right that it belongs with the other
global config settings up top).
@peff
Copy link
Member

peff commented Mar 17, 2017

Hmm. This tests the live site. But when will it get kicked off? I assume whenever we update any PR. But those two things aren't really linked. Ideally you'd check the PR itself to make sure it doesn't contain or cause any broken links. But it's hard to even test a single state anyway, because so much of the content is imported content in the database (that's pre-processed, but with unknown vintages of the ruby code; it depends on what was deployed when a particular version of the manpages got imported, or when I kick off a manual rebuild).

So I'm not sure this really matches a Travis build. I think Travis does do periodic jobs, and this seems like it would be a better match for that.

@jnavila
Copy link
Contributor

jnavila commented Nov 21, 2017

To check at PR or push time, we could try spawning the web site locally, but that would mean importing all the additional data from git and progit, quite a heavy work, prone to failures.

This would be a good idea to periodically run a test on the site, but I'm definitely against adding a dependency to npm for that. There are surely such tools available natively.

@peff
Copy link
Member

peff commented Nov 21, 2017

Another complication is that there are known broken links in older versions of the git manpages. We don't fix those, but preserve them in their broken state. So any link-checking would want to avoid digging into old versions at all, I'd think.

@sxlijin
Copy link
Contributor Author

sxlijin commented Nov 22, 2017

@jnavila - what do you mean by "natively"? A ruby gem, like this one?: https://github.com/endymion/link-checker

Also, I just checked and it should be fairly straightforward to set this up as a cron-only job: https://docs.travis-ci.com/user/cron-jobs/#detecting-builds-triggered-by-cron

@jnavila
Copy link
Contributor

jnavila commented Nov 22, 2017

@sxlijin a ruby gem for instance, or maybe even a simple correctly crafted wget or curl command.

@pedrorijo91
Copy link
Member

just found out about a solution using the awesome_bot gem: https://github.com/marmelo/tech-companies-in-portugal/blob/master/.travis.yml

@AnisFirdaus193
Copy link

Adds a build matrix entry that uses the broken-link-checker node module to crawl
https://git-scm.com, searching through the site recursively, and attempting all
links, reporting if they succeed or fail. This should make it easier to identify
broken links on the site. (Closes #957.)

Also moves the sudo: line to the top of the file for style (it's a global build
matrix configuration, so it only seems right that it belongs with the other
global config settings up top).

@sxlijin sxlijin closed this Sep 1, 2021
@sxlijin sxlijin deleted the test-broken-links branch September 1, 2021 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eliminate all issues with broken links
5 participants