Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure there are no broken links, and automate the check #50

Open
ctrueden opened this issue Apr 20, 2021 · 5 comments
Open

Ensure there are no broken links, and automate the check #50

ctrueden opened this issue Apr 20, 2021 · 5 comments
Assignees
Labels

Comments

@ctrueden
Copy link
Member

ctrueden commented Apr 20, 2021

The script _bin/broken-links.sh prints out links that it detects as broken. But it needs updating to handle additional cases:

  • /ij/* – proxied mirror.imagej.net content – needs update to serve from /ij and redirect old links from the root where feasible (e.g. /macros)
  • Other repos in this org: /presentations, /workshops, /tutorials, /list-of-update-sites, others?

List of known observed weirdness so far:

  • BigDataServer: INFO__ are linkish, but are surrounded in backticks. These should be fixed, and other instances of backtick-mangling should be checked for.
  • Some MediaWiki-style links got escaped with backslashes—\[link title\]—I fixed many of them but would be good to double check there aren't any remaining.
  • (Category_Segmentation) (and similar) links—and _bin/broken-links.sh does not find them.

Once we have a robust dead link checker, we also need to hook it up to an action to check when links break.

See also #55, #63 (IJ1 page renames), #66

@ctrueden ctrueden added this to the production milestone Apr 20, 2021
@ctrueden ctrueden self-assigned this Apr 20, 2021
@ctrueden ctrueden added this to To do in Road to Production Apr 20, 2021
@ctrueden ctrueden moved this from To do to In progress in Road to Production Apr 20, 2021
@ctrueden ctrueden moved this from In progress to To do - content in Road to Production May 5, 2021
@hinerm
Copy link
Member

hinerm commented May 6, 2021

Things that may be broken:

  • Links to /media/[subfolder]/.. currently everything should just go to /media/
  • Double encoded ampersands (&)

I looked at the _bin/broken_links.sh script but don't really understand how to modify it to add these things..

hinerm added a commit that referenced this issue May 6, 2021
hinerm added a commit that referenced this issue May 6, 2021
hinerm added a commit that referenced this issue May 7, 2021
htmlproofer can be used to scan the site for broken links.

Recommended command-line options:
 --allow_hash_href --empty_alt_ignore --assume_extension --disable_external

This is still not perfect and is returning a lot of false positives, but
it's a start.

See also #50
@hinerm
Copy link
Member

hinerm commented May 7, 2021

I've been using htmlproofer which seems great except I can't get it to understand relative paths from the site root. For example

$ htmlproofer update-sites/index.html --disable_external --assume_extension --allow-hash-href --url-ignore "///list-of-update-sites/"

produces hundreds of failures of the form:

     <a href="/update-sites/tos">ToS for personal update sites</a>
  *  internally linking to /update-sites/tos, which does not exist (line 90)
     <a href="/update-sites/tos">ToS for personal update sites</a>
  *  internally linking to /update-sites/tos, which does not exist (line 234)
     <a href="/update-sites/tos">ToS for personal update sites</a>
  *  internally linking to /update-sites/tos, which does not exist (line 234)

but running the same command on the root index.html works fine even though the links are the same. But "true" relative paths, e.g. ../update-sites/tos would work.

This led me down a horrible path:

  • Jekyll 4.2 does something different in generating relative paths. When I build with 4.2 and go to the my local /update-sites/ page, the Automatic Uploads sidebar link breaks because it adds a second /update-sites/ to the url. Building on 3.9 doesn't have this issue.
  • Should we even have relative links?

I am highly tempted to just copy all the pages to the base directory and run the check against them there..

@ctrueden
Copy link
Member Author

ctrueden commented May 7, 2021

@hinerm If you add --root-dir=_site, those bogus errors should go away.

As for whether we should have relative links: no, we shouldn't. But /update-sites/tos is not a relative link, it's an absolute one—just to make sure we have our shared terminology straight. I am a fan of all internal links starting with /, and eschewing the relative_url Liquid filter completely.

@hinerm
Copy link
Member

hinerm commented May 10, 2021

@hinerm If you add --root-dir=_site, those bogus errors should go away.

Thank you @ctrueden!

@ctrueden
Copy link
Member Author

The script _bin/check-site-html.sh now checks for broken links using htmlproofer (thanks @hinerm!). This issue is now "half done" in that the automation is robust. We just need to:

  1. Actually fix all outstanding broken links; and
  2. Hook up the _bin checker scripts to CI to ensure nothing breaks again in future.

@ctrueden ctrueden removed their assignment May 14, 2021
@hinerm hinerm self-assigned this May 21, 2021
hinerm added a commit that referenced this issue May 21, 2021
@ctrueden ctrueden moved this from To do - content + style to In progress in Road to Production May 23, 2021
@ctrueden ctrueden modified the milestones: production, post-production Jun 4, 2021
@ctrueden ctrueden removed this from In progress in Road to Production Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants