Skip to content

Latest commit

 

History

History
152 lines (92 loc) · 7.79 KB

validators.md

File metadata and controls

152 lines (92 loc) · 7.79 KB

Useful HTML validators

What counts here: how many useful errors are reported, how many false positives are reported, how much time is needed to use this tools.

Detecting dead links

I want to find all broken links on my website. I also would want to check and detect orphaned pages (pages not reachable from my main page by following links).

Note that unlinked orphaned pages will not be checked by this tools, unless noted explicitly.

I want a script that

  • runs locally
  • can validate locally present files - not only online websites (I want to check for dead links before publishing)
  • but also can validate published websites
  • works without hangups/crashes/mishandling UTF8
  • can be used to detect orphaned pages
  • detects dead link in <a href=, <img src=, linked css/html files and more
  • can be used as is or is easily modifiable by me

I made a project full of test cases for easy testing of potential tools.

Checking local html files

Note that local html files can be served on localhost in a relatively simple way, then any link checker running on your computer can check them. Without any extra support for reading files.

One of solutions is below. Not entirely happy about it but it works:

http-server

Install

sudo npm install http-server -g

BTW, is there way to install node modules without sudo and have it within PATH? If yes, please open an issue in this project or in other way.

Use

http-server in directory with html files

then use for example site-graph tool

cd site-graph
p site_graph.py http://127.0.0.1:8080/ --visit-external --force

Found candidates

  • html-proofer by gjtorikian
    • Link check fails when example is linked instead of example.html while it works at Github Pages. Requires an extra parameter to stop requiring explicit .html
      • htmlproofer /home/path_to_entire_folder/ --assume-extension --check-html --check-favicon --log-level warn
      • htmlproofer /home/mateusz/Desktop/kolejka/portfolio/test_cases_for_detecting_link_rot/ --assume-extension --check-html --check-favicon --log-level warn
      • htmlproofer ../test_cases_for_detecting_link_rot/ --assume-extension --check-html --check-favicon --log-level warn
  • this site-graph tool is promising as a base, I am contributing to it
    • remember to use --visit-external - it is disabled by default!

Not tested yet

  • link-checker
  • linkchecker works very nicely
    • linkchecker https://matkoniecz.github.io/dead_links_testing_site/
    • outputting site graph is one of listed features! So detecting orphaned pages should be feasible...
    • linkchecker https://matkoniecz.github.io/dead_links_testing_site/ --verbose -o csv seems parsable to detect orphaned pages
    • but has problems with utf-8 support - but this should be fixed now!

Problematic

  • another option is wget and parsing its log. Mentioning for completeness but it looks like a nasty quagmire for me.
    • wget --spider -o wget.log -e robots=off --wait 1 -r -p https://matkoniecz.github.io/dead_links_testing_site/
    • cat wget.log | grep 404
  • https://github.com/LukasHechenberger/broken-link-checker-local - but it is a dead buggy project, last commit in 2021, it is known to hang randomly (reported in 2017, remains unfixed as of 2024)
    • blcl -ro . --filter-level 3
    • blcl -ro . --filter-level 3 | grep 'BROKEN'
    • UTF-8 support has some issues - see an upstream issue - reported in 2021, as of 2024 still has "needs confirmation" label and remains unfixed
  • w3c link checker may look promising

Mobile-Friendly Test by Google

Test made by Google. Especially important as hopefully what is reported here is similar to factors considered by Google for ranking mobile-friendly websites higher.

Following suggestions (like using viewport) from it may save time on what would be otherwise wasted on unneeded debugging.

Grammarly

Checker of grammar and language. Not very smart and has plenty of false positives but sometimes catches real problems. I consider it worth using to avoid wasting time and attention of a human proofreader on obvious things. Accepts markdown and raw html as input. Requires registration to work properly, strongly pushes a paid version.

Not a HTML-focused tool (it checks any text), with plenty of problems and still turned out to be the more useful than most automatic validators.

I use simple script to check all text at once.

PageSpeed Insights

Next useful Google tool.

manually check version of dependencies

For example remember to update your leaflet .js and .css files. (is there a way to automate that?)

html-proofer

html-proofer by gjtorikian

htmlproofer folder_to_validate --check-html --check-favicon is the only automatic validator that I found so far that reminds about favicons.

Link check fails when example is linked instead of example.html while it works at Github Pages.

On the other hand it found some actual dead links...

webpagetest.org

Page speed test.

html5validator by svenkreiss

Scriptability-friendly validator. So far it reported no user-visible problems, but installation (pip install html5validator) and running (html5validator --show-warnings --root folder_to_validate) is easy so it may be worth using.

Nu Html Checker

https://github.com/validator/validator via java .jar file - relatively easy to install (npm install --save vnu-jar, move .jar file to known location) and use, reported some minor but user-visible problems (pages with text and without any <h1> tags) that helped to improve the site.

I use it as follows (command executed in main folder of .html and .css files):

find . -name ".html" -exec java -jar /path_to_vnu_jar/vnu.jar --also-check-css --also-check-svg --verbose {} ; find . -name ".css" -exec java -jar /path_to_vnu_jar/vnu.jar --also-check-css --also-check-svg --verbose {} ;

Runs online on https://validator.w3.org/nu/

HTMLProofer

https://github.com/gjtorikian/html-proofer

I use it like this /usr/local/bin/htmlproofer . --check-html --check-favicon --log-level warn, from root folder of a project.

Rawler

https://github.com/oscardelben/rawler

Appears to be able to check only live websites.

/usr/local/bin/rawler https://mapsaregreat.com | /bin/grep -v "] INFO -- : 200 - "

Stylelint

https://github.com/stylelint/stylelint/blob/master/docs/user-guide/cli.md

Looks potentially useful, not worth configuring effort for me at this moment.

W3C CSS Validation

https://github.com/w3c/css-validator + http://jigsaw.w3.org/css-validator/

Not investigated for now, but looks like something useful.