Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes results deviate from the live W3C Validator #4

Closed
avikshee opened this issue Mar 9, 2015 · 5 comments
Closed

Sometimes results deviate from the live W3C Validator #4

avikshee opened this issue Mar 9, 2015 · 5 comments

Comments

@avikshee
Copy link

avikshee commented Mar 9, 2015

Hello,

Thanks a ton for this great self contained W3C Markup validator. I had been checking the results after building this container, but the results sometimes vary from the live W3C Validator (http://validator.w3.org/).

These are the names of some sites for which you would see that the results deviate from the hosted one:

grantwatch.com
merruk.com
tarnegarco.com

It would be helpful if the validator container can be updated and it can be made to return results similar to the live one to avoid ambiguity. Thought this info would be of some help to you and thanks once more for this great repo.

Thanks,
Avik

@magnetikonline
Copy link
Owner

Hello @avikshee - thanks for the issue and kind words.

Are you able to provide a summary of where the issues/vary? I'm assuming it's because the HTML5 validation engine that I'm using here (the excellent vnu validator) is not what is being used at http://validator.w3.org/.

If can provide some examples (text is fine - here) - I'll see if I can reproduce.

@avikshee
Copy link
Author

Hello,

Thanks for replying back. I too assumed that the results vary as the HTML5 validation engine used here might be different from the one that is being used by validator.w3.org.

Here's are a few examples:

Test case 1:

Evaluation of merruk . com

Live validator:
http://validator.w3.org/check?uri=merruk.com
Passed, 1 warning(s)

Hosted Validator:
13 Errors, 1 warning(s)

Line 7, Column 926: A charset attribute on a meta element found after the first 512 bytes.
…/www.merruk.com/"><!-- Start Merruk Corporations Managem…

Line 12, Column 1260: Attribute datatype not allowed on element meta at this point.
…type="dcterms:rfc4646" content="en-US"><meta property="dcterms:date" datatype=…

Line 12, Column 1352: Attribute datatype not allowed on element meta at this point.
…f" content="2015-01-01T08:00:00+00:00"><meta property="dcterms:date.issued" co…

Line 12, Column 1498: Attribute datatype not allowed on element meta at this point.
…type="dcterms:dcmitype" content="Text"><meta property="dcterms:format" content…

Line 12, Column 1819: Attribute datatype not allowed on element meta at this point.
…f" content="2015-01-01T08:00:00+00:00"><meta property="dcterms:modified" datat…

Line 12, Column 1921: Attribute datatype not allowed on element meta at this point.
…tent="Tue, 10 Mar 2015 09:15:32 +0000"><meta property="dcterms:modified" datat…

Line 12, Column 1996: Attribute datatype not allowed on element meta at this point.
…tatype="xsd:date" content="2015-03-10"><link rel="dcterms:subject" href="http:…

Line 12, Column 2140: Bad value dcterms:subject for attribute rel on element link: The string dcterms:subject is not a registered keyword.
… Covering Hardware & Accessories."><link rel="dc:source" href="urn:ISBN:97…

Line 12, Column 2196: Bad value dc:source for attribute rel on element link: The string dc:source is not a registered keyword.
…rce" href="urn:ISBN:978-1-2345-6789-X"><link rel="dc:relation" href="http://ww…

Line 12, Column 2250: Bad value dc:relation for attribute rel on element link: The string dc:relation is not a registered keyword.
…elation" href="http://www.merruk.com/"><link rel="dcterms:references" href="ht…

Line 12, Column 2323: Bad value dcterms:references for attribute rel on element link: The string dcterms:references is not a registered keyword.
…f="http://www.merruk.com/Docs/M_T.pdf"><meta name="geo.placename" content="Cas…

Line 13, Column 2184: Bad value version for attribute name on element meta: Keyword version is not registered.
…ta name="version" content="MT1.0 Beta"><meta name="revisit-after" content="3 d…

Test case 2:

Evaluation of grantwatch . com

Live Validator:
http://validator.w3.org/check?uri=grantwatch.com
Passed, 2 warning(s)

Hosted Validator:
1 Error, 2 warning(s)

Line 11, Column 55: Bad value X-UA-Compatible for attribute http-equiv on element meta.

I can provide you with more test cases, if you need.
Thanks once more for following up on this.

Thanks,
Avik

@jaimeiniesta
Copy link

I've tried this and I can confirm that I get the same results as @avikshee

I've noticed that on the Dockerfile, the sgml-lib file that is being used is not the most up-to-date one.

This guide explains how to use a more recent sgml-lib file.

Please note however that the W3C Validator is legacy. It's not actively maintained, you should instead be using http://validator.w3.org/nu/

You can install the Nu HTML Checker without installing the legacy validator. Here's a Dockerfile for it.

@magnetikonline
Copy link
Owner

Thanks @avikshee @jaimeiniesta for the replies.

Given time, I try building with this updated sgml-lib release - see if that corrects the issue? Still, I was under the impression the W3 validator handed off the HTML body wholesale to the VNU/NU validator so I'm not sure if this will fix the issue. Would love to know where live W3 validator drifts away from what I am building here.

Main reason @jaimeiniesta I'm sticking with the W3 validator - allows me to use the excellent Validity Chrome extension.

I'll keep this issue open.

@magnetikonline
Copy link
Owner

Hello @avikshee,

finally found some time to look into this - since we last looked at this issue, quite a bit has changed with the W3C validator landscape (for the better!):

  • The W3C validator source is now hosted on GitHub at https://github.com/w3c/markup-validator - rather than in tar files on their servers. In fact, these files no longer exist - meaning that this Dockerfile has been broken for a little while!
  • The Validator.nu checker is now called directly by the W3C validator at https://validator.w3.org/ - that is, if your "site to validate" is HTML5/etc. the W3C validator will simply redirect you to the Validator.nu instance they are running.
  • The Validator.nu JAR itself has had quite a few source version bumps/fixes since my last update.

So - I have updated this repo to reflect these changes with a rather large commit/update: 5334c97.

Thought I should outline all this in greater detail - in the case that someone else finds this information of use.

I'm going to close this issue out - as you should now find (at least for now) - results from this Dockerfile should be pretty much spot-on with that of the live https://validator.w3.org/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants