Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

html-proofer failed to pick up html validation errors. #141

Closed
3 tasks
Raza403 opened this issue Feb 4, 2020 · 5 comments
Closed
3 tasks

html-proofer failed to pick up html validation errors. #141

Raza403 opened this issue Feb 4, 2020 · 5 comments

Comments

@Raza403
Copy link

Raza403 commented Feb 4, 2020

**Nokogumbo is an upstream dependency of htmlproofer
Test Case
https://github.com/Raza403/test

In index.html there are

  • Couple of divs without closing tags
  • A <p> element without closing tag
  • One stray <hr> tag, which is after </html>

And build tested with html-proofer is passing here
https://travis-ci.com/Raza403/test/builds/120699310

The same code was run through https://validator.nu and following errors are found.
1

The maintainer of htmlproofer said "The parser relies on nokogumbo, so a report should be filed for that repo/project.
"

@fulldecent
Copy link
Contributor

To the Nokogumbo team: please advise if there is a better format we can use for this and future bug reports.

@stevecheckoway
Copy link
Collaborator

When I run that HTML through Nokogumbo, I get three errors.

irb(main):029:0> doc = Nokogiri.HTML5(html, max_errors: 100)
irb(main):030:0> doc.errors.each { |e| puts(e) }
12:1: ERROR: That tag isn't allowed here  Currently open tags: html, body, div, div, p.
</body>
^
15:1: ERROR: That tag isn't allowed here  Currently open tags: html, body, div, div, p.
<hr>
^
16:1: ERROR: Premature end of file  Currently open tags: html, body, div, div.

^

The HTML standard specifies what must be a parse error. This is an area that is changing and the conformance tests aren't completely correct here. See the following PRs I filed.

At some point, I went through and tried to ensure that Nokogumbo produces exactly one error for each error the standard specifies. If Nokogumbo (really our included Gumbo) has diverged from the HTML living standard, then I'm happy to fix it.

Looking at the errors produced by validator.nu, the first three should be a single error (I think), the fourth is correct, and the fifth is bogus. It's not possible to write HTML in such a way that the parser cannot recover and produce more error messages. In particular, there should be an error message for the end of file with open tags.

Without checking the standard, my inclination here is that Nokogumbo is correct, validator.nu is incorrect, and whatever is causing html-proofer to not emit errors isn't the fault of Nokogumbo.

@stevecheckoway
Copy link
Collaborator

Also, https://travis-ci.com/Raza403/test/builds/120699310 doesn't show anything since the bundle install failed for some reason.

@fulldecent
Copy link
Contributor

@Raza403 Please minimize test case to show how htmlproofer is broken, and let's see exactly what htmlproofer is sending to Nokogumbo.

@flavorjones
Copy link
Collaborator

Closing. If you think this is still an issue, please open a new Issue at https://github.com/sparklemotion/nokogiri

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants