-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
further thoughts about pip as an HTML5 reader #10880
Comments
The relevant part of the spec is: https://html.spec.whatwg.org/#after-doctype-name-state Specifically these lines:
So this would add an addition of optionally "PUBLIC" or "SYSTEM" after "html" as part of the "doctype name state". This doesn't look too complicated to add if someone wants to make a PR. However PEP 503 only says that the simple index "MUST be a valid HTML5 page" and this is not part of the HTML5 writing spec: https://html.spec.whatwg.org/#the-doctype . PEP 503 does not say the tool parsing the HTML5 must conform to the HTML5 parser spec, which was designed to handle non-valid HTML5 docs and presumably has a complicated history to do with what browsers accept). So actually I think it would be at best optional and at worst incorrect for Pip to check against this, as Pip is not trying to be an HTML5 parser here and doesn't need to deal with non-valid HTML5 documents. Pip's motivation is checking for a valid PEP 503 index, otherwise it would also have to be handling things like the bogus doctype state and many other parts of the HTML5 spec that was designed for browser makers to handle non-valid documents in a standard way. |
Hiya! Thanks for filing this. I don't really see much point in arguing at length about fine print here -- things work at the moment, albeit being slightly annoying for a subset of our users due to use of tools/platforms that don't have doctypes (and correspondingly, a valid document structure). I'm not interested in spending more time or energy on this topic anytime soon. If someone wants to file PRs to tweak the logic or remove things, I'm not going to block them (unless they're doing something really bad/stupid, which... is unlikely) and likewise, I'm also unlikely to spend time reviewing such PRs. There's other maintainers though, and I trust their judgement on what the right thing to do here would be. Please don't @-mention me here, since I'm unsubscribing for now. |
Description
As Donald Stufft pointed out, the HTML5 spec has different requirements for HTML5 writers versus parsers, and in fact, the 13.1 Writing HTML documents part of the spec explicitly says that the section "does not apply to conformance checkers", so we were looking at the wrong section when we added a DOCTYPE regex to
handle_decl()
. If pip is going to continue to want to validate the HTML5 (i.e. be a conformance checker, which some of us have said should not be PIP's job, and which @dstufft said his PEP 503 is ambiguous about), then the 13.2 Parsing HTML documents section is what needs to be followed.It is a complicated spec, so it will take some careful reading to figure out what exactly is required for HTML5 validation, but one thing I found is that the parser also allows for "PUBLIC" to be in the
<!DOCTYPE ...>
, which is not mentioned for the writer, so that is definitely a problem in the regex.Expected behavior
When using a Simple Repository API repo that outputs a
<!DOCTYPE ...>
that has "PUBLIC" in it. Expected behavior is not to get a warning saying "does not have a proper HTML doctype declaration," butpip
would give that warning.pip version
22.0.3
Code of Conduct
The text was updated successfully, but these errors were encountered: