New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sgmllib fail to parse html containing <!- .... -> #54244
Comments
When parsing html containing the following tag: When looking into sgmllib.py, statement below found: if rawdata.startswith("<!", i):
# This is some sort of declaration; in "HTML as
# deployed," this should only be the document type
# declaration ("<!DOCTYPE html...>"). I think that's why something goes wrong here. |
Are you sure you got the comment syntax right? e.g. SGMLParser should handle that. |
well, <!-- ... -> is ok since it's comment. <!- ... -> is probably a IE hack. see http://www.google.com/dictionary?langpair=en|zh-CN&q=vague&hl=en&aq=f |
Is that URL really what you wanted to show me? Also, I'm not intimate with all of SGML's syntax, but ISTM that what you show here is invalid SGML, and as such SGMLParser is not required to parse it. |
Sorry, the URL on the page is sort of broken. The URL contains the "<!- ... ->" stuff. I think you're right, the <!- is probably just a mistake which is not in the SGML standard. But I'm wondering if the SGMLParser can SKIP such an invalid statement? My browser does this. |
The browser needs to be very liberal in what it accepts, since nobody wants their page view to break because of such a technicality. This is different for a tool like SGMLParser. In light of this, and because sgmllib is removed anyway in Python 3, I'm closing this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: