Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soup hyphen in comment #251

Merged
merged 4 commits into from
Jul 29, 2017
Merged

Conversation

mozbugbox
Copy link
Contributor

Fix a couple of issues with soupparser

@scoder
Copy link
Member

scoder commented Jul 28, 2017

Thanks. Could you come up with a test that shows why the "--" special handling is necessary? At first glance, it seems wrong to me to add a character escape in text content (which is not serialised XML).

@mozbugbox
Copy link
Contributor Author

Something like

<html>
<body>
<div> Test </div>
<!-- This is -- a comment with double hyphen -->
</body>
</html>

When parsing with soupparser, exception raised

 raise ValueError("Comment may not contain '--' or end with '-'")

The lxml code that raise the exception is in lxml.etree.pyx, Comment().

@mozbugbox
Copy link
Contributor Author

hmm... It seems lxml.html can parse comment with 2 hyphens with no problem.

from lxml.html import soupparser
from lxml import html

content = "<html><!-- something with -- 2 hyphen --></html>"

tree = html.fromstring(content)
print(html.tostring(tree))

tree = soupparser.fromstring(content)
print(html.tostring(tree))

@scoder
Copy link
Member

scoder commented Jul 28, 2017

I think the soup parser is simply using the wrong way to instantiate an HTML comment:

>>> lxml.html.HtmlComment("something with -- 2 hyphen")
<!--something with -- 2 hyphen-->

@mozbugbox
Copy link
Contributor Author

aha, thanks for the advice. Pull updated

@scoder
Copy link
Member

scoder commented Jul 29, 2017

Could you please also add tests to test_elementsoup.py ?

@scoder scoder merged commit e70dad6 into lxml:master Jul 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants