New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lxml serializer goes "past" individual elements #338
Comments
Also,
Outputs:
|
I'm guessing this issue has to do with the fact that
So the wrapping could be tripping up that logic. |
You're right, removing the Root wrapper line solves this problem and some others too. That wrapper also gives problems when you try to walk over an element that has siblings. What is it even supposed to accomplish? The walker is broken for list of elements, too (like returned by parseFragments when there are multiple root elements). In its current state the lxml walker is broken, and only works in some edge cases. |
IIRC, it's meant to ensure that the DOCTYPE gets included when we have a whole tree. We probably should just special-case getting an
As far as I can tell (from what we have tests for), it works fine for whole trees and for whole fragments. What it doesn't work for is arbitrary elements. |
For fragments, I'm getting that Here's code to reproduce:
which outputs:
Should I file a different issue? |
Actually now that I've looked at things a bit better wouldn't it be better to scrap the lxml specific stuff and replace it with the etree-API stuff? If I use Edit: actually better make that |
Perhaps, but note that (at least as it stands now) the etree serializer has some of its own issues, like the one I mention in the comment before this one, and also this issue: #341 |
Also, I think the original idea behind the lxml serializer is that it is optimized / faster. E.g. see here: https://html5lib.readthedocs.io/en/latest/html5lib.html#html5lib.__init__.getTreeWalker |
#341 doesn't look like a bug: the Entity class in lxml.etree is an lxml exclusive extension to the API. |
Okay, well then all the more reason not to scrap the lxml serializer. Something should be capable of serializing lxml.etree.Entity elements. |
AFAIK, it was mostly down to API differences around things like |
I think it's important to keep around for the reason I mentioned in my comment just above -- namely, otherwise, there would be no way to serialize trees containing lxml.etree.Entity objects, for example (since #341 was closed). |
I've been looking into it, and the problems I've ran into so far are:
Still, some small changes enable the etree builder/walker to work. Adding Entity support to it wouldn't be hard either. |
I noticed that when serializing an individual element, the lxml serializer continues "past" the element. This causes inconsistent behavior between the "etree" and "lxml" tree types.
A minimal way to reproduce:
Outputs:
The text was updated successfully, but these errors were encountered: