Fix attribute misquoting #219

Merged
merged 4 commits into from Jan 8, 2017

Conversation

Projects
None yet
2 participants
@plq
Contributor

plq commented Dec 27, 2016

Behold:

>>> from lxml import html
>>> from lxml.etree import htmlfile
>>> from lxml.html.builder import E
>>> from StringIO import StringIO
>>> out = StringIO()
>>> with htmlfile(out) as f:
... with f.element("tagname", attrib={"attr": '"misquoted"'}):
... f.write("foo")
...
>>> out.getvalue()
'<tagname attr=""misquoted"">foo</tagname>'

Expected output:

'<tagname attr="&quot;misquoted&quot;">foo</tagname>'

https://bugs.launchpad.net/lxml/+bug/1594155

Not fixed yet but we're closer than we were yesterday :)

@plq

This comment has been minimized.

Show comment
Hide comment
@plq

plq Dec 27, 2016

Contributor

So I did some research on this. This bug exists because here:

tree.xmlOutputBufferWriteEscape(self._c_out, _xcstr(value), NULL)

the wrong string output function is used. xmlOutputBufferWriteEscape is for writing text into tag bodies, not attribute values.

I'm inclined to say that the function that we rather need to use is xmlBufAttrSerializeTxtContent, here:

https://git.gnome.org/browse/libxml2/tree/xmlsave.c?id=229d1f93ceaa5e29a1e8455e73a7f346993d20b3#n2039

As can be seen, this is a non-trivial function so I'd be reluctant to copy it to lxml source.

However, it takes an xmlAttrPtr, which we don't seem to have at that point in lxml source code.

I'm not sure how to proceed from here. Any ideas?

Contributor

plq commented Dec 27, 2016

So I did some research on this. This bug exists because here:

tree.xmlOutputBufferWriteEscape(self._c_out, _xcstr(value), NULL)

the wrong string output function is used. xmlOutputBufferWriteEscape is for writing text into tag bodies, not attribute values.

I'm inclined to say that the function that we rather need to use is xmlBufAttrSerializeTxtContent, here:

https://git.gnome.org/browse/libxml2/tree/xmlsave.c?id=229d1f93ceaa5e29a1e8455e73a7f346993d20b3#n2039

As can be seen, this is a non-trivial function so I'd be reluctant to copy it to lxml source.

However, it takes an xmlAttrPtr, which we don't seem to have at that point in lxml source code.

I'm not sure how to proceed from here. Any ideas?

@plq

This comment has been minimized.

Show comment
Hide comment
@plq

plq Jan 6, 2017

Contributor

We can't use xmlEscapeEntities from xmlsave.c because 1) it's static 2) it doesn't escape anything between 0x20 and 0x80, where '"' is 0x22.

Contributor

plq commented Jan 6, 2017

We can't use xmlEscapeEntities from xmlsave.c because 1) it's static 2) it doesn't escape anything between 0x20 and 0x80, where '"' is 0x22.

@plq

This comment has been minimized.

Show comment
Hide comment
@plq

plq Jan 6, 2017

Contributor

Okay, I bit the bullet and ported the whole thing. I hope I did it the right way!

Contributor

plq commented Jan 6, 2017

Okay, I bit the bullet and ported the whole thing. I hope I did it the right way!

@scoder

This comment has been minimized.

Show comment
Hide comment
@scoder

scoder Jan 8, 2017

Member

Thanks! I know, this feels verbose and redundant, but given that this code is unlikely to ever change (well, functionality-wise, I mean), I also think that it's the best way to do this.

Member

scoder commented Jan 8, 2017

Thanks! I know, this feels verbose and redundant, but given that this code is unlikely to ever change (well, functionality-wise, I mean), I also think that it's the best way to do this.

@scoder scoder merged commit c45c661 into lxml:master Jan 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment