New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inheritance in lxml.html #340
Conversation
Also looking at I've not tested it but if my reading is correct the *-mixins will only apply to the But I don't know what the exact contract is supposed to be, or if that matters. |
Also didn't find a contributing guide so not sure how forward- or back- ports might work (if they're done / supported for minor issues). |
Thanks. That looks correct. Can't say why it wasn't done like this at the time. It's probably also missing tests for comments, PIs and entities, that's why it didn't show so far. Could you please add some?
Not sure what exactly you're referring to here. Do you mean elements created by instantiating element classes (rather than parsing)? That behaviour is documented on the element classes page. Regarding backports, lxml tends to be mostly backwards compatible, so that rarely happens. |
No I mean that if we set up a parser with the lookup HtmlElementClassLookup(mixins=[
('*', A),
('foo', B),
]) Then I think parsed |
I also read it this way, that it's exclusive. It either matches a specific class or the fallback class. While it wouldn't necessarily have to be this way, one advantage is that it makes it possible to exclude certain tags from inheriting the default mixin. If the default was always applied, that would be difficult. |
That interpretation could make some sense but it seems somewhat inconsistent as it seems to behave differently if the class being augmented is passed through On the other hand the mixins and classes features don't seem used by the module itself when instantiating its parsers so if the object is not considered part of the public API (is just an internal feature) an other option would be to deprecate those things and leave Also coming back to
Sure but there doesn't seem to be much to test aside from testing that Python's inheritance works the way it's specced?
Maybe time or couldn't be arsed, that happens. |
Well, if it's not used much, then ISTM that we don't have a problem. :) At least not a problem that couldn't be solved with documentation rather than deprecation.
Or rather that Comment, PI and Entity classes are correctly inserted and still behave as expected when found in the tree, and that XML attribute setting on elements does what it should (there might be tests for that, at least). I mean, the mere fact that your changes broke no tests shows that there were tests missing. I actually wonder if the |
It might also be that the change doesn't break anything because it manipulates the MRO in ways which were not intentionally leveraged (and the old behaviour had to be worked around hence the attributes being re-set). But I see your point.
It is still useful for the docstring, as normally |
|
Right, but without the override that's not visible through the API doc, because you get the information from
Which one? Or is it in an unreleased revision? For the currently published stuff all I see is
|
@scoder I've one more question: I'm trying to write a test to ensure
however there seems to be a snag: Also more of a propriety / cleanliness question: |
Yes, a fresh commit in 3bd8db7 |
As the old comment / FIXME from 8132c75 notes, the mixin should come first for the inheritance to be correct (the left-most class is the first in the MRO, at least if no diamond inheritance is involved). Also fix the odd `super` in `HtmlMixin` likely stemming from the incorrect MRO. Fixes the inheritance order of all `HTML*` base classes though it probably doesn't matter for other than `HtmlElement`.
Added a pair of tests to check for mixin features (the only one which seems really applicable is Also updated the tox file to specifically allow |
Thanks! |
As the old comment / FIXME from 8132c75 notes, the mixin should come first for the inheritance to be correct (the left-most class is the first in the MRO, at least if no diamond inheritance is involved).
Also fix the odd
super
inHtmlMixin
likely stemming from the incorrect MRO.Fixes the inheritance order of all
HTML*
base classes though it probably doesn't matter for other thanHtmlElement
.