This branch removes most of lxml.cssselect and uses the new cssselect project to provide the same functionality.
The new project distinguishes "generic" XML (no assumption is made on the meaning of element names) and HTML (has a more useful implementation of some pseudo-classes like :link or :checked.) This, the CSSSelect class as well as the cssselect method now have a translator argument that can be 'xml', 'html, 'xhtml', or a Translator object. (See the documentation.) The translator default to 'html' on HtmlElement.cssselect, 'xml' everywhere else.
The dependency is optional, so importing lxml.cssselect or calling the cssselect() method of elements can cause an ImportError at runtime.
I believe this fixes issues #34, #40, and #41.
#45 is partially addressed: all elements now have a cssselect method. However, html5parser still does not make HtmlElement objects so the user may need to pass translator='html' explicitly.
#20 is not resolved yet, but cssselect’s #3 is open about the same issue.
cssselect still has a few unresolved issues (filed on github), but the API should be stable now.
Use the external cssselect
Includes ugly hack to support Python 2.4
Make the cssselect dependency optional.
Instead of the sys.modules stuff below, you could simply have base_cssselect = __import__('cssselect') here instead.
base_cssselect = __import__('cssselect')
For sys.modules['cssselect'] yes, but what about sys.modules['cssselect.xpath'] for example?
Assuming the cssselect package triggers those imports then you can just use base_cssselect.xpath, otherwise use base_cssselect = __import__('cssselect.xpath') to ensure that module is imported. See the example here.
base_cssselect = __import__('cssselect.xpath')
__import__('cssselect.xpath') ensures that the submodule is imported, but I think not that it is reachable as an attribute of the parent package. Currently, cssutils/__init__.py does not have from cssselect import xpath but from cssselect.xpath import foo, bar.
from cssselect import xpath
from cssselect.xpath import foo, bar
What can avoid the use of sys.modules according to the doc is something like this:
xpath = __import__('cssselect.xpath', globals(), locals(), ['xpath_literal'], 0)
... but I’m not sure it is better.
The submodule is set as an attribute on the parent package when it is imported, consider the following:
>>> import lxml
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'module' object has no attribute 'cssselect'
<module 'lxml' from '/data/buildout/eggs/lxml-2.3.4-py2.7-macosx-10.6-intel.egg/lxml/__init__.pyc'>
<module 'lxml.cssselect' from '/data/buildout/eggs/lxml-2.3.4-py2.7-macosx-10.6-intel.egg/lxml/cssselect.pyc'>
(The same thing happens if you instead import lxml.cssselect)
Oh, I didn’t know that. I’ll try and see, thanks.
Update for cssselect 0.3
Change to a single 'translator' param that accepts some string values.
Add a cssselect method to all elements, not just HtmlElement
translator defaults to 'xml' in _Element and 'html' in HtmlElement
Shrink the css tests
No need to duplicate the tests that are now in cssselect.
Now test_css.py only tests what is specific to lxml.
Update the doc/website for cssselect.
By the way, cssselect v0.5+ is required for translator='xhtml'. I’ll make sure that future versions will stay compatible so that lxml does not need to be patched further.
Ah, sorry, my bad. I should have checked. I had thought this method was already there and needed fixing, but it was only in lxml.html, not in lxml.etree. I didn't mean to add it to lxml.etree.
Do you mean that it should not be added there?
That's what I'm saying, yes. In any case, it's an addition to the API that is independent of the intention of this pull request.
Is there a real advantage of having it on all Elements, instead of just HtmlElements? I mean, it's convenient, ok, but it's not efficient and also won't work on all systems because it requires an external module to be installed. Making it a method feels too tightly integrated for an external module.
I'm really not sure, but I'm leaning towards considering the method on HtmlElements a legacy issue rather than a sparkling good idea. And if that's the case, we shouldn't add to it.
Ok, I’ll make two independent pull requests so they can be judged separately.
Revert "Add a cssselect method to all elements, not just HtmlElement"
This is an addition to the API that should be considered separatly
from using the new cssselect.
This reverts commit 7820b8d.
Fix the test suite for _Element.cssselect removal.
Looks good. Thanks for all the work!