External cssselect #46

Merged
merged 10 commits into from Apr 21, 2012

Projects

None yet

3 participants

@SimonSapin

This branch removes most of lxml.cssselect and uses the new cssselect project to provide the same functionality.

The new project distinguishes "generic" XML (no assumption is made on the meaning of element names) and HTML (has a more useful implementation of some pseudo-classes like :link or :checked.) This, the CSSSelect class as well as the cssselect method now have a translator argument that can be 'xml', 'html, 'xhtml', or a Translator object. (See the documentation.) The translator default to 'html' on HtmlElement.cssselect, 'xml' everywhere else.

The dependency is optional, so importing lxml.cssselect or calling the cssselect() method of elements can cause an ImportError at runtime.

I believe this fixes issues #34, #40, and #41.
#45 is partially addressed: all elements now have a cssselect method. However, html5parser still does not make HtmlElement objects so the user may need to pass translator='html' explicitly.
#20 is not resolved yet, but cssselect’s #3 is open about the same issue.

cssselect still has a few unresolved issues (filed on github), but the API should be stable now.

@lrowe

Instead of the sys.modules stuff below, you could simply have base_cssselect = __import__('cssselect') here instead.

For sys.modules['cssselect'] yes, but what about sys.modules['cssselect.xpath'] for example?

Assuming the cssselect package triggers those imports then you can just use base_cssselect.xpath, otherwise use base_cssselect = __import__('cssselect.xpath') to ensure that module is imported. See the example here.

__import__('cssselect.xpath') ensures that the submodule is imported, but I think not that it is reachable as an attribute of the parent package. Currently, cssutils/__init__.py does not have from cssselect import xpath but from cssselect.xpath import foo, bar.

What can avoid the use of sys.modules according to the doc is something like this:

xpath = __import__('cssselect.xpath', globals(), locals(), ['xpath_literal'], 0)

... but I’m not sure it is better.

The submodule is set as an attribute on the parent package when it is imported, consider the following:

>>> import lxml
>>> lxml.cssselect
Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: 'module' object has no attribute 'cssselect'
>>> __import__('lxml.cssselect')
<module 'lxml' from '/data/buildout/eggs/lxml-2.3.4-py2.7-macosx-10.6-intel.egg/lxml/__init__.pyc'>
>>> lxml.cssselect
<module 'lxml.cssselect' from '/data/buildout/eggs/lxml-2.3.4-py2.7-macosx-10.6-intel.egg/lxml/cssselect.pyc'>

(The same thing happens if you instead import lxml.cssselect)

Oh, I didn’t know that. I’ll try and see, thanks.

@SimonSapin

By the way, cssselect v0.5+ is required for translator='xhtml'. I’ll make sure that future versions will stay compatible so that lxml does not need to be patched further.

@scoder
Member
scoder commented on 7820b8d Apr 20, 2012

Ah, sorry, my bad. I should have checked. I had thought this method was already there and needed fixing, but it was only in lxml.html, not in lxml.etree. I didn't mean to add it to lxml.etree.

Do you mean that it should not be added there?

Member

That's what I'm saying, yes. In any case, it's an addition to the API that is independent of the intention of this pull request.

Is there a real advantage of having it on all Elements, instead of just HtmlElements? I mean, it's convenient, ok, but it's not efficient and also won't work on all systems because it requires an external module to be installed. Making it a method feels too tightly integrated for an external module.

I'm really not sure, but I'm leaning towards considering the method on HtmlElements a legacy issue rather than a sparkling good idea. And if that's the case, we shouldn't add to it.

Ok, I’ll make two independent pull requests so they can be judged separately.

Member

Thanks

SimonSapin added some commits Apr 20, 2012
@SimonSapin SimonSapin Docstring fix. da3d1d3
@SimonSapin SimonSapin Revert "Add a cssselect method to all elements, not just HtmlElement"
This is an addition to the API that should be considered separatly
from using the new cssselect.

This reverts commit 7820b8d.
83f7f5b
@SimonSapin SimonSapin Fix the test suite for _Element.cssselect removal. 5acdb84
@SimonSapin SimonSapin referenced this pull request Apr 20, 2012
Closed

Cssselect method #47

@scoder
Member
scoder commented Apr 21, 2012

Looks good. Thanks for all the work!

@scoder scoder merged commit 651d012 into lxml:master Apr 21, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment