New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS selectors #176

Closed
wants to merge 25 commits into
from

Conversation

Projects
None yet
@barraponto
Contributor

barraponto commented Sep 30, 2012

I'm a web developer and designer turned into a web scraper. Python is easy and I love it. Scrapy is wonderful. But XPath... it's a foreign language that mostly does what CSS selectors do — but those are second language to me. Anyone coming from jQuery or CSS will already be used to CSS selectors.

So I decided to improve scrapy to support CSS selectors, using the cssselect package that got extracted from the lxml package. I've created two Selector classes, HtmlCSSSelector and XmlCSSSelector, that basically translate the CSS selectors into XPath selectors and run the parent class select method. Easy Peasy.

I'm looking into how to provide tests for these new classes, but would love some guidance. This is my first contribution to a Python package.

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Sep 30, 2012

Contributor

Here is the tutorial spider re-implemented using HtmlCSSSelector: https://gist.github.com/3808079

Contributor

barraponto commented Sep 30, 2012

Here is the tutorial spider re-implemented using HtmlCSSSelector: https://gist.github.com/3808079

@brianDoherty

This comment has been minimized.

Show comment
Hide comment
@brianDoherty

brianDoherty Dec 4, 2012

I just used this feature to make scrapy download, parse, and test which css is unused on my company site. It's good stuff. If I could merge it, I would 👍

I just used this feature to make scrapy download, parse, and test which css is unused on my company site. It's good stuff. If I could merge it, I would 👍

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Dec 18, 2012

Member

Very nice patch!. It looks quite simple and useful, so it's definitely a +1 for me. Sorry for not being able to review it earlier.

Could we add some basic tests before merging it?

It would also be great to add a "CSS selectors" section to the Scrapy Selectors documentation although that is not a blocker and can be done after merging.

Member

pablohoffman commented Dec 18, 2012

Very nice patch!. It looks quite simple and useful, so it's definitely a +1 for me. Sorry for not being able to review it earlier.

Could we add some basic tests before merging it?

It would also be great to add a "CSS selectors" section to the Scrapy Selectors documentation although that is not a blocker and can be done after merging.

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Dec 18, 2012

Contributor

@pablohoffman thanks. This is my first contribution to a Python project, so I'm not sure my code is thoroughly pythonic, particularly 0da6be1 where I duplicate some methods for my CSSMixin. The method names mimic the original methods, although there might be a better approach.

As for the documentation, I'll get down to it tomorrow.

Contributor

barraponto commented Dec 18, 2012

@pablohoffman thanks. This is my first contribution to a Python project, so I'm not sure my code is thoroughly pythonic, particularly 0da6be1 where I duplicate some methods for my CSSMixin. The method names mimic the original methods, although there might be a better approach.

As for the documentation, I'll get down to it tomorrow.

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Dec 18, 2012

Member

Great @barraponto, I look forward to those changes to merge this PR.

The CSSMixin code looks OK to me.

One thing I'm not so sure is about using extras_require instead of (the more standard) install_requires. On one hand, it's more elegant/correct to do it that way, but on the other it makes installation slightly more complex with a somewhat obscure setuptools feature.

My reasons in favor of using install_requires:

  • CSS selectors would be a new nice core feature (anything not in scrapy.contrib is)
  • simplifies setup by giving you a fully functional (batteries included) installation of Scrapy, regardless of how you install it. As a policy, all stuff not in scrapy.contrib should run with the default dependencies.
  • cssselect is pure python, so multi-platform support shouldn't be an issue (because pip/easy_install should pull all dependencies automatically). If it were a binary package, I would worry much more about adding it as a mandatory dependency because of cross-platform issues.
  • we did the same with PyOpenSSL (in fact, we should probably remove the 'ssl' from optional features) and it's even a binary package.
  • cssselect seems like a reasonably well maintained project and not likely to go away soon. But, even if it goes away, we could replace it by another (better maintained) library like we did with libxml2 -> lxml, since the scrapy selectors API won't expose any internal cssselect API.
Member

pablohoffman commented Dec 18, 2012

Great @barraponto, I look forward to those changes to merge this PR.

The CSSMixin code looks OK to me.

One thing I'm not so sure is about using extras_require instead of (the more standard) install_requires. On one hand, it's more elegant/correct to do it that way, but on the other it makes installation slightly more complex with a somewhat obscure setuptools feature.

My reasons in favor of using install_requires:

  • CSS selectors would be a new nice core feature (anything not in scrapy.contrib is)
  • simplifies setup by giving you a fully functional (batteries included) installation of Scrapy, regardless of how you install it. As a policy, all stuff not in scrapy.contrib should run with the default dependencies.
  • cssselect is pure python, so multi-platform support shouldn't be an issue (because pip/easy_install should pull all dependencies automatically). If it were a binary package, I would worry much more about adding it as a mandatory dependency because of cross-platform issues.
  • we did the same with PyOpenSSL (in fact, we should probably remove the 'ssl' from optional features) and it's even a binary package.
  • cssselect seems like a reasonably well maintained project and not likely to go away soon. But, even if it goes away, we could replace it by another (better maintained) library like we did with libxml2 -> lxml, since the scrapy selectors API won't expose any internal cssselect API.
@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Dec 19, 2012

Contributor

Adding CSS selectors to scrapy shell already assuming cssselect will be a default dependency, not an optional one.

Contributor

barraponto commented Dec 19, 2012

Adding CSS selectors to scrapy shell already assuming cssselect will be a default dependency, not an optional one.

@jcswart

This comment has been minimized.

Show comment
Hide comment
@jcswart

jcswart Dec 19, 2012

As a frequent user, I think this is an awesome addition. I was about to begin a new project and figure out how to integrate PyQuery with scrapy. Having something baked in, or nearly so would be wonderful!

Is there any I currently need to do to scrapy to get this functionality other than 'import' HtmlCSSSelector ?

jcswart commented Dec 19, 2012

As a frequent user, I think this is an awesome addition. I was about to begin a new project and figure out how to integrate PyQuery with scrapy. Having something baked in, or nearly so would be wonderful!

Is there any I currently need to do to scrapy to get this functionality other than 'import' HtmlCSSSelector ?

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Dec 19, 2012

Contributor

@jcswart i guess someone already integrated pyquery, check this thread https://groups.google.com/forum/?fromgroups=#!topic/scrapy-developers/OnQ5eOvGz5k

Contributor

barraponto commented Dec 19, 2012

@jcswart i guess someone already integrated pyquery, check this thread https://groups.google.com/forum/?fromgroups=#!topic/scrapy-developers/OnQ5eOvGz5k

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Dec 28, 2012

Member

Hey @barraponto, I hope you have a chance to make those minor changes soon, so we can integrate this new functionality into master branch and have some time to test it before the 0.18 release. AFAIK, the only two changes blocking this merge are:

  • move csselect dependency to install_requires
  • add some basic test
Member

pablohoffman commented Dec 28, 2012

Hey @barraponto, I hope you have a chance to make those minor changes soon, so we can integrate this new functionality into master branch and have some time to test it before the 0.18 release. AFAIK, the only two changes blocking this merge are:

  • move csselect dependency to install_requires
  • add some basic test
@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Dec 29, 2012

Contributor

@pablohoffman I'm currently working on documentation (slowly, blame the holidays). I thought someone in #scrapy@freenode was already working on making cssselect a hard dependency. That's something I can do on my own, though. But I have no experience in writing tests for scrapy, I don't think I'll be able to deliver that any soon, so I'd be glad if someone from scrapinghub stepped up either to write it or mentor me.

Contributor

barraponto commented Dec 29, 2012

@pablohoffman I'm currently working on documentation (slowly, blame the holidays). I thought someone in #scrapy@freenode was already working on making cssselect a hard dependency. That's something I can do on my own, though. But I have no experience in writing tests for scrapy, I don't think I'll be able to deliver that any soon, so I'd be glad if someone from scrapinghub stepped up either to write it or mentor me.

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Dec 31, 2012

Member

@barraponto let's merge the PR once you finish the documentation, and we can add the tests afterwards (very simple, if you know how). The change to hard dependency is also a very simple, so I don't mind doing it once the PR is merged.

Member

pablohoffman commented Dec 31, 2012

@barraponto let's merge the PR once you finish the documentation, and we can add the tests afterwards (very simple, if you know how). The change to hard dependency is also a very simple, so I don't mind doing it once the PR is merged.

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 4, 2013

Contributor

Ok, I guess I'm done here.

Contributor

barraponto commented Jan 4, 2013

Ok, I guess I'm done here.

@pablohoffman

View changes

scrapy/selector/csssel.py
+ def get(self, attr):
+ return self.__class__(flatten([x.get(attr) for x in self]))
+
+ def text(self, recursive=False):

This comment has been minimized.

@pablohoffman

pablohoffman Jan 4, 2013

Member

Here it uses recursive but below it uses all, should we normalize the argument names?. Also, the documentation only refers to all.

@pablohoffman

pablohoffman Jan 4, 2013

Member

Here it uses recursive but below it uses all, should we normalize the argument names?. Also, the documentation only refers to all.

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 4, 2013

Contributor

Good catch, I should have updated it when I changed lxml.selector.CSSSelectorMixin.text. Rebased for a linear merge.

Contributor

barraponto commented Jan 4, 2013

Good catch, I should have updated it when I changed lxml.selector.CSSSelectorMixin.text. Rebased for a linear merge.

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 10, 2013

Contributor

I don't really like :text or :attribute() pseudo-classes because in CSS Selectors, pseudo-classes are meant to select element nodes. ::text and ::attr() pseudo-elements would be better, but cssselect package doesn't support pseudo-elements (and with Shadow DOM coming in CSS4, we might conflict with new pseudo-elements).

If this change is meant only for ItemLoaders support, I'd rather create a CSSItemLoader class.

Contributor

barraponto commented Jan 10, 2013

I don't really like :text or :attribute() pseudo-classes because in CSS Selectors, pseudo-classes are meant to select element nodes. ::text and ::attr() pseudo-elements would be better, but cssselect package doesn't support pseudo-elements (and with Shadow DOM coming in CSS4, we might conflict with new pseudo-elements).

If this change is meant only for ItemLoaders support, I'd rather create a CSSItemLoader class.

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 10, 2013

Contributor

Actually, when it comes to frontend-turned-scrapers (like myself), I think pseudo-classes trump new methods. I mean, I see no harm in actually leaving the methods in, but I appreciate the pseudo-classes. Made-up pseudo-classes is something jQuery used too, and they sure target novice frontend developers.

Contributor

barraponto commented Jan 10, 2013

Actually, when it comes to frontend-turned-scrapers (like myself), I think pseudo-classes trump new methods. I mean, I see no harm in actually leaving the methods in, but I appreciate the pseudo-classes. Made-up pseudo-classes is something jQuery used too, and they sure target novice frontend developers.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 10, 2013

Member

I was looking at jquery css selectors and found lot of non-standard pseudo-classes.

One that surprised me was :text but it is used in a very different way. It doesn't select the text nodes in xpath sense, but instead input elements of type text.

I think adding pseudo-classes is convenient and avoids the extra methods in selector's api which we try to keep minimal and common across all backends.

what do you think about renaming :text to :text-node and :attribute() to :attribute-node() to avoid future clashing and make it clear that it references text nodes and attribute nodes in xpath sense.

Member

dangra commented Jan 10, 2013

I was looking at jquery css selectors and found lot of non-standard pseudo-classes.

One that surprised me was :text but it is used in a very different way. It doesn't select the text nodes in xpath sense, but instead input elements of type text.

I think adding pseudo-classes is convenient and avoids the extra methods in selector's api which we try to keep minimal and common across all backends.

what do you think about renaming :text to :text-node and :attribute() to :attribute-node() to avoid future clashing and make it clear that it references text nodes and attribute nodes in xpath sense.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 10, 2013

Member

the reason to avoid :text is that JQuery CSS selectors are the facto reference for frontend developers.
Most probaby will assume/expect some if not all JQuery extensions to CSS selectors to be available in Scrapy

Member

dangra commented Jan 10, 2013

the reason to avoid :text is that JQuery CSS selectors are the facto reference for frontend developers.
Most probaby will assume/expect some if not all JQuery extensions to CSS selectors to be available in Scrapy

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 10, 2013

Member

btw, I agree pseudo-elements are a better option and looking at cssselect source code I see adding support is fairly simple

Member

dangra commented Jan 10, 2013

btw, I agree pseudo-elements are a better option and looking at cssselect source code I see adding support is fairly simple

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 10, 2013

Member

I have confirmed with cssselect developer than extending translators is a feature he wants to keep in the future, but he doesn't commit to an stable API yet.

Also sent a pull request scrapy/cssselect#22 to address one of the workarounds I used for XPathExpr

Member

dangra commented Jan 10, 2013

I have confirmed with cssselect developer than extending translators is a feature he wants to keep in the future, but he doesn't commit to an stable API yet.

Also sent a pull request scrapy/cssselect#22 to address one of the workarounds I used for XPathExpr

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 10, 2013

Contributor

A quick excerpt from CSS Selectors Module Level 3:

Pseudo-elements create abstractions about the document tree beyond those specified by the document language. For instance, document languages do not offer mechanisms to access the first letter or first line of an element's content. Pseudo-elements allow authors to refer to this otherwise inaccessible information. Pseudo-elements may also provide authors a way to refer to content that does not exist in the source document (e.g., the ::before and ::after pseudo-elements give access to generated content).

Only one pseudo-element may appear per selector, and if present it must appear after the sequence of simple selectors that represents the subjects of the selector.

Mind the last paragraph. I started porting a scraper to the pseudo-class syntax introduced earlier, and it feels weird: h3:text shouldn't select anything, because there's no h3 element that is a text node, so I had to write h3 :text which implies h3 *:text. Just weird, and opens up to weird stuff like h3 > :text or h3 + :attribute.

Pseudo-elements, on the other hand, are anchored to the last selector, like h3::text (getting any text content inside all of h3 elements) or h3::string getting all of the text contents inside h3 elements (hello world in <h3>hello </span>world</span></h3>).

By the way, attr() is already used in CSS as a value for the content property (see specs). So I guess ::attr() is better than :attribute.

Contributor

barraponto commented Jan 10, 2013

A quick excerpt from CSS Selectors Module Level 3:

Pseudo-elements create abstractions about the document tree beyond those specified by the document language. For instance, document languages do not offer mechanisms to access the first letter or first line of an element's content. Pseudo-elements allow authors to refer to this otherwise inaccessible information. Pseudo-elements may also provide authors a way to refer to content that does not exist in the source document (e.g., the ::before and ::after pseudo-elements give access to generated content).

Only one pseudo-element may appear per selector, and if present it must appear after the sequence of simple selectors that represents the subjects of the selector.

Mind the last paragraph. I started porting a scraper to the pseudo-class syntax introduced earlier, and it feels weird: h3:text shouldn't select anything, because there's no h3 element that is a text node, so I had to write h3 :text which implies h3 *:text. Just weird, and opens up to weird stuff like h3 > :text or h3 + :attribute.

Pseudo-elements, on the other hand, are anchored to the last selector, like h3::text (getting any text content inside all of h3 elements) or h3::string getting all of the text contents inside h3 elements (hello world in <h3>hello </span>world</span></h3>).

By the way, attr() is already used in CSS as a value for the content property (see specs). So I guess ::attr() is better than :attribute.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 10, 2013

Member

hi @SimonSapin, We would like to add pseudo-elements support to cssselect

The goal is to extend GenericTranslator and add ::text and ::attribute() pseudo-elements

The approach I am willing to take is very similar to how other types and its hooks are handled but without any pseudo-element implemented by default in cssselect.

what do you think? is this something you want to see merged anytime soon?

Member

dangra commented Jan 10, 2013

hi @SimonSapin, We would like to add pseudo-elements support to cssselect

The goal is to extend GenericTranslator and add ::text and ::attribute() pseudo-elements

The approach I am willing to take is very similar to how other types and its hooks are handled but without any pseudo-element implemented by default in cssselect.

what do you think? is this something you want to see merged anytime soon?

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jan 10, 2013

No. cssselect implements Selectors, nothing else. Feel free to extend it to implement a super-set or, well, anything; but this is not the same project. Pyquery sounds more like the project you’re looking for. cssselect might at some point implement next levels of Selectors, or proposals for features meant to go into the Selectors spec eventually. But Selectors will only ever apply to elements, not attribute or text nodes.

(For example, WeasyPrint uses cssselect to apply CSS stylesheets to a tree of elements. ::text does not make sense in that context.)

By the way, the way pseudo-elements are handled in cssselect is completely different from pseudo-classes. You’ll need to work around that if you want to use the pseudo-element syntax for eg. ::text.

No. cssselect implements Selectors, nothing else. Feel free to extend it to implement a super-set or, well, anything; but this is not the same project. Pyquery sounds more like the project you’re looking for. cssselect might at some point implement next levels of Selectors, or proposals for features meant to go into the Selectors spec eventually. But Selectors will only ever apply to elements, not attribute or text nodes.

(For example, WeasyPrint uses cssselect to apply CSS stylesheets to a tree of elements. ::text does not make sense in that context.)

By the way, the way pseudo-elements are handled in cssselect is completely different from pseudo-classes. You’ll need to work around that if you want to use the pseudo-element syntax for eg. ::text.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 10, 2013

Member

@SimonSapin: sorry, I probably messed my words

I know pseudo-elements are part of Selectors, in fact there are 4 defined in the W3C recommendation for selectors but I don't plan/want to add them to cssselect.

I was asking to remove the constraint imposed by cssselect/xpath.py#L183, and leaving GenericTranslator hooks the chance to convert them to xpath, and fail by default as it does now if the pseudo-element hook handler is not defined as a translator method.

::text and ::attribute() are going to be part of Scrapy extensions only, and not implemented/merged into cssselect project.

Member

dangra commented Jan 10, 2013

@SimonSapin: sorry, I probably messed my words

I know pseudo-elements are part of Selectors, in fact there are 4 defined in the W3C recommendation for selectors but I don't plan/want to add them to cssselect.

I was asking to remove the constraint imposed by cssselect/xpath.py#L183, and leaving GenericTranslator hooks the chance to convert them to xpath, and fail by default as it does now if the pseudo-element hook handler is not defined as a translator method.

::text and ::attribute() are going to be part of Scrapy extensions only, and not implemented/merged into cssselect project.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jan 10, 2013

Just bypass translator.css_to_xpath(string) in Scrapy and call translator.selector_to_xpath(selector) or even translator.xpath(selector.parsed_tree) directly, or override one of them.

Just bypass translator.css_to_xpath(string) in Scrapy and call translator.selector_to_xpath(selector) or even translator.xpath(selector.parsed_tree) directly, or override one of them.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 10, 2013

Member

I am aware of that, but it's always better to have the change made into upstream than extending myself :)

I will play a bit more with the idea and send you a PR if something worthwhile comes out.

thanks

Member

dangra commented Jan 10, 2013

I am aware of that, but it's always better to have the change made into upstream than extending myself :)

I will play a bit more with the idea and send you a PR if something worthwhile comes out.

thanks

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jan 10, 2013

Well, it’ll depend on what change exactly you want. But this is the point of having multiple points of entry like .css_to_xpath() vs. .selector_to_xpath().

Well, it’ll depend on what change exactly you want. But this is the point of having multiple points of entry like .css_to_xpath() vs. .selector_to_xpath().

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 10, 2013

Contributor

cssselect.parser.parse already parses pseudo elements, saving us a lot of work. cssselect.xpath.css_to_xpath is a pretty small method, pretty easy to overwrite. I guess it's the best approach right now, if we develop a pluggable way to support pseudo-elements we can contribute it upstream.

Pseudo-elements are, in a way, very specific to its environment and use case. There's no reason XPath would support ::before or ::-webkit-slider-thumb. But scrapy has good use case for it and any other project using cssselect might have as well.

Contributor

barraponto commented Jan 10, 2013

cssselect.parser.parse already parses pseudo elements, saving us a lot of work. cssselect.xpath.css_to_xpath is a pretty small method, pretty easy to overwrite. I guess it's the best approach right now, if we develop a pluggable way to support pseudo-elements we can contribute it upstream.

Pseudo-elements are, in a way, very specific to its environment and use case. There's no reason XPath would support ::before or ::-webkit-slider-thumb. But scrapy has good use case for it and any other project using cssselect might have as well.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 30, 2013

Member

trying to recap on this issue

We want to:

  • Select text nodes and attributes using CSS selectors
  • Use pseudo-element ::text to extract text nodes similar to JQuery .text()
  • Use pseudo-element ::attr(name) to extract attribute values simlar to .attr()
  • Use pseudo-element ::html to extract node content similar to .html() ??

Problems:

  • cssselect can parse pseudo-elements but doesn't provide hooks translating them to xpath
  • function-like pseudo-elements (i.e.: ::attr() ) are not parseable by cssselect and it is not defined by http://www.w3.org/TR/selectors/#pseudo-elements
  • JQuery defines a :text pseudo class that conflicts with ::text pseudo element

(Re)proposal

I think we better settle to use pseudo-classes instead:

  • Function-like pseudo-classes just works (aka :attr(name))
  • No need to extend cssselect
  • We have a working implmentation right now
  • Only left adapting content extraction to JQuery behaviors
  • Use :text-content and :html-content to avoid collisions with selectors defined by JQuery

what others think?

Member

dangra commented Jan 30, 2013

trying to recap on this issue

We want to:

  • Select text nodes and attributes using CSS selectors
  • Use pseudo-element ::text to extract text nodes similar to JQuery .text()
  • Use pseudo-element ::attr(name) to extract attribute values simlar to .attr()
  • Use pseudo-element ::html to extract node content similar to .html() ??

Problems:

  • cssselect can parse pseudo-elements but doesn't provide hooks translating them to xpath
  • function-like pseudo-elements (i.e.: ::attr() ) are not parseable by cssselect and it is not defined by http://www.w3.org/TR/selectors/#pseudo-elements
  • JQuery defines a :text pseudo class that conflicts with ::text pseudo element

(Re)proposal

I think we better settle to use pseudo-classes instead:

  • Function-like pseudo-classes just works (aka :attr(name))
  • No need to extend cssselect
  • We have a working implmentation right now
  • Only left adapting content extraction to JQuery behaviors
  • Use :text-content and :html-content to avoid collisions with selectors defined by JQuery

what others think?

@artemdevel

This comment has been minimized.

Show comment
Hide comment
@artemdevel

artemdevel Jan 30, 2013

Contributor

to be honest I don't really like the motivation for this feature, xpath is a great thing and it definitely worth to learn. also I'm not sure if css selectors will improve scraping greatly. and yes, I'm a bit conservative so this is just my humble opinion :)

Contributor

artemdevel commented Jan 30, 2013

to be honest I don't really like the motivation for this feature, xpath is a great thing and it definitely worth to learn. also I'm not sure if css selectors will improve scraping greatly. and yes, I'm a bit conservative so this is just my humble opinion :)

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 30, 2013

Contributor

XPath is great. By now I'm writing a lot of XPath, no issues so far. It is able to select stuff CSS can't. However, CSS selectors are far more popular, particularly with web developers. I thought this could be a great help for novice scrapers.

@dangra I agree with all of the proposed CSS pseudo-elements. Keep in mind that while pseudo-classes select elements, pseudo-elements don't. They select stuff that is "imagined" by the environment running the CSS selector (such as ::before pseudo-elements). It fits our use case precisely. jQuery use of :text pseudo-class makes a lot of sense: it selects input elements by its type (a pseudo class). Our use of ::text also makes sense: we're selecting the text content of the element (a pseudo element). Is it clear?

Now comes the issue of implementing it. cssselect already parses pseudo-classes, it just throws it away. There's room to implement our pseudo-elements, right?

Contributor

barraponto commented Jan 30, 2013

XPath is great. By now I'm writing a lot of XPath, no issues so far. It is able to select stuff CSS can't. However, CSS selectors are far more popular, particularly with web developers. I thought this could be a great help for novice scrapers.

@dangra I agree with all of the proposed CSS pseudo-elements. Keep in mind that while pseudo-classes select elements, pseudo-elements don't. They select stuff that is "imagined" by the environment running the CSS selector (such as ::before pseudo-elements). It fits our use case precisely. jQuery use of :text pseudo-class makes a lot of sense: it selects input elements by its type (a pseudo class). Our use of ::text also makes sense: we're selecting the text content of the element (a pseudo element). Is it clear?

Now comes the issue of implementing it. cssselect already parses pseudo-classes, it just throws it away. There's room to implement our pseudo-elements, right?

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Jan 30, 2013

Member

On Wed, Jan 30, 2013 at 7:04 PM, Capi Etheriel notifications@github.comwrote:

XPath is great. By now I'm writing a lot of XPath, no issues so far. It is
able to select stuff CSS can't. However, CSS selectors are far more
popular
, particularly with web developers. I thought this could be a
great help for novice scrapers.

I agree, XPath is more expressive but CSS is more friendly for newcomers.
Worth saying xpath is not going anywhere, it will be supported by Scrapy
selectors forever :)

@dangra https://github.com/dangra I agree with all of the proposed CSS
pseudo-elements.

Ok.

Keep in mind that while pseudo-classes select elements, pseudo-elements
don't. They select stuff that is "imagined" by the environment running the
CSS selector (such as ::before pseudo-elements). It fits our use case
precisely.

It does except pseudo-elements doesn't allow functions like the ones we
need to select attributes with ::attr(name)

I can be missing something from the spec, but cssselect doesn't support
them and spec doesn't have an example like this.
We will need to extend cssselect parser (not just translator) to support
function pseudo elements.

jQuery use of :text pseudo-class makes a lot of sense: it selects input
elements by its type (a pseudo class). Our use of ::text also makes
sense: we're selecting the text content of the element (a pseudo element).
Is it clear?

I understand :text and ::text are different things, but they
are undoubtedly similar.
Anyway, I have no problem with ::text If we can keep everything as
pseudo-elements and that includes ::attr()

Now comes the issue of implementing it. cssselect already parses
pseudo-classes, it just throws it away. There's room to implement our
pseudo-elements, right?

I think you mean it already parse and throw pseudo-elements.

It does, but as said above, it can't parse function-like pseudo elements
and adding support will require cssselect upstream approval, or
maintaining a fork which doesn't worth it IMHO.

I am OK to continue with pseudo-elements only If we can get upstream
support to parse functions even if they are throw away like it happens now.

@barraponto: can you say if CSS selectors spec allow function-like psuedo
elements?

Member

dangra commented Jan 30, 2013

On Wed, Jan 30, 2013 at 7:04 PM, Capi Etheriel notifications@github.comwrote:

XPath is great. By now I'm writing a lot of XPath, no issues so far. It is
able to select stuff CSS can't. However, CSS selectors are far more
popular
, particularly with web developers. I thought this could be a
great help for novice scrapers.

I agree, XPath is more expressive but CSS is more friendly for newcomers.
Worth saying xpath is not going anywhere, it will be supported by Scrapy
selectors forever :)

@dangra https://github.com/dangra I agree with all of the proposed CSS
pseudo-elements.

Ok.

Keep in mind that while pseudo-classes select elements, pseudo-elements
don't. They select stuff that is "imagined" by the environment running the
CSS selector (such as ::before pseudo-elements). It fits our use case
precisely.

It does except pseudo-elements doesn't allow functions like the ones we
need to select attributes with ::attr(name)

I can be missing something from the spec, but cssselect doesn't support
them and spec doesn't have an example like this.
We will need to extend cssselect parser (not just translator) to support
function pseudo elements.

jQuery use of :text pseudo-class makes a lot of sense: it selects input
elements by its type (a pseudo class). Our use of ::text also makes
sense: we're selecting the text content of the element (a pseudo element).
Is it clear?

I understand :text and ::text are different things, but they
are undoubtedly similar.
Anyway, I have no problem with ::text If we can keep everything as
pseudo-elements and that includes ::attr()

Now comes the issue of implementing it. cssselect already parses
pseudo-classes, it just throws it away. There's room to implement our
pseudo-elements, right?

I think you mean it already parse and throw pseudo-elements.

It does, but as said above, it can't parse function-like pseudo elements
and adding support will require cssselect upstream approval, or
maintaining a fork which doesn't worth it IMHO.

I am OK to continue with pseudo-elements only If we can get upstream
support to parse functions even if they are throw away like it happens now.

@barraponto: can you say if CSS selectors spec allow function-like psuedo
elements?

@brianDoherty

This comment has been minimized.

Show comment
Hide comment
@brianDoherty

brianDoherty Jan 30, 2013

I can atleast attest to the CSS selectors working alright for me using @barraponto new additional the code base. It worked well for my use case of checking css code coverage on a site.

Does it perhaps make sense to get started with just this implementation for now to get some basic CSS selector support in scrapy, and than revisit things like pseudo selectors later?

I can atleast attest to the CSS selectors working alright for me using @barraponto new additional the code base. It worked well for my use case of checking css code coverage on a site.

Does it perhaps make sense to get started with just this implementation for now to get some basic CSS selector support in scrapy, and than revisit things like pseudo selectors later?

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jan 30, 2013

Hi all,

As you have noted, the cssselect parser already supports simple (non-functional) pseudo-elements. They’re available in the public API for you to use however you want. You could append to the generated XPath or use the lxml API or something: root.xpath(expr + '/text()'); [el.text for el in root.xpath(foo)]

While pseudo-elements select something other then element themselves, pseudo-classes are more like boolean tests that filter elements. You could use pseudo-class syntax for pseudo-elements if you like, but conceptually they are very different. Right now you can override a cssselect translator to add pseudo-classes, but this is not a public API (ie. I don’t promise not to break it in the next version.) Too many internal details that I would still like to change.

If there is more demand for custom pseudo-classes, cssselect could have a public API where you register pseudo-classes and how to translate them to a boolean XPath expression. This is something well-delimited that I could maintain when I change internal details. I would gladly review and merge a pull request adding this. I might even add it myself at some point, but no promise on when that would happen.

Functional pseudo-classes or pseudo-elements are more tricky because you have to represent the arguments somehow. The most general is to provide a list of unparsed tokens, but frankly cssselect’s current tokenizer sucks and I wouldn’t want to set it in stone. But there is hope: we are working hard on a new css3-syntax draft and it’s definitely in my long term plans to have this tokenizer in cssselect.

In the meantime, there could a set of pre-defined argument formats. For example, name is a single identifier in ::attr(name). Selectors Level 3 has no functional pseudo-elements, but there is a proposal to add some. They’re definitely possible to add in the parser. (But see above for arguments.)

Hi all,

As you have noted, the cssselect parser already supports simple (non-functional) pseudo-elements. They’re available in the public API for you to use however you want. You could append to the generated XPath or use the lxml API or something: root.xpath(expr + '/text()'); [el.text for el in root.xpath(foo)]

While pseudo-elements select something other then element themselves, pseudo-classes are more like boolean tests that filter elements. You could use pseudo-class syntax for pseudo-elements if you like, but conceptually they are very different. Right now you can override a cssselect translator to add pseudo-classes, but this is not a public API (ie. I don’t promise not to break it in the next version.) Too many internal details that I would still like to change.

If there is more demand for custom pseudo-classes, cssselect could have a public API where you register pseudo-classes and how to translate them to a boolean XPath expression. This is something well-delimited that I could maintain when I change internal details. I would gladly review and merge a pull request adding this. I might even add it myself at some point, but no promise on when that would happen.

Functional pseudo-classes or pseudo-elements are more tricky because you have to represent the arguments somehow. The most general is to provide a list of unparsed tokens, but frankly cssselect’s current tokenizer sucks and I wouldn’t want to set it in stone. But there is hope: we are working hard on a new css3-syntax draft and it’s definitely in my long term plans to have this tokenizer in cssselect.

In the meantime, there could a set of pre-defined argument formats. For example, name is a single identifier in ::attr(name). Selectors Level 3 has no functional pseudo-elements, but there is a proposal to add some. They’re definitely possible to add in the parser. (But see above for arguments.)

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jan 31, 2013

Contributor

I agree with @brianDoherty — while pseudoelements are welcome and add syntactic sugar to CSS Selectors in Scrapy, they're not release blockers. We should make sure that CSS Selectors are available in the next release of Scrapy and add those pseudo-elements in a follow-up issue.

Also, I believe they should be extendable with XPath, something I suggested earlier as the CSSSelector.xpath() method, otherwise they're very limited. Keep in mind that hs.select('.menu a').xpath('@href') is still very simple and readable.

Thanks for the input @SimonSapin, it's always welcome. Would you accept a pull request to cssselect that adds API to register pseudo-classes and pseudo-elements?

Contributor

barraponto commented Jan 31, 2013

I agree with @brianDoherty — while pseudoelements are welcome and add syntactic sugar to CSS Selectors in Scrapy, they're not release blockers. We should make sure that CSS Selectors are available in the next release of Scrapy and add those pseudo-elements in a follow-up issue.

Also, I believe they should be extendable with XPath, something I suggested earlier as the CSSSelector.xpath() method, otherwise they're very limited. Keep in mind that hs.select('.menu a').xpath('@href') is still very simple and readable.

Thanks for the input @SimonSapin, it's always welcome. Would you accept a pull request to cssselect that adds API to register pseudo-classes and pseudo-elements?

@mquandalle

This comment has been minimized.

Show comment
Hide comment
@mquandalle

mquandalle Jun 17, 2013

Any news on this?

Any news on this?

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Jun 17, 2013

Member

@barraponto agreed to wrap this up a couple of weeks ago. If he's not able to, maybe someone else can pick it up. @barraponto ?

Member

pablohoffman commented Jun 17, 2013

@barraponto agreed to wrap this up a couple of weeks ago. If he's not able to, maybe someone else can pick it up. @barraponto ?

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Jun 17, 2013

Contributor

Yeah, I did stumble to implement CSS pseudo-element syntax in cssselect. I'd love some help. RIght now my issue is adding them to the parser.

Contributor

barraponto commented Jun 17, 2013

Yeah, I did stumble to implement CSS pseudo-element syntax in cssselect. I'd love some help. RIght now my issue is adding them to the parser.

@Blender3D

This comment has been minimized.

Show comment
Hide comment
@Blender3D

Blender3D Jul 10, 2013

Contributor

@barraponto What part are you stuck on? I looked at cssselect's source and it doesn't seem make any assumptions about the names of the pseudo-elements.

Contributor

Blender3D commented Jul 10, 2013

@barraponto What part are you stuck on? I looked at cssselect's source and it doesn't seem make any assumptions about the names of the pseudo-elements.

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Jul 12, 2013

Contributor

@barraponto could you post your latest code so that we can help?
(you could create a new branch on your fork, distinct of this pull request if you want to show us your current status and dont want to mess with this pull request)

Contributor

redapple commented Jul 12, 2013

@barraponto could you post your latest code so that we can help?
(you could create a new branch on your fork, distinct of this pull request if you want to show us your current status and dont want to mess with this pull request)

@paulproteus paulproteus referenced this pull request in paulproteus/scrapy-slides-sphinx Jul 23, 2013

Closed

Emphasize CSS selectors over XPath #4

@mquandalle

This comment has been minimized.

Show comment
Hide comment

ping

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Sep 8, 2013

Member

Looks like @barraponto went cold, can someone else take over this work?. I'd love to see CSS selectors merged soon.

Member

pablohoffman commented Sep 8, 2013

Looks like @barraponto went cold, can someone else take over this work?. I'd love to see CSS selectors merged soon.

@abevoelker

This comment has been minimized.

Show comment
Hide comment
@abevoelker

abevoelker Sep 14, 2013

First, thanks so much for this awesome project. There is nothing even close to it in Ruby-land, so I am picking up Python just to be able to use scrapy! Well done.

However, it's a major bummer to see that scrapy doesn't natively support CSS selectors. I'm trying to replace a few scrapers that I cobbled together in Ruby, and they all used CSS selectors due to my familiarity with them and because the excellent Nokogiri HTML/XML parsing (Ruby) library has great support for CSS3 selectors. I do know a little bit of XPath, but I have some complicated CSS selectors for some sites that are a challenge to scrape due to their mutable structure, so I will have to learn XPath more in-depth in order to properly convert them. I understand XPath is a valuable thing to know and it will broaden my toolset, but it's yet another impediment to me using scrapy, and I'm already anxious! :-)

Just my $0.02 as someone new to both scrapy and Python.

First, thanks so much for this awesome project. There is nothing even close to it in Ruby-land, so I am picking up Python just to be able to use scrapy! Well done.

However, it's a major bummer to see that scrapy doesn't natively support CSS selectors. I'm trying to replace a few scrapers that I cobbled together in Ruby, and they all used CSS selectors due to my familiarity with them and because the excellent Nokogiri HTML/XML parsing (Ruby) library has great support for CSS3 selectors. I do know a little bit of XPath, but I have some complicated CSS selectors for some sites that are a challenge to scrape due to their mutable structure, so I will have to learn XPath more in-depth in order to properly convert them. I understand XPath is a valuable thing to know and it will broaden my toolset, but it's yet another impediment to me using scrapy, and I'm already anxious! :-)

Just my $0.02 as someone new to both scrapy and Python.

@redapple redapple referenced this pull request in scrapy/cssselect Sep 14, 2013

Closed

Support for functional pseudo-elements #29

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 14, 2013

Contributor

@abevoelker, for CSS selectors support, I have been using lxml.cssselect directly, which is based on this very cssselect package that is referred throughout this PR.

With XPath and CSS selectors support, I consider lxml(+cssselect) as the Python cousin of Nokogiri. And you can very well use lxml within Scrapy callbacks (that's what I do quite often).

But then of course using cssselect with Scrapy Item Loaders is not so straightforward.

Contributor

redapple commented Sep 14, 2013

@abevoelker, for CSS selectors support, I have been using lxml.cssselect directly, which is based on this very cssselect package that is referred throughout this PR.

With XPath and CSS selectors support, I consider lxml(+cssselect) as the Python cousin of Nokogiri. And you can very well use lxml within Scrapy callbacks (that's what I do quite often).

But then of course using cssselect with Scrapy Item Loaders is not so straightforward.

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 14, 2013

Contributor

Not sure who won between pseudo-classes and (functional) pseudo-elements fo this Scrapy PR but I opened a PR to cssselect with an attempt to support the latter: scrapy/cssselect#29

I added a test on how to extend cssselect Translator with custom functions: see test_custom_translation in https://github.com/redapple/cssselect/blob/186bfdafd660a32cf8b25d7839ae50392963839b/cssselect/tests.py

Comments welcome there.

Contributor

redapple commented Sep 14, 2013

Not sure who won between pseudo-classes and (functional) pseudo-elements fo this Scrapy PR but I opened a PR to cssselect with an attempt to support the latter: scrapy/cssselect#29

I added a test on how to extend cssselect Translator with custom functions: see test_custom_translation in https://github.com/redapple/cssselect/blob/186bfdafd660a32cf8b25d7839ae50392963839b/cssselect/tests.py

Comments welcome there.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Sep 14, 2013

Member

@redapple: this PR stalled because it needed someone to reimplement using pseudo-elements, you work for scrapy/cssselect#29 is very valuable!

Member

dangra commented Sep 14, 2013

@redapple: this PR stalled because it needed someone to reimplement using pseudo-elements, you work for scrapy/cssselect#29 is very valuable!

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Sep 15, 2013

Thanks to @redapple ’s work (though not exactly as in the PR), cssselect not has parser-only support for functional pseudo-elements. See the tests for an usage example.

Play around with it, feel free to send new issues or PRs to cssselect, and let me know when you think this is stable and would like it in a PyPI release.

Happy hacking!

Thanks to @redapple ’s work (though not exactly as in the PR), cssselect not has parser-only support for functional pseudo-elements. See the tests for an usage example.

Play around with it, feel free to send new issues or PRs to cssselect, and let me know when you think this is stable and would like it in a PyPI release.

Happy hacking!

@barraponto

This comment has been minimized.

Show comment
Hide comment
@barraponto

barraponto Sep 16, 2013

Contributor

Ok, I'm back from Neverland :)

I'm really really happy that @redapple's work motivated @SimonSapin to provide us the API needed for pseudo-elements (including functional pseudo-elements!). I don't think I can push this to the end right now, but I can commit to writing the documentation for CSS selectors once it's fixed (or even sooner).

Contributor

barraponto commented Sep 16, 2013

Ok, I'm back from Neverland :)

I'm really really happy that @redapple's work motivated @SimonSapin to provide us the API needed for pseudo-elements (including functional pseudo-elements!). I don't think I can push this to the end right now, but I can commit to writing the documentation for CSS selectors once it's fixed (or even sooner).

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 16, 2013

Contributor

Here's how it could look like:
redapple@3903b34

Contributor

redapple commented Sep 16, 2013

Here's how it could look like:
redapple@3903b34

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Sep 16, 2013

Member

thanks @redapple and @SimonSapin!

@redapple: adapt PR to option 3 of this comment #176 (comment)

  • Use pseudo-element ::text to extract text nodes similar to JQuery .text()
  • Use pseudo-element ::attr(name) to extract attribute values simlar to .attr()
Member

dangra commented Sep 16, 2013

thanks @redapple and @SimonSapin!

@redapple: adapt PR to option 3 of this comment #176 (comment)

  • Use pseudo-element ::text to extract text nodes similar to JQuery .text()
  • Use pseudo-element ::attr(name) to extract attribute values simlar to .attr()
@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Sep 16, 2013

Member

@barraponto welcome back, as this PR started in your branch, can you pull @redapple changes once it is ready and add a short documentation on how to use cssselectors.

keep in mind that this PR must not change tutorial/docs examples to CSS selectors, and doesn't integrate with Loaders yet.

Member

dangra commented Sep 16, 2013

@barraponto welcome back, as this PR started in your branch, can you pull @redapple changes once it is ready and add a short documentation on how to use cssselectors.

keep in mind that this PR must not change tutorial/docs examples to CSS selectors, and doesn't integrate with Loaders yet.

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 16, 2013

Contributor

so @dangra , remove :text and :attribute() pseudo-classes (and tests) and adapt to process ::attr(name) instead of ::attribute(name)?

Contributor

redapple commented Sep 16, 2013

so @dangra , remove :text and :attribute() pseudo-classes (and tests) and adapt to process ::attr(name) instead of ::attribute(name)?

@dangra

This comment has been minimized.

Show comment
Hide comment
Member

dangra commented Sep 16, 2013

@redapple, yes.

@redapple

This comment has been minimized.

Show comment
Hide comment
Contributor

redapple commented Sep 16, 2013

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 17, 2013

Contributor

Among the things to test still: namespaces in CSS selectors
http://lxml.de/cssselect.html#namespaces

Scrapy CSS selectors should probably support things like "atom|link::attr(atom|href)"
that should become something like "//atom:link/@atom:href"

See for example a test in parslepy on CSS selectors + namespaces
https://github.com/redapple/parslepy/blob/master/tests/test_parslepy_parse.py#L165
for iTunes feed

parselet_script = {
        "--(atom|feed atom|entry)": {
            "title": "atom|title",
            "name": "im|name",
            "id": "atom|id @im|id",
            "images(im|image)": [{
                "height": "@height",
                "url": ".",
            }],
            "releasedate": "im|releaseDate",
        }
    }
dsh = parslepy.selectors.DefaultSelectorHandler(
                namespaces={
                    'atom': 'http://www.w3.org/2005/Atom',
                    'im': 'http://itunes.apple.com/rss',
                }
            )

(it uses the "... @attrname" Parsley syntax, that I plan to complement also with ::attr(name) :) )

Contributor

redapple commented Sep 17, 2013

Among the things to test still: namespaces in CSS selectors
http://lxml.de/cssselect.html#namespaces

Scrapy CSS selectors should probably support things like "atom|link::attr(atom|href)"
that should become something like "//atom:link/@atom:href"

See for example a test in parslepy on CSS selectors + namespaces
https://github.com/redapple/parslepy/blob/master/tests/test_parslepy_parse.py#L165
for iTunes feed

parselet_script = {
        "--(atom|feed atom|entry)": {
            "title": "atom|title",
            "name": "im|name",
            "id": "atom|id @im|id",
            "images(im|image)": [{
                "height": "@height",
                "url": ".",
            }],
            "releasedate": "im|releaseDate",
        }
    }
dsh = parslepy.selectors.DefaultSelectorHandler(
                namespaces={
                    'atom': 'http://www.w3.org/2005/Atom',
                    'im': 'http://itunes.apple.com/rss',
                }
            )

(it uses the "... @attrname" Parsley syntax, that I plan to complement also with ::attr(name) :) )

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Sep 17, 2013

FYI, cssselect does basically everything about namespaces wrong. What it should do is detailed in the spec:

http://www.w3.org/TR/selectors/#typenmsp
http://www.w3.org/TR/selectors/#univnmsp
http://www.w3.org/TR/selectors/#attrnmsp

FYI, cssselect does basically everything about namespaces wrong. What it should do is detailed in the spec:

http://www.w3.org/TR/selectors/#typenmsp
http://www.w3.org/TR/selectors/#univnmsp
http://www.w3.org/TR/selectors/#attrnmsp

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 17, 2013

Contributor

hehe ok @SimonSapin
but "ns|E" works, right?

Contributor

redapple commented Sep 17, 2013

hehe ok @SimonSapin
but "ns|E" works, right?

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Sep 17, 2013

If I remember correctly, ns|E does the right thing, but |E and *|E are not supported, and E is (incorrectly) equivalent to |E. Double check, though. And don’t rely and the incorrect behavior, it might get fixed some day.

If I remember correctly, ns|E does the right thing, but |E and *|E are not supported, and E is (incorrectly) equivalent to |E. Double check, though. And don’t rely and the incorrect behavior, it might get fixed some day.

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 17, 2013

Contributor

Duly noted.

Contributor

redapple commented Sep 17, 2013

Duly noted.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Sep 17, 2013

Or, you can also send PRs to fix it :]

Or, you can also send PRs to fix it :]

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 17, 2013

Contributor

Why not! Thanks again @SimonSapin

Contributor

redapple commented Sep 17, 2013

Why not! Thanks again @SimonSapin

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Sep 20, 2013

Member

We should add cssselect to requirements.txt and .travis/requirements-*.txt

Other than that, great work and +1 to merge. @dangra will you do the honors? (looks like it'll need a rebase though)

Member

pablohoffman commented Sep 20, 2013

We should add cssselect to requirements.txt and .travis/requirements-*.txt

Other than that, great work and +1 to merge. @dangra will you do the honors? (looks like it'll need a rebase though)

dangra added a commit to dangra/scrapy that referenced this pull request Sep 24, 2013

dangra added a commit to dangra/scrapy that referenced this pull request Sep 24, 2013

@dangra dangra referenced this pull request Sep 24, 2013

Closed

Selectors unified API #395

dangra added a commit to dangra/scrapy that referenced this pull request Oct 10, 2013

dangra added a commit to dangra/scrapy that referenced this pull request Oct 10, 2013

@dangra dangra referenced this pull request Oct 11, 2013

Merged

[MRG] Selectors unified API #426

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Oct 11, 2013

Member

CSS selectors will be merged as part of #395.

Member

dangra commented Oct 11, 2013

CSS selectors will be merged as part of #395.

@dangra dangra closed this Oct 11, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment