Make "xpath" and "css" Selector methods available on response #554
Comments
Hmm, let's push further. If we remove the requirement to inherit from
No need to import scrapy at all I'm scared about this path but worth looking into it as the api clearly low the barrier. |
quick note: Internal selector should be public, it will be used to access the complete (current and future) Selector API like |
could it simply be a |
@redapple: I thought that was the implicit idea, and the ticket was more like adding the shortcuts attached to the response. |
Ah ok. Well then I don't feel a strong need for the |
indeed. |
What if I don't want to parse the responses because I'm just doing raw storing ? |
@nramirezuy , would that be handled by a |
We definitively wants it to be lazily evaluated. |
Fixed by #690. |
I understand that this is a compromise: HTTP response shouldn't be tied to lxml. But combined with #548 and #494, it allows to reduce the ceremony even further. Instead of this:
we'll be able to write this:
The latter variant is 2x shorter, and readability is still fine (I'd say it is improved).
I think that increased internal complexity is justified by better API here.
In browser js document knows about css and xpath selectors, and they are doing just fine. The distinction between "response" and "document" is not very clear - document is the response js code works on. It doesn't provide raw http data though. But our TextResponse also doesn't work only on raw http data - it checks some headers an provides the decoded body.
Also check ItemLoader class - it can be initialized with either response or selector; this tells us Selector and Response already can act similar.
Implementation may involve renaming existing Response to something like RawResponse or HttpResponse, or adding
xpath
andcss
methods only for TextResponse. Selector could be created only on demand internally. We may also ditch LxmlDocument cache and store parsed tree as a response attribute.The text was updated successfully, but these errors were encountered: