Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Support kwargs for response.xpath() #2457

Merged
merged 4 commits into from Feb 2, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/topics/selectors.rst
Expand Up @@ -283,6 +283,40 @@ XPath specification.

.. _Location Paths: https://www.w3.org/TR/xpath#location-paths

.. _topics-selectors-xpath-variables:

Variables in XPath expressions
------------------------------

XPath allows you to reference variables in your XPath expressions, using
the ``$somevariable`` syntax. This is somewhat similar to parameterized
queries or prepared statements in the SQL world where you replace
some arguments in your queries with placeholders like ``?``,
which are then substituted with values passed with the query.

Here's an example to match an element based on its "id" attribute value,
without hard-coding it (that was shown previously)::

>>> # `$val` used in the expression, a `val` argument needs to be passed
>>> response.xpath('//div[@id=$val]/a/text()', val='images').extract_first()
u'Name: My image 1 '

Here's another example, to find the "id" attribute of a ``<div>`` tag containing
five ``<a>`` children (here we pass the value ``5`` as an integer)::

>>> response.xpath('//div[count(a)=$cnt]/@id', cnt=5).extract_first()
u'images'

All variable references must have a binding value when calling ``.xpath()``
(otherwise you'll get a ``ValueError: XPath error:`` exception).
This is done by passing as many named arguments as necessary.

`parsel`_, the library powering Scrapy selectors, has more details and examples
on `XPath variables`_.

.. _parsel: https://parsel.readthedocs.io/
.. _XPath variables: https://parsel.readthedocs.io/en/latest/usage.html#variables-in-xpath-expressions

Using EXSLT extensions
----------------------

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Expand Up @@ -7,4 +7,4 @@ queuelib
six>=1.5.2
PyDispatcher>=2.0.5
service_identity
parsel>=0.9.5
parsel>=1.1
4 changes: 2 additions & 2 deletions scrapy/http/response/text.py
Expand Up @@ -111,8 +111,8 @@ def selector(self):
self._cached_selector = Selector(self)
return self._cached_selector

def xpath(self, query):
return self.selector.xpath(query)
def xpath(self, query, **kwargs):
return self.selector.xpath(query, **kwargs)

def css(self, query):
return self.selector.css(query)
2 changes: 1 addition & 1 deletion setup.py
Expand Up @@ -49,7 +49,7 @@
'pyOpenSSL',
'cssselect>=0.9',
'six>=1.5.2',
'parsel>=0.9.5',
'parsel>=1.1',
'PyDispatcher>=2.0.5',
'service_identity',
],
Expand Down
32 changes: 32 additions & 0 deletions tests/test_http_response.py
Expand Up @@ -320,6 +320,20 @@ def test_selector_shortcuts(self):
response.selector.css("title::text").extract(),
)

def test_selector_shortcuts_kwargs(self):
body = b"<html><head><title>Some page</title><body><p class=\"content\">A nice paragraph.</p></body></html>"
response = self.response_class("http://www.example.com", body=body)

self.assertEqual(
response.xpath("normalize-space(//p[@class=$pclass])", pclass="content").extract(),
response.xpath("normalize-space(//p[@class=\"content\"])").extract(),
)
self.assertEqual(
response.xpath("//title[count(following::p[@class=$pclass])=$pcount]/text()",
pclass="content", pcount=1).extract(),
response.xpath("//title[count(following::p[@class=\"content\"])=1]/text()").extract(),
)

def test_urljoin_with_base_url(self):
"""Test urljoin shortcut which also evaluates base-url through get_base_url()."""
body = b'<html><body><base href="https://example.net"></body></html>'
Expand Down Expand Up @@ -428,3 +442,21 @@ def test_selector_shortcuts(self):
response.xpath("//elem/text()").extract(),
response.selector.xpath("//elem/text()").extract(),
)

def test_selector_shortcuts_kwargs(self):
body = b'''<?xml version="1.0" encoding="utf-8"?>
<xml xmlns:somens="http://scrapy.org">
<somens:elem>value</somens:elem>
</xml>'''
response = self.response_class("http://www.example.com", body=body)

self.assertEqual(
response.xpath("//s:elem/text()", namespaces={'s': 'http://scrapy.org'}).extract(),
response.selector.xpath("//s:elem/text()", namespaces={'s': 'http://scrapy.org'}).extract(),
)

response.selector.register_namespace('s2', 'http://scrapy.org')
self.assertEqual(
response.xpath("//s1:elem/text()", namespaces={'s1': 'http://scrapy.org'}).extract(),
response.selector.xpath("//s2:elem/text()").extract(),
)