ValueError: Cannot use xpath on a Selector of type 'json' #5923

damodharheadrun · 2023-05-05T12:13:51Z

Description

.local/lib/python3.8/site-packages/parsel/selector.py", line 621, in xpath
Scrapy version 2.8.0

[Description of the issue]

Steps to Reproduce

URL request response is json.
Some times we use to get the html response.
We have written xpath for the html_response , but now i'm getting the ValueError: Cannot use xpath on a Selector of type 'json'
Earlier we are able to get the response, now it is thowing an exception

Expected behavior: Error should not accour

Actual behavior: json is removed from the self.type under selector
ex: if self.type not in ("html", "xml", "text", "json"):
raise ValueError(
f"Cannot use xpath on a Selector of type {self.type!r}"
)
if self.type in ("html", "xml"):
try:
xpathev = self.root.xpath
except AttributeError:
return typing.cast(
SelectorList[_SelectorType], self.selectorlist_cls([])
)
else:
try:
xpathev = self._get_root(self._text or "", type="html").xpath
except AttributeError:
return typing.cast(
SelectorList[_SelectorType], self.selectorlist_cls([])
)

Reproduces how often: 100%

Versions

Please paste here the output of executing scrapy version --verbose in the command line.

Additional context

Any additional information, configuration, data or output from commands that might be necessary to reproduce or understand the issue. Please try not to include screenshots of code or the command line, paste the contents as text instead. You can use GitHub Flavored Markdown to make the text look better.

The text was updated successfully, but these errors were encountered:

Gallaecio · 2023-05-05T12:30:04Z

I find it interesting that you were able to use XPath to parse HTML within a JSON structure before.

If you are looking for a workaround, downgrade parsel to <1.8.

As for a long-term solution, I am inclined to say that this is how things should work.

If you have {"html": "<html><title>foo</title></html>"}, you do not use response.xpath("//title"), you use response.selector.jmespath("html").xpath("//tittle") (or, starting with the upcoming Scrapy 2.9, response.jmespath("html").xpath("//tittle")).

damodharheadrun · 2023-05-05T12:46:18Z

Thanks for the quick update @Gallaecio

Is this same for 2.5.0 also?

Gallaecio · 2023-05-05T12:57:00Z

It affects any version of Scrapy if your version of Parsel is 1.8.0 or later.

damodharheadrun · 2023-05-05T17:00:01Z

ohh ok @Gallaecio i will try above mentioned method and update here

GeorgeA92 · 2023-05-05T17:55:59Z

I find it interesting that you were able to use XPath to parse HTML within a JSON structure before.

On parsel 1.6.0 and older - application add extra <html>><body><p> tags that "make" it.. possible to use css xpath selectors (attempt to "get" valid html as browser does)

from parsel import Selector
selector_json = Selector(text='{"a":"1"}')

print(selector_json.getall())
# parsel 1.6: ['<html><body><p>{"a":"1"}</p></body></html>']
# parsel 1.8: [{'a': '1'}] # <-converted to dict, we see ' instead of original ", most likely expected to receive str here"

#print(selector_json.css('*::text').getall())
print(selector_json.css('p::text').getall())

# parsel 1.6: ['{"a":"1"}']

# parsel 1.8:
'''
    print(selector_json.css('p::text').getall())
  File "<redacted>\parsel\parsel\selector.py", line 680, in css
    raise ValueError(
ValueError: Cannot use css on a Selector of type 'json'

Process finished with exit code 1

'''

If you have {"html": "<title>foo</title>"}, you do not use response.xpath("//title"), you use response.selector.jmespath("html").xpath("//tittle") (or, starting with the upcoming Scrapy 2.9, response.jmespath("html").xpath("//tittle")).

In case if server return json response with.. html inside it's variables - Selector ~~will be~~ was able to parse it's html content by xpath/css selectors without any additional data transformations:

text2 = '''{
"prod_1": "<div class=product><div class=price>1$</div></div>",
"prod_2": "<div class=product><div class=price>2$</div></div>",
"prod_3": "<div class=product><div class=price>3$</div></div>"}
'''

selector_html = Selector(text=text2)
print(selector_html.css('div.price::text').getall())
# parsel 1.6: ['1$', '2$', '3$']
# parsel 1.8:
'''
    print(selector_html.css('div.price::text').getall())
  File "<redacted>\parsel\parsel\selector.py", line 680, in css
    raise ValueError(
ValueError: Cannot use css on a Selector of type 'json'
'''

damodharheadrun · 2023-05-05T18:37:36Z

Thanks for the detailed information @GeorgeA92

damodharheadrun · 2023-07-04T05:34:06Z

from scrapy.selector import Selector
response = Selector(text=str(data))
response.xpath("//title")

above soltion also work to avoid the Value error

data = '{"test": "verify_xpath", "data": "test"}'
from scrapy.selector import Selector
sel = Selector(text=data)
sel.xpath('.//text()').extract()
['{"test": "verify_xpath", "data": "test"}']

- fix: fixed hidden ValueError when trying to use 'LrmiBase.getLRMI()' on Response objects of type 'json' -- Scrapy's older version used the 'parsel'-package <1.8 which (somehow) was less strict when erroneously trying to navigate a 'json'-object with XPath-selectors -- as of Scrapy v2.9+ trying to use 'response.xpath()' on a response object other than of type "html" will throw an Error which needs to be handled -- a bare except previously hid this problem from us, causing digitallearninglab_spider.py to throw warnings which obfuscated the real problem -- see: scrapy/scrapy#5923 - fix: fixed weak warnings (ambiguous variable names) - fix: fixed weak warning regarding comparison with None (PEP8:E711) - optimized imports Signed-off-by: Andreas Schnäpp <981166+Criamos@users.noreply.github.com>

wRAR closed this as not planned Won't fix, can't repro, duplicate, stale Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Cannot use xpath on a Selector of type 'json' #5923

ValueError: Cannot use xpath on a Selector of type 'json' #5923

damodharheadrun commented May 5, 2023

Gallaecio commented May 5, 2023

damodharheadrun commented May 5, 2023

Gallaecio commented May 5, 2023

damodharheadrun commented May 5, 2023

GeorgeA92 commented May 5, 2023

damodharheadrun commented May 5, 2023

damodharheadrun commented Jul 4, 2023 •

edited

ValueError: Cannot use xpath on a Selector of type 'json' #5923

ValueError: Cannot use xpath on a Selector of type 'json' #5923

Comments

damodharheadrun commented May 5, 2023

Description

Steps to Reproduce

Versions

Additional context

Gallaecio commented May 5, 2023

damodharheadrun commented May 5, 2023

Gallaecio commented May 5, 2023

damodharheadrun commented May 5, 2023

GeorgeA92 commented May 5, 2023

damodharheadrun commented May 5, 2023

damodharheadrun commented Jul 4, 2023 • edited

damodharheadrun commented Jul 4, 2023 •

edited