New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Cannot use xpath on a Selector of type 'json' #5923
Comments
I find it interesting that you were able to use XPath to parse HTML within a JSON structure before. If you are looking for a workaround, downgrade parsel to <1.8. As for a long-term solution, I am inclined to say that this is how things should work. If you have |
Thanks for the quick update @Gallaecio Is this same for 2.5.0 also? |
It affects any version of Scrapy if your version of Parsel is 1.8.0 or later. |
ohh ok @Gallaecio i will try above mentioned method and update here |
On parsel 1.6.0 and older - application add extra from parsel import Selector
selector_json = Selector(text='{"a":"1"}')
print(selector_json.getall())
# parsel 1.6: ['<html><body><p>{"a":"1"}</p></body></html>']
# parsel 1.8: [{'a': '1'}] # <-converted to dict, we see ' instead of original ", most likely expected to receive str here"
#print(selector_json.css('*::text').getall())
print(selector_json.css('p::text').getall())
# parsel 1.6: ['{"a":"1"}']
# parsel 1.8:
'''
print(selector_json.css('p::text').getall())
File "<redacted>\parsel\parsel\selector.py", line 680, in css
raise ValueError(
ValueError: Cannot use css on a Selector of type 'json'
Process finished with exit code 1
'''
In case if server return json response with.. html inside it's variables - Selector text2 = '''{
"prod_1": "<div class=product><div class=price>1$</div></div>",
"prod_2": "<div class=product><div class=price>2$</div></div>",
"prod_3": "<div class=product><div class=price>3$</div></div>"}
'''
selector_html = Selector(text=text2)
print(selector_html.css('div.price::text').getall())
# parsel 1.6: ['1$', '2$', '3$']
# parsel 1.8:
'''
print(selector_html.css('div.price::text').getall())
File "<redacted>\parsel\parsel\selector.py", line 680, in css
raise ValueError(
ValueError: Cannot use css on a Selector of type 'json'
''' |
Thanks for the detailed information @GeorgeA92 |
from scrapy.selector import Selector above soltion also work to avoid the Value error
|
- fix: fixed hidden ValueError when trying to use 'LrmiBase.getLRMI()' on Response objects of type 'json' -- Scrapy's older version used the 'parsel'-package <1.8 which (somehow) was less strict when erroneously trying to navigate a 'json'-object with XPath-selectors -- as of Scrapy v2.9+ trying to use 'response.xpath()' on a response object other than of type "html" will throw an Error which needs to be handled -- a bare except previously hid this problem from us, causing digitallearninglab_spider.py to throw warnings which obfuscated the real problem -- see: scrapy/scrapy#5923 - fix: fixed weak warnings (ambiguous variable names) - fix: fixed weak warning regarding comparison with None (PEP8:E711) - optimized imports Signed-off-by: Andreas Schnäpp <981166+Criamos@users.noreply.github.com>
Description
.local/lib/python3.8/site-packages/parsel/selector.py", line 621, in xpath
Scrapy version 2.8.0
[Description of the issue]
Steps to Reproduce
Expected behavior: Error should not accour
Actual behavior: json is removed from the self.type under selector
ex: if self.type not in ("html", "xml", "text", "json"):
raise ValueError(
f"Cannot use xpath on a Selector of type {self.type!r}"
)
if self.type in ("html", "xml"):
try:
xpathev = self.root.xpath
except AttributeError:
return typing.cast(
SelectorList[_SelectorType], self.selectorlist_cls([])
)
else:
try:
xpathev = self._get_root(self._text or "", type="html").xpath
except AttributeError:
return typing.cast(
SelectorList[_SelectorType], self.selectorlist_cls([])
)
Reproduces how often: 100%
Versions
Please paste here the output of executing
scrapy version --verbose
in the command line.Additional context
Any additional information, configuration, data or output from commands that might be necessary to reproduce or understand the issue. Please try not to include screenshots of code or the command line, paste the contents as text instead. You can use GitHub Flavored Markdown to make the text look better.
The text was updated successfully, but these errors were encountered: