Skip to content

Request.from_curl() with $-prefixed string literals #5899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wRAR opened this issue Apr 18, 2023 · 0 comments · Fixed by #5901
Closed

Request.from_curl() with $-prefixed string literals #5899

wRAR opened this issue Apr 18, 2023 · 0 comments · Fixed by #5901

Comments

@wRAR
Copy link
Member

wRAR commented Apr 18, 2023

Chrome (and probably other things) sometimes generate curl commands with a $-prefixed data string, probably when it's easier to represent the string in that way or when it includes non-ASCII characters, e.g. the DiscoverQueryRendererQuery XHR on https://500px.com/popular is copied as

curl 'https://api.500px.com/graphql' \
<headers omitted>
  --data-raw $'{"operationName":"DiscoverQueryRendererQuery",<omitted> "query":"query DiscoverQueryRendererQuery($filters: [PhotoDiscoverSearchFilter\u0021], <the rest omitted>' \
  --compressed

, most likely because of \u0021 in this payload.

scrapy.utils.curl.curl_to_request_kwargs() isn't smart enough to understand this kind of shell escaping, so it puts $ into the request body which is incorrect. Ideally we should support this, though I don't know if there are existing libraries to unescape this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant