Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid URL Encoding as an option #3542

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

maramsumanth
Copy link
Contributor

@maramsumanth maramsumanth commented Dec 21, 2018

Fixes #833.
Case (i):-


In [4]: Request('http://google.com/"hello"',True)
Out[4]: <GET http://google.com/%22hello%22>

In [5]: Request('http://google.com/"hello"',False)
Out[5]: <GET http://google.com/"hello">

In [6]: Request('http://google.com/"hello"')
Out[6]: <GET http://google.com/%22hello%22>

In the above case I am able to add quote_path as an argument in the Request constructor. This is set to True (default) if the user wants to quote the URL.

Case (ii):-

I want to add an option to the command scrapy shell "http://google.com/\"hello\"" which when set to True, we get <GET http://google.com/%22hello%22> , if it is set to False, we get<GET http://google.com/"hello">

I dont have idea how to add option for that (Case-(ii)) command. Any suggestions?
Mentors please guide me @kmike @cathalgarvey @lopuhin :)

(PS:- I used \ for each " of hello as escape sequence in order to include it in the URL)
At present I had opened a PR for Case (i), ideas for Case (ii) are welcome :)

I opened a PR for w3lib to modify safe_url_string method. Please look into this scrapy/w3lib#119

Thanks :)

@maramsumanth maramsumanth changed the title Update __init__.py Avoid URL Encoding as an option Dec 21, 2018
@maramsumanth
Copy link
Contributor Author

maramsumanth commented Dec 21, 2018

I can't get this correct unless scrapy/w3lib#119 is merged because this commit uses the modified method of safe_url_string of w3lib.

@@ -16,12 +16,13 @@

class Request(object_ref):

def __init__(self, url, callback=None, method='GET', headers=None, body=None,
def __init__(self, url, quote_path=True, callback=None, method='GET', headers=None, body=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the second positional argument is callback, putting quote_path in its place would be backwards incompatible. I'd sugest you to move it to the end of the arguments list, I suspect that's why the tests are failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elacuesta , I did it tests are failing because , my PR scrapy/w3lib#119 didn't merge yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felipeboffnunes
Copy link
Member

@Gallaecio not really sure about the current CI failing but this seems pretty straightforward. Are the changes on this PR currently worth to be merged or has anything significant changed in master that would make this deprecated somehow? If not, I am willing to do the sync and also add some unit tests if it is worth it.

@felipeboffnunes
Copy link
Member

From another angle, the integration is seamless as far as it looks, so the unit tests at scrapy/w3lib#119 should suffice. Nevertheless, it may be worth adding a small doc update over the added argument on Request init?

@Gallaecio
Copy link
Member

I think it may be best not to try and address the issue at all, for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent URL encoding option
5 participants