Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in safe_url_encoding #188

Closed
azierariel opened this issue Aug 16, 2022 · 8 comments
Closed

Issue in safe_url_encoding #188

azierariel opened this issue Aug 16, 2022 · 8 comments

Comments

@azierariel
Copy link

Hi,

I hope I find you well.

I found the following issue, not sure if it's a bug, but it didn't behave like this in the previous version (1.22).

Before on 1.22.0:

safe_url_string("foo://BAR/foobar") == "foo://BAR/foobar"

But now on 2.0.1:

safe_url_string("foo://BAR/foobar") == "foo://bar/foobar"

I understand that the domain is not case sensitive, but is this the desired behavior of the function?

Cheers,

@azierariel azierariel changed the title bug in safe_url_encoding Issue in safe_url_encoding Aug 16, 2022
@wRAR
Copy link
Member

wRAR commented Aug 20, 2022

This is caused by #174, but I have no idea which line would do this.

@azierariel
Copy link
Author

@wRAR I found why is this happening. The hostname is set to lower cases directly from urllib.urlsplit (here), when this function is applied in the code here.

I could make a workaround I guess, but it would be an awful hack imo. Do you think it worth to change the behavior?
I pass parameters to download_handlers in scrapy and right now i had to freeze the previous version.

cc @Gallaecio @Laerte

@wRAR
Copy link
Member

wRAR commented Aug 20, 2022

Hmm I was afraid urllib does that, but I only tested urlunsplit, not urlsplit.

To be honest I don't think we should work around this in the w3lib code but I'm open to suggestions.

@kmike
Copy link
Member

kmike commented Aug 21, 2022

@azierariel I'm curious how have you found this issue - what does this change break for you? Why do you need to freeze w3lib to a previous version?

@azierariel
Copy link
Author

@kmike I pass parameters to the download_handlers in scrapy. For example: captcha://file_path.png, here if the file path has any upper case it will be transformed to lower and my code will fail.

@wRAR
Copy link
Member

wRAR commented Aug 28, 2022

That just looks like an invalid URI. You should put paths into the path section, not the host one.

@Gallaecio
Copy link
Member

Gallaecio commented Aug 29, 2022

@azierariel Does it work if you use an extra / after the protocol? (i.e. captcha:///file_path.png) That is what the file:// scheme does to omit the host part, which in the case of file:// implies localhost.

@azierariel
Copy link
Author

@Gallaecio It works perfectly, thanks!

I think I can close the issue right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants