Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that canonicalize_url is not different from that of w3lib #30

Closed
malloxpb opened this issue Jul 9, 2018 · 2 comments
Closed
Labels
enhancement New feature or request question Further information is requested

Comments

@malloxpb
Copy link
Member

malloxpb commented Jul 9, 2018

Right now, canonicalize_url lowercase all the letters in the path of the canonicalized urls. Therefore, we will need to work on keeping all the letters the way the urls were before canonicalizing them. We can change GURL source code for this

@malloxpb malloxpb added enhancement New feature or request question Further information is requested labels Jul 9, 2018
@lopuhin
Copy link
Member

lopuhin commented Jul 9, 2018

The goal here is to make it 100% compatible with canonicalize_url from w3lib: this is required for scrapy integration, or else this will be a backwards incompatible change. Besides making it compatible now, it's important that we know if this breaks. We discussed this with @kmike and it seems that this can be achieved in the following way:

  • make w3lib import canonicalize_url from scurl if it's available
  • add a new test env to w3lib that is run using scurl: this will make sure that if canonicalize_url in w3lib is changed, scurl version is updated accordingly
  • run w3lib tests in the scurl travis build: probably this means cloning w3lib repo and running w3lib's canonicalize_url tests (which will pick up scurl) - this will ensure that if scurl version is changed and no longer passes w3lib tests, we know this.

@malloxpb
Copy link
Member Author

malloxpb commented Oct 1, 2018

This has been resolved in scrapy/w3lib#110

@malloxpb malloxpb closed this as completed Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants