Make sure that canonicalize_url is not different from that of w3lib #30

malloxpb · 2018-07-09T15:09:32Z

Right now, canonicalize_url lowercase all the letters in the path of the canonicalized urls. Therefore, we will need to work on keeping all the letters the way the urls were before canonicalizing them. We can change GURL source code for this

lopuhin · 2018-07-09T18:18:41Z

The goal here is to make it 100% compatible with canonicalize_url from w3lib: this is required for scrapy integration, or else this will be a backwards incompatible change. Besides making it compatible now, it's important that we know if this breaks. We discussed this with @kmike and it seems that this can be achieved in the following way:

make w3lib import canonicalize_url from scurl if it's available
add a new test env to w3lib that is run using scurl: this will make sure that if canonicalize_url in w3lib is changed, scurl version is updated accordingly
run w3lib tests in the scurl travis build: probably this means cloning w3lib repo and running w3lib's canonicalize_url tests (which will pick up scurl) - this will ensure that if scurl version is changed and no longer passes w3lib tests, we know this.

malloxpb · 2018-10-01T20:20:12Z

This has been resolved in scrapy/w3lib#110

malloxpb added enhancement New feature or request question Further information is requested labels Jul 9, 2018

malloxpb closed this as completed Oct 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure that canonicalize_url is not different from that of w3lib #30

Make sure that canonicalize_url is not different from that of w3lib #30

malloxpb commented Jul 9, 2018

lopuhin commented Jul 9, 2018

malloxpb commented Oct 1, 2018

Make sure that canonicalize_url is not different from that of w3lib #30

Make sure that canonicalize_url is not different from that of w3lib #30

Comments

malloxpb commented Jul 9, 2018

lopuhin commented Jul 9, 2018

malloxpb commented Oct 1, 2018