You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.
$ scrapy shell "https://uat.payleap.com/transactservices.svc"
> from scrapy.linkextractors import LinkExtractor
> [ l.url for l in LinkExtractor(allow=()).extract_links(response) ]
['https://uat.payleap.com/TransactServices.svc?wsdl=']
The text was updated successfully, but these errors were encountered:
gmargari
changed the title
Extra "=" added at end of urls from LinkExtractor
Extra "=" added at end of urls extracted from LinkExtractor
May 11, 2016
gmargari
changed the title
Extra "=" added at end of urls extracted from LinkExtractor
'=' added at end of urls extracted from LinkExtractor
May 11, 2016
@gmargari , this is the (unfortunate) behavior of canonicalize_url (enabled by default in LinkExtractor).
You can disable it though:
In [1]: from scrapy.linkextractors import LinkExtractor
In [2]: [ l.url for l in LinkExtractor(allow=()).extract_links(response) ]
Out[2]: ['https://uat.payleap.com/TransactServices.svc?wsdl=']
In [3]: [ l.url for l in LinkExtractor(allow=(), canonicalize=False).extract_links(response) ]
Out[3]: ['https://uat.payleap.com/TransactServices.svc?wsdl']
Scrapy shell info:
The text was updated successfully, but these errors were encountered: