-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error 302 redirection with headers location starts with 3 slash #4032
Comments
We aim to behave as close to browsers as possible, so let’s have a look at it. |
I was able to replicate this issue, @Gallaecio may I work on this? |
@amardeep24 Please, go ahead! 🙂 |
@Gallaecio The difference between Scrapy and a browser is that, when the This is happening because we have How can we know whether the URL should be relative w.r.t base URL or a completely new URL? should we make two calls one for the |
First we should find out if the URL sanitizer should be updated to handle this case differently. Otherwise, maybe we need to replace |
@Gallaecio This fixes the issue:
Here I checked whether the location header contains a URL starting with multiple |
Please, open a pull request. |
…y#4032 -- fix: checked for multiple / when re-directing
Raised PR#4042 @Gallaecio please review :) |
Description
when the 302 response return a headers's location startswith 3 slash, the scrapy redirect to a url different from what the browser do.
Steps to Reproduce
Expected behavior:
redirect to
https://fr.hujiang.com/new/p1285798/
as browserMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36
do.Actual behavior:
redirct to
https://www.hjenglish.com/fr.hujiang.com/new/p1285798
Reproduces how often:
everytime
Versions
Scrapy : 1.7.3
lxml : 4.3.2.0
libxml2 : 2.9.9
cssselect : 1.1.0
parsel : 1.5.2
w3lib : 1.20.0
Twisted : 19.7.0
Python : 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)]
pyOpenSSL : 19.0.0 (OpenSSL 1.1.1c 28 May 2019)
cryptography : 2.6.1
Platform : Windows-10-10.0.17134-SP0
Additional context
I check the defination of Location in rfc and end with reference resolution. But I fail to findout how to resolve the Location startswith
///
. So I don't know why Chrome did so.The behavior of scrapy is determined by redirect.py#L73, which will truncate
///
to/
。I'm wandering the differents betweent scarpy and browser...
The text was updated successfully, but these errors were encountered: