Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy should follow redirects on scrapy shell by default #2290

Closed
eliasdorneles opened this issue Sep 28, 2016 · 5 comments
Closed

Scrapy should follow redirects on scrapy shell by default #2290

eliasdorneles opened this issue Sep 28, 2016 · 5 comments

Comments

@eliasdorneles
Copy link
Member

@eliasdorneles eliasdorneles commented Sep 28, 2016

Scrapy shell doesn't follow redirects when you do fetch('http://google.com'), but it does if you do fetch(scrapy.Request('http://google.com'))`.

I realize this is a historic behavior, but I'd argue that it breaks the expectations for most users, since the most common expectation is that Scrapy would try to do what a browser would do, so I think we should change it.

In my opinion, not following the redirect should be the exceptional behavior, and so that's what should require the user to build a scrapy.Request object.

I think ideally, we would be able to pass a keyword argument to fetch, like:

fetch('http://google.com', redirect=False)

And only for that case it would set handle_httpstatus_all to True as explained by @redapple here: #2177 (comment)

@eliasdorneles eliasdorneles changed the title Scrapy should follow redirects on scrapy shell Scrapy should follow redirects on scrapy shell by default Sep 28, 2016
@elacuesta
Copy link
Member

@elacuesta elacuesta commented Nov 24, 2016

Hey guys, I would like to submit a patch for this issue, but first I have a few questions. Let's say we add a redirect parameter to fetch:

  • is redirect the best name for a parameter which sets handle_httpstatus_all=True on the request? I think redirect would be more suitable if we only wanted the request not to handle 3xx status codes. How about a name closer to handle_httpstatus_all?
  • what if we pass a request with meta['handle_httpstatus_all']=True and redirect=True? (other colliding values are possible too) Which one should take precedence?

Looking forward to reading your comments, thanks for your time!

@redapple
Copy link
Contributor

@redapple redapple commented Nov 24, 2016

Hey @elacuesta !
Do you have that patch ready? because I worked on this this morning actually. And was about to submit a WIP PR
I opted for an "handle statuses" argument and not a "redirect on or off" one.

@redapple
Copy link
Contributor

@redapple redapple commented Nov 24, 2016

@elacuesta , this is the patch I was referring to #2410

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Nov 28, 2016

Hello Paul, thanks for your reply. I was just asking to better understand the situation before starting, so let's not duplicate efforts and focus on that PR.

@redapple
Copy link
Contributor

@redapple redapple commented Dec 8, 2016

@elacuesta , I updated my PR for this one at #2410. If you want to have a look and discuss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants