New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid getting stuck in a loop #2244

Merged
merged 1 commit into from Sep 25, 2014

Conversation

Projects
None yet
5 participants
@sigmavirus24
Member

sigmavirus24 commented Sep 23, 2014

This prevents a case where we make a request to URL A, which 301s to B which
would then 301 back to A. Alternatively, for less simple schemes, this will
also prevent us from getting stuck in a loop, e.g., it will prevent the
following from causing an endless loop:

    A -> B -> C -> D -> E -> F --
    ^                             \
    |                             /
    ---<------------<----------<-

Fixes #2231.

I tested this by cloning httpbin and hard coding a permanent redirect loop from /relative-redirect/1 to /relative-redirect/2 so that we could trigger this. This pull request fixes it.

Avoid getting stuck in a loop
This prevents a case where we make a request to URL A, which 301s to B which
would then 301 back to A. Alternatively, for less simple schemes, this will
also prevent us from getting stuck in a loop, e.g., it will prevent the
following from causing an endless loop:

    A -> B -> C -> D -> E -> F --
    ^                             \
    |                             /
    ---<------------<----------<-
@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Sep 23, 2014

Member

This no longer gets us stuck in a loop by causing us to actually hit the URLs that we're being forcibly redirected to, starting with the end of the redirect chain.

Is this what we want? Or do we want to throw an exception, e.g. RedirectLoopError?

Member

Lukasa commented Sep 23, 2014

This no longer gets us stuck in a loop by causing us to actually hit the URLs that we're being forcibly redirected to, starting with the end of the redirect chain.

Is this what we want? Or do we want to throw an exception, e.g. RedirectLoopError?

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 23, 2014

Member

I don't think we should be adding a new exception for this. This will preserve the previous behavior of having hotting a Max Redirects exception thrown. It's no worse than the behavior we had before the redirect cache was introduced.

Sent from my Android device with K-9 Mail. Please excuse my brevity.

Member

sigmavirus24 commented Sep 23, 2014

I don't think we should be adding a new exception for this. This will preserve the previous behavior of having hotting a Max Redirects exception thrown. It's no worse than the behavior we had before the redirect cache was introduced.

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 23, 2014

Member

But yes, I suspect we could subclass RedirectLoopError from TooManyRedirects and raise that so as to save the calls to the network that would otherwise cause us to exceed the max number of retries. I'm not opposed to that. I'd like to hear other opinions though from @fcosantos and @RuudBurger (and anyone else that has one).

@kennethreitz thoughts on saving people from hitting the network more than necessary when we can detect an endless redirect loop?

Member

sigmavirus24 commented Sep 23, 2014

But yes, I suspect we could subclass RedirectLoopError from TooManyRedirects and raise that so as to save the calls to the network that would otherwise cause us to exceed the max number of retries. I'm not opposed to that. I'd like to hear other opinions though from @fcosantos and @RuudBurger (and anyone else that has one).

@kennethreitz thoughts on saving people from hitting the network more than necessary when we can detect an endless redirect loop?

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 23, 2014

Member

I should also have mentioned that this saves us from loops like this too:

    A -> B -> C -> D -> E -> F --
        ^                         \
        |                         /
        ------------<----------<-

Or any other case that essentially results in an endless loop

Member

sigmavirus24 commented Sep 23, 2014

I should also have mentioned that this saves us from loops like this too:

    A -> B -> C -> D -> E -> F --
        ^                         \
        |                         /
        ------------<----------<-

Or any other case that essentially results in an endless loop

@RuudBurger

This comment has been minimized.

Show comment
Hide comment
@RuudBurger

RuudBurger Sep 23, 2014

The RedirectLoopError seems like a good idea.

In my case, the url works (and redirects properly) in the browser. So maybe there is another issue on how the cache works.
I can provide you with a test url, if you have an email address I can send it to. As it is a url containing an API key.

RuudBurger commented Sep 23, 2014

The RedirectLoopError seems like a good idea.

In my case, the url works (and redirects properly) in the browser. So maybe there is another issue on how the cache works.
I can provide you with a test url, if you have an email address I can send it to. As it is a url containing an API key.

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 23, 2014

Member

@RuudBurger both @Lukasa and I have our emails available on our GitHub profiles. If you'd rather use PGP, you can find my PGP key (and associated email address) by searching https://pgp.mit.edu/ for my real name.

Member

sigmavirus24 commented Sep 23, 2014

@RuudBurger both @Lukasa and I have our emails available on our GitHub profiles. If you'd rather use PGP, you can find my PGP key (and associated email address) by searching https://pgp.mit.edu/ for my real name.

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Sep 23, 2014

Member

I see absolutely no reason why permanent redirects in the redirect cache should not count against the max redirects limit. They are still redirects, we're just not hitting the wire to do them. That behaviour will also fix our infinite redirects problem.

Member

Lukasa commented Sep 23, 2014

I see absolutely no reason why permanent redirects in the redirect cache should not count against the max redirects limit. They are still redirects, we're just not hitting the wire to do them. That behaviour will also fix our infinite redirects problem.

@fcosantos

This comment has been minimized.

Show comment
Hide comment
@fcosantos

fcosantos Sep 24, 2014

In my opinion an exception needs to be raised and RedirectLoopError is always going to be a TooManyRedirects at the end, if you want to subclass it is fine but maybe unnecessary.

I like @sigmavirus24 set() solution, maybe you want to pack it a bit more:

checked_urls = set()
while request.url in redirect_cache:
    checked_urls.add(request.url)
    request.url = self.redirect_cache.get(request.url)
    if request.url in checked_urls:
        raise *whatever*

fcosantos commented Sep 24, 2014

In my opinion an exception needs to be raised and RedirectLoopError is always going to be a TooManyRedirects at the end, if you want to subclass it is fine but maybe unnecessary.

I like @sigmavirus24 set() solution, maybe you want to pack it a bit more:

checked_urls = set()
while request.url in redirect_cache:
    checked_urls.add(request.url)
    request.url = self.redirect_cache.get(request.url)
    if request.url in checked_urls:
        raise *whatever*
@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 24, 2014

Member

RedirectLoopError is always going to be a TooManyRedirects at the end

@fcosantos I don't quite understand what you mean. Could you clarify this for me?

Member

sigmavirus24 commented Sep 24, 2014

RedirectLoopError is always going to be a TooManyRedirects at the end

@fcosantos I don't quite understand what you mean. Could you clarify this for me?

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Sep 24, 2014

Member

I think the key point is that there's no need for a RedirectLoopError, we can just have TooManyRedirects. I still think permanent redirects should count against the redirect limit though, with a loop being a special case where we immediately know that we'll hit TooManyRedirects.

Member

Lukasa commented Sep 24, 2014

I think the key point is that there's no need for a RedirectLoopError, we can just have TooManyRedirects. I still think permanent redirects should count against the redirect limit though, with a loop being a special case where we immediately know that we'll hit TooManyRedirects.

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 24, 2014

Member

@Lukasa I can think of a couple ways to make it influence the max_redirects count.

  1. Move the logic into Session#resolve_redirects. This unfortunately would mean actually using the network for the first (possibly permanent) redirect.
  2. Keep count along with the set of how many times we go through that loop and add a new optional parameter to Session#resolve_redirects along the lines of redirects_already_followed=0. We can then initialize i in Session#resolve_redirects with that and that will affect the max number of redirects possible (including using the cache).

Option 2 seems most practical, but I just loathe adding more arguments (that could confuse a user) to a public API like this.

Also, I think there's value in using a subclass of TooManyRedirects. I can see an instance where this might cause confusion because of a case like @RuudBurger has. In a browser, it might very well work just fine, but because of oddities in the usage of requests the loop is caused by the redirect cache. I think providing users a way to disambiguate where the error is actually coming from is very useful (and a better experience).

Member

sigmavirus24 commented Sep 24, 2014

@Lukasa I can think of a couple ways to make it influence the max_redirects count.

  1. Move the logic into Session#resolve_redirects. This unfortunately would mean actually using the network for the first (possibly permanent) redirect.
  2. Keep count along with the set of how many times we go through that loop and add a new optional parameter to Session#resolve_redirects along the lines of redirects_already_followed=0. We can then initialize i in Session#resolve_redirects with that and that will affect the max number of redirects possible (including using the cache).

Option 2 seems most practical, but I just loathe adding more arguments (that could confuse a user) to a public API like this.

Also, I think there's value in using a subclass of TooManyRedirects. I can see an instance where this might cause confusion because of a case like @RuudBurger has. In a browser, it might very well work just fine, but because of oddities in the usage of requests the loop is caused by the redirect cache. I think providing users a way to disambiguate where the error is actually coming from is very useful (and a better experience).

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Sep 24, 2014

Member

I wonder if my concern about the redirect cache leading to reproducible behaviour is just my specific problem. I just realised that the other thing this redirect cache changes is the behaviour of Response.history, which is now not guaranteed to be the same for each request (the redirect cache doesn't populate it).

Member

Lukasa commented Sep 24, 2014

I wonder if my concern about the redirect cache leading to reproducible behaviour is just my specific problem. I just realised that the other thing this redirect cache changes is the behaviour of Response.history, which is now not guaranteed to be the same for each request (the redirect cache doesn't populate it).

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 24, 2014

Member

@Lukasa good point. I'm not sure we can actually reconstruct the history accurately unless we also cache responses in the redirect cache. In other words, we'd have a cache something like:

{'http://example.com/': ('http://www.example.com', <Response [301]>)}

And a big problem with that would be timestamps in headers and such (e.g., cookies). All of this which makes me wonder exactly how good an idea it is to keep the redirect cache around.

Member

sigmavirus24 commented Sep 24, 2014

@Lukasa good point. I'm not sure we can actually reconstruct the history accurately unless we also cache responses in the redirect cache. In other words, we'd have a cache something like:

{'http://example.com/': ('http://www.example.com', <Response [301]>)}

And a big problem with that would be timestamps in headers and such (e.g., cookies). All of this which makes me wonder exactly how good an idea it is to keep the redirect cache around.

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Sep 24, 2014

Member

I don't know that we should necessarily throw out the redirect cache, but we should at the very least document the hell out of how it is going to behave.

Member

Lukasa commented Sep 24, 2014

I don't know that we should necessarily throw out the redirect cache, but we should at the very least document the hell out of how it is going to behave.

kennethreitz added a commit that referenced this pull request Sep 25, 2014

Merge pull request #2244 from sigmavirus24/bug/2231
Avoid getting stuck in a loop

@kennethreitz kennethreitz merged commit de76fbe into requests:master Sep 25, 2014

1 check passed

default Merged build finished.
Details
@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz
Member

kennethreitz commented Sep 25, 2014

+1

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 25, 2014

Member

Uh... this wasn't exactly ready to merge. (Not that it breaks anything, but we were discussing alternative solutions.)

Member

sigmavirus24 commented Sep 25, 2014

Uh... this wasn't exactly ready to merge. (Not that it breaks anything, but we were discussing alternative solutions.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment