Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get_ident method to BaseThrottle class #1222

Closed
wants to merge 13 commits into from

Conversation

Projects
None yet
4 participants
@kahnvex
Copy link
Contributor

commented Nov 11, 2013

This pull adds a get_ident method to the BaseThrottle class. This function is used to determine the identity of a request and will use X_FORWARDED_FOR and REMOTE_ADDR in that order to determine the identity of the machine making the request.

This fixes an issue where UserRateThrottle and ScopedRateThrottle use REMOTE_ADDR over X_FORWARDED_FOR to determine machine identity. This will not work if you are behind a proxy (like a load balancer). Using X_FORWARDED_FOR, if available, will fix this. This pull also solves a vulnerability, see the edit below.

get_ident will:

  • Try to use X_FORWARDED_FOR first
  • Fall back to REMOTE_ADDR

Edit:
This also fixes a possible vulnerability caused by using the entire HTTP_X_FORWARDED_FOR string. If an attacker is using one or more anonymous proxies when making requests against a DRF2 API he is able to change the HTTP_X_FORWARDED_FOR string in one of the proxies.

In the following case:

HTTP_X_FORWARDED_FOR = "client1, anon_proxy1, anon_proxy2, application_proxy, application_server"

Only anon_proxy2 should be trusted, because it is added by application_proxy (which we trust because it is a part of our application). If we trust client1 or anon_proxy1 we open up a vulnerability in which anon_proxy2 changes the HTTP_X_FORWARDED_FOR value and it now looks unique to the server, and will bypass the throttle.

This vulnerability can be found here: 2d5e14a

kahnjw added some commits Nov 11, 2013

kahnjw
Add get_ident method to BaseThrottle class
* Tries to use X_FORWARDED_FOR first
* Falls back to REMOTE_ADDR
@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 11, 2013

The implementation here trusts X_FORWARDED_FOR implicitly. An attacker could change the first X_FORWARDED_FOR value on every request and bypass the throttle. To fix this we should only trust the ip directly before the developers first proxy.

Something like this will fix it:

ident = xff.split(',')[-min(num_proxies, len(xff))].strip()

But this requires that the user configure the number of application proxies, probably in settings.py. I am working on adding code to do this.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 12, 2013

a1b4695 should make the number of proxies user configurable. Will add this to the documentation next.

Also, if NUM_PROXIES = 0 should get_ident just use REMOTE_ADDR?

kahnjw added some commits Nov 12, 2013

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 12, 2013

I know I did this in the wrong order, but adding tests to this is next.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 13, 2013

Those tests should cover using XFF to identify uniqueness of the machine making the request.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 22, 2013

Bueller...

@tomchristie

This comment has been minimized.

Copy link
Member

commented Nov 22, 2013

Heya :)

Yup need to consider this. Even with this I don't know enough to know if IP address spoofing is still possible or not?

I'm not super keen on forcing dev to set this - I'd consider making NUM_PROXES=None be the default, and still use HTTP_X_FORWARDED_FOR in that case. That doesn't protect against deliberate spoofing via a proxy, but it does at least ensure that the standard cases are properly throttled. Folks who need something more targeted than that could then also set NUM_PROXIES. Any thoughts?

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 22, 2013

Defaulting NUM_PROXIES=None is definitely the way to go. But ideally we'll use a combination of HTTP_X_FORWARDED_FOR and REMOTE_ADDR in the following circumstances:

1: The user does not set NUM_PROXIES or sets it to 0: use REMOTE_ADDR to set the identifier for that client. In this case we cannot trust anything in HTTP_X_FORWARDED_FOR because a client can do anything they want with HTTP_X_FORWARDED_FOR, they can change it's values on any given request, bypassing the throttle. Because there is no proxy in our app, no trusted source will be adding anything to HTTP_X_FORWARDED_FOR so it should all together not be trusted. REMOTE_ADDR is added by our application server, so we trust it. Also in this case HTTP_X_FORWARDED_FOR will probably not be set.

2: The user sets NUM_PROXES to > 0: We have to parse HTTP_X_FORWARDED_FOR.
If NUM_PROXIES = 1 trust the last IP entry in HTTP_X_FORWARDED_FOR and use it to ID the client
If NUM_PROXIES = 2 trust the last 2 IP entry in HTTP_X_FORWARDED_FOR and use the 2nd to last IP to ID the client.
In these cases we trust the IPs in HTTP_X_FORWARDED_FOR which are added by trusted sources - proxies that we have configured in our application.

This gets a little weird because an attacker could use multiple proxies to get past the throttle limit and we would detect them as unique, but at least this way it is cost bound not network bound. Each proxy costs something to run, whereas changing the contents of HTTP_X_FORWARDED_FOR on every request is virtually free.

There is a way to take care of this in Apache configuration, but it would be ideal to make this application configurable as well. Here is an Apache mod that basically achieves the same effect: http://www.stderr.net/apache/rpaf/

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 22, 2013

The code in this pull adheres to all of this except it should default NUM_PROXIES to None as @tomchristie suggests.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 22, 2013

Here is a drawing of two different scenarios that illustrate how this works: https://docs.google.com/drawings/d/1XnGlJ_q8z7tudN742ndSO_nity224Db8HdLoQiJaeSA/pub?w=960&h=720

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Nov 22, 2013

I guess in the end I don't see the point of using HTTP_X_FORWARDED_FOR unless you have a proxy sitting in front of your web server. Unless I am totally missing something.

@tomchristie

This comment has been minimized.

Copy link
Member

commented Dec 5, 2013

kahnjw: The deal here is that the default setup shouldn't require the dev to configure anything in order to get a sensible behaviour. Using HTTP_X_FORWARDED_FOR is still useful in that context in that it will correctly identify and throttle non-maliciously crafted requests. Sure it does guard against some using proxies for masking or crafting the requests to look like that, but it will still work for the more common case of someone just blindly trying to hammer the API.

@tomchristie

This comment has been minimized.

Copy link
Member

commented Dec 5, 2013

So yes, this is looking good, and it'll be useful to be able to explicitly configure the # of proxies, but I think the default case shouldn't change the current behaviour.

@tomchristie

This comment has been minimized.

Copy link
Member

commented Dec 5, 2013

Once that update is made we can look at into merging this.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2013

@tomchristie Totally agree that default behavior changes should be avoided. When I was writing this I looked at the current behavior:

  1. AnonThrottle defaults to HTTP_X_FORWARDED_FOR https://github.com/kahnjw/django-rest-framework/blob/2d5e14a8d39a53c8a2e6d28fb8ae7debb5fbd388/rest_framework/throttling.py#L155
  2. Both UserRateThrottle and ScopedRateThrottle use REMOTE_ADDR.

So my logic was to use REMOTE_ADDR in the default case to follow current behavior as closely as possible. Also, most users do not have proxies, and in that case you'd want to use REMOTE_ADDR. That being said I'll switch it to HTTP_X_FORWARDED_FOR, but in the default setting, it will change current behavior for UserRateThrottle and ScopedRateThrottle. So that change may break expected behavior for some users.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2013

I could give get_ident a default_to option, and that way we can preserve current behavior.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2013

I think this would allow preservation of current behavior: https://gist.github.com/kahnjw/7810936

kahnjw
@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2013

7019236 Should preserve current default behavior.

@tomchristie

This comment has been minimized.

Copy link
Member

commented Dec 6, 2013

Looking good. I'm not sure if we need the default_to or not. Ideally I'd like to do without, tho it might mean this'd be deferred to 2.4.0. I'll probably also expand the documentation to explain how the identification works a bit. There's also subtleties to keep in mind such as users trying to access the service who are legitimately behind a firewall (eg corp. or university setups)

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2013

Your call either way. Let me know if I should remove default_to and hard code HTTP_X_FORWARDED_FOR as the default and switch the pull to merge to 2.4.0.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2013

Also I'll expand on the documentation and note that if a client's network is NAT'd they should understand the consequences of parsing HTTP_X_FORWARDED_FOR and set an appropriate limit.

@tomchristie

This comment has been minimized.

Copy link
Member

commented Dec 6, 2013

Let me know if I should remove default_to and hard code HTTP_X_FORWARDED_FOR as the default and switch the pull to merge to 2.4.0.

That'd be great, yeah. HTTP_X_FORWARDED_FOR, falling back to REMOTE_ADDR as the default.

Nice work :)

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2013

Switching to 2.4.0

@kahnvex kahnvex closed this Dec 6, 2013

header to uniquely identify client machines for throttling. If
HTTP_X_FORWARDED_FOR is not present `REMOTE_ADDR` header value will be used.

To help Django REST Framework identify unique clients the number of application proxies can be set using `NUM_PROXIES`. This setting will allow the throttle to correctly identify unique requests whenthere are multiple application side proxies in front of the server. `NUM_PROXIES` should be set to an integer. It is important to understand that if you configure `NUM_PROXIES > 0` all clients behind a unique [NAT'd](http://en.wikipedia.org/wiki/Network_address_translation) gateway will be treated as a single client.

This comment has been minimized.

Copy link
@thomasw

thomasw Dec 6, 2013

whenthere => when there

@kahnvex kahnvex deleted the kahnvex:get_ident_from_xff branch Dec 12, 2013

@tomchristie

This comment has been minimized.

Copy link
Member

commented Dec 13, 2013

Nice work! I've tweaked the docs slightly, and also changed the behaviour so that NUM_PROXIES=0 is no longer the same as NUM_PROXIES=None ie. if set to 0 then REMOTE_ADDR should always be used.

@kahnvex

This comment has been minimized.

Copy link
Contributor Author

commented Dec 13, 2013

Cool! Would love to continue contributing. I've been looking at the roadmap. Should I comment there for other tasks?

@tomchristie

This comment has been minimized.

Copy link
Member

commented Dec 13, 2013

Would love to continue contributing.

⚡️ 😄 ⚡️ Please, yeah.

Ideally against tickets as it's easier to maintain a conversation that way, although also against the wiki or on the discussion group also fine.

@sanchaz

This comment has been minimized.

Copy link

commented Mar 31, 2019

kahnjw: The deal here is that the default setup shouldn't require the dev to configure anything in order to get a sensible behaviour. Using HTTP_X_FORWARDED_FOR is still useful in that context in that it will correctly identify and throttle non-maliciously crafted requests. Sure it does guard against some using proxies for masking or crafting the requests to look like that, but it will still work for the more common case of someone just blindly trying to hammer the API.

@tomchristie Sorry to be picking up a very old PR, don't know if there's somewhere better to address this, let me know if there is.
Just wanted to understand better the reasoning for NUM_PROXIES=None to use HTTP_X_FORWARDED_FOR if it is available, it sounded very odd to me and made me dig into the source code and end up at this pull request.
Like @kahnvex originally suggested, wouldn't assuming HTTP_X_FORWARDED_FOR is unsafe if NUM_PROXIES is not set be safer?
Since the result would be anyone attempting to bypass the throttle would likely throttle themselves and any other users, since REMOTE_ADDR is more likely to be the same for all requests?
Equally someone hammering the API will end up throttling all requests.
The current default behaviour is a malicious user is able to bypass the throttle and DOS the server(s).
I'd argue, that for a default sensible behaviour, the premature throttling is a better outcome than a successful DOS on the server.
Let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.