Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxies must treat target signals independently from client requests #114

Closed
chris-wood opened this issue Mar 28, 2022 · 17 comments · Fixed by #130
Closed

Proxies must treat target signals independently from client requests #114

chris-wood opened this issue Mar 28, 2022 · 17 comments · Fixed by #130

Comments

@chris-wood
Copy link
Collaborator

Something that came up during Vienna during the rate limiting topic is how these signals are consumed and acted upon by proxies. We know that the proxy sending additional bits per client might allow the target to partition the anonymity set to detrimental effect. But we haven't fully explored how target->proxy signals, like rate limits, are consumed and applied.

Generally speaking, I think that proxies need to act on any target->proxy signals independently of client requests, as otherwise a target can use this signal to try and partition the anonymity set. That means, for example, that a proxy should not apply rate limits to just one bad-behaving client, but apply limits uniformly across all client requests.

@martinthomson
Copy link
Collaborator

Do you mean the oblivious request resource -> oblivious proxy resource hop? The target can't communicate with the proxy; anything it sends is encapsulated (which is why Tiru wanted a new header field that breaks out of that encapsulation).

The most obvious example (see Erik's comment) is that a particular request is identified as abusive somehow. Letting the proxy know, so that it might treat that client differently, is the entire goal of that process. And I don't think that there is any inherent risk of that being tied to an individual request. Though I think that this is unlike the CDN reputation case where a set of attributes is forwarded along with a request; the goal here is to have the proxy do the extra work of offloading requests by blocking or slowing them. I guess that slowing requests might have an observable effect through the timestamps.

There is a second piece, which is not request-related. A server might become overloaded and need to tell the proxy to slow down in the aggregate. A holistic signal makes sense there.

@chris-wood
Copy link
Collaborator Author

Do you mean the oblivious request resource -> oblivious proxy resource hop? The target can't communicate with the proxy; anything it sends is encapsulated (which is why Tiru wanted a new header field that breaks out of that encapsulation).

Surely the target can communicate with the proxy -- it just adds a header, just like Tiru did.

The most obvious example (see #66 (comment)) is that a particular request is identified as abusive somehow. Letting the proxy know, so that it might treat that client differently, is the entire goal of that process. And I don't think that there is any inherent risk of that being tied to an individual request.

Yes, I understand the goal, but I'm saying that goals runs contrary to the competing privacy goal of OHTTP, which is to maintain client<>request unlinkability.

As a simple example, consider the following scenario. There are three clients C1, C2, and C3 sending requests to a target through a proxy in rounds. In each round, the target can guess which request corresponds to which client with some probability. Absent any information about the client, the target guesses correctly with probability 1/3.

Now let's assume the proxy flags one request from one client Ci in round j with the shadowban bit. The target, in response, decides it's going to apply some rate limit or otherwise ask the proxy to treat the client differently. And let's say it does that by banning Ci from sending a request in round j+1.

Now, in round j+1, the target sees only two requests. If all requests were unlinkable, the probability that the target could correctly guess the client for each request in j+1 and j would be 1/9 (=1/3 * 1/3). But that's not the case anymore, since the target knows the requests in j+1 correspond to the unflagged requests in round j.

This is all pretty farfetched, but I think the core idea is simply that the proxy allowed the target to partition the anonymity set by applying per-client actions at request of the target. Balancing abuse and privacy here seems pretty challenging.

@martinthomson
Copy link
Collaborator

it just adds a header

The response from the target will be encapsulated, so new headers will be hidden from the proxy. What Tiru was suggesting was that we teach the response resource to recognize a header that is not encapsulated, but copied to the outer envelope instead.

the probability that the target could correctly guess the client for each request in j+1 and j would be 1/9 (=1/3 * 1/3). But that's not the case anymore

Isn't this exactly the sort of thing that we're warning about when we say that every bit can split the client population in half? That speaks to a similar restriction on per-response entropy as exists on requests. The question whether this is orthogonal (and additive) or correlated (and therefore not able to divide the client population again).

@chris-wood
Copy link
Collaborator Author

chris-wood commented Mar 29, 2022

The response from the target will be encapsulated, so new headers will be hidden from the proxy. What Tiru was suggesting was that we teach the response resource to recognize a header that is not encapsulated, but copied to the outer envelope instead.

Yes, I'm saying this header is not encapsulated. 👍

Isn't this exactly the sort of thing that we're warning about when we say that every bit can split the client population in half? That speaks to a similar restriction on per-response entropy as exists on requests. The question whether this is orthogonal (and additive) or correlated (and therefore not able to divide the client population again).

Well, sort of. Perhaps restricting per-response entropy will help, but I think this needs more analysis. In any case, if the proxy does not treat client requests independently of what information it learns from the target, then things start becoming correlated. I'd prefer we not just overlook this relationship as we reason about (a) what information proxies (and targets) can send to targets (and proxies, respectively), and (b) how the recipient of that entity uses the information.

As of now, I think applying per-client restrictions to requests upon being signaled from the target is harmful (see scenario described above).

@tireddy2
Copy link

We updated draft-rdb-ohai-feedback-to-proxy to handle both server overload and malicious clients attacking the server scenarios. The latter case is tricky as it can be potentially abused to identify a client. We proposed the following changes to address this attack:

1: Indicates that RateLimit fields are applicable to all the clients
that are serviced by the same Oblivious proxy.

2: Indicates that RateLimit fields are applicable only to the
offending client. For example, this value is used if the client
is attacking the server (e.g., the client is using an abnormal
header that matches an attack pattern). The Oblivious proxy can
shadowban requests from the offending client for a certain
duration instead of rate-limiting the requests when the client has
a high ratio of malicious requests to legitimate requests.

@chris-wood
Copy link
Collaborator Author

The second value of the "ohttp-target" parameter seems to be the problematic case here, as described earlier in this issue. @tireddy2, do you think it's safe (from a client privacy perspective) for the proxy to change its behavior on a per-client basis?

@tireddy2
Copy link

Yes, the second value can be potentially abused by the target. However, the proxy does not immediately act on the second value to rate-limit the traffic from the client but starts maintaining a count of responses to the client with "ohttp-target" parameter set to 2 (potential malcious-requests) and responses without the parameter (legitimate-requests). If the client has a high ratio of malicous-requests to legitimate-requests, it can shadow requests from the offending client for certain duration.
A typical botnet attacking the server would not send a single malicious request but multiple requests from the same client and from multiple clients to overload the server. The target has no clue if the requests are coming from the same client or different clients and the proxy is using the second value only as a hint to determine the client reputation score and act accordingly at a later time.

Do you think the proposed mechanism is safe to protect the client privacy or it can possibly be abused by the target ?

@martinthomson
Copy link
Collaborator

So I'm seeing some fairly positive things in the new version of the feedback draft. What I think we should do is first take this discussion to the mailing list, but then talk about whether just providing rate limiting signals is the right approach or whether a more generic communication path needs to be established.

Rate limiting does allow us to address a good number of the use cases we have identified so far, but it could be constraining if we find that request resources need to be updated piecemeal to enable new use cases later. A more generic signaling scheme might relieve some of that pressure.

It might be that all we need is a negotiation system whereby the request resource signals what it is willing to keep outside of encapsulation and the target then uses those. Given likely deployment scenarios, that might work even without signaling in some cases, but a more explicit scheme might be better for interoperability.

@chris-wood
Copy link
Collaborator Author

Do you think the proposed mechanism is safe to protect the client privacy or it can possibly be abused by the target ?

No, I don't, as demonstrated above.

So I'm seeing some fairly positive things in the new version of the feedback draft. What I think we should do is first take this discussion to the mailing list, but then talk about whether just providing rate limiting signals is the right approach or whether a more generic communication path needs to be established.

Rate limiting does allow us to address a good number of the use cases we have identified so far, but it could be constraining if we find that request resources need to be updated piecemeal to enable new use cases later. A more generic signaling scheme might relieve some of that pressure.

It might be that all we need is a negotiation system whereby the request resource signals what it is willing to keep outside of encapsulation and the target then uses those. Given likely deployment scenarios, that might work even without signaling in some cases, but a more explicit scheme might be better for interoperability.

I don't think the question is whether rate limits are sufficient for the use cases we care about. As described via example above, the relevant question here seems to be whether the signals from target to proxy -- whatever they may be, and however they may be sent -- can be used to further partition the client anonymity set.

@martinthomson
Copy link
Collaborator

Fair. So what does a system that looks good look like? Do we have to frame differential treatment in terms of things that the client accepts, just like we do for added information? Because we might frame differential treatment as being roughly equated to adding information to requests.

@chris-wood
Copy link
Collaborator Author

Do we have to frame differential treatment in terms of things that the client accepts, just like we do for added information? Because we might frame differential treatment as being roughly equated to adding information to requests.

This would be a reasonable framing, yeah. Something something "if you misuse this proxy, you run the risk of revealing information to the target"?

@tireddy2
Copy link

Do you think the proposed mechanism is safe to protect the client privacy or it can possibly be abused by the target ?

No, I don't, as demonstrated above.

In your example of three clients, the proxy does not rate-limit the requests from client C1 based on a single feedback from proxy. The proxy will block traffic from C1 only after it sees such a feedback signal for multiple requests from C1. Assuming the threshold is 10 requests, the probability score will be (0.000016) to send successive 10 feedback signals to C1. The mechanism only works if the target sees an attack pattern or garbage data in multiple requests from C1. An legitimate client will never send requests which are linkable unlike a malicious client which will send malformed requests.

@chris-wood
Copy link
Collaborator Author

@tireddy2 Well, this is different from the example I proposed. Nevertheless, I'm not convinced this changes the situation in any meaningful way. I think we need to be clear about the risks here, and stating them in a way like @martinthomson proposed seems like a good way to do that.

@tireddy2
Copy link

@tireddy2 Well, this is different from the example I proposed. Nevertheless, I'm not convinced this changes the situation in any meaningful way. I think we need to be clear about the risks here, and stating them in a way like @martinthomson proposed seems like a good way to do that.

Agreed, I was trying to show the proposed mechanism does not adversely impact the privacy of legitimate clients.

@tireddy2
Copy link

So I'm seeing some fairly positive things in the new version of the feedback draft. What I think we should do is first take this discussion to the mailing list, but then talk about whether just providing rate limiting signals is the right approach or whether a more generic communication path needs to be established.

Rate limiting does allow us to address a good number of the use cases we have identified so far, but it could be constraining if we find that request resources need to be updated piecemeal to enable new use cases later. A more generic signaling scheme might relieve some of that pressure.

It might be that all we need is a negotiation system whereby the request resource signals what it is willing to keep outside of encapsulation and the target then uses those. Given likely deployment scenarios, that might work even without signaling in some cases, but a more explicit scheme might be better for interoperability.

Sounds good, we can introduce a new header (e.g., h=header1:header2:header3) for the request resource to signal that it will keep header1, header2 and header3 outside of the encapsulation so the target can decide to use those headers. RateLimit-Limit can be one of the headers to start with.

@chris-wood
Copy link
Collaborator Author

Agreed, I was trying to show the proposed mechanism does not adversely impact the privacy of legitimate clients.

Sorry, but I don't think this is true. I'm saying that we've not demonstrated this to be the case, and the burden is on us to do so if we are to recommend it in the spec.

@tireddy2
Copy link

Agreed, I was trying to show the proposed mechanism does not adversely impact the privacy of legitimate clients.

Sorry, but I don't think this is true. I'm saying that we've not demonstrated this to be the case, and the burden is on us to do so if we are to recommend it in the spec.

Sure, the onus will anyway be on draft-rdb-ohai-feedback-to-proxy to prove it is privacy preserving.

tireddy2 added a commit to tireddy2/ohai that referenced this issue Jun 29, 2022
Updated draft based on the discussion with the authors of Oblivious HTTP protocol draft.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants