From 500675bd7361933341711ab452c188222afda19f Mon Sep 17 00:00:00 2001 From: Christopher Wood Date: Wed, 24 Feb 2021 11:33:56 -0800 Subject: [PATCH] Start system overview and requirements. --- design-document.md | 90 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 87 insertions(+), 3 deletions(-) diff --git a/design-document.md b/design-document.md index 5755de3e..5765fddc 100644 --- a/design-document.md +++ b/design-document.md @@ -1,8 +1,89 @@ # Prio v3 Design Document -## Sample user stories +## Architecture overview -## Threat model +Prio is a system and protocol for privately computing aggregation functions over private +input. An aggregation function F is one that computes an output y = F(x1,x2,...) for inputs +xi. In general, Prio supports any aggregation function whose inputs can be encoded in a +particular way. However, not all aggregation functions admit an efficient encoding, rendering +them impractical to implement. Thus, Prio supports a limited set of aggregation functions, +some of which we highlight below: + +- Simple statistics, including sum, mean, min, max, variance, and standard deviation; +- Bit vector OR and AND operations; and +- Count-min sketch (approximated frequency counts) over a closed universe of strings. + +The applications for such aggregations functions are large, including, though not limited to: +counting the number of times a sensitive or private event occurs and approximating the frequency +that sensitive tokens or strings occur. + +Client applications hold private inputs to the aggregation function, server processors, +or custodians, interact in a multi-party computation to compute the output, and a final +collector obtains the output of the aggregator. At a high level, the flow of data through +these entities works roughly as follows: + +~~~ + +------------+ + (1) Batch submission | | (3) Collection + +-----------------------> Processor +------------------+ + | | | | + | +-^-------^--+ | + | | | | + | | | | + | | | (2) MPC | ++--------+ +--------v--+ | eval +----v------+ +| | | | | | | +| Client +-----------> Processor | | | Collector | +| | | | | | | ++--------+ +--------^--+ | +----^------+ + | | | | + | | | | + | | | | + | +-v-------v--+ | + | | | | + +-----------------------> Processor +------------------+ + | | + +------------+ +~~~ + +1. Applications split inputs into multiple (at least two) anonymized and encrypted shares, + and upload each share to different processors that do not collude or otherwise share + data with one another. Applications continue this process until a "batch" of data is + collected. Upon receipt of a share, each processor verifies it for correctness. + (Details about input validation and how it pertains to system security properties is + in {{CITE}}.) +2. Each processor aggregates its shares into a partial sum. The processors then engage + in a multi-party protocol to combine these sums into a final, aggregation output. +3. The aggregation output is sent to the collector. + +The output of a single batch aggregation reveals little to nothing beyond the value itself. + +## Security overview + +Prio assumes a powerful adversary with the ability to compromise an unbounded number of +clients. In doing so, the adversary can input malicious (yet truthful) to the aggregation +function. Prio also assumes that all but one server operates honestly, where a dishonest +server does not execute the protocol faithfully as specified. The system also assumes +that servers communicate over secure and mutually authenticated channels. In practice, +this can be done by TLS or some other form of application-layer authentication. + +In the presence of this adversary, Prio provides two important properties for computing +an aggergation function F: + +1. Privacy. The adversary learns only the output of F computed over all client inputs, + and nothing else. +1. Robustness. The adversary can influence the output of F only by reporting false + (untruthful) data. The output cannot be influenced in any other way. + +There are several additional constraints that a Prio deployment must satisfy in order +to achieve these goals: + +1. Minimum batch size. The aggregation batch size has an obvious impact on privacy. + (A batch size of one hides nothing of the input.) {{questions-and-params}} discusses + appropriate batch sizes and how it pertains to privacy in more detail. +2. Aggregation function choice. Some aggregation functions leak slightly more than the + function output itself. {{questions-and-params}} discusses the leakage profiles of + various aggregation functions in more detail. ## System requirements @@ -12,6 +93,9 @@ ## System design -## Open questions and system parameters +## Open questions and system parameters {#questions-and-params} + +[[OPEN ISSUE: discuss batch size parameter and thresholds]] +[[OPEN ISSUE: discuss f^ leakage differences from HCG's paper]] ## Cryptographic dependencies