Skip to content

Commit

Permalink
Start system overview and requirements.
Browse files Browse the repository at this point in the history
  • Loading branch information
chris-wood committed Feb 24, 2021
1 parent ca9cebc commit 500675b
Showing 1 changed file with 87 additions and 3 deletions.
90 changes: 87 additions & 3 deletions design-document.md
@@ -1,8 +1,89 @@
# Prio v3 Design Document

## Sample user stories
## Architecture overview

## Threat model
Prio is a system and protocol for privately computing aggregation functions over private
input. An aggregation function F is one that computes an output y = F(x1,x2,...) for inputs
xi. In general, Prio supports any aggregation function whose inputs can be encoded in a
particular way. However, not all aggregation functions admit an efficient encoding, rendering
them impractical to implement. Thus, Prio supports a limited set of aggregation functions,
some of which we highlight below:

- Simple statistics, including sum, mean, min, max, variance, and standard deviation;
- Bit vector OR and AND operations; and
- Count-min sketch (approximated frequency counts) over a closed universe of strings.

The applications for such aggregations functions are large, including, though not limited to:
counting the number of times a sensitive or private event occurs and approximating the frequency
that sensitive tokens or strings occur.

Client applications hold private inputs to the aggregation function, server processors,
or custodians, interact in a multi-party computation to compute the output, and a final
collector obtains the output of the aggregator. At a high level, the flow of data through
these entities works roughly as follows:

~~~
+------------+
(1) Batch submission | | (3) Collection
+-----------------------> Processor +------------------+
| | | |
| +-^-------^--+ |
| | | |
| | | |
| | | (2) MPC |
+--------+ +--------v--+ | eval +----v------+
| | | | | | |
| Client +-----------> Processor | | | Collector |
| | | | | | |
+--------+ +--------^--+ | +----^------+
| | | |
| | | |
| | | |
| +-v-------v--+ |
| | | |
+-----------------------> Processor +------------------+
| |
+------------+
~~~

1. Applications split inputs into multiple (at least two) anonymized and encrypted shares,
and upload each share to different processors that do not collude or otherwise share
data with one another. Applications continue this process until a "batch" of data is
collected. Upon receipt of a share, each processor verifies it for correctness.
(Details about input validation and how it pertains to system security properties is
in {{CITE}}.)
2. Each processor aggregates its shares into a partial sum. The processors then engage
in a multi-party protocol to combine these sums into a final, aggregation output.
3. The aggregation output is sent to the collector.

The output of a single batch aggregation reveals little to nothing beyond the value itself.

## Security overview

Prio assumes a powerful adversary with the ability to compromise an unbounded number of
clients. In doing so, the adversary can input malicious (yet truthful) to the aggregation
function. Prio also assumes that all but one server operates honestly, where a dishonest
server does not execute the protocol faithfully as specified. The system also assumes
that servers communicate over secure and mutually authenticated channels. In practice,
this can be done by TLS or some other form of application-layer authentication.

In the presence of this adversary, Prio provides two important properties for computing
an aggergation function F:

1. Privacy. The adversary learns only the output of F computed over all client inputs,
and nothing else.
1. Robustness. The adversary can influence the output of F only by reporting false
(untruthful) data. The output cannot be influenced in any other way.

There are several additional constraints that a Prio deployment must satisfy in order
to achieve these goals:

1. Minimum batch size. The aggregation batch size has an obvious impact on privacy.
(A batch size of one hides nothing of the input.) {{questions-and-params}} discusses
appropriate batch sizes and how it pertains to privacy in more detail.
2. Aggregation function choice. Some aggregation functions leak slightly more than the
function output itself. {{questions-and-params}} discusses the leakage profiles of
various aggregation functions in more detail.

## System requirements

Expand All @@ -12,6 +93,9 @@

## System design

## Open questions and system parameters
## Open questions and system parameters {#questions-and-params}

[[OPEN ISSUE: discuss batch size parameter and thresholds]]
[[OPEN ISSUE: discuss f^ leakage differences from HCG's paper]]

## Cryptographic dependencies

0 comments on commit 500675b

Please sign in to comment.