## Higher-Level API

The Analysis API preserves the core elements of the OpenDP framework, but simplifies the process of constructing DP analyses by using a more data-oriented approach, familiar to libraries like Pandas and NumPy.

A `Context` is a privacy accountant that mediates query access to your sensitive dataset. Although we anticipate adding more constructors in an upcoming release, at this time, a `Context` can only be constructed via the `compositor` constructor.  Broadly speaking, constructors require the following parameters:
- `data`: The data to be analyzed.
- `privacy_unit`: A tuple consisting of a metric and dataset distance
- `privacy_loss`: A tuple consisting of a privacy measure and a privacy loss parameter.

In addition, `compositor` requires:
- `split_evenly_over` or `split_by_weights`: Either one of these parameters must be specified. When making multiple DP queries on the same dataset, these parameters define how you want to split your privacy budget across your queries.

The `unit_of` and `loss_of` helper functions can be used to construct the `privacy_unit` and `privacy_loss` parameters, respectively. The `unit_of` wrapper allows you to define the contributions (the number of entries a single individual contributed to in the dataset) and the dataset distance.

By default, defining a value for the contributions parameter gives a Symmetric Distance while defining a value for the changes parameter gives a Change One Distance.

In [15]:
from typing import List
import opendp.prelude as dp
dp.enable_features("contrib")

print(dp.unit_of(contributions=3))
print(dp.unit_of(changes=3))

(SymmetricDistance(), 3)
(ChangeOneDistance(), 3)


Absolute distance, L1 Distance, and L2 Distance can also be defined as follows.

In [16]:
print(dp.unit_of(absolute=5))
print(dp.unit_of(l1=5))
print(dp.unit_of(l2=5))

(AbsoluteDistance(i32), 5)
(L1Distance(i32), 5)
(L2Distance(i32), 5)


The `loss_of` wrapper allows you to define privacy measures and loss for different forms of DP.

In [17]:
# Pure DP
measure, distance = dp.loss_of(epsilon=1.0)
# Approximate DP
measure, distance = dp.loss_of(epsilon=1.0, delta=1e-9)
# Zero-Concentrated DP
measure, distance = dp.loss_of(rho=1.0)

Now, let's create an Analysis object via the `sequential_composition` method. Note that leaving the `domain` parameter unspecified assumes that the structure of the data is public knowledge. In some cases, specifying the domain explicitly can improve utility (i.e by setting dataset size).

In [20]:
context = dp.Context.sequential_composition(
        data=[1, 2, 3],
        privacy_unit=dp.unit_of(contributions=1),
        privacy_loss=dp.loss_of(epsilon=3.0),
        domain=dp.domain_of(List[int]),
        split_evenly_over=1
    )

Once you have created the `Analysis` object, you can submit DP queries to it. This example clamps the data, computes the sum, and applies Laplace noise calibrated to the privacy budget of epsilon=3.0. The query is not applied to the data until `.release()` is called.

In [21]:
dp_sum = context.query().clamp((0, 5)).sum().laplace()
print(dp_sum.release())

7


Note that attempting to run another DP query will result in a message stating that we have exhausted our number of queries.

In [24]:
from opendp.mod import OpenDPException

try:
    print(dp_sum.release())
except OpenDPException as err:
    print(err.message)

out of queries


You can allow for more queries by changing the `split_evenly_over` parameter.

In [25]:
analysis = dp.Context.sequential_composition(
        data=[1, 2, 3],
        privacy_unit=dp.unit_of(contributions=1),
        # Approximate DP this time
        privacy_loss=dp.loss_of(epsilon=3.0, delta=1e-6),
        domain=dp.domain_of(List[int]),
        split_evenly_over=3
    )

Now, we can split our privacy budget evenly over 3 queries.

In [79]:
# Release a DP sum
dp_sum = context.query().clamp((0, 5)).sum().gaussian()
print(dp_sum.release())

# Release a DP mean
dp_mean = (
        context.query()
        .cast_default(float)
        .clamp((0.0, 5.0))
        .resize(3, constant=2.5)
        .mean()
        .gaussian()
    )
print(dp_mean.release())

# Release a DP count
dp_count = context.query().clamp((0, 5)).count().gaussian()
print(dp_count.release())

-1
-9.134325408962367
4


The `split_evenly_over` parameter splits the privacy loss evenly across each query. If you wanted to give more of our privacy budget to one of the queries, you can do so by specifying the `split_by_weights` parameter instead.

In [29]:
context = dp.Context.sequential_composition(
        data=[1, 2, 3],
        privacy_unit=dp.unit_of(contributions=1),
        # Rho DP this time
        privacy_loss=dp.loss_of(rho=1.0),
        domain=dp.domain_of(List[int]),
        # Give more privacy loss to the second query
        split_by_weights=[1, 2]
    )

In [30]:
# Release a DP sum, using 1/3 of the privacy loss
dp_sum = context.query().clamp((0, 5)).sum().laplace()
print(dp_sum.release())

# Release a DP count, using 2/3 of the privacy loss
dp_count = context.query().clamp((0, 5)).count().laplace()
print(dp_count.release())

14
3
