### OpenDP

The OpenDP Project is a community effort to build trustworthy, open source software tools for analysis of private data. The core software of the OpenDP Project is the OpenDP Library.

The OpenDP Library is a modular collection of statistical algorithms that adhere to differential privacy. The library is based on a conceptual framework described in [A Programming Framework for OpenDP](https://projects.iq.harvard.edu/files/opendp/files/opendp_programming_framework_11may2020_1_01.pdf).

The OpenDP Library can be found on GitHub: https://github.com/opendp/opendp/

In [None]:
from opendp.meas import *
from opendp.trans import *
from opendp.typing import *
from opendp.mod import enable_features, Transformation, Measurement
enable_features("contrib", "floating-point")

OpenDP represents computations with Transformations and Measurements.

In [None]:
# create a measurement that simply adds laplace noise
scale = 0.5
base_laplace: Measurement = make_base_laplace(scale=scale)

# call the measurement like a function
base_laplace(arg=23.)

23.1796400888379

In [None]:
# create a mean transformation
sized_bounded_mean: Transformation = make_sized_bounded_mean(size=n, bounds=educ_bounds)

# call the transformation like a function
sized_bounded_mean(arg=educ)

10.608553908251183

We can also _chain_ transformations and measurements. You might already think of chaining as function composition. The reason why we call it "chaining," instead of "composition," is because composition has a special meaning in differential privacy. In the context of differential privacy, composition refers to the joint release of more than one measurement.

In [None]:
# chain with the base_laplace measurement
dp_mean: Measurement = sized_bounded_mean >> base_laplace

# release a dp mean
dp_mean(arg=educ)

11.061892449178133

Now that we've shown how to build up computations and execute them, lets talk about distances.

Transformations and Measurements relate distances. There are three kinds of distances:

1. Dataset distances  
    (greatest distance between neighboring datasets)
1. Sensitivities  
    (greatest distance between queries on neighboring datasets)
1. Privacy budget    
    (greatest distance between the probability distributions)

The following shows how the `base_laplace` measurement relates a `sensitivity` to a privacy budget, `epsilon`.


In [None]:
# Check that when sensitivity is 0.15, the privacy usage is .30
base_laplace.check(d_in=.15, d_out=.30)

True

We can interpret this as: If the sensitivity is .15, then we could release the query answer with `laplace(scale=.5)` noise at a privacy expenditure of `.3 epsilon`. Equivalently, this data release is `.3 differentially private`, or `.3-DP`, where the privacy units are implicitly in terms of epsilon.

If we were to increase the sensitivity, the same relation would fail, and the release would not be `.3-DP`.

In [None]:
base_laplace.check(d_in=.16, d_out=.3)

False

As you might expect, the relation will also pass for any sensitivity that is smaller, or any privacy expenditure that is larger. 

In [None]:
print(base_laplace.check(d_in=.15, d_out=.31))
print(base_laplace.check(d_in=.14, d_out=.30))

True
True


Similarly, the `sized_bounded_mean` transformation relates a dataset distance `max_influence` to a `sensitivity`.

In [None]:
max_influence = 1 # the greatest number of records that an individual can influence
sized_bounded_mean.check(d_in=max_influence, d_out=.15)

True

You can also relate distances on chained computations. The units for the input and output distances come from the constituent transformations and measurements. 

For example, when we chain the `sized_bounded_mean` transformation and `base_laplace` measurement together, the input distance is a dataset distance, `max_influence`, and the output distance is measured in terms of a privacy budget, `epsilon`.

In [None]:
# Check that when neighboring datasets differ by at most one record, the privacy usage is .3
dp_mean.check(d_in=1, d_out=.3)

True

The `release_histogram` function behaves similarly to the `make_count_by_categories` function in OpenDP.

In [None]:
help(make_count_by_categories)

Help on function make_count_by_categories in module opendp.trans:

make_count_by_categories(categories: Any, MO: opendp.typing.SensitivityMetric = 'L1Distance<i32>', TIA: Union[ForwardRef('RuntimeType'), _GenericAlias, str, Type[Union[List, Tuple, int, float, str, bool]], tuple] = None, TOA: Union[ForwardRef('RuntimeType'), _GenericAlias, str, Type[Union[List, Tuple, int, float, str, bool]], tuple] = 'i32') -> opendp.mod.Transformation
    Make a Transformation that computes the number of times each category appears in the data. 
    This assumes that the category set is known.
    
    :param categories: The set of categories to compute counts for.
    :type categories: Any
    :param MO: output sensitivity metric
    :type MO: SensitivityMetric
    :param TIA: categorical/hashable input type. Input data must be Vec<TIA>.
    :type TIA: RuntimeTypeDescriptor
    :param TOA: express counts in terms of this numeric type
    :type TOA: RuntimeTypeDescriptor
    :return: A count_by_catego

In [None]:
# release a histogram with laplace noise
dp_histogram_laplace = (
    make_count_by_categories(categories=educ_categories, TOA=float, MO=L1Distance[float]) >>
    make_base_laplace(scale=1., D=VectorDomain[AllDomain[float]])
)
dp_histogram_laplace(educ.astype(np.int32))


[270.94759328916456,
 143.0943752629504,
 356.34345153044376,
 467.5500935317367,
 497.80218487415516,
 755.5910417303638,
 893.9578708926002,
 1058.0116826037568,
 5144.003182629482,
 1395.3386035925798,
 3968.4715173089767,
 1154.9484668847808,
 6283.353918674489,
 2271.227934866608,
 791.6482866565484,
 308.2543824770269,
 -1.059484723088854]

In practice, it's more secure to chain with the geometric mechanism:

In [None]:
# release a histogram with geometric noise
dp_histogram_geometric = (
    make_count_by_categories(categories=educ_categories) >>
    make_base_geometric(scale=1., D=VectorDomain[AllDomain[int]])
)
dp_histogram_geometric(educ.astype(np.int32))

[272,
 140,
 357,
 470,
 495,
 759,
 892,
 1059,
 5147,
 1393,
 3963,
 1155,
 6282,
 2265,
 797,
 309,
 -1]

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=0df97151-1c38-49bc-a037-2a058b96fd82' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>