Material for this section was adapted from: https://docs.opendp.org/

# OpenDP Installation

To install the Python3 version of the OpenDP library, run the following command:
```
pip3 install opendp
```

As was discussed in class, the code in the OpenDP library is still undergoing the vetting process. For code that has not passed through that process, it marked as `contrib`. To enable such code, we can use the following command:

In [None]:
from opendp.mod import binary_search, binary_search_param, enable_features
from opendp.trans import *
from opendp.meas import *
from opendp.typing import *
from opendp.accuracy import *
enable_features("contrib", "floating-point")

Without the `contrib` flag, you will get the following error:
```
AssertionError: Attempted to use function that requires contrib, but contrib is not enabled. See https://github.com/opendp/opendp/discussions/304, then call enable_features("contrib")
```

## Transformations

A **transformation** is a non-DP abstraction defined by a function and a stability relation.
The function maps data from an input domain to an output domain. The stability relation maps data from an input metric to some output metric. We can view transformations as a basic unit of computation.


In [None]:
# Exercise:
# Write a transformation to compute the variance of a dataset.
# Test on a bunch of data.

compute_variance = lambda x : np.sum((x - np.mean(x))**2)/(len(x)-1)

from opendp.trans import make_sized_bounded_variance
from opendp.mod import Transformation
import numpy as np

for data in [[1., 2., 3., 4., 5., 5., 5., 5.],
             [1., 1., 3., 4],
             [1., 10.]]:
    n = len(data)

    # create an instance of a Transformation using a constructor from the trans module
    var: Transformation = make_sized_bounded_variance(n, (0., 10.))

    # invoke the transformation (invoke and __call__ are equivalent)
    print(var(data))

    print("Variance Equal? ", compute_variance(data) == var.invoke(data))

2.5
Variance Equal?  True
2.25
Variance Equal?  True
40.5
Variance Equal?  True


In [None]:
# Exercise: 
# Find the L2 sensitivity of a histogram query, when individuals may influence up to three rows.

histogram = make_count_by_categories(categories=["a"], MO=L2Distance[float])

binary_search(
    lambda d_out: histogram.check(3, d_out), 
    bounds = (0., 100.))

3.0000000027939677

## Measurements

A **measurement** is a DP abstraction defined by a function and a privacy relation.
The function is responsible for performing the DP release. The privacy relation maps data from an input metric to some output measure.

In [None]:
# check(d_in, d_out, *, debug=False)
#
# Also works on non-private transformations and relations.
#
# Check if the measurement satisfies the privacy relation at d_in, d_out.
# d_in – Distance in terms of the input metric.
# d_out – Distance in terms of the output measure.
# Returns True iff a release is differentially private at d_in, d_out.

from opendp.mod import Measurement, enable_features, binary_search_param
from opendp.meas import make_base_geometric, make_base_laplace
from opendp.trans import make_count

enable_features("contrib")
enable_features("floating-point")

# create an instance of Measurement using a constructor from the meas module
base_geometric: Measurement = make_base_geometric(scale=2.)

# invoke the measurement (invoke and __call__ are equivalent)
print(base_geometric.invoke(100))  
print(base_geometric(100))        

# check the measurement's relation at
# (1, 0.5): (AbsoluteDistance<u32>, MaxDivergence)
assert base_geometric.check(1, 0.5)

# chain with a transformation from the trans module
chained = (
    make_count(TIA=int) >>
    base_geometric
)

# the resulting measurement has the same features
print(chained([1, 2, 3]))
# check the chained measurement's relation at
#     (1, 0.5): (SubstituteDistance, MaxDivergence)
assert chained.check(1, 0.5)

# Exercise:
# What is the noise scale of the Laplace mechanism mechanism with an input sensitivity of 4
# (measured in absolute distance) and privacy utilization of epsilon=0.1?
scale = binary_search_param(make_base_laplace, d_in=4., d_out=0.1)
print(scale)

102
102
3
40.00000000745057


In [None]:
# Exercise:
# Create a transformation and chain it so it can handle the following case:
# Data is fed in new lines where some newlines might not have data (in which case, impute 10)
# and count the number of atomic items

# chain with more transformations from the trans module
from opendp.trans import make_split_lines, make_cast, make_clamp, make_count_distinct, make_count, make_impute_constant, make_bounded_resize

count: Transformation = make_count(TIA=int)

chained_count = (
    make_split_lines() >>
    make_cast(TIA=str, TOA=int) >>
    make_impute_constant(constant=10) >>
    count
)

print(chained_count("1.0\n\n2.0\n\n15.0\n10.0\n1.0\n\n1.0\n3.0"))

print("Count == 10?", chained_count("1.0\n\n2.0\n\n15.0\n10.0\n1.0\n\n1.0\n3.0") == 10)

10
Count == 10? True


Also review material from:   
https://github.com/opendp/cs208/blob/main/spring2022/examples/wk8_opendp.ipynb

## Errors

Example 1: suppose the system returns an overflow error if at any point in a summation, the sum exceeds max_val. Using this, you can get the exact answer to the a count of elements that satisfy some predicate on a dataset

1. Let q(x) return the 'value' associated with elements in x that satisfy some predicate p. Let sum_{C, q}(x) = sum(clamp(q(x), C)) for some constant C

2.  Perform a binary search to find the smallest value of C such that sum_{C,q}(x) returns an overflow error.

3. Because we know the&nbsp;max_val&nbsp;at which overflow occurs, we can just do&nbsp;max_val/C to get the approximate number of elements in x that satisfy the predicate p.

Example 2: system returns an error if the user-entered upper bound &lt; lower bound. Does this preserve DP? Why is this different from the previous example?

Issues with errors rely on the errors depending on the dataset, rather than on values that are known to the data analyst.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=0df97151-1c38-49bc-a037-2a058b96fd82' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>