## Install OpenDP

In [None]:
pip install opendp

Collecting opendp
  Downloading opendp-0.12.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.1 kB)
Downloading opendp-0.12.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m25.0/25.0 MB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: opendp
Successfully installed opendp-0.12.1


## OpenDP Programming Framework Demo

import the OpenDP library. Enable the "honest-but-curious" and "contrib" flags:
- **Honest-but-Curious**: We will require a looser trust model, as we cannot verify any privacy or stability properties of user-defined functions (exercise 2).
- **Contrib**: include mechanisms which have not yet been fully-vetted

In [None]:
import opendp.prelude as dp
import pandas as pd
import numpy as np
dp.enable_features("honest-but-curious", "contrib")

# Read in the dataset
# We will look at income data from the California PUMS dataset
data = dp.examples.get_california_pums_path().read_text()

# the greatest number of records that any one individual can influence in the dataset
max_influence = 1

# establish public information
col_names = ["age", "sex", "educ", "race", "income", "married"]

# we can also reasonably intuit that age and income will be numeric,
# as well as bounds for them, without looking at the data
age_bounds = (0, 100)
income_bounds = (0, 150_000)

### Exercise 1: Computing a private variance

In this exercise, you will compute a DP variance over the age column. See [`then_variance()`](https://docs.opendp.org/en/stable/api/python/opendp.transformations.html#opendp.transformations.then_variance) and [`make_variance()`](https://docs.opendp.org/en/stable/api/python/opendp.transformations.html#opendp.transformations.make_variance) in the OpenDP documentation. We will give you the code for releasing a private count since the variance transformation requires an input domain with a known (bounded) dataset size.

In [None]:
income_preprocessor = (
    # Convert data into a dataframe where columns are of type Vec<str>
    dp.t.make_split_dataframe(separator=",", col_names=col_names) >>
    # Selects a column of df, Vec<str>
    dp.t.make_select_column(key="age", TOA=str) >>
    dp.t.then_cast_default(TOA=float) >>
    # Clamp income values
    dp.t.then_clamp(bounds=tuple(map(float, age_bounds)))
)

dp_count_measurement = income_preprocessor >> dp.t.then_count() >> dp.m.then_laplace(1.)
count_release = dp_count_measurement(data)
print(count_release)

Use the DP count above as input to a DP variance measurement. You can use `dp.binary_search_chain` (see [OpenDP documentation](https://docs.opendp.org/en/stable/api/user-guide/utilities/parameter-search.html)) to find the right scale so that your DP variance is $\varepsilon = 1$ differentially private.

In [None]:
# TODO:
#  (1) apply a 'then_resize' transformation to the dataset so that it is of size count_release
#  (2) apply the 'then_variance' transformation to compute the variance
variance_transformation = income_preprocessor >> # TODO

# TODO: (3) use dp.binary_search_chain to find the right scale to pass into 'then_laplace' so that the DP variance is epsilon = 1 differentially private
dp_variance = dp.binary_search_chain(
    #TODO
)

#print(dp_variance(data))

## Exercise 2: Create a user-defined transformation

Use the Plugins API to create a user-defined transformation `make_trimmed` (see [`make_user_transformation`](https://docs.opendp.org/en/stable/api/python/opendp.transformations.html#opendp.transformations.make_user_transformation)) that removes the smallest $\alpha$ fraction of elements and the largest $\alpha$ fraction of elements from the dataset, based on their positions after sorting.


In [None]:
def make_trimmed(alpha, n):
    """Constructs a Transformation that trims the bottom alpha and top (1 - alpha) percentiles from the dataset"""
    def function(arg: list[int]) -> list[int]:
        # TODO: return the trimmed dataset
        pass

    def stability_map(d_in: int) -> int:
        # TODO: fill in the stability map function
        # The trimming transformation should map d_in close inputs to c * d_in close outputs for some value of c
        # In section we showed that this transformation is 1-stable when the input and output metrics are both the Symmetric Distance
        pass

    return dp.t.make_user_transformation(
        input_domain=dp.vector_domain(dp.atom_domain(T=float), size=n),
        input_metric=dp.symmetric_distance(),
        output_domain=dp.vector_domain(dp.atom_domain(T=float), size = int(((1 - 2*alpha)*n))),
        output_metric=dp.symmetric_distance(),
        function=function,
        stability_map=stability_map,
    )

trim_transformation = (
    (dp.vector_domain(dp.atom_domain(T=float), size=len(data)), dp.symmetric_distance())
    >> dp.t.then_cast_default(TOA=float)
    >> make_trimmed(alpha=0.05, n=len(data))
)

# Test your transfomation
#print(trim_transformation(data))

## Exercise 3: Create a DP Trimmed Mean Measurement

Using your `make_trimmed` transformation, compute a DP trimmed mean. Don't forget to clamp the data after trimming the dataset.

In [None]:
alpha = 0.05
epsilon = 1.0
n = len(data)
lower, upper = (0.0, 1_000_000.0)
scale = (upper - lower) / (2 * alpha * n * epsilon)
trim_transformation = (
    (dp.vector_domain(dp.atom_domain(T=float), size = n), dp.symmetric_distance())
    >> dp.t.then_cast_default(TOA=float)
    >> make_trimmed(alpha, n)
)
dp_trimmed_mean = trim_transformation >> #TODO: finish by chaining (1) a clamping transformation (2) a mean transformation and (3) a Laplace measurment
#print(dp_trimmed_mean(data))