# OpenDP Library Audience Exercise

Welcome to PPAI-23!


## Exercise 1: DP Sum

### 1.1. Enable "contrib" flag. 

Any constructors that have not completed the proof-writing and vetting process may still be accessed if you opt-in to "contrib".
Please contact us if you are interested in proof-writing. Thank you!

[[Documentation]](https://github.com/opendp/opendp/discussions/304)

### 1.2. Data Preprocessing

Write a transformation to cast the data to a vector of floating-point numbers, 
and then clamp the data.

[[Example]](https://docs.opendp.org/en/stable/user/getting-started.html#Transformation-Example:-Clamp)

In [6]:
from opendp.transformations import make_cast_default, make_clamp

mock_dataset = ["1.", "2.", "3."]
caster = make_cast_default(str, int)

# set bounds to be used for the rest of our computation:
bounds = ("TODO", "TODO")

# construct a clamper!
clamper = "TODO"

# chain the caster and clamper together!
bounded_data_trans = "TODO"

# bounded_data_trans(mock_dataset)

## 1.3. Construct a non-DP Sum Transformation

This sum transformation should be able to chain with the clamp transformation in the previous section.

[[Documentation]](https://docs.opendp.org/en/stable/user/getting-started.html#Transformation-Example:-Sum)

In [7]:
from opendp.transformations import make_bounded_sum

sum_trans = "TODO"

# chain the two transformations you've made!
# bounded_sum_trans = bounded_data_trans >> sum_trans

## 1.5. Apply a Differentially Private Mechanism
Build a DP Sum measurement by chaining the exact sum with a laplace noise measurement.

[[Documentation]](https://docs.opendp.org/en/stable/user/getting-started.html#The-Laplace-Mechanism)

In [8]:
from opendp.measurements import make_base_laplace

base_laplace = "TODO"

# chain the bounded sum transformation
dp_sum_meas = "TODO"

# if you use laplace noise, you should be able to pass in a bound on contributions and get back an ε
# dp_sum_meas.map(1)

## Exercise 2: Privatizing a Poll
This code snip downloads a dataset that we'll use for this exercise. Don't forget to comment out the lines indicated by the `TODO`!

In [1]:
# download a dataset
dataset_url = "https://raw.githubusercontent.com/opendp/dp-test-datasets/main/poll_data.csv"
dataset_path = "poll_data.csv"

# TODO: comment these lines to use the poll data instead
dataset_url = "https://raw.githubusercontent.com/opendp/dp-test-datasets/master/data/PUMS_california_demographics_1000/data.csv"
dataset_path = "pums_1000.csv"

# download the data
import os, requests
if not os.path.exists(dataset_path):
    open(dataset_path, "wb").write(requests.get(dataset_url).content)    

csv_dataset = open(dataset_path).read()

### 2.1. Load data from a CSV

Construct a transformation that loads data from a CSV and selects a column.

[[Documentation]](https://docs.opendp.org/en/stable/user/transformations.html#dataframe)

In [10]:
from opendp.transformations import make_split_dataframe, make_select_column

# TODO: replace the column names with the names from your dataset
col_names = ["age", "sex", "educ", "race", "income", "married"]
df_loader = make_split_dataframe(",", col_names=col_names)

# TODO: replace "age" with the column of interest
column_selector = make_select_column("age", TOA=str)

# chain the transformations together!
csv_loader_trans = "TODO"

# you should be able to pass data through and get a vector of strings
# csv_loader_trans(csv_dataset)

### 2.2. Compute non-DP Histogram
Look for `make_count_by_categories`: 

[[Documentation]](https://docs.opendp.org/en/stable/user/transformations.html#aggregators)


In [12]:
from opendp.transformations import make_count_by_categories
categories = ["North America", "South America", "Asia", "Europe", "Africa", "Australia"]
histogram_trans = "TODO"

# TODO: This assertion should pass:
# assert histogram_trans(["North America"] * 2 + ["South America"]) == [2, 1, 0, 0, 0, 0, 0]

### 2.3. Apply a Differentially Private Mechanism

The additive noise notebook explains how to add vector-valued noise:
[[Documentation]](https://docs.opendp.org/en/stable/user/measurements/additive-noise-mechanisms.html#Domain:-Scalar-vs.%C2%A0Vector)

In [None]:
base_vector_laplace = "TODO"

# chain the csv_loader_trans, histogram_trans and vector laplace mechanism, like usual:
dp_histogram = "TODO"

# dp_histogram(csv_dataset)
# dp_histogram.map(1)

## Exercise 3: Combinators

### 3.1. Non-Interactive Composition

More than one measurement can be composed together into a single measurement:
[[Documentation]](https://docs.opendp.org/en/stable/user/combinators.html#composition)

You can reuse measurements we've previously constructed as arguments to the basic composition combinator.

In [None]:
from opendp.combinators import make_basic_composition

basic_composition_meas = "TODO"

# basic_composition_meas(csv_dataset)

### 3.2. Measure Casting

You can use a measure-cast combinator to wrap a measurement in a new measurement that gives a privacy guarantee under a different definition of privacy.

[[Documentation]](https://docs.opendp.org/en/stable/user/combinators.html#measure-casting)

The Gaussian measurement gives a guarantee in terms of zero-Concentrated Differential Privacy.
Can you write the equivalent measurement in terms of ε(δ)-DP?

In [None]:
from opendp.measurements import make_base_gaussian
base_zCDP_gaussian = "TODO"

from opendp.combinators import make_zCDP_to_approxDP
base_approxDP_gaussian = "TODO"

Can you determine $\epsilon$ when $scale = 4.$, $\delta = 1e-7$ and $\Delta = 0.5$?

In [None]:
# epsilon_curve = base_approxDP_gaussian.map(d_in=0.5)

### 3.3 User-Defined Callbacks

The OpenDP Library can incorporate measurements defined in other libraries.
In this exercise, your own measurement that wraps a mechanism in PyDP.

This code snip uses [PyDP](https://github.com/OpenMined/PyDP) to release a DP Count:

In [None]:
# import the Count algorithm from py-DP
from pydp.algorithms.laplacian import Count

x = Count(0.6)

x.quick_result(list(range(10)))

9

Use the documentation provided here to wrap the this primitive from PyDP in a measurement:
[[Documentation]](https://docs.opendp.org/en/stable/user/combinators.html#user-defined-callbacks)

In [None]:
from opendp.typing import *
from opendp.combinators import make_default_user_measurement

In [None]:
from opendp.mod import enable_features
enable_features("honest-but-curious")


def pyDP_count(epsilon):

    from pydp.algorithms.laplacian import Count
    x = Count(0.6)

    def function(data):
        return "TODO"

    def privacy_map(d_in):
        return "TODO"

    return make_default_user_measurement(
        function,
        privacy_map,
        DI=VectorDomain[AllDomain[i32]],
        DO=AllDomain[i32],
        MI=SymmetricDistance,
        MO=MaxDivergence[f64]
    )


meas = pyDP_count(epsilon=1.)

meas([1, 2, 3])

2