Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate Bounds #659

Open
raprasad opened this issue Mar 20, 2023 · 1 comment
Open

Approximate Bounds #659

raprasad opened this issue Mar 20, 2023 · 1 comment

Comments

@raprasad
Copy link
Member

raprasad commented Mar 20, 2023

from @andrewvyrros:

Seems like a nice usability thing. Mike shared with MS half a year ago. Stumbling block for ppl new to dp. Like an easy way to use some budget to let the system figure out min max bounds in DP Creator. Different from min estimator and max estimator. Not complicated to implement, but choose bin edges exponentially distance (?) and post process.

@raprasad raprasad added this to the SmartNoise milestone Apr 4, 2023
@Shoeboxam
Copy link
Member

I wrote a prototype here:

import opendp.prelude as dp
import numpy as np
dp.enable_features("contrib", "honest-but-curious")

def make_logarithmic_binning():
    """bin each datum into a floating-point band"""
    input_domain = dp.vector_domain(dp.atom_domain(T=float))
    output_domain = dp.vector_domain(dp.atom_domain(T=int))
    metric = dp.symmetric_distance()

    def function(arg):
        arg = np.array(arg)
        band = np.log2(abs(arg)) # bands grow exponentially
        band[~np.isfinite(arg)] = 1025 # infinities get their own band
        return ((arg >= 0) * 2 - 1) * (band.astype(int) + 1074) # sign * band

    return dp.t.make_user_transformation(
        input_domain, metric, output_domain, metric, 
        function, lambda d_in: d_in)  # 1-stable

def make_find_bounds(scale, alpha=1e-9):
    """makes a postprocessor that finds bounds from a vector of noisy counts"""
    n_bands = 2099 + 1 + 2099 # negative bands, zero, positive bands
    threshold = -scale * np.log(2 - 2 * (1 - alpha) ** (1 / (n_bands - 1)))

    def function(counts):
        assert len(counts) == n_bands, "expected one count per-band"
        lower_idx = (counts > threshold).argmax() - 1
        upper_idx = n_bands - (counts > threshold)[::-1].argmax()
        idx = np.array([lower_idx, upper_idx]) - 2099

        with np.errstate(over="ignore"):
            return ((idx >= 0) * 2 - 1) * 2. ** (abs(idx) - 1075)

    return function


def make_private_bounds_via_histogram(scale, alpha=1e-9):
    return (
        make_logarithmic_binning() >> 
        dp.t.then_count_by_categories(np.arange(-2099, 2100), False) >>
        dp.m.then_laplace(scale) >>
        make_find_bounds(scale, alpha)
    )

def test_binning():
    binner = make_logarithmic_binning()
    print(binner([np.nextafter(0, 1), 2.2250738585072014e-308, 0.5, 1., 2., 1.7976931348623157e+308, np.inf]))
    print(binner([np.nextafter(0, -1), -2.2250738585072014e-308, -0.5, -1., -2., -1.7976931348623157e+308, -np.inf]))


# meas = make_private_bounds_via_histogram(scale)
# print(meas(np.random.normal(size=1000)))

def test_make_find_bounds():
    post = make_find_bounds(scale=1.)
    zeros = np.zeros(2099 + 1 + 2099)
    # zeros[0] = 1000
    # zeros[1] = 1000
    zeros[2100 + 1074] = 1000
    print(post(zeros))

meas = make_private_bounds_via_histogram(scale=3., alpha=1e-9)
print(meas(np.random.normal(size=1000, scale=10))) # ~> [-8.0, 8.0]
print(meas.map(d_in=1)) # -> .333 = ε

@raprasad raprasad modified the milestones: SmartNoise, 999 - New Constructors, 999: New Mechanisms Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants