Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context API #750

Merged
merged 27 commits into from
Aug 4, 2023
Merged

Context API #750

merged 27 commits into from
Aug 4, 2023

Conversation

Shoeboxam
Copy link
Member

@Shoeboxam Shoeboxam commented May 26, 2023

Closes #625

This is an initial implementation of the oft-discussed higher-level api.
Expect many API changes, and please provide feedback to help us evolve it!

See the tests for example usage.

  • Adds a Query class to help in building measurements
    • dot-chaining automatically fills in the input domain and metric from the previous transformation
      • make_a(input_domain, input_metric, ...) becomes .a(...)
    • allows a single free parameter anywhere in the Query, which is automatically solved-for
      • (can leave scale un-specified, and retrieve the discovered scale param later)
    • uses __getattr__ magic to dispatch to constructors in the transformations and measurements modules
      • constructors in combinators have manual APIs, that do more than what can be done automatically
  • Adds a Analysis class to manage interactive compositors/odometers
    • for compositors, automatically distribute privacy loss over queries, according to a given set of weights:
      • Analysis.sequential_composition(data, unit_of(contributions=1), loss_of(epsilon=1.), weights)
    • queries spawned with a free parameter will calibrate themselves
      • like here: my_analysis.query().dp_sum(bounds, scale=None), where d_out is filled in from my_analysis
      • in this case, the d_in and d_out of the query were filled in from the current state of my_analysis

Example:

analysis = Analysis.sequential_composition(
    data=[1, 2, 3],
    privacy_unit=unit_of(contributions=1),
    privacy_loss=loss_of(epsilon=3.0),
    split_evenly_over=3, # distribute 3ε over 3 queries
    domain=dp.vector_domain(dp.atom_domain(T=int)),
)

dp_sum = analysis.query().clamp((1, 10)).sum().laplace() # where scale is not set

print(dp_sum.release())

Each query requires an explicit release(). This is because:

  • sub-queries spawned by measure casters need eager=False, and having different behavior can be confusing
  • it will become more common to run postprocessors, such as sink_csv or sink_parquet
  • it is easier to access accuracy estimation helpers on the query, and on the discovered param, if eager=False
  • easier to do implicit measure casting

TODO: Consider renaming to Session.

Copy link
Member Author

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@Shoeboxam Shoeboxam changed the title Analysis api Context API Aug 3, 2023
Copy link
Member

@andrewvyrros andrewvyrros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a cool way to leverage our core APIs! Will be interesting to see what users think of this approach.

@Shoeboxam Shoeboxam merged commit 216dd00 into main Aug 4, 2023
5 checks passed
@Shoeboxam Shoeboxam deleted the 625-hlapi branch August 4, 2023 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

High-level API Layer
2 participants