Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agenda Request - (CPA?) Composable privacy-preserving architecture #65

Closed
rmirisola opened this issue Jun 23, 2022 · 5 comments
Closed
Assignees

Comments

@rmirisola
Copy link

Agenda+: (CPA?) Composable privacy-preserving architecture to support more flexible x-platform use cases

We'd like to propose this group consider a scalable & practical approach to decentralized privacy-preserving architecture to support composition of methods across device platforms & data collection channels for various use cases.

Moreover, this approach would be compatible with existing privacy-centric industry initiatives (like WFA CMM r/f), and proposals previously raised in this forum like IPA.

This proposal assumes a few things:

  1. A device platform is ok allowing encrypted data from a device to a (set of) trusted privacy-preserving processing networks, which guarantee that privacy requirements are met before releasing privacy-preserving results for a particular use case.

  2. We want to solve for use cases where we need to combine data from multiple device platforms, and channels. i.e. x-media, x-platform, as that has a lot of value for advertisers. This is a requirement for the WFA CMM r/f system, demanded by advertisers.

  3. Want to do 1+2 while maximizing utility for the use cases. This generally means introducing (or having the ability to introduce) privacy-preserving mechanisms as late as possible in the data processing graph.

We propose that a result is privacy-preserving if it (roughly) conforms to one of the following:

  • It meets a certain privacy-preserving definition TBD (e.g. Differentially Private with certain budget policy). Note that who defines this criteria is out of scope, but we argue the architecture should be flexible enough to support many variations of this set up.

  • Results are encrypted and can only be processed in further trusted privacy-preserving processing networks.

This allows compatibility and composition of secure/private operations which would enable support to more advanced and comprehensive advertising use cases.

This means that there is an equivalence in privacy principles for the data that leave the device to the data that leaves any further processing service, which in our opinion also allows for a more scalable and consistent architecture.

We can talk about the general architectural proposal and a specific example application for the purposes of x-media r/f measurement via the WFA CMM framework (including an overview of said framework).

Links

Slides and supporting material is forthcoming.

@rmirisola rmirisola added the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Jun 23, 2022
@AramZS
Copy link
Contributor

AramZS commented Jul 19, 2022

@rmirisola This sounds like a very interesting topic. How long were you thinking you needed and are you prepared to present at the upcoming meeting time?

@sashidhar-jakkamsetti
Copy link

sashidhar-jakkamsetti commented Jul 20, 2022

@rmirisola Thanks for proposing this architecture. This is interesting and certainly looks like a scalable and long-term approach for handling complex advertising scenarios. I’ve been working with @eriktaubeneck on a related problem for IPA, and wanted to share some ideas that might be helpful.

With respect to the first assumption, we anticipate that it will be critical for user-agent vendors to be able to specify the set of purposes (i.e., use cases) for a specific piece of encrypted data.

We’ve been exploring an architecture to make all the data (either coming from a user device or a private computation network) purpose-constrained. In particular, the data must always remain encrypted and attached with some annotations that give context on which use cases are allowed to decrypt and use it. At the origin of this data, i.e., at the user device, the browser or mobile OS encrypts the user-generated data along with the annotations using AEAD encryption, keeping annotations public. (This could also happen at the server side.) Later, when this data is input into a private computation network for a given use case, the network first verifies whether the annotations allow using the data for that use case. If it allows, the network decrypts and computes. Otherwise, it drops them. After computation, the network will encrypt the resulting data (which can either be the final result or some intermediate result) and further annotate it with use cases that can reuse it. For a more clearer description, consider the following interface:

def private_computation(raw_input_rows, current_usecase, key_current_usecase, next_usecase, key_next_usecase):
	input_rows = []
	for row in raw_input_rows:
		if current_usecase in row.annotation.allowed_usecases:
			input_rows.append(row)

	decrypted_rows = decryption(input_rows, key_current_usecase)

        // privately compute the use case
	// in addition, add DP noise at the end, if it is the final result
        output_rows = compute_usecase(input_rows, current_usecase)

	annotated_output_rows = annotate(output_rows, next_usecase)
        encrypted_output_rows = encryption(annotated_output_rows, key_next_usecase)
        // we could also generate a new key and encrypt with that
	// in that case, new key has to be communicated to the next network

        return encrypted_output_rows

We consider two scenarios in such an architecture, where (1) we can trust the private computation nodes to adhere to the annotations, or (2) pick specific keys for each use case and strongly tie the data to be decryptable only by further allowed use cases (shown in the above pseudo code). The former is simpler, and is suitable in MPC type private computation, where we have distributed trust across t-out-of-n MPC nodes. While the latter is useful in TEE type computation, where we can attest and then delegate the keys to the specific TEE instance. However in the latter case, we need to assume a (set of) coordinator(s) who store and release the keys to verified TEE instances.

By this approach, we can keep the data, either input or output, to be secure and consistent across the networks irrespective of what use case they solve. Thus preserving end-to-end privacy goals.

@rmirisola
Copy link
Author

@sashidhar-jakkamsetti very interesting proposal! sounds like a good additional feature to make it more robust. Would love to understand a bit more how to generalize the use-case allowance check, in particular for various kind of MPC protocols.

@AramZS I think I'll need at least 1 hour, and yes I can be prepared for the meeting. If we want to go deeper on the WFA stack I'll probably need more time.

@AramZS
Copy link
Contributor

AramZS commented Jul 25, 2022

@rmirisola Ok, I see from the emoji reactions there's general support for this topic and I'll assign 1.5 hours to give you time for questions.

@AramZS
Copy link
Contributor

AramZS commented Aug 9, 2022

@rmirisola Please send us the slides or add as a PR. Thanks!

@AramZS AramZS removed the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Aug 29, 2022
@AramZS AramZS closed this as completed Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants