Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context manager to toggle on/off privacy in training loop #634

Open
bnestor opened this issue Feb 26, 2024 · 0 comments
Open

Add context manager to toggle on/off privacy in training loop #634

bnestor opened this issue Feb 26, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@bnestor
Copy link

bnestor commented Feb 26, 2024

🚀 Feature

Context manager to optionally disable privacy for mixtures of public and private data.

Motivation

Similar to how torch.cuda.no_grad(), or torch.autocast(enabled=True) work, it would be nice to have a context manager to disable privacy. The main reason is to concurrently train public and private data, without the public data eating away at the privacy budget.

Pitch

# define your components as usual
model = Net()
optimizer = SGD(model.parameters(), lr=0.05)
data_loader = torch.utils.data.DataLoader(dataset, batch_size=1024)

# enter PrivacyEngine
privacy_engine = PrivacyEngine()
model, optimizer, data_loader = privacy_engine.make_private(
    module=model,
    optimizer=optimizer,
    data_loader=data_loader,
    noise_multiplier=1.1,
    max_grad_norm=1.0,
)

batch = next(iter(dataloader))


# case 1
output = model(batch)
loss = # loss computation
loss.backward() # standard privacy engine with privacy applied

# case 2
with privacy_context(enabled=True):
    output = model(batch)
    loss = # loss computation
    loss.backward() # standard privacy engine with privacy applied

# case 3
with privacy_context(enabled=False):
    output = model(batch)
    loss = # loss computation
    loss.backward() # differential privacy is not applied, and gradient is computed as if privacy engine had not been initialized.

Alternatives

Alternatively, you could have two copies of each model/optimizer/dataloader, and just load the state dict whenever switching from a previous copy to the next. In this case, only one would be initialized through the privacy engine.

Additional context

Based off of current research showing that public pre-training, then private fine-tuning performance increases: https://aclanthology.org/2020.privatenlp-1.5.pdf, https://arxiv.org/pdf/2302.09483.pdf

It would interesting to test if including public data during fine-tuning would improve performance: https://arxiv.org/abs/2111.12292

@HuanyuZhang HuanyuZhang added the enhancement New feature or request label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants