New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using dataloader with fixed batch size #7
Comments
Hi, Thanks for the interest and raising a good point!
Yes, most examples in Opacus use Strictly speaking, there's a mismatch between the batch selection rule used in our code (fixed batch size and non-overlapping batches within one epoch) and the privacy accounting procedure. To actually attain the prescribed privacy guarantee, one should use Poisson sampling under the current privacy accounting procedure. However, Poisson sampling yields batches of non-uniform sizes, some of which could be too large to cause memory issues. Gradient accumulation (e.g., the example here) partially addresses this problem but not entirely -- even with a smaller sampling rate for the micro-batches, there's still a non-zero probability of picking extremely large micro-batches. Past works have therefore used the usual batch selection rule (as for non-private learning) as a proxy for the true Poisson sampled performance, see Appendix D.4 of this paper; they also show that the difference in performance is minor. We follow this convention here.
One should still recall that there's a non-zero probability of selecting huge micro-batches which would cause OOM issues (with the approach given in the Opacus example). The alternative option here would be to manually break each Poisson sampled batch into micro-batches of fixed size. I actually have been working on refining this part of the codebase, and there is an alternative solution. One could still use fixed batches, but each of which would be an independent and uniform sample over all possible batches. Note the usual loop-over-batches-across-dataset approach doesn't satisfy this since two consecutive batches within one epoch can't be independent due to sampling without replacement. The privacy accounting procedure, however, needs to be slightly modified (see this for code, and Theorem 27 in this paper for the theory). Hope this addresses your concerns and helps with whatever you're working on! Chen |
Thank you so much for this extensive reply! The OOM issues you've mentioned are exactly what I've encountered when using the poisson sampler, which is why I had to train with batch size 1 and a huge number of gradient accumulation steps (which is obviously much slower). I wasn't aware of that Appendix D.4 in particular, so this is going to save me a lot of time in future experiments, and it definitely cleared up my confusion around this topic. Phillip |
Thanks for the question, and I'm glad that it helped! I'm closing this issue for now, but feel free to re-open if there are other questions. The refinement I mentioned about using fixed-size batches with the alternative accounting procedure would be checked in in the near future. |
Sorry for having to reopen this, but I do have two more (perhaps related) questions after all and would really appreciate if you could help clarify them.
|
Hi, thanks for providing this codebase!
So for a while I've been using Opacus to experiment with DP-SGD and RoBERTa, but I wanted to check out your
PrivacyEngine
, mainly because of the training speed and memory optimizations. With Opacus, I always trained with theirUniformWithReplacementSampler
for accurate RDP accounting and as far as I can tell, you're training with fixed size batches in your examples. I'm wondering if there's a reason theUniformWithReplacementSampler
isn't needed in your codebase anymore, and if the uniform sampler is compatible with your modifiedPrivacyEngine
because the optimizer needs to be able to deal with variations in batch size?The text was updated successfully, but these errors were encountered: