Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing batch size #82

Open
knightron0 opened this issue Feb 6, 2024 · 1 comment
Open

Increasing batch size #82

knightron0 opened this issue Feb 6, 2024 · 1 comment

Comments

@knightron0
Copy link

I'm trying to run pretraining with Resnet50 with my data, and running into out-of-memory issues with this.

Initially, I was using two V100s (32 GB) and the maximum batch size I could go to was 256. However, I can't go higher with even larger memory GPUs — I tried using an A100 both 40GB and 80GB, and the maximum batch size I could use without running into out-of-memory issues was still 256.

I'm a bit confused and was wondering if there's a knowledge gap in my understanding; let me know if I'm missing anything!

@keyu-tian
Copy link
Owner

hi @knightron0, if a batch size of 256 maxes out a 32GB V100, then a 40GB A100 should be similar.
FYI: we use 32 x 80GB A100 in ResNet50 pretraining, with single batch size 128, and that was ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants