-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning while training model with DDP #177
Comments
I think this error was occurring because I was not putting the tensors on gpu in the image and label pipeline, instead I was putting them on gpu in the train and val loop.
|
Resolved: Pytorch dataset class should be given array as an input and I was giving a list for labels. |
Hi @AmmaraRazzaq |
Even after successfully moving the tensors to GPU, the Warning still persists,
|
Finally figured it out. This warning occurs because I have found that contiguous memory format is much faster. channels_last memory format is making the training slower than with pytorch data loader. |
@AmmaraRazzaq What GPU are you using. Newer GPUs should be at least 10% faster with channel_last |
Hi @GuillaumeLeclerc I am using Tesla V100-SXM2-32GB |
I have a V100 handy, do you mind sharing a sample of your code that is faster with |
Hi @GuillaumeLeclerc Thankyou for offering to help. |
Sorry for the delay, can you give me exactly the parameters you are using (and which dataset). Thank you! |
Hi @GuillaumeLeclerc I can't share much detail with you as this is a research project which is still in development phase and has not been made opensource yet. |
There are many very important factors including:
|
Hi @GuillaumeLeclerc Apologies for late reply. I am sharing the dataset files and sample code. I am working with CheXpert dataset and beton file size is 165GB for all the images so I have created a beton file with 1000 images (~1.5GB). Images are resized to 512x512 and normalized in the range [-1,1] and are written to beton file in 'raw' format. It's a multilabel classification problem with 5 labels for each image. Dataset files: https://github.com/AmmaraRazzaq/image_classification/tree/master/betonfiles I am using resnet101 architecture with lr=2e-3, bs=24, gpus=4 (ddp training), SGD optimizers with weight_decay=0, momentum=0.9 and num_workers=6 in the dataloader. |
Hi
I am getting the following warning when training the model with ffcv dataloader + ddp.
The same code works fine with pytorch dataloader + ddp
The text was updated successfully, but these errors were encountered: