Skip to content

Conversation

YosuaMichael
Copy link
Contributor

@YosuaMichael YosuaMichael commented Apr 14, 2022

resolve #4730

NOTE:

  • For detection, optical_flow we don't add warning if number of processed samples is different from the len(dataset) because it seems that it is already handled (let me know if this is actually wrong!)
  • For similarity, seems like the script does not support distributed. Hence we don't add warning if the number of processed samples is different from len(dataset) as well (following arguments in Evaluation code of references is slightly off #4559 (comment) the warning only needed in distributed setting)
  • We have done test on classification, detection, optical-flow, segmentation, similarity, and video classification. The resulting variance seems small enough (<0.02% differences). For each type we did eval twice

@YosuaMichael YosuaMichael changed the title [WIP] Reduce variance of evaluation in reference Reduce variance of evaluation in reference Apr 21, 2022
@YosuaMichael YosuaMichael marked this pull request as ready for review April 21, 2022 09:48
@YosuaMichael
Copy link
Contributor Author

YosuaMichael commented Apr 21, 2022

[RESOLVED]
Note: the video_classification trigger a warning when trained with 4 gpu and batch size=16

UserWarning: It looks like the dataset has 2167123 samples, but 94932 samples were used for the validation, which might bias the res
ults. Try adjusting the batch size and / or the world size. Setting the world size to 1 is always a safe bet.

I have tried the eval 3 times with 3 different settings on number of gpu and batch size:

  • 4 GPU, batch_size=16, result: Clip Acc@1 53.500 Clip Acc@5 75.991
  • 1 GPU, batch_size=16, result: Clip Acc@1 53.502 Clip Acc@5 75.991
  • 1 GPU, batch_size=1, result: Clip Acc@1 53.513 Clip Acc@5 75.995

From this, seems like the variance is relatively small. We notice that the warning when we use 1 GPU (no matter what batch size), is a bit different:

UserWarning: It looks like the dataset has 2167123 samples, but 94930 samples were used for the validation, which might bias the results. Try adjusting the batch size and / or the world size. Setting the world size to 1 is always a safe bet.

take note that previously 94932 become 94930. However both of them significantly different from 2167123 that we got from len(dataloader.dataset). I think this imply that len(dataloader.dataset) might be wrong. Will investigate more on this.

Update:
Seems like the cause is because the dataset have a variable number of clips per video, and len(dataloader.dataset) will return the total number of clips in the whole dataset while the UniformClipSampler that we use for testing will only take a fixed amount of clip per video (default is 5). Hence, our num_processed_samples will more like number of video * number of clips per video instead of the total number of clips. To fix this, I will take the len of the UniformClipSampler instead of the dataset.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @YosuaMichael ! Nice work. I only have a minor comment regarding a potential simplification, but other than that, LGTM. I'll approve now, feel free to merge when addressed.

raise ValueError("The device must be cuda if we want to run in distributed mode using torchrun")
device = torch.device(args.device)

if args.use_deterministic_algorithms:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid duplicated code with args.test_only, I would suggest to do something like:

Suggested change
if args.use_deterministic_algorithms:
if args.use_deterministic_algorithms or args.test_only:

and to remove these 2 lines

torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

I would also suggest to do that for every references as they seem to follow the same pattern

Copy link
Contributor Author

@YosuaMichael YosuaMichael May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NicolasHug , I have just tried what you suggested and got error when I try to use test-only:

  File "/fsx/users/yosuamichael/conda/envs/vision-c113/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or `at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility

Following the error note, when I set the env var CUBLAS_WORKSPACE_CONFIG=:4096:8 it works, but I think this will create a friction to users.

I think the main issue is that we use torch.use_deterministic_algorithms(True) when we set args.use_deterministic_algorithms and this is much more stricter than torch.backends.cudnn.deterministic = True when we use args.test_only. (see here )

Hence as of now we can only avoid duplicate of the line torch.backends.cudnn.benchmark = False which I think still okay to have 1 line duplicate for now. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair point. I forgot that one is stricter than the other. I think the way you did it is fine then. Sorry for the noise!

@YosuaMichael YosuaMichael merged commit e556640 into pytorch:main May 3, 2022
facebook-github-bot pushed a commit that referenced this pull request May 7, 2022
Summary:
* Change code to reduce variance in eval

* Remove unnecessary new line

* Fix missing import warnings

* Fix the warning on video_classification

* Fix bug to get len of UniformClipSampler

Reviewed By: YosuaMichael

Differential Revision: D36204389

fbshipit-source-id: dfa0dbad60cf2a236f61e3c5d4a459731f07557b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce variance of model evaluation in references
3 participants