Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-producing issue #5

Closed
youngwanLEE opened this issue May 17, 2021 · 12 comments
Closed

Re-producing issue #5

youngwanLEE opened this issue May 17, 2021 · 12 comments

Comments

@youngwanLEE
Copy link

Hi,

For checking re-producibility, I tried to train the coat_lite_mini model(reported 79.1/94.5) and got 78.85/94.42 by using this command :

bash scripts/train.sh coat_lite_mini coat_lite_mini

with the default settings such as the batch size of 256 and using 8 GPUs (TITAN RTX).

Is such a small difference (79.1 vs. 78.9) negligible?

My environment :


sys.platform linux
Python 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
numpy 1.19.2
Compiler GCC 7.5
CUDA compiler CUDA 10.1
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE
PyTorch 1.7.0
PyTorch debug build True
GPU available True
GPU 0,1,2,3,4,5,6,7 TITAN RTX (arch=7.5)
CUDA_HOME /usr/local/cuda-10.1
Pillow 8.0.1
torchvision 0.8.0
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.2.post20201218
cv2 Not found


PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.5
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
@xwjabc
Copy link
Contributor

xwjabc commented May 17, 2021

Hi @youngwanLEE, thank you for your experiment for reproducibility checking! I think 0.1~0.2% error is acceptable (actually our reported results are rounded from 79.090%), since in some of our trials we also experience similar errors. Thus, I think your results are reasonable.

@youngwanLEE
Copy link
Author

@xwjabc Thanks for your quick reply :)

@xwjabc
Copy link
Contributor

xwjabc commented May 17, 2021

@xwjabc Thanks for your quick reply :)

@youngwanLEE Besides, sometimes we found that if you use 16 GPUs instead of 8 GPUs (other settings are the same) it can slightly improve the performance (around 0.1% improvement), but we have not validated it too much. You may give it a try :)

@youngwanLEE
Copy link
Author

@xwjabc,
On the other hand, Did you decrease the batch size from 256 to 128?
When I tried to train the coat_mini model with default settings (batch size of 256) on an 8 GPU machine with V100(32GB), the out-of-memory problem occurred.

So I had no choice but to reduce the batch size from 256 to 128.

@xwjabc
Copy link
Contributor

xwjabc commented May 17, 2021

@youngwanLEE Yes, we also reduce the batch size per GPU and use more GPUs for coat_mini model (since we use 24GB GPUs such as TITAN RTX or RTX 3090 for training).

@youngwanLEE
Copy link
Author

@xwjabc Hi,
I have another question.

How to compute the computational cost (a.k.a. FLOPs)?

@xwjabc
Copy link
Contributor

xwjabc commented May 20, 2021

@xwjabc Hi,
I have another question.

How to compute the computational cost (a.k.a. FLOPs)?

Hi @youngwanLEE, for the arXiv paper, we used a modified version of the FLOPs calculation script from mmcv, following PVT (the calculation for attention part is modified accordingly). Will update the script to repo soon!

@youngwanLEE
Copy link
Author

@xwjabc oh, good news !!
Thanks :)

@youngwanLEE
Copy link
Author

youngwanLEE commented May 24, 2021

@xwjabc Hi,

I want to share the re-produced result of Coat-Mini: 81.494 / 95.568 which is higher than your report:).

Cool !!

My environment :

pytorch: 1.7
torchvision: 0.8.1
GPUs : RTX8000 x 8
batch-size-per-gpu : 256

Training time 6 days, 15:51:55

@xwjabc
Copy link
Contributor

xwjabc commented May 24, 2021

@youngwanLEE Cool! Currently we are still exploring ways to improve the efficiency of CoaT models. We hope that we can obtain faster and better models in the end.

@youngwanLEE
Copy link
Author

@xwjabc Hi,
I have another question.
How to compute the computational cost (a.k.a. FLOPs)?

Hi @youngwanLEE, for the arXiv paper, we used a modified version of the FLOPs calculation script from mmcv, following PVT (the calculation for attention part is modified accordingly). Will update the script to repo soon!

Hi, @xwjabc

I want to know when the FLOPs calculation script is open.

Thanks in advance :)

@xwjabc
Copy link
Contributor

xwjabc commented Jun 18, 2021

@youngwanLEE Sorry for the late reply! Previously we were busy preparing for the paper rebuttal. We will release the FLOPs calculation script soon as well as some larger models (CoaT Small (~20M) and CoaT-Lite Medium (~40M)). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants