New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why the speed slower than pvtv2-b1? #18
Comments
Hi @jinfagang , PoolFormer here is just a tool to demonstrate MetaFormer. The implementation may not be efficient for industrial use. For example, |
@yuweihao Hi, does that means I should train whole model again if change |
@jinfagang You don;t have to. In our experiment, replacing GN with BN, and then reimplement the Poollayer with a fixed, predefined DepthWise conv, gives us about 30% speed up, and the accuracy drop 1% on ImageNet. |
@chuong98 can you show your pretrained fixed DepthWise conv? how to reset it's weights? |
Hi, @jinfagang , @chuong98 , I just found it seems that CUDA much prefers NHWC rather than NCHW [1]. However, NCHW is used by default in PyTorch and PoolFormer also uses this layout. This may also be optimized to further speed it up [2]. [1] https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html#tensor-layout |
Recently I trained a transformer based instance seg model, tested with different backbone, here is the result and speed test:
batchsize is training batchsize. Why the speed of poolformer is the slowest one? is that normal?
Slower than pvtv2-b1 and precision less than it...
The text was updated successfully, but these errors were encountered: