Bottleneck Transformers for Visual Recognition

BoTNet is a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency.

Architecture

Reference

Paper link

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset.py		dataset.py
main.py		main.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

README.md

README.md

config.py

config.py

dataset.py

dataset.py

main.py

main.py

model.py

model.py

train.py

train.py

Repository files navigation

Bottleneck Transformers for Visual Recognition

Architecture

Reference

About

Releases

Packages

Contributors 2

Languages

scheshmi/BottleneckTransformers

Folders and files

Latest commit

History

Repository files navigation

Bottleneck Transformers for Visual Recognition

Architecture

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages