Vision Transformer (ViT) ref. lucidrains/vit-pytorch ref. huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py A. Vaswani+, Attention Is All You Need., NeurIPS 2017 A. Dosovitskiy+, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale., ICLR 2021