julia_class_project

Implementing a Vision Transformer (ViT) in Julia!

Until recently, the best performing models for image classification had been convolutional neural networks (CNNs) introduced in LeCun et al. (1998). Nowadays, transformer architectures have been shown to have similar to better performance. One such model, called Vision Transformer by Dosovitskiy et al. (2020) splits up images into regularly sized patches. The patches are treated as a sequence and attention weights are learned as in a standard transformer model.

The Transformer architecture, introduced in the paper Attention Is All You Need by Vaswani et al. (2017), is the most ubiquitous neural network architecture in modern machine learning. Its parallelism and scalability to large problems has seen it adopted in domains beyong those it was traditionally considered for (sequential data).

NOTE: We adapt/borrow a lot of material/concepts from Torralba, A., Isola, P., & Freeman, W. T. (2021, December 1). Foundations of Computer Vision. MIT Press; The MIT Press, Massachusetts Institute of Technology.

Potentially useful packages:

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
cifar		cifar
figures		figures
resources		resources
.gitignore		.gitignore
README.md		README.md
project.jl		project.jl
project.pdf		project.pdf
reordered.jl		reordered.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

julia_class_project

Implementing a Vision Transformer (ViT) in Julia!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

julia_class_project

Implementing a Vision Transformer (ViT) in Julia!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages