Skip to content

qsimeon/julia_class_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

julia_class_project

Implementing a Vision Transformer (ViT) in Julia!

Until recently, the best performing models for image classification had been convolutional neural networks (CNNs) introduced in LeCun et al. (1998). Nowadays, transformer architectures have been shown to have similar to better performance. One such model, called Vision Transformer by Dosovitskiy et al. (2020) splits up images into regularly sized patches. The patches are treated as a sequence and attention weights are learned as in a standard transformer model.

ViT Model


The Transformer architecture, introduced in the paper Attention Is All You Need by Vaswani et al. (2017), is the most ubiquitous neural network architecture in modern machine learning. Its parallelism and scalability to large problems has seen it adopted in domains beyong those it was traditionally considered for (sequential data).

Transformer Model

NOTE: We adapt/borrow a lot of material/concepts from Torralba, A., Isola, P., & Freeman, W. T. (2021, December 1). Foundations of Computer Vision. MIT Press; The MIT Press, Massachusetts Institute of Technology.


Potentially useful packages:

About

Implementing a ViT in Julia

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors