A beginner's attempt to understand and implement the Vision Transformer paper from scratch.
As of now the implemented ViT classifies Roccoco and Expressionalism art styles. The model has not been trained yet to its fullest capability due to limited computational resources avaialble at the moment.
Further plan of action is as follows:
- to use a pre-trained ViT model from pytorch models and observe how well it performs.
- run the same classification task on pretrained CNN models and compare results.
kaggle notebook: https://www.kaggle.com/code/vrindakohli/art-vit
dataset: https://www.kaggle.com/datasets/sivarazadi/wikiart-art-movementsstyles