Skip to content

msalvaris/tiny-vit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Transformers: Cats vs. Dogs Classification

A machine learning project utilizing Visual Transformers (ViTs) to classify images from the Cats vs. Dogs dataset.

Example

Table of Contents

Introduction

The Cats vs. Dogs dataset is a standard computer vision dataset that contains images of cats and dogs. In this project, instead of using conventional CNNs, we utilize Visual Transformers (ViTs). The Cats vs. Dogs dataset is only a small collection of 25K annotated images. The purpose of this project is to see if we can use MAE to pretrain the model to achieve a better result than simply trainining from random initialisation.

Output Gif

Installation

Requirements: Python 3.8+

  1. Clone the repository:

    git clone https://github.com/yourusername/cats-vs-dogs-vit.git
    cd cats-vs-dogs-vit
  2. Install the required packages:

    pip install -r requirements.txt

Usage

To prepare the data:

python tinyVIT.py prepare-data

To train the model using MAE:

python tinyVIT.py train-mae

To train the model using supervision:

python tinyVIT.py train

Results

We achieved an accuracy of 83.46% on a randomly sampled validation set of 2500 images using a ViT with random weight initialisation. If we used a pretrained network trained use MAE then the final accuracy on the same validation set was 93.29%. Also of note that we reached this accuracy around 80 epochs where the best performance from random initialisation was reached only after 158 epochs.

Model Accuracy (%)
Visual Transformer 83.46
Visual Transformer + MAE 93.29

Acknowledgements


Ⓒ 2023 Mathew Salvaris. All Rights Reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages