ViT-Implementation

AI Club DC Mini Project

This project is a step-by-step journey into modern deep learning architectures, with a special focus on understanding and implementing Vision Transformers (ViT) using PyTorch. The tasks below are designed to build foundational knowledge and practical coding skills.

🚀 Tasks Overview

✅ Task 1: Convolutional Neural Networks (CNNs)

Objective: Understand and implement a basic CNN.
What to do:
- Study CNN architecture fundamentals (convolutional layers, pooling, activation functions).
- Implement a simple CNN from scratch in PyTorch (e.g., on MNIST or CIFAR-10).

✅ Task 2: Attention Mechanism & Transformer Encoder-Decoder

Paper Reading:
- Read and understand the landmark paper: Attention Is All You Need
Practical Exploration:
- Go through a blog/tutorial where an encoder-decoder transformer model is implemented from scratch.
- Suggested blog: The Annotated Transformer (or choose your preferred one).

✅ Task 3: Vision Transformer (ViT)

Paper Reading:
- Dive into the original Vision Transformer paper: An Image is Worth 16x16 Words
Implementation:
- Code the ViT architecture from scratch using only PyTorch (no high-level transformer libraries like Hugging Face).
- Understand and build:
  - Patch embeddings
  - Positional encodings
  - Transformer encoder blocks
  - Classification head

📁 Repository Structure (Implemented)

ViT-Implementation/
├── task1_cnn/
│   └── cnn_model.ipynb
├── task3_vit/
│   └── vit_from_scratch.ipynb
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViT-Implementation

🚀 Tasks Overview

✅ Task 1: Convolutional Neural Networks (CNNs)

✅ Task 2: Attention Mechanism & Transformer Encoder-Decoder

✅ Task 3: Vision Transformer (ViT)

📁 Repository Structure (Implemented)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
task1_cnn		task1_cnn
task3_vit		task3_vit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ViT-Implementation

🚀 Tasks Overview

✅ Task 1: Convolutional Neural Networks (CNNs)

✅ Task 2: Attention Mechanism & Transformer Encoder-Decoder

✅ Task 3: Vision Transformer (ViT)

📁 Repository Structure (Implemented)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages