AI Club DC Mini Project
This project is a step-by-step journey into modern deep learning architectures, with a special focus on understanding and implementing Vision Transformers (ViT) using PyTorch. The tasks below are designed to build foundational knowledge and practical coding skills.
- Objective: Understand and implement a basic CNN.
- What to do:
- Study CNN architecture fundamentals (convolutional layers, pooling, activation functions).
- Implement a simple CNN from scratch in PyTorch (e.g., on MNIST or CIFAR-10).
-
Paper Reading:
- Read and understand the landmark paper: Attention Is All You Need
-
Practical Exploration:
- Go through a blog/tutorial where an encoder-decoder transformer model is implemented from scratch.
- Suggested blog: The Annotated Transformer (or choose your preferred one).
-
Paper Reading:
- Dive into the original Vision Transformer paper: An Image is Worth 16x16 Words
-
Implementation:
- Code the ViT architecture from scratch using only PyTorch (no high-level transformer libraries like Hugging Face).
- Understand and build:
- Patch embeddings
- Positional encodings
- Transformer encoder blocks
- Classification head
ViT-Implementation/
βββ task1_cnn/
β βββ cnn_model.ipynb
βββ task3_vit/
β βββ vit_from_scratch.ipynb
βββ README.md