The greatest aspect of Artificial Intelligence today is its open source nature. Very rarely in history have key technologies, especially those as powerful as these models developed today, have been made completely available to the public. Without the incredible online resources made by all of the contributing scientists and engineers, I never would have learned as much as I know today. This repository acts as a documentation of my own exploration as well as an attempt to teach everything I learn to others. I will try my best to cite everyone and everything I reference that helped me learn everything as well!
The main limitation for researchers in this field today is we normally don't have buckets of GPU's just sitting around to train with! Every example I do will be a proof of concept, but to the best of my ability I will attempt to reproduce any model that is feasible!
I am typically more wrong than I am right! If you find any errors in my work, that means there is an error in my knowledge. Please let me know as I want this to be as accurate as possible, but also, I want to learn as much as I can! If you want to contribute anything yourself, just submit a PR and I will review it!
We will be using a couple of datasets in our Deep Learning Adventures!!
Ensure you have a /data folder in your root directory of the git repo and run the following to install all datasets
bash download_data.sh
There are a few other datasets that we will use but are inconsistent to automatically download and are used in the more advanced architectures! Just download them from the link and save them in the /data folder! These datasets may also be too large to train in Google Drive so keep that in mind!
- Going Deeper with ResNet
- UNet for Image Segmentation
- Moving from Convolutions: Vision Transformer
- Masked Image Modeling with Masked Autoencoders
- Self-Supervised Learning with DINO
- Hierarchical Vision Transformers with Swin Transformer
- Causal Language Modeling: GPT
- Masked Language Modeling: RoBERTa
- MLP to Mixture of Experts
- Intro to Audio Processing in PyTorch
- Connectionist Temporal Classification Loss
- Intro to Automatic Speech Recognition
- Quantized Audio Pre-Training: Wav2Vec2
- RNN Transducer as an Alternative to CTC
-
- PixelCNN
- WaveNet
-
- Intro to Generative Adversarial Networks
- SuperResolution with SRGAN
- Image2Image Translation with CycleGAN
-
- Intro to Diffusion
- Text-Conditional Diffusion with Classifier Free Guidance
- Latent-Space Diffusion
- Building Vision/Language Representations: CLIP
- Automatic Image Captioning
- Visual Question Answering
- Attention is All You Need
- Sparse Windowed Attention
- Linear Attention
- Seq2Seq for Language Translation
- CNN/RNN for Image Captioning
- Attention is All You Need for Language Translation
- Q-Learning
- Deep-Q Learning