Skip to content

shikhr/nn_fundamentals

Repository files navigation

Makemore models

Building and training autoregressive language models from scratch, following Andrej Karpathy's Neural Networks: Zero to Hero series.

  1. Bigram
  2. Trigram
  3. Trigram Alternative
  4. MLP
  5. MLP 2
  6. MLP with Modules
  7. MLP
  8. RNN

Word2Vec with Negative Sampling

Word2vec is a technique in natural language processing (NLP) for obtaining vector representations of words. These vectors capture information about the meaning of the word based on the surrounding(context) words.

Word2Vec SkipGram with Negative Sampling

GPT

GPT Tokenizer(BPE)

Tokenization is splitting text into smaller units called tokens that can be fed into the language model.

  • character level (too small)
  • word level (too big)
  • subword level (balanced)
    • BPE (Algorithm which merges on argmax P(A,B), good for whitespaced languages)
    • WordPiece (Algorithm which merges on argmax P(A,B)/[P(A)*P(B)], good for whitespaced languages)
    • SentencePiece (Library containing optimized BPE, WordPiece, Unigram, good for non - whitespaced languages)
    • Unigram (All combinations of substrings, then reduce if least impact to maximising likelihood)

GPT Tokenizer(BPE) Notebook

NanoGPT

Transformer Decoder for autoregressive sequence to sequence modelling.

gpt2

NanoGPT Notebook

Vision

CNN based architectures

  • LeNet
  • AlexNet
  • ResNet

VisionModels Notebook

Vision Transformer

ViT

encoder

ViT Notebook

Image generation

DCGAN

DCGAN, or Deep Convolutional GAN, is a generative adversarial network architecture.

DCGAN Notebook

About

Implementing models all the way from scratch!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published