| Section | Sub-Section | Notebook | Paper Link |
|---|---|---|---|
| String Matching | Regular Expressions | Regex | |
| Language Tokenization | Sub-Word Tokenization | WordPiece Tokenization | Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation |
| Word Tokenization | Word Tokenization | ||
| Sentence Tokenization | |||
| Language Normalization | Lemmatization/Stemming | Lemmatization/Stemming |
| Section | Sub-Section | Notebook | Paper Link |
|---|---|---|---|
| Sparse Representation | Bag-of-Words | BOW | |
| TF-IDF | TF-IDF | ||
| Word-Level Dense Representation | Word2Vec | Word2Vec | Efficient Estimation of Word Representations |
| GloVe | GloVe | GloVe: Global Vectors for Word Representation | |
| FastText | FastText | Enriching Word Vectors with Subword Information | |
| Sentence-Level Dense Representation | Doc2Vec |
| Section | Sub-Section | Notebook | Paper Link |
|---|---|---|---|
| N-Gram Models | Tri-Gram LM | ||
| Neural Architectures | RNNs/LSTMs/GRUs | RNNs | |
| Seq2Seq | Seq2Seq (With Attention) | Sequence to Sequence Learning with Neural Networks | |
| Transformers | Transformer | Attention Is All You Need | |
| Pre-trained Models | BERT | BERT | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
| Section | Sub-Section | Notebook | Model Architecture |
|---|---|---|---|
| Classification | Language Identification | Language Identification | Embedding and Dense Layers |
| Sentiment Analysis | IMDb NB | BoW Vectorization with Naive Bayes | |
| Sequence Labelling | Parts-of-Speech Tagging | POS Tagging | Stacked Bidirectional LSTM |
| Named-Entity Recognition | NER Tagging | BERT Transformer | |
| Machine Translation | English-Italian Translation | EN-IT MT | Seq2Seq LSTM |
| Question Answering | Extractive QA | ||
| Abstractive QA | Abstractive QA on CoQA | Encoder-Decoder Transformer | |
| Summarization | Extractive Summarization | ||
| Abstractive Summarization | |||
| Hybrid Summarization | |||
| Conditional Generation | Clickbait Content Spoiling | Clickbait Content Spoiling | FLAN-T5 |
| Image Classification | Music Genre Classifier | ViT | |
| Section | Notebook |
|---|---|
| Basic Pipeline | 0. 🤗Hugging Face Pipeline |
| Fine-Tuning | 1. Fine-tuning a Pretrained 🤗Hugging Face Model |
| Token Classification | 2. Fine-tuning BERT on NER Tagging |
| Question Answering | |
| Translation | |
| Summarization |