Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time

Concept Hierarchy in Neural Networks for NLP

Below is a list of important concepts in neural networks for NLP. In the annotations/ directory in this repository, we have examples of papers annotated with these concepts that you can peruse.

Annotation Critera: For a particular paper, the concept should be annotated if it is important to understand the proposed method. It should also be annotated if it's important to understand the evaluation. For example, if a proposed self-attention model is compared to a baseline that uses an LSTM, and the difference between these two methods is important to understanding the experimental results, then the LSTM concept should also be annotated. Concepts do not need to be annotated if they are simply mentioned in passing, or in the related work section.

Implication: Some tags are listed with "XXX (implies YYY)" which means you need to understand a particular concept XXX in order to understand concept YYY. If YYY exists in a paper, you do not need to annotate XXX.

Non-neural Papers: This conceptual hierarchy is for tagging papers that are about neural network models for NLP. If a paper is not fundamentally about some application of neural networks to NLP, it should be tagged with not-neural, and no other tags need to be applied.


Optimizers and Optimization Techniques




Loss Functions (other than cross-entropy)

Training Paradigms

Sequence Modeling Architectures

Activation Functions

Pooling Operations

Recurrent Architectures

  • Recurrent Neural Network (RNN): arch-rnn
  • Bi-directional Recurrent Neural Network (Bi-RNN): arch-birnn (implies arch-rnn)
  • Long Short-term Memory (LSTM): arch-lstm (implies arch-rnn)
  • Bi-directional Long Short-term Memory (LSTM): arch-bilstm (implies arch-birnn, arch-lstm)
  • Gated Recurrent Units (GRU): arch-gru (implies arch-rnn)
  • Bi-directional Gated Recurrent Units (GRU): arch-bigru (implies arch-birnn, arch-gru)

Other Sequential/Structured Architectures

  • Bag-of-words, Bag-of-embeddings, Continuous Bag-of-words (BOW): arch-bow
  • Convolutional Neural Networks (CNN): arch-cnn
  • Attention: arch-att
  • Self Attention: arch-selfatt (implies arch-att)
  • Recursive Neural Network (RecNN): arch-recnn
  • Tree-structured Long Short-term Memory (TreeLSTM): arch-treelstm (implies arch-recnn)
  • Graph Neural Network (GNN): arch-gnn
  • Graph Convolutional Neural Network (GCNN): arch-gcnn (implies arch-gnn)

Architectural Techniques

Standard Composite Architectures

  • Transformer: arch-transformer (implies arch-selfatt, arch-residual, arch-layernorm, optim-noam)

Model Combination

Search Algorithms

Prediction Tasks

  • Text Classification (text -> label): task-textclass
  • Text Pair Classification (two texts -> label: task-textpair
  • Sequence Labeling (text -> one label per token): task-seqlab
  • Extractive Summarization (text -> subset of text): task-extractive (implies text-seqlab)
  • Span Labeling (text -> labels on spans): task-spanlab
  • Language Modeling (predict probability of text): task-lm
  • Conditioned Language Modeling (some input -> text): task-condlm (implies task-lm)
  • Sequence-to-sequence Tasks (text -> text, including MT): task-seq2seq (implies task-condlm)
  • Cloze-style Prediction, Masked Language Modeling (right and left context -> word): task-cloze
  • Context Prediction (as in word2vec) (word -> right and left context): task-context
  • Relation Prediction (text -> graph of relations between words, including dependency parsing): task-relation
  • Tree Prediction (text -> tree, including syntactic and some semantic semantic parsing): task-tree
  • Graph Prediction (text -> graph not necessarily between nodes): task-graph
  • Lexicon Induction/Embedding Alignment (text/embeddings -> bi- or multi-lingual lexicon): task-lexicon
  • Word Alignment (parallel text -> alignment between words): task-alignment

Composite Pre-trained Embedding Techniques

  • word2vec: pre-word2vec (implies arch-cbow, task-cloze, task-context)
  • fasttext: pre-fasttext (implies arch-cbow, arch-subword, task-cloze, task-context)
  • GloVe: pre-glove
  • Paragraph Vector (ParaVec): pre-paravec
  • Skip-thought: pre-skipthought (implies arch-lstm, task-seq2seq)
  • ELMo: pre-elmo (implies arch-bilstm, task-lm)
  • BERT: pre-bert (implies arch-transformer, task-cloze, task-textpair)
  • Universal Sentence Encoder (USE): pre-use (implies arch-transformer, task-seq2seq)

Structured Models/Algorithms

Relaxation/Training Methods for Non-differentiable Functions

Adversarial Methods

  • Generative Adversarial Networks (GAN): adv-gan
  • Adversarial Feature Learning: adv-feat
  • Adversarial Examples: adv-examp
  • Adversarial Training: adv-train (implies adv-examp)

Latent Variable Models

Meta Learning