- Efficient Estimation of Word Representations in Vector Space
- Distributed Representations of Words and Phrases and their Compositionality
- word2vec Explained: Deriving Mikolov et al.'s Negative-Sampling Word-Embedding Method
- Not All Neural Embeddings are Born Equal
- word2vec Parameter Learning Explained
- Retrofitting Word Vectors to Semantic Lexicons
- Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
- Bag of Tricks for Efficient Text Classification
- Enriching Word Vectors with Subword Information
- Semi-supervised sequence tagging with bidirectional language models
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Learning Semantic Sentence Embeddings using Pair-wise Discriminator
- Contextual String Embeddings for Sequence Labeling
- Linguistic Regularities in Sparse and Explicit Word Representations
- Evaluation methods for unsupervised word embeddings
- Dependency-Based Word Embeddings
- GloVe: Global Vectors forWord Representation
- Two/Too Simple Adaptations of Word2Vec for Syntax Problems
- Improving Word Representations via Global Context and Multiple Word Prototypes
- Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
- Improving Distributional Similarity with Lessons Learned from Word Embeddings
- Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance
- Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure
- Evaluating Word Embeddings Using a Representative Suite of Practical Tasks