- From Recognition to Cognition: Visual Commonsense Reasoning
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations
- UNITER: UNiversal Image-Text Representation Learning
- Heterogeneous Graph Learning for Visual Commonsense Reasoning
- TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
- Visual Commonsense for Scene Understanding Using Perception, Semantic Parsing and Reasoning
- A Simple Baseline for Visual Commonsense Reasoning
- Connective Cognition Network for Directional Visual Commonsense Reasoning
- TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
- Project on Visual Commonsense Reasoning