Track the trend of Representation learning of MultiModal Machine Learning(MMML).
-
[CVPR oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning paper code
-
[ICML] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision paper code
- [NeurIPS] Large-Scale Adversarial Training for Vision-and-Language Representation Learning paper code
-
[NeurIPS] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks paper code
-
[EMNLP] LXMERT: Learning Cross-Modality Encoder Representations from Transformers paper code
-
[arXiv] VisualBERT: A Simple and Performant Baseline for Vision and Language paper code
- [TPAMI] Multimodal machine learning: A survey and taxonomy paper
- website
- Reading list for research topics in multimodal machine learning