- ECCV-2022
- CVPR-2022
- AAAI-2022
- IJCAI-2022
- NeurIPS-2021
- ACMMM-2021
- ICCV-2021
- ACL-2021
- CVPR-2021
- AAAI-2021
- ACMMM-2020
- NeurIPS-2020
- ECCV-2020
- CVPR-2020
- ACL-2020
- AAAI-2020
- ACL-2019
- NeurIPS-2019
- ACMMM-2019
- ICCV-2019
- CVPR-2019
- AAAI-2019
- ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-Verified Image-Caption Associations for MS-COCO
- Object-Centric Unsupervised Image Captioning
- D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
- StyleBabel: Artistic Style Tagging and Captioning
- MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
- GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
- Explicit Image Caption Editing
- GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features
- Unifying Event Detection and Captioning as Sequence Generation via Pre-training
Image Captioing
- DeeCap: Dynamic Early Exiting for Efficient Image Captioning
- Injecting Visual Concepts into End-to-End Image Captioning
- DIFNet: Boosting Visual Information Flow for Image Captioning
- Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
- Quantifying Societal Bias Amplification in Image Captioning
- Show, Deconfound and Tell: Image Captioning with Causal Inference
- Scaling Up Vision-Language Pretraining for Image Captioning
- VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
- Comprehending and Ordering Semantics for Image Captioning
- Alleviating Emotional bias in Affective Image Captioning by Contrastive Data Collection
- NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge
- NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
Video Captioing
- End-to-end Generative Pretraining for Multimodal Video Captioning
- SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
- Hierarchical Modular Network for Video Captioning
Image Captioing
- Image Difference Captioning with Pre-Training and Contrastive Learning
- Attention-Aligned Transformer for Image Captioning
- Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
- MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning
- UNISON: Unpaired Cross-Lingual Image Captioning
- End-to-End Transformer Based Model for Image Captioning
Image Captioing
- ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning
- S2 Transformer for Image Captioning
Video Captioning
- GL-RG: Global-Local Representation Granularity for Video Captioning
Video Captioning
- Multi-modal Dependency Tree for Video Captioning [paper]
Image Captioning
- Distributed Attention for Grounded Image Captioning
- Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
- Semi-Autoregressive Image Captioning
- Question-controlled Text-aware Image Captioning
- Triangle-Reward Reinforcement Learning: A Visual-Linguistic Semantic Alignment for Image Captioning
- Group-based Distinctive Image Captioning with Memory Attention
- Direction Relation Transformer for Image Captioning
- Scene Graph with 3D Information for Change Captioning
- Similar Scenes Arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning
Video Captioning
- State-aware Video Procedural Captioning
- Discriminative Latent Semantic Graph for Video Captioning
- Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
- Multi-Perspective Video Captioning.
- Hybrid Reasoning Network for Video-based Commonsense Captioning
Image Captioning
- Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning [paper]
- Viewpoint-Agnostic Change Captioning With Cycle Consistency [paper]
- Understanding and Evaluating Racial Biases in Image Captioning [paper]
- Auto-Parsing Network for Image Captioning and Visual Question Answering [paper]
- In Defense of Scene Graphs for Image Captioning [paper]
- Describing and Localizing Multiple Changes With Transformers [paper]
- Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation [paper]
Video Captioning
- Motion Guided Region Message Passing for Video Captioning [paper]
- End-to-End Dense Video Captioning With Parallel Decoding [paper]
Image Captioning
- Control Image Captioning Spatially and Temporally [paper]
- SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis [paper] [code]
- Enhancing Descriptive Image Captioning with Natural Language Inference [paper]
- UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning [paper]
- Semantic Relation-aware Difference Representation Learning for Change Captioning [paper]
Video Captioning
- Hierarchical Context-aware Network for Dense Video Event Captioning [paper]
- Video Paragraph Captioning as a Text Summarization Task [paper]
- O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning [paper]
Image Captioning
- Connecting What to Say With Where to Look by Modeling Human Attention Traces. [paper] [code]
- Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles. [paper]
- Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship. [paper]
- Image Change Captioning by Learning From an Auxiliary Task. [paper]
- Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. [paper] [code]
- Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning. paper
- TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption. [paper]
- Towards Accurate Text-Based Image Captioning With Content Diversity Exploration. [paper]
- FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation. [paper]
- RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words. [paper]
- Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles. [paper]
Video Captioning
- Open-Book Video Captioning With Retrieve-Copy-Generate Network. [paper]
- Towards Diverse Paragraph Captioning for Untrimmed Videos. [paper]
Image Captioning
- Partially Non-Autoregressive Image Captioning. [code]
- Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network. [paper]
- Object Relation Attention for Image Paragraph Captioning [paper]
- Dual-Level Collaborative Transformer for Image Captioning. [paper] [code]
- Memory-Augmented Image Captioning [paper]
- Image Captioning with Context-Aware Auxiliary Guidance. [paper]
- Consensus Graph Representation Learning for Better Grounded Image Captioning. [paper]
- FixMyPose: Pose Correctional Captioning and Retrieval. [paper] [code] [website]
- VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning [paper]
Video Captioning
- Non-Autoregressive Coarse-to-Fine Video Captioning. [paper]
- Semantic Grouping Network for Video Captioning. [paper] [code]
- Augmented Partial Mutual Learning with Frame Masking for Video Captioning. [paper]
Image Captioning
- Structural Semantic Adversarial Active Learning for Image Captioning.
oral
[paper] - Iterative Back Modification for Faster Image Captioning. [paper]
- Bridging the Gap between Vision and Language Domains for Improved Image Captioning. [paper]
- Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning. [paper]
- Improving Intra- and Inter-Modality Visual Relation for Image Captioning. [paper]
- ICECAP: Information Concentrated Entity-aware Image Captioning. [paper]
- Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal. [paper]
- Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning. [paper]
Video Captioning
- Controllable Video Captioning with an Exemplar Sentence.
oral
[paper] - Poet: Product-oriented Video Captioner for E-commerce.
oral
[paper] - Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning. [paper]
- Relational Graph Learning for Grounded Video Description Generation. [paper]
- Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning. [paper]
- RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning. [paper]
- Diverse Image Captioning with Context-Object Split Latent Spaces. [paper]
Image Captioning
- Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets.
oral
[paper] - In-Home Daily-Life Captioning Using Radio Signals.
oral
[paper] [website] - TextCaps: a Dataset for Image Captioning with Reading Comprehension.
oral
[paper] [website] [code] - SODA: Story Oriented Dense Video Captioning Evaluation Framework. [paper]
- Towards Unique and Informative Captioning of Images. [paper]
- Learning Visual Representations with Caption Annotations. [paper] [website]
- Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. [paper]
- Length Controllable Image Captioning. [paper] [code]
- Comprehensive Image Captioning via Scene Graph Decomposition. [paper] [website]
- Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning. [paper]
- Captioning Images Taken by People Who Are Blind. [paper]
- Learning to Generate Grounded Visual Captions without Localization Supervision. [paper] [code]
Video Captioning
- Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos.
Spotlight
[paper] [code] - Character Grounding and Re-Identification in Story of Videos and Text Descriptions.
Spotlight
[paper] [code] - Identity-Aware Multi-Sentence Video Description. [paper]
Image Captioning
- Context-Aware Group Captioning via Self-Attention and Contrastive Features [paper]
Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan L. Yuille - More Grounded Image Captioning by Distilling Image-Text Matching Model [paper] [code]
Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang - Show, Edit and Tell: A Framework for Editing Image Captions [paper] [code]
Fawaz Sammani, Luke Melas-Kyriazi - Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs [paper] [code]
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu - Normalized and Geometry-Aware Self-Attention Network for Image Captioning [paper]
Longteng Guo, Jing Liu, Xinxin Zhu, Peng Yao, Shichen Lu, Hanqing Lu - Meshed-Memory Transformer for Image Captioning [paper] [code]
Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara - X-Linear Attention Networks for Image Captioning [paper] [code]
Yingwei Pan, Ting Yao, Yehao Li, Tao Mei - Transform and Tell: Entity-Aware News Image Captioning [paper] [code] [website]
Alasdair Tran, Alexander Mathews, Lexing Xie
Video Captioning
-
Object Relational Graph With Teacher-Recommended Learning for Video Captioning [paper]
Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha -
Spatio-Temporal Graph for Video Captioning With Knowledge Distillation [paper] [code]
Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles -
Better Captioning With Sequence-Level Exploration [paper]
Jia Chen, Qin Jin -
Syntax-Aware Action Targeting for Video Captioning [code]
Qi Zheng, Chaoyue Wang, Dacheng Tao
Image Captioning
- Clue: Cross-modal Coherence Modeling for Caption Generation [paper]
Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut and Matthew Stone - Improving Image Captioning Evaluation by Considering Inter References Variance [paper]
Yanzhi Yi, Hangyu Deng and Jinglu Hu - Improving Image Captioning with Better Use of Caption [paper] [code]
Zhan Shi, Xu Zhou, Xipeng Qiu and Xiaodan Zhu
Video Captioning
- MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning [paper] [code]
Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara Berg and Mohit Bansal
Image Captioning
-
Unified VLP: Unified Vision-Language Pre-Training for Image Captioning and VQA [paper]
Luowei Zhou (University of Michigan); Hamid Palangi (Microsoft Research); Lei Zhang (Microsoft); Houdong Hu (Microsoft AI and Research); Jason Corso (University of Michigan); Jianfeng Gao (Microsoft Research) -
OffPG: Reinforcing an Image Caption Generator using Off-line Human Feedback [paper]
Paul Hongsuck Seo (POSTECH); Piyush Sharma (Google Research); Tomer Levinboim (Google); Bohyung Han(Seoul National University); Radu Soricut (Google) -
MemCap: Memorizing Style Knowledge for Image Captioning [paper]
Wentian Zhao (Beijing Institute of Technology); Xinxiao Wu (Beijing Institute of Technology); Xiaoxun Zhang(Alibaba Group) -
C-R Reasoning: Joint Commonsense and Relation Reasoning for Image and Video Captioning [paper]
Jingyi Hou (Beijing Institute of Technology); Xinxiao Wu (Beijing Institute of Technology); Xiaoxun Zhang (AlibabaGroup); Yayun Qi (Beijing Institute of Technology); Yunde Jia (Beijing Institute of Technology); Jiebo Luo (University of Rochester) -
MHTN: Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption [paper]
Wei Zhang (East China Normal University); Yue Ying (East China Normal University); Pan Lu (The University of California, Los Angeles); Hongyuan Zha (GEORGIA TECH) -
Show, Recall, and Tell: Image Captioning with Recall Mechanism [paper]
Li WANG (MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China); Zechen BAI(Institute of Software, Chinese Academy of Science, China); Yonghua Zhang (Bytedance); Hongtao Lu (Shanghai Jiao Tong University) -
Interactive Dual Generative Adversarial Networks for Image Captioning
Junhao Liu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Kai Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Chunpu Xu (Huazhong University of Science and Technology); Zhou Zhao (Zhejiang University); Ruifeng Xu (Harbin Institute of Technology (Shenzhen)); Ying Shen (Peking University Shenzhen Graduate School); Min Yang ( Chinese Academy of Sciences) -
FDM-net: Feature Deformation Meta-Networks in Image Captioning of Novel Objects [paper]
Tingjia Cao (Fudan University); Ke Han (Fudan University); Xiaomei Wang (Fudan University); Lin Ma (Tencent AI Lab); Yanwei Fu (Fudan University); Yu-Gang Jiang (Fudan University); Xiangyang Xue (Fudan University)
Video Captioning
- An Efficient Framework for Dense Video Captioning
Maitreya Suin (Indian Institute of Technology Madras)*; Rajagopalan Ambasamudram (Indian Institute of Technology Madras)
Image Captioning
- Aligning Linguistic Words and Visual Semantic Units for Image Captioning
- Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
- MUCH: Mutual Coupling Enhancement of Scene Recognition and Dense Captioning
- Generating Captions for Images of Ancient Artworks
Video Captioning
- Hierarchical Global-Local Temporal Modeling for Video Captioning [peper]
- Attention-based Densely Connected LSTM for Video Captioning [paper]
- Critic-based Attention Network for Event-based Video Captioning [paper]
- Watch It Twice: Video Captioning with a Refocused Video Encoder [paper]
- Informative Image Captioning with External Sources of Information [paper]
Sanqiang Zhao, Piyush Sharma, Tomer Levinboim and Radu Soricut - Dense Procedure Captioning in Narrated Instructional Videos [paper]
Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu and Ming Zhou - Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang - Generating Question Relevant Captions to Aid Visual Question Answering [paper]
Jialin Wu, Zeyuan Hu and Raymond Mooney - Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang
Image Captioning
- AAT: Adaptively Aligned Image Captioning via Adaptive Attention Time [paper] [code]
Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen - ObjRel Transf: Image Captioning: Transforming Objects into Words [paper] [code]
Simao Herdade, Armin Kappeler, Kofi Boakye, Joao Soares - VSSI-cap: Variational Structured Semantic Inference for Diverse Image Captioning [paper]
Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang
Video Captioning
-
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research [paper] [challenge]
Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang
ICCV 2019 Oral
-
POS+CG: Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network [paper]
Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu -
POS: Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning [paper]
Jingyi Hou, Xinxiao Wu, Wentian Zhao, Jiebo Luo, Yunde Jia
Image Captioning -
DUDA: Robust Change Captioning
Dong Huk Park, Trevor Darrell, Anna Rohrbach [paper]
ICCV 2019 Oral
-
AoANet: Attention on Attention for Image Captioning [paper]
Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei
ICCV 2019 Oral
-
MaBi-LSTMs: Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style [paper]
Hongwei Ge, Zehang Yan, Kai Zhang, Mingde Zhao, Liang Sun -
Align2Ground: Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment [paper]
Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran* -
GCN-LSTM+HIP: Hierarchy Parsing for Image Captioning [paper]
Ting Yao, Yingwei Pan, Yehao Li, Tao Mei -
IR+Tdiv: Generating Diverse and Descriptive Image Captions Using Visual Paraphrases [paper]
Lixin Liu, Jiajun Tang, Xiaojun Wan, Zongming Guo -
CNM+SGAE: Learning to Collocate Neural Modules for Image Captioning [paper]
Xu Yang, Hanwang Zhang, Jianfei Cai -
Seq-CVAE: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning [paper]
Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing -
Towards Unsupervised Image Captioning With Shared Multimodal Embeddings [paper]
Iro Laina, Christian Rupprecht, Nassir Navab -
Human Attention in Image Captioning: Dataset and Analysis [paper]
Sen He, Hamed R. Tavakoli, Ali Borji, Nicolas Pugeault -
RDN: Reflective Decoding Network for Image Captioning [paper]
Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai -
PSST: Joint Optimization for Cooperative Image Captioning [paper]
Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik -
MUTAN: Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning [paper]
Tanzila Rahman, Bicheng Xu, Leonid Sigal -
ETA: Entangled Transformer for Image Captioning [paper]
Guang Li, Linchao Zhu, Ping Liu, Yi Yang -
nocaps: novel object captioning at scale [paper]
Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson -
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection [paper]
Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent -
Graph-Align: Unpaired Image Captioning via Scene Graph Alignments paper
Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang -
: Learning to Caption Images Through a Lifetime by Asking Questions [paper]
Tingke Shen, Amlan Kar, Sanja Fidler
Image Captioning
- SGAE: Auto-Encoding Scene Graphs for Image Captioning [paper] [code]
XU YANG (Nanyang Technological University); Kaihua Tang (Nanyang Technological University); Hanwang Zhang (Nanyang Technological University); Jianfei Cai (Nanyang Technological University)
CVPR 2019 Oral
- POS: Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech [paper]
Aditya Deshpande (University of Illinois at UC); Jyoti Aneja (University of Illinois, Urbana-Champaign); Liwei Wang (Tencent AI Lab); Alexander Schwing (UIUC); David Forsyth (Univeristy of Illinois at Urbana-Champaign)
CVPR 2019 Oral
- Unsupervised Image Captioning [paper] [code]
Yang Feng (University of Rochester); Lin Ma (Tencent AI Lab); Wei Liu (Tencent); Jiebo Luo (U. Rochester) - Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
Yan Xu (UESTC); Baoyuan Wu (Tencent AI Lab); Fumin Shen (UESTC); Yanbo Fan (Tencent AI Lab); Yong Zhang (Tencent AI Lab); Heng Tao Shen (University of Electronic Science and Technology of China (UESTC)); Wei Liu (Tencent) - Describing like Humans: On Diversity in Image Captioning [paper]
Qingzhong Wang (Department of Computer Science, City University of Hong Kong); Antoni Chan (City University of Hong Kong, Hong, Kong) - MSCap: Multi-Style Image Captioning With Unpaired Stylized Text [paper]
Longteng Guo ( Institute of Automation, Chinese Academy of Sciences); Jing Liu (National Lab of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences); Peng Yao (University of Science and Technology Beijing); Jiangwei Li (Huawei); Hanqing Lu (NLPR, Institute of Automation, CAS) - CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection [paper] [code]
Lu Zhang (Dalian University of Technology); Huchuan Lu (Dalian University of Technology); Zhe Lin (Adobe Research); Jianming Zhang (Adobe Research); You He (Naval Aviation University) - Context and Attribute Grounded Dense Captioning [paper]
Guojun Yin (University of Science and Technology of China); Lu Sheng (The Chinese University of Hong Kong); Bin Liu (University of Science and Technology of China); Nenghai Yu (University of Science and Technology of China); Xiaogang Wang (Chinese University of Hong Kong, Hong Kong); Jing Shao (Sensetime) - Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning [paper]
Dong-Jin Kim (KAIST); Jinsoo Choi (KAIST); Tae-Hyun Oh (MIT CSAIL); In So Kweon (KAIST) - Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [paper]
Marcella Cornia (University of Modena and Reggio Emilia); Lorenzo Baraldi (University of Modena and Reggio Emilia); Rita Cucchiara (Universita Di Modena E Reggio Emilia) - Self-Critical N-step Training for Image Captioning [paper]
Junlong Gao (Peking University Shenzhen Graduate School); Shiqi Wang (CityU); Shanshe Wang (Peking University); Siwei Ma (Peking University, China); Wen Gao (PKU) - Look Back and Predict Forward in Image Captioning [paper]
Yu Qin (Shanghai Jiao Tong University); Jiajun Du (Shanghai Jiao Tong University); Hongtao Lu (Shanghai Jiao Tong University); Yonghua Zhang (Bytedance) - Intention Oriented Image Captions with Guiding Objects [paper]
Yue Zheng (Tsinghua University); Ya-Li Li (THU); Shengjin Wang (Tsinghua University) - Adversarial Semantic Alignment for Improved Image Captions [paper]
Pierre Dognin (IBM); Igor Melnyk (IBM); Youssef Mroueh (IBM Research); Jarret Ross (IBM); Tom Sercu (IBM Research AI) - Good News, Everyone! Context driven entity-aware captioning for news images [paper] [code]
Ali Furkan Biten (Computer Vision Center); Lluis Gomez (Universitat Autónoma de Barcelona); Marçal Rusiñol (Computer Vision Center, UAB); Dimosthenis Karatzas (Computer Vision Centre) - Pointing Novel Objects in Image Captioning [paper]
Yehao Li (Sun Yat-Sen University); Ting Yao (JD AI Research); Yingwei Pan (JD AI Research); Hongyang Chao (Sun Yat-sen University); Tao Mei (AI Research of JD.com) - Engaging Image Captioning via Personality [paper]
Kurt Shuster (Facebook); Samuel Humeau (Facebook); Hexiang Hu (USC); Antoine Bordes (Facebook); Jason Weston (FAIR) - Intention Oriented Image Captions With Guiding Objects [paper]
Yue Zheng, Yali Li, Shengjin Wang - Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu
Video Captioning
- SDVC: Streamlined Dense Video Captioning [paper]
Jonghwan Mun (POSTECH); Linjie Yang (ByteDance AI Lab); Zhou Ren (Snap Inc.); Ning Xu (Snap); Bohyung Han (Seoul National University)
CVPR 2019 Oral
- GVD: Grounded Video Description [paper]
Luowei Zhou (University of Michigan); Yannis Kalantidis (Facebook Research); Xinlei Chen (Facebook AI Research); Jason J Corso (University of Michigan); Marcus Rohrbach (Facebook AI Research)
CVPR 2019 Oral
- HybridDis: Adversarial Inference for Multi-Sentence Video Description [paper]
Jae Sung Park (UC Berkeley); Marcus Rohrbach (Facebook AI Research); Trevor Darrell (UC Berkeley); Anna Rohrbach (UC Berkeley)
CVPR 2019 Oral
- OA-BTG: Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning [paper]
Junchao Zhang (Peking University); Yuxin Peng (Peking University) - MARN: Memory-Attended Recurrent Network for Video Captioning [paper]
Wenjie Pei (Tencent); Jiyuan Zhang (Tencent YouTu); Xiangrong Wang (Delft University of Technology); Lei Ke (Tencent); Xiaoyong Shen (Tencent); Yu-Wing Tai (Tencent) - GRU-EVE: Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [paper]
Nayyer Aafaq (The University of Western Australia); Naveed Akhtar (The University of Western Australia); Wei Liu (University of Western Australia); Syed Zulqarnain Gilani (The University of Western Australia); Ajmal Mian (University of Western Australia)
Image Captioning
- Improving Image Captioning with Conditional Generative Adversarial Nets [paper]
CHEN CHEN (Tencent); SHUAI MU (Tencent); WANPENG XIAO (Tencent); ZEXIONG YE (Tencent); LIESI WU (Tencent); QI JU (Tencent)
AAAI 2019 Oral
- PAGNet: Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding [paper]
Lingyun Song (Xi'an JiaoTong University); Jun Liu (Xi'an Jiaotong Univerisity); Buyue Qian (Xi'an Jiaotong University); Yihe Chen (University of Toronto)
AAAI 2019 Oral
- Meta Learning for Image Captioning [paper]
Nannan Li (Wuhan University); Zhenzhong Chen (WHU); Shan Liu (Tencent America) - DA: Deliberate Residual based Attention Network for Image Captioning [paper] Lianli Gao (The University of Electronic Science and Technology of China); kaixuan fan (University of Electronic Science and Technology of China); Jingkuan Song (UESTC); Xianglong Liu (Beihang University); Xing Xu (University of Electronic Science and Technology of China); Heng Tao Shen (University of Electronic Science and Technology of China (UESTC))
- HAN: Hierarchical Attention Network for Image Captioning [paper]
Weixuan Wang (School of Electronic and Information Engineering, Sun Yat-sen University);Zhihong Chen (School of Electronic and Information Engineering, Sun Yat-sen University); Haifeng Hu (School of Electronic and Information Engineering, Sun Yat-sen University) - COCG: Learning Object Context for Dense Captioning [paper]
Xiangyang Li (Institute of Computing Technology, Chinese Academy of Sciences); Shuqiang Jiang (ICT, China Academy of Science); Jungong Han (Lancaster University)
Video Captioning
- TAMoE: Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning [code] [paper]
Xin Wang (University of California, Santa Barbara); Jiawei Wu (University of California, Santa Barbara); Da Zhang (UC Santa Barbara); Yu Su (OSU); William Wang (UC Santa Barbara)
AAAI 2019 Oral
- TDConvED: Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning [paper]
Jingwen Chen (Sun Yat-set University); Yingwei Pan (JD AI Research); Yehao Li (Sun Yat-Sen University); Ting Yao (JD AI Research); Hongyang Chao (Sun Yat-sen University); Tao Mei (AI Research of JD.com)
AAAI 2019 Oral
- FCVC-CF&IA: Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention [paper]
Kuncheng Fang (Fudan University); Lian Zhou (Fudan University); Cheng Jin (Fudan University); Yuejie Zhang (Fudan University); Kangnian Weng (Shanghai University of Finance and Economics); Tao Zhang (Shanghai University of Finance and Economics); Weiguo Fan (University of Iowa) - MGSA: Motion Guided Spatial Attention for Video Captioning [paper]
Shaoxiang Chen (Fudan University); Yu-Gang Jiang (Fudan University)