A Survey on Hallucination in Large Vision-Language Models

updated 2024/04/29

This is the first released survey paper in the field of hallucinations in large vision-language models (LVLMs). Our paper focuses on the benchmark, evaluation methods, causes and mitigation methods of hallucinations in LVLMs.

Paper link: A Survey on Hallucination in Large Vision-Language Models

Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ''hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related hallucinations in an attempt to establish an overview and facilitate future mitigation. Our scrutiny starts with a clarification of the concept of hallucinations in LVLMs, presenting a variety of hallucination symptoms and highlighting the unique challenges inherent in LVLM hallucinations. Subsequently, we outline the benchmarks and methodologies tailored specifically for evaluating hallucinations unique to LVLMs. Additionally, we delve into an investigation of the root causes of these hallucinations, encompassing insights from the training data and model components. We also critically review existing methods for mitigating hallucinations. The open questions and future directions pertaining to hallucinations within LVLMs are discussed to conclude this survey.

📑 If you find our projects helpful to your research, please consider citing:

@article{liu2024survey,
  title={A survey on hallucination in large vision-language models},
  author={Liu, Hanchao and Xue, Wenyuan and Chen, Yifei and Chen, Dapeng and Zhao, Xiutian and Wang, Ke and Hou, Liping and Li, Rongjun and Peng, Wei},
  journal={arXiv preprint arXiv:2402.00253},
  year={2024}
}

Evaluation Methods

Title	Name	Venue	Date	Code
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models	VALOR-Eval	arXiv	2024-04-22	Github
ALOHa: A New Measure for Hallucination in Captioning Models	ALOHa	arXiv	2024-04-03
Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning	PA-ICVL (for hallucinations in generated images)	arXiv	2024-03-22
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models	AFHA	arXiv	2024-02-24
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites	FGHE	arXiv	2023-12-04	Github
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models	FAITHSCORE	arXiv	2023-11-02	Github
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models	NOPE	arXiv	2023-10-09
HallE-Control: Controlling Object Hallucination in Large Multimodal Models	CCEval	arXiv	2023-10-03	Github
Aligning Large Multimodal Models with Factually Augmented RLHF	MMHal-Bench	arXiv	2023-09-25	Github
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning	CIEM	NeurIPS 2023	2023-09-05
Evaluation and analysis of hallucination in large vision-language models	HaELM	arXiv	2023-08-29	Github
Detecting and preventing hallucinations in large vision language models	M-HalDetect	AAAI 2024	2023-08-11	Github
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning	GAVIE	ICLR 2024	2023-06-26	Github
Evaluating object hallucination in large vision-language models	POPE	EMNLP 2023	2023-05-17	Github
Object hallucination in image captioning	CHAIR	EMNLP 2019	2019-03-19

Benchmarks

Title	Name	Venue	Date	Code
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models	VALOR-BENCH	arXiv	2024-04-22	Github
PhD: A Prompted Visual Hallucination Evaluation Dataset	Phd	arXiv	2024-03-17	Github
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models	AFHA	arXiv	2024-02-24
Visual Hallucinations of Multi-modal Large Language Models	VHTest	arXiv	2024-02-22	Github
Visually Dehallucinative Instruction Generation: Know What You Don't Know	VQAv2-IDK	arXiv	2024-02-15
The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs	CorrelationQA	arXiv	2024-02-06	Github
Hallucination Benchmark in Medical Visual Question Answering		ICLR 2024	2024-01-11	Github
An llm-free multi-dimensional benchmark for mllms hallucination evaluation	AMBER	arXiv	2023-11-13	Github
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models	FAITHSCORE	arXiv	2023-11-02	Github
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models	NOPE	arXiv	2023-10-09
Aligning Large Multimodal Models with Factually Augmented RLHF	MMHal-Bench	arXiv	2023-09-25	Github
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning	CIEM	NeurIPS 2023	2023-09-05
Evaluation and analysis of hallucination in large vision-language models	HaELM	arXiv	2023-08-29	Github
Detecting and preventing hallucinations in large vision language models	M-HalDetect	AAAI 2024	2023-08-11	Github
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning	GAVIE	ICLR 2024	2023-06-26	Github
Evaluating object hallucination in large vision-language models	POPE	EMNLP 2023	2023-05-17	Github

Mitigation Methods

Title	Name	Venue	Date	Code
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback	HSA-DPO	arXiv	2024-04-22
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales	Fact	arXiv	2023-04-17
Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning	DFTG	arXiv	2024-04-16
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs		arXiv	2024-04-11
BRAVE: Broadening the visual encoding of vision-language models	BRAVE	arXiv	2024-04-10
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback	FGAIF	arXiv	2024-04-07
H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model	RSSA	arXiv	2024-03-29	Github
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding	ICD	arXiv	2024-03-27
Visual Hallucination: Definition, Quantification, and Prescriptive Remediations	VHILT	arXiv	2024-03-26
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models	ESREAL	arXiv	2024-03-24
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination	Pensieve	arXiv	2024-03-21	Github
Multi-Modal Hallucination Control by Visual Information Grounding	M3ID	arXiv	2024-03-20
What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models	Counterfactual Inception	arXiv	2024-03-20	Github
Debiasing Multimodal Large Language Models	LLaVA-Align	arXiv	2024-03-08	Github
Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models		arXiv	2024-03-03
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding	HALC	arXiv	2024-03-01	Github
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World	ASMv2	arXiv	2024-02-29	Github
IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding	IBD	arXiv	2024-02-28
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation	GROUNDHOG	arXiv	2024-02-26	Coming Soon
Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding	CGD	arXiv	2024-02-23	Github
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models	CGD	arXiv	2024-02-22	Github
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective		arXiv	2024-02-22	Github
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models	LogicCheckGPT	arXiv	2024-02-18	Github
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning	POVID	arXiv	2024-02-18	Github
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models	EFUF	arXiv	2024-02-15
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance	MARINE	arXiv	2024-02-13
Visually Dehallucinative Instruction Generation	CAP2QA	ICASSP 2024	2024-02-13
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling		arXiv	2024-02-09
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models	Skip \n	arXiv	2024-02-02	Github
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models		arXiv	2024-01-18
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs	MoF	arXiv	2024-01-11	Github
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks	InternVL	arXiv	2023-12-21	Github
VCoder: Versatile Vision Encoders for Multimodal Large Language Models	VCoder	arXiv	2023-12-21	Github
Silkie: Preference Distillation for Large Visual Language Models	Silkie	arXiv	2023-12-17	Github
Hallucination augmented contrastive learning for multimodal large language model	HACL	arXiv	2023-12-12	Github
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites	ReCaption	arXiv	2023-12-04	Github
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback	RLHF-V	arXiv	2023-12-01	Github
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation	OPERA	arXiv	2023-11-29	Github
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding	VCD	arXiv	2023-11-28	Github
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization	HA-DPO	arXiv	2023-11-28	Github
Mitigating Hallucination in Visual Language Models with Visual Supervision		arXiv	2023-11-27
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data	HalluciDoctor	arXiv	2023-11-22	Github
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models	Ferret	arXiv	2023-11-11	Github
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model	Spatial Awareness Enhancing	arXiv	2023-10-31
Woodpecker: Hallucination Correction for Multimodal Large Language Models	Woodpecker	arXiv	2023-10-24	Github
Ferret: Refer and Ground Anything Anywhere at Any Granularity	Ferret	ICLR 2024	2023-10-11	Github
Improved Baselines with Visual Instruction Tuning	LLaVA-1.5	NeurIPS 2023	2023-10-05	Github
HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption	HallE-Switch	arXiv	2023-10-03	Github
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models	LURE	arXiv	2023-10-01	Github
Aligning Large Multimodal Models with Factually Augmented RLHF	MMHal-Bench	arXiv	2023-09-25	Github
Evaluation and Mitigation of Agnosia in Multimodal Large Language Models	EMMA	ICLR 2024	2023-09-07
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning	CIEM	NeurIPS 2023	2023-09-05
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	Qwen-VL	arXiv	2023-08-24	Github
Detecting and preventing hallucinations in large vision language models	FDPO	AAAI 2024	2023-08-11	Github
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning	LRV-Instruction	ICLR 2024	2023-06-26	Github

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A Survey on Hallucination in Large Vision-Language Models

Evaluation Methods

Benchmarks

Mitigation Methods

About

Releases

Packages

License

lhanchao777/LVLM-Hallucinations-Survey

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A Survey on Hallucination in Large Vision-Language Models

Evaluation Methods

Benchmarks

Mitigation Methods

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages