Skip to content

lhanchao777/LVLM-Hallucinations-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 

Repository files navigation

A Survey on Hallucination in Large Vision-Language Models

updated 2024/04/29

This is the first released survey paper in the field of hallucinations in large vision-language models (LVLMs). Our paper focuses on the benchmark, evaluation methods, causes and mitigation methods of hallucinations in LVLMs.

Paper link: A Survey on Hallucination in Large Vision-Language Models

Hallucinations examples in LVLMs

Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ''hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related hallucinations in an attempt to establish an overview and facilitate future mitigation. Our scrutiny starts with a clarification of the concept of hallucinations in LVLMs, presenting a variety of hallucination symptoms and highlighting the unique challenges inherent in LVLM hallucinations. Subsequently, we outline the benchmarks and methodologies tailored specifically for evaluating hallucinations unique to LVLMs. Additionally, we delve into an investigation of the root causes of these hallucinations, encompassing insights from the training data and model components. We also critically review existing methods for mitigating hallucinations. The open questions and future directions pertaining to hallucinations within LVLMs are discussed to conclude this survey.



đź“‘ If you find our projects helpful to your research, please consider citing:

@article{liu2024survey,
  title={A survey on hallucination in large vision-language models},
  author={Liu, Hanchao and Xue, Wenyuan and Chen, Yifei and Chen, Dapeng and Zhao, Xiutian and Wang, Ke and Hou, Liping and Li, Rongjun and Peng, Wei},
  journal={arXiv preprint arXiv:2402.00253},
  year={2024}
}

Evaluation Methods

Title Name Venue Date Code
Star
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
VALOR-Eval arXiv 2024-04-22 Github
ALOHa: A New Measure for Hallucination in Captioning Models
ALOHa arXiv 2024-04-03
Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning
PA-ICVL (for hallucinations in generated images) arXiv 2024-03-22
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models AFHA arXiv 2024-02-24
Star
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites
FGHE arXiv 2023-12-04 Github
Star
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
FAITHSCORE arXiv 2023-11-02 Github
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models NOPE arXiv 2023-10-09
Star
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
CCEval arXiv 2023-10-03 Github
Star
Aligning Large Multimodal Models with Factually Augmented RLHF
MMHal-Bench arXiv 2023-09-25 Github
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning CIEM NeurIPS 2023 2023-09-05
Star
Evaluation and analysis of hallucination in large vision-language models
HaELM arXiv 2023-08-29 Github
Star
Detecting and preventing hallucinations in large vision language models
M-HalDetect AAAI 2024 2023-08-11 Github
Star
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
GAVIE ICLR 2024 2023-06-26 Github
Star
Evaluating object hallucination in large vision-language models
POPE EMNLP 2023 2023-05-17 Github
Object hallucination in image captioning CHAIR EMNLP 2019 2019-03-19

Benchmarks

Title Name Venue Date Code
Star
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
VALOR-BENCH arXiv 2024-04-22 Github
Star
PhD: A Prompted Visual Hallucination Evaluation Dataset
Phd arXiv 2024-03-17 Github
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models AFHA arXiv 2024-02-24
Star
Visual Hallucinations of Multi-modal Large Language Models
VHTest arXiv 2024-02-22 Github
Visually Dehallucinative Instruction Generation: Know What You Don't Know VQAv2-IDK arXiv 2024-02-15
Star
The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs
CorrelationQA arXiv 2024-02-06 Github
Star
Hallucination Benchmark in Medical Visual Question Answering
ICLR 2024 2024-01-11 Github
Star
An llm-free multi-dimensional benchmark for mllms hallucination evaluation
AMBER arXiv 2023-11-13 Github
Star
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
FAITHSCORE arXiv 2023-11-02 Github
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models NOPE arXiv 2023-10-09
Star
Aligning Large Multimodal Models with Factually Augmented RLHF
MMHal-Bench arXiv 2023-09-25 Github
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning CIEM NeurIPS 2023 2023-09-05
Star
Evaluation and analysis of hallucination in large vision-language models
HaELM arXiv 2023-08-29 Github
Star
Detecting and preventing hallucinations in large vision language models
M-HalDetect AAAI 2024 2023-08-11 Github
Star
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
GAVIE ICLR 2024 2023-06-26 Github
Star
Evaluating object hallucination in large vision-language models
POPE EMNLP 2023 2023-05-17 Github

Mitigation Methods

Title Name Venue Date Code
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback HSA-DPO arXiv 2024-04-22
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales Fact arXiv 2023-04-17
Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning DFTG arXiv 2024-04-16
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs arXiv 2024-04-11
BRAVE: Broadening the visual encoding of vision-language models BRAVE arXiv 2024-04-10
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback FGAIF arXiv 2024-04-07
Star
H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model
RSSA arXiv 2024-03-29 Github
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding ICD arXiv 2024-03-27
Visual Hallucination: Definition, Quantification, and Prescriptive Remediations VHILT arXiv 2024-03-26
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models ESREAL arXiv 2024-03-24
Star
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
Pensieve arXiv 2024-03-21 Github
Multi-Modal Hallucination Control by Visual Information Grounding M3ID arXiv 2024-03-20
Star
What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models
Counterfactual Inception arXiv 2024-03-20 Github
Star
Debiasing Multimodal Large Language Models
LLaVA-Align arXiv 2024-03-08 Github
Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models arXiv 2024-03-03
Star
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
HALC arXiv 2024-03-01 Github
Star
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ASMv2 arXiv 2024-02-29 Github
IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding IBD arXiv 2024-02-28
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation GROUNDHOG arXiv 2024-02-26 Coming Soon
Star
Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding
CGD arXiv 2024-02-23 Github
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models CGD arXiv 2024-02-22 Github
Star
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
arXiv 2024-02-22 Github
Star
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
LogicCheckGPT arXiv 2024-02-18 Github
Star
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
POVID arXiv 2024-02-18 Github
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models EFUF arXiv 2024-02-15
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance MARINE arXiv 2024-02-13
Visually Dehallucinative Instruction Generation CAP2QA ICASSP 2024 2024-02-13
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling arXiv 2024-02-09
Star
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models
Skip \n arXiv 2024-02-02 Github
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models arXiv 2024-01-18
Star
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
MoF arXiv 2024-01-11 Github
Star
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL arXiv 2023-12-21 Github
Star
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
VCoder arXiv 2023-12-21 Github
Star
Silkie: Preference Distillation for Large Visual Language Models
Silkie arXiv 2023-12-17 Github
Hallucination augmented contrastive learning for multimodal large language model HACL arXiv 2023-12-12 Github
Star
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites
ReCaption arXiv 2023-12-04 Github
Star
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
RLHF-V arXiv 2023-12-01 Github
Star
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
OPERA arXiv 2023-11-29 Github
Star
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
VCD arXiv 2023-11-28 Github
Star
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
HA-DPO arXiv 2023-11-28 Github
Mitigating Hallucination in Visual Language Models with Visual Supervision arXiv 2023-11-27
Star
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
HalluciDoctor arXiv 2023-11-22 Github
Star
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Ferret arXiv 2023-11-11 Github
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model Spatial Awareness Enhancing arXiv 2023-10-31
Star
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Woodpecker arXiv 2023-10-24 Github
Star
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret ICLR 2024 2023-10-11 Github
Star
Improved Baselines with Visual Instruction Tuning
LLaVA-1.5 NeurIPS 2023 2023-10-05 Github
Star
HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
HallE-Switch arXiv 2023-10-03 Github
Star
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
LURE arXiv 2023-10-01 Github
Star
Aligning Large Multimodal Models with Factually Augmented RLHF
MMHal-Bench arXiv 2023-09-25 Github
Evaluation and Mitigation of Agnosia in Multimodal Large Language Models EMMA ICLR 2024 2023-09-07
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning CIEM NeurIPS 2023 2023-09-05
Star
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Qwen-VL arXiv 2023-08-24 Github
Star
Detecting and preventing hallucinations in large vision language models
FDPO AAAI 2024 2023-08-11 Github
Star
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
LRV-Instruction ICLR 2024 2023-06-26 Github

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published