GitHub - itsloganmann/LLaVAProbe: Reproducible pipeline for analysing spatial attention clusters and confidence metrics for LLaVA models

Attention Confidence Analysis Pipeline

This repository now provides a reproducible pipeline for analysing spatial attention clusters and confidence metrics for LLaVA models. The workflow automates the ideas outlined in Ideas.pdf, including weighted clustering sweeps, language-only and visual ablations, confidence calibration, and rich structured reporting.

Quick start

Environment setup
```
bash setup.sh
```
This creates a virtual environment and installs all required packages (PyTorch, Transformers, HDBSCAN, etc.).
Generate prompts (optional)

Use tester.py to refresh results.csv if you need new prompt/image pairs.
Run the analysis pipeline
```
python analysis/pipeline_runner.py --prompts results.csv --output-dir analysis_outputs
```
Key options:
- --quantization {none,4bit,8bit}: load LLaVA with optional bitsandbytes quantisation.
- --log-level: adjust verbosity (default INFO).
Outputs

The pipeline produces the following artefacts inside the chosen output directory:
- analysis_records.json / analysis_records.csv: per-example metrics including token probabilities, margins, entropies, clustering stats, head ablation deltas, and ablation summaries.
- clustering_summary.json: correlation statistics for every DBSCAN/HDBSCAN/GMM configuration plus null-model baselines.
- pipeline_execution.log: full execution log (timestamps, errors, timing).

What the pipeline covers

Weighted 3D DBSCAN sweeps with sample-weight exponents and sensitivity analyses.
Null-model permutation tests, HDBSCAN, and Gaussian Mixture alternatives.
Normalised attention entropy and token entropy fixes.
Ensemble confidence metrics (top-k margin, logit margin, per-token log-probs, sequence probability).
Confidence calibration (ECE & Brier) for yes/no and short-answer subsets.
Answer-token vs ground-truth-token head deltas.
Language-only, visual dropout, head-ablation, and prefix-control experiments.
Image resolution sweeps (224, 336, 448) with structured exports for downstream analysis.

Legacy scripts

main.py and main_batching.py now delegate to the new pipeline but keep their previous implementations below the entry-point guard for reference.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
analysis		analysis
data_processing		data_processing
logit_lens		logit_lens
outputs_lambda		outputs_lambda
test_intervention_output		test_intervention_output
topk		topk
vlm_probe		vlm_probe
.gitignore		.gitignore
Ideas_extracted.txt		Ideas_extracted.txt
Llava_visualize_code.ipynb		Llava_visualize_code.ipynb
README.md		README.md
VLM_PROBE_README.md		VLM_PROBE_README.md
dataloader.py		dataloader.py
main.py		main.py
main_batching.py		main_batching.py
modeling_llama.py		modeling_llama.py
modeling_llava.py		modeling_llava.py
results.csv		results.csv
run_analysis.py		run_analysis.py
setup.sh		setup.sh
setup_colab.sh		setup_colab.sh
tester.py		tester.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attention Confidence Analysis Pipeline

Quick start

What the pipeline covers

Legacy scripts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Attention Confidence Analysis Pipeline

Quick start

What the pipeline covers

Legacy scripts

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages