SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

Official PyTorch Implementation

Quentin Guimard^1,* · Federico Bartsch^1,* · Simone Caldarella¹ · Rahaf Aljundi² · Elisa Ricci^1,3 · Massimiliano Mancini¹

¹University of Trento | ²Toyota Motor Europe | ³Fondazione Bruno Kessler
^*Equal contribution

Abstract

Models that bridge vision and language, such as CLIP, are key components of multimodal AI. Yet, their large-scale, uncurated training data introduces severe social and spurious biases. Existing post-hoc debiasing methods often operate directly in the dense CLIP embedding space, where bias and task-relevant information are highly entangled—making it difficult to remove bias without degrading semantic fidelity.

In this work, we propose Sparse Embedding Modulation (SEM) — a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM can pinpoint and modulate bias-relevant neurons while safely preserving query-relevant ones.

This approach enables more precise, non-linear interventions. Across four benchmark datasets and two CLIP backbones, SEM achieves substantial fairness gains in retrieval and zero-shot classification, demonstrating that sparse latent representations provide a highly effective foundation for debiasing vision-language models.

🛠️ Software Requirements

This codebase is written in Python. We recommend using uv for fast and reliable dependency management.

1. Create and activate a virtual environment:

uv venv
source .venv/bin/activate

2. Install PyTorch: (Note: Adjust the index URL based on your specific CUDA version requirements.)

uv pip install torch torchvision

3. Install the remaining dependencies:

uv pip install -r requirements.txt

📂 Dataset Preparation

Please download the following datasets and organize them into the ./data/ directory.

Dataset	Reference	Link	Note
FairFace	Karkkainen et al., WACV 2021	GitHub repository	Mandatory for evaluation (padding = 0.25)
UTKFace	Zhang et al., CVPR 2017	Kaggle dataset	Mandatory for evaluation (aligned & cropped faces)
CelebA	Liu et al., ICCV 2015	Dataset website	Mandatory for evaluation (aligned & cropped faces)
Waterbirds	Sagawa et al., ICLR 2020	GitHub repository	Mandatory for evaluation (official tarball)
FACET	Gustafson et al., ICCV 2023	Dataset website	ZSDebias training only
COCO	Lin et al., ECCV 2014	Dataset website	ZSDebias training only (2017 train set)
ImageNet-100	Tian et al., ECCV 2020	GitHub repository	ZSDebias training only (original imagenet100.txt)

Expected Directory Structure:

./
├── ...
├── data/
│   ├── CelebA/
│   │   ├── Anno/
│   │   ├── Eval/
│   │   ├── Img/
│   │   │   └── img_align_celeba/
│   │   └── ...
│   ├── COCO/
│   │   ├── train2017/
│   │   ├── instances_train2017.json
│   │   └── ...
│   ├── FACET/
│   │   ├── images/
│   │   ├── annotations.csv
│   │   └── ...
│   ├── FairFace/
│   │   ├── train/
│   │   ├── val/
│   │   ├── fairface_label_train.csv
│   │   ├── fairface_label_val.csv
│   │   └── ...
│   ├── ImageNet-100/
│   │   ├── train/
│   │   ├── imagenet100.txt
│   │   └── ...
│   ├── LFW/
│   │   ├── images/
│   │   └── ...
│   ├── UTKFace/
│   │   ├── images/
│   │   └── ...
│   └── Waterbirds/
│       ├── waterbirds_complete95_forest2water2/
│       └── ...
└── ...

Alternatively, you can use the precomputed CLIP embeddings for each dataset (4 backbones: ViT-B/16, ViT-L/14@336px, RN50, RN101) available in the GitHub releases.

🚀 Usage & Reproduction

1. Training-Based Methods & Weights

SEM (Ours)

We rely on a general-purpose Matryoshka Sparse Autoencoder (MSAE) (Zaigrajew et al., ICML 2025).

Pre-trained weights: one model for each tested CLIP backbone (4 files). You can download and extract them directly from the GitHub releases:
```
wget https://github.com/mardgui/SEM/releases/download/v1.0/sae_weights.zip
unzip sae_weights.zip -d ./debiasing_methods/sae_weights/
```
Training the model: please refer to the original MSAE repository if you wish to train a new SAE model.

PRISM (Baseline)

PRISM requires building a small text dataset and training a linear layer for every backbone/target/bias combination (324 configurations x 3 seeds = 972 models total).

Pre-trained weights: one model for each configuration (972 files). You can download and extract them directly from the GitHub releases:

wget https://github.com/mardgui/SEM/releases/download/v1.0/prism_weights.zip
unzip prism_weights.zip -d ./debiasing_methods/prism_weights/

Reproduction: we provide the training script and a shell script to generate all necessary models:
```
# Train all 972 configuration models
bash ./train_all_prism.sh
```
(Individual training script location: ./debiasing_methods/prism_training.py)

ZSDebias (Baseline)

ZSDebias requires building a bias-specific vision dataset and bias text corpus for each type of bias. For demographic bias on faces, a combination of FACET, LFW, and UTKFace is used for training. For background bias on animals, a combination of ImageNet-100 and MS-COCO (animal subset) is used for training. The bias text corpus is based on combinations of pre-defined text prompts. VAE-based "adaptors" are trained for each bias type / backbone (8 configurations x 3 seeds = 24 models total).

Pre-trained weights: one model for each configuration (24 files). You can download and extract them directly from the GitHub releases:

wget https://github.com/mardgui/SEM/releases/download/v1.0/zsdebias_weights.zip
unzip zsdebias_weights.zip -d ./debiasing_methods/zsdebias_weights/

Reproduction: we provide the training script and a shell script to generate all necessary models:
```
# Make sure all CLIP embeddings are pre-computed (see precompute_all_clip_embeddings.sh)
# Train all 24 configuration models
bash ./train_zsdebias.sh
```
(Individual training script location: ./debiasing_methods/zsdebias_training.py)

2. Pre-computation

Before running the evaluation, pre-compute the CLIP embeddings for the datasets and generate the k-fold cross-validation indices.

# Compute embeddings for all datasets/backbones
bash ./precompute_all_clip_embeddings.sh

# Generate k-fold indices for all evaluation datasets
bash ./generate_all_splits.sh

(Individual scripts locations: ./precompute_embeddings.py and ./generate_splits.py)

Alternatively, to save on computation time, you can use the precomputed CLIP embeddings available in the GitHub releases. We also provide the k-fold indices that were used in the paper for all evaluation datasets to ensure reproducibility.

3. Evaluation

To run the complete evaluation pipeline (Retrieval and Zero-Shot Classification) using the configurations described in the paper (all datasets / all backbones / all baselines):

# Run full evaluation suite
bash ./run_eval.sh

# Process the JSON eval outputs
python process_json_outputs.py

# Reproduce the tables from the paper
python reproduce_table_2.py
python reproduce_table_3.py
python reproduce_table_9.py
python reproduce_table_13.py
python reproduce_table_14.py
python reproduce_table_15.py

Individual script location: ./main_eval.py. Specific configurations for different datasets and methods can be found in the ./config/ directory.

To save on computation time and ensure reproducibility, we also include in ./main_eval_output the precomputed JSON outputs for the whole run_eval.sh script.

📊 Analysis & Visualization

We provide the exact code used to generate the analytical figures in the paper.

Disentanglement Study (Section 3.1)

This study quantifies the entanglement of bias and content in SAE latent space versus CLIP embedding space.

Location: ./disentanglement_study/

Steps to reproduce:

Move into the directory:
```
cd disentanglement_study
```
Precompute the probing datasets:
```
bash ./precompute_all.sh
```
(Individual script location: ./disentanglement_study/precompute_dataset.py)
Run the probing experiments (Linear Probe training):
```
bash ./run_all_disentanglement_exps.sh
```
(Individual script location: ./disentanglement_study/disentanglement_experiment.py)
Generate the plot (Figure 2 in main paper):
```
python ./generate_disentanglement_plot.py
```
Output: ./disentanglement_study/disentanglement_score_plot.pdf

Qualitative PCA Study (Section 4.2)

This study visualizes the geometric structure of gendered profession embeddings before and after debiasing.

Location: ./qualitative_study/

Steps to reproduce:

Move into the directory:
```
cd qualitative_study
```
Run the PCA analysis and plotting script:
```
python ./pca.py
```
Output: ./qualitative_study/pca_profession_gender.pdf

Citation

If you find this work useful, please consider citing:

@inproceedings{guimardbartsch2026sem,
  title={{SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models}},
  author={Guimard, Quentin and Bartsch, Federico and Caldarella, Simone and Aljundi, Rahaf and Ricci, Elisa and Mancini, Massimiliano},
  booktitle={Findings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

Table of Contents

Abstract

🛠️ Software Requirements

📂 Dataset Preparation

🚀 Usage & Reproduction

1. Training-Based Methods & Weights

SEM (Ours)

PRISM (Baseline)

ZSDebias (Baseline)

2. Pre-computation

3. Evaluation

📊 Analysis & Visualization

Disentanglement Study (Section 3.1)

Qualitative PCA Study (Section 4.2)

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
debiasing_methods		debiasing_methods
disentanglement_study		disentanglement_study
main_eval_output		main_eval_output
prompt_templates		prompt_templates
qualitative_study		qualitative_study
query_templates		query_templates
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classes_and_prompts.py		classes_and_prompts.py
generate_all_splits.sh		generate_all_splits.sh
generate_splits.py		generate_splits.py
main_eval.py		main_eval.py
main_eval_utils.py		main_eval_utils.py
precompute_all_clip_embeddings.sh		precompute_all_clip_embeddings.sh
precompute_embeddings.py		precompute_embeddings.py
process_json_outputs.py		process_json_outputs.py
reproduce_table_13.py		reproduce_table_13.py
reproduce_table_14.py		reproduce_table_14.py
reproduce_table_15.py		reproduce_table_15.py
reproduce_table_2.py		reproduce_table_2.py
reproduce_table_3.py		reproduce_table_3.py
reproduce_table_9.py		reproduce_table_9.py
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh
train_all_prism.sh		train_all_prism.sh
train_zsdebias.sh		train_zsdebias.sh

Folders and files

Latest commit

History

Repository files navigation

SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

Table of Contents

Abstract

🛠️ Software Requirements

📂 Dataset Preparation

🚀 Usage & Reproduction

1. Training-Based Methods & Weights

SEM (Ours)

PRISM (Baseline)

ZSDebias (Baseline)

2. Pre-computation

3. Evaluation

📊 Analysis & Visualization

Disentanglement Study (Section 3.1)

Qualitative PCA Study (Section 4.2)

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages