Localized Symbolic Knowledge Distillation for Visual Commonsense Models [Neurips 2023]

Repo for LSKD: Distilling localized (e.g. bounding boxes), visual commonsense knowledge to Visual Language Models with ChatGPT generated data and filtering.

[paper] [dataset]

The Localized Commonsense Knowledge (LCK) Dataset

Dataset with localized reasoning is provided here

>>> pprint(df.iloc[1])
image                                       VG_100K/2348412.jpg
source                             chatgpt_region_any_v4_people
split                                                     train
index                       chatgpt_region_any_v4_people-858297
region                                                      [4]
region_all                                            [0, 2, 4]
references    [{'name': '4', 'boxes': [[379.6391601562, 152....
question      [What, is, the, significance, of, the, gold, l...
answer        [The, gold, lion, on, the, woman's, shirt, in,...
rationale     [Lions, are, often, used, as, symbols, of, str...
prediction                                             0.940065

Distillation Results

Model Training and Evaluation

We use the Salesforce LAVIS repo to train and evaluate the knowledge distillation pipeline.

Installation

pip install -e .

Downstream Task Evaluation

You can download the BLIP2 + LSKD model [here]

To run the evaluation on localized datasets, adjust $CHECKPOINT_DIR and run the script:

bash run_scripts/blip2/eval/eval_unified_common_sense.sh

Critic Model for Data Filtering

We also release the critic model used to filter the irrelevant generated data. You can download the finetuned critic model [here]

Run the following command to run the finetuned critic model in distriubted setting. This saves the output json file in run.output_dir

torchrun --nproc_per_node=4 evaluate.py --cfg-path lavis/projects/blip2/eval/laion/laion_sample_critic_ft_filtering.yaml \
  --options run.output_dir=output/BLIP2/laion_samples/filtering/critic_ft/

References

@inproceedings{Park2023LocalizedSK,
  title={Localized Symbolic Knowledge Distillation for Visual Commonsense Models},
  author={Jae Sung Park and Jack Hessel and Khyathi Raghavi Chandu and Paul Pu Liang and Ximing Lu and Peter West and Youngjae Yu and Qiuyuan Huang and Jianfeng Gao and Ali Farhadi and Yejin Choi},
  year={2023},
  url={https://api.semanticscholar.org/CorpusID:266149843}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
assets		assets
dataset_card		dataset_card
docs		docs
examples		examples
lavis		lavis
projects		projects
run_scripts		run_scripts
tests/models		tests/models
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
evaluate.py		evaluate.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

License

jamespark3922/localized-skd

Folders and files

Latest commit

History

Repository files navigation

Localized Symbolic Knowledge Distillation for Visual Commonsense Models [Neurips 2023]

The Localized Commonsense Knowledge (LCK) Dataset

Distillation Results

Model Training and Evaluation

Installation

Downstream Task Evaluation

Critic Model for Data Filtering

References

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages