Skip to content

seonwoo-min/GVRT

Repository files navigation

Grounding Visual Representations with Texts (GVRT)

Grounding Visual Representations with Texts for Domain Generalization
Seonwoo Min, Nokyung Park, Siwon Kim, Seunghyun Park, Jinkyu Kim
ECCV 2022 | Official Pytorch implementation

We advocate for leveraging the vision-and-language cross-modality supervision for the DG task.

  • Two modules to ground visual representations with texts containing typical reasoning of humans.
  • Visual and Textual Joint Embedder aligns visual representations with the pivot sentence embedding.
  • Textual Explanation Generator generates explanations justifying the rationale behind its decision.
  • Our method achieves state-of-the-art results both in CUB-DG and DomainBed benchmarks!

image-gvrt

Installation

We recommend creating a conda environment and installing the necessary python packages as:

git clone https://github.com/mswzeus/GVRT.git
cd GVRT
ln -s ../src DomainBed_GVRT/src
conda create -n GVRT python=3.8
conda activate GVRT
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

CUB-DG Benchmark Dataset

We created CUB-DG to investigate the cross-modality supervision in the DG task (Download Link).
CUB is an image dataset with photos of 200 bird species. For more information, please see the original repo.
We used pre-trained style transfer models to obtain images from three other domains, i.e. Art, Paint, and Cartoon.

image-cub-dg

Pre-trained Models

We provide the following pre-trained models for three independent runs (Download Link).

  • Ours trained with PTE (pre-trained textual encoder)
  • Ours trained with STE (self-supervised textual encoder)

How to Run

Training a GVRT model

You can use the train_model.py script with the necessary configurations as:

CUDA_VISIBLE_DEVICES=0 python train_model.py --algorithm GVRT --test-env 0 --seed 0 --output-path results/PTE_test0_seed0 

Evaluating a GVRT model

You can use the evaluate_model.py script with the necessary configurations as:

CUDA_VISIBLE_DEVICES=0 python evaluate_model.py --algorithm GVRT --test-env 0 --seed 0 --output-path results/PTE_test0_seed0 --checkpoint pretrained_models/PTE_test0_seed0.pt

Experimental Results on CUB-DG

We report averaged results across three independent runs.

Citation

If you find our work useful, please kindly cite this paper:

@article{min2022grounding,
  author    = {Seonwoo Min and Nokyung Park and Siwon Kim and Seunghyun Park and Jinkyu Kim},
  title     = {Grounding Visual Representations with Texts for Domain Generalization},
  journal   = {arXiv},
  volume    = {abs/2207.10285},
  year      = {2022},
  url       = {https://arxiv.org/abs/2207.10285}
}

About

[ECCV-2022]Grounding Visual Representations with Texts for Domain Generalization

Resources

Stars

Watchers

Forks