Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Attribute Phrases

This is the dataset and the training code with Tensorflow used in the paper:

Jong-Chyi Su*, Chenyun Wu*, Huaizu Jiang, Subhransu Maji, "Reasoning about Fine-grained Attribute Phrases using Reference Games", International Conference on Computer Vision (ICCV), 2017

    Author = {Jong-Chyi Su and Chenyun Wu and Huaizu Jiang and Subhransu Maji},
    Title = {Reasoning about Fine-grained Attribute Phrases using Reference Games},
    Booktitle = {International Conference on Computer Vision (ICCV)},
    Year = {2017}

[Project page] [Paper]


Each pair has 1 pair of images and 5 pairs of corresponding attribute phrases

Image 1 Image 2
commercial plane vs private plane
large plane vs small plane
white and grey vs white with blue and red stripes
twin engines vs single engine
more windows on body vs less windows on body

Stats about the dataset

  • Training set: 4700 pairs
  • Val set: 2350 pairs
  • Test set: 2350 pairs


  • Python 2.7
  • Tensorflow v1.0+

Download Dataset

  • User descriptions are included in dataset/visdiff\_SET.json, where SET={train, val, test, trainval}
  • Download images from OID dataset (
  • Move images from oid-aircraft-beta-1/data/images/aeroplane/\*.jpg to the folder dataset/images/\*.jpg

Download ImageNet Pre-trained Model

Add pretrained model (e.g. vgg_16.ckpt) in models/checkpoints/

Extract image feature to numpy file to accelerate training

Go to utils/ and run: python --dataset train the numpy file will be saved in img_feat/vgg_16/train.npy

Train Listener Model

Step 1 fix image feature Step 2 finetune image feature

SL (Simple Listener)

  1. python --mode train --log_dir result/SL --pairwise 0 --train_img_model 0 --max_steps 2000 --batch_size 128
  2. python --mode train --log_dir result/SL --pairwise 0 --train_img_model 1 --max_steps 7500 --load_model_path model-fixed-2000 --learn_rate 0.00001

SLr (Simple Listener trained w/o contrastive data)

  1. python --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 0 --max_steps 5000 --batch_size 128
  2. python --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 1 --max_steps 10000 --load_model_path model-fixed-5000 --learn_rate 0.00001

DL (Discerning Listener)

  1. python --mode train --log_dir result/DL --pairwise 1 --train_img_model 0 --max_steps 2000 --max_sent_length 17 --batch_size 128
  2. python --mode train --log_dir result/DL --pairwise 1 --train_img_model 1 --max_steps 7000 --load_model_path model-fixed-2000 --max_sent_length 17 --learn_rate 0.00001

Evaluate Listener Model


  1. python --mode eval --log_dir result/SL --pairwise 0 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python --mode eval --log_dir result/SL --pairwise 0 --train_img_model 1 --load_model_path model-finetune-7500 --dataset val


  1. python --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 0 --load_model_path model-fixed-5000 --dataset val
  2. python --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 1 --load_model_path model-finetune-10000 --dataset val


  1. python --mode eval --log_dir result/DL --pairwise 1 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python --mode eval --log_dir result/DL --pairwise 1 --train_img_model 1 --load_model_path model-finetune-7000 --dataset val

Train Speaker Model

  • Example: python --speaker_mode=S --img_model=vgg_16 --train_img_model=1 --experiment_path=result/speaker/temp
  • Options:
    • --speaker_mode: S or DS
    • --img_model: alexnet, inception_v3, or vgg_16
    • --train_img_model: Fine-tune image model or not (0 as False, 1 as True)
    • --experiment_path: where to output and save the trained model
    • --load_model_dir: path to the pre-trained model. If not set, train from scratch
    • --load_model_name: model name (model-%steps) in load_model_dir
    • See more options in

Use Speaker to Generate Attribute Phrases

  • Example: python --input_path=result/speaker/temp --model_step=model-5000 --dataset_name=val
  • Options:
    • --input_path: path to the trained speaker model that you want to use
    • --model_step: model name (model-%steps) in input_path
    • --dataset_name: which sub-dataset to use (train / val / test)
    • See more options in

Discerning Speaker Model

Here we use the listener model to re-rank attribute phrases generated by speaker model. To run this step, you need to have a listenter model, and generated phrases from a speaker model.

  • Example: pyhton --listener_path=result/SL --listener_model=model-fixed-2000 --speaker_result_path=result/speaker/temp/infer_annotations_val_model-5000_case0_beam10_sent10.json --infer_dataset=val
  • Options:
    • --listener_path: path to the listener model used for reranking
    • --listener_model: model name (model-%steps) in listener_model
    • --speaker_result_path: the file that saves the phrases generated by a speaker model
    • --infer_dataset: which dataset to work on (train / val / test)
    • See more options in

Generate Set-wise Attribute Phrases

  • In "", set "speaker_path" as the path to the trained speaker model you want to use
  • run python


Please contact if you have any question.

  • Jong-Chyi Su (Umass-Amherst)
  • Chenyun Wu (Umass-Amherst)
  • Huaizu Jiang (Umass-Amherst)


Reasoning about Fine-grained Attribute Phrases using Reference Games






No releases published


No packages published