Skip to content

Quantifying visual and linguistic complexity in MSCOCO image captioning dataset

Notifications You must be signed in to change notification settings

emlinking/visual-complexity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

visual-complexity

This repository contains scripts for quantifying visual and linguistic complexity in MSCOCO image captioning dataset. Our approach consists of (1) applying a novel metric to measure the visual complexity of the images, (2) fine-tuning BERT to predict images' visual complexity from their captions, and (3) probing and mitigating content-related biases in the resulting models.

Citation: Lin, E., Yang, Z., & Ordóñez, V. Text-Based Prediction of Visual Complexity: How Does What We See Influence What We Say? Columbia University Undergraduate Research Symposium, New York, NY, October 2022 (poster). [Report] [Slides] [Poster]

Table of Contents

data: Pages displaying validation set images

  • Pages displaying validation set image-caption pairs for complexity classifier have filenames in the format coco--val2017-classification.html. Below each image, the number of distinct regions in the image and image caption are shown.
  • Pages displaying validation set image-caption pairs for complexity score prediction model have filenames in the format coco--val2017-regression.html. Below each image, the complexity score and image caption are shown.
  • cleaned_coco_category_stats.csv, cleaned_coco_supercategory_stats.csv: Numbers of complex/noncomplex image captions per COCO (super)category in classification sets
  • cleaned_regression_tanh_norm_dataset_stats.csv:** Data on number of image captions per COCO (super)category in regression training/val/test sets
  • data_by_categories_extended.csv: detailed breakdown of dataset statistics for classification task, including analysis of class imbalances
  • regression-train-times.txt: Training times for complexity score prediction models.
  • {test, val, train}_dataset_stats_by_category.{png, svg}: visualization of how many complex and noncomplex captions there are for images containing different COCO object classes
  • regression_dataset.pdf: shows distributions of (normalized) complexity scores for train/val sets.

predictions: Pages displaying predicted complexity scores and complexity classifications by different fine-tuned models

  • predicted_(non)complex_best_classifier_trained_on___val2017.html displays images from that BERT predicted to be "(non)complex" when fine-tuned on .
  • predicted_(non)complex_best_regression_model_tanh___val2017_regression.html displays images from and their predicted complexity scores (normalized using tanh) from BERT fine-tuned on images.
  • cleaned_predictions_by_category_classification_val_set_classifier: Data on numbers of image captions predicted to be complex/noncomplex per COCO (super)category by classifier trained on full dataset

training: Plots of loss and accuracy on train/val sets during finetuning.

evaluation:

  • classification_accuracy_and_loss.txt: Losses and accuracies on validation sets from cross-domain evaluation of complexity classifiers.
  • classification_and_regression_average_precisions.txt: Precision and recall curve image filepaths and values for complexity score prediction and complexity classification models.
  • regression_correlations.{csv, xlsx}: Correlations between true visual complexity scores and predicted complexity scores
  • regression_loss.{csv, xlsx}: Average MSE losses of complexity score prediction models from cross-domain evaluation.
  • regression_accuracy_and_loss_on_classification_sets.txt: Accuracy and loss when predicted complexity scores are used for binary classification with 0.5 as the threshold value.

scripts: Python scripts for building datasets, training and evaluating models. See sections below for use details.

Computing Complexity Data

python meanshift-filtered.py [path-to-image-directory] [path-to-output-directory] [cdiff_thresh] [sdiff_thresh]

Correlating Ground Truth Complexity and Distinct # of Regions - SAVOIAS Dataset

python correlate_scores.py [path-to-objects-data] [path-to-scenes-data] [path-to-interior-design-data] [objects-plot-output-file] [scenes-plot-output-file] [interior-design-plot-output-file] [combined-plot-output-file] [correlations-output-file]

COCO Dataset Stats - Data Collection

  • [log_directory] should be a directory containing containing dataset logs generated by filter_coco_class.py (see step 1 in BERT Classification/Regression Model Training below) python get_coco_dataset_stats.py [log_directory] [out_file] ['classification' or 'regression']

COCO Dataset Stats - Visualization

For plotting bar charts of complex/noncomplex samples per COCO category in classification/regression datasets. Requires csv with columns for # complex captions per category, # noncomplex captions per category.

python visualize_coco_datasets.py [name of input csv file with dataset stats] [name of output png file to save plotted data] [title of plot] [complex_idx of dataframe] [noncomplex_idx of dataframe]

Preprocess Regression Data

python get_regression_data.py [path-to-data] [whether-to-filter-grayscale: 0 or 1] [output-file]

  • For fairest evaluation, do not exclude black-and-white images from dataset.
  • [path-to-data] should contain .txt files, one for each image in the COCO dataset, with the first line of the file recording # of mean-shift-segmented regions per image and the second line recording the number of distinct regions. Some files have a third line saying 'grayscale 2d' indicating the corresponding image was single-channel.
  • [output-file] should be a .json file

python normalize_regression_data.py [raw-complexity-data-file] [file-to-save-normalized-data] 'tanh' [reduction]

python normalize_regression_data.py [raw-complexity-data-file] [file-to-save-normalized-data] 'min-max' [min] [max]

  • [raw-complexity-data-file] is a .json generated by get_regression_data.py (see above)
  • normalization modes: 'min-max' normalization uses the formula (x-min)/(max-min) to normalize complexity scores, where x is the number of distinct regions. tanh normalization uses the formula tanh(x/reduction)

BERT Classification/Regression Model Training

  1. Run dataset_builder.py to build full (not filtered by object class) dataset.
  • This script creates JSON files ready to use for building PyTorch datasets.
  • [path-to-coco-trainset-complexity-data], [path-to-coco-valset-complexity-data]: these are JSON's if the mode is 'regression,' else path to directories {train2017, val2017} if mode is 'classification.' We resplit train/val/test from the original COCO train/val split by randomly sampling from COCO train2017 to get our validation set, and treating COCO val2017 as test set (leftover train2017 samples not included in val set remain as part of train set). Hence, even though we have already converted the regression data to JSON format with normalize_regression_data.py, it still needs to be resplit.
  • In general, filter_grayscale should be False (0).

python dataset_builder.py [mode - 'classification' or 'regression'] [path-to-coco-trainset-complexity-data] [path-to-coco-valset-complexity-data] [path to save train set] [path to save val set] [path to save test set] [filter_grayscale - 0 or 1]

  1. Run filter_coco_class.py to build dataset (optionally) filtered by object class and (optionally) mask nouns, verbs, adjectives and/or adverbs in captions.
  • This step still needs to be performed to build the full dataset, as the script converts JSON > PyTorch Dataset even without filtering.
  • Expects name of COCO instance class(es) as first argument(s) after script name. Note that to filter for images containing instances of a supercategory, you must indicate all object classes that are members of that supercategory (e.g., "car," "boat," etc. for supercategory "vehicle").
  • train-data, val-data, test-data are JSOn files generated in previous step
  • dataset-log-file.txt will save information about train/val/test splits, e.g., how many samples, filtering used
  • [noun_mask] [verb_mask] [adjective_mask] [adverb_mask]: See table below for substitutions introduced into captions:
Tag Description Example Substitute with
NN, NNP noun, common, singular or mass; noun, proper, singular thermostat, Liverpool object
NNPS, NNS noun, proper, plural; noun, common, plural Americans, undergraduates objects
VB, VBP verb, base form; verb, present tense, not 3rd person singular ask, predominate act
VBD, VBN verb, past tense; verb, past participle acted
VBG verb, present participle or gerund telegraphing acting
VBZ verb, present tense, 3rd person singular bases acts
JJ adjective or numeral, ordinal third, ill-mannered plain
RB adverb occasionally plainly
JJR, RBR adjective, comparative; adverb, comparative bleaker, further plainer
JJS, RBS adjective, superlative; adverb, superlative calmest, best plainest

python filter_coco_class.py [mode - 'classification' or 'regression'] [class1] [class2] . . . [train-data] [val-data] [test-data] [name-of-dataset-log-file.txt] [noun_mask: 0 = False, 1 = True] [verb_mask] [adjective_mask] [adverb_mask]

  1. Run train_coco_class.py to train BERT on dataset created in (1).
  • Replace trainset-file and valset-file with the names of the datasets created in the previous step, and replace model-weights-file and log-file with names of files to save model checkpoint and training progress logs to.
  • Optional arg to add after log-file: 'normalize' means apply sigmoid normalization to outputs before computing loss (for regression models only). This option is necessary because classification models use BCEWithLogitsLoss, which takes care of applying sigmoid() normalization before computing the loss, whereas regression models use MSELoss, which expects normalized inputs.
  • Automatically applies PyTorch WeightedRandomSampler to correct for complex/noncomplex class imbalance when training classifier.

CUDA_VISIBLE_DEVICES={0, 1, ..., 7} python train_coco_class.py {mode - 'classification' or 'regression'} {trainset-file} {valset-file} {model-weights-file} {log-file} {train-time-log} {normalize}

  1. Run analyze_training.py to generate plots of training progress from log-file and see min/max train/val set accuracies.

python analyze_training.py {log-file} {plot-title} {file-to-save-plot} {mode - 'classification' or 'regression'}

  1. Run visualize_model_predictions.py to build html pages of model's most confident predicted complex/noncomplex images, as well as save .p file of model predictions and compute accuracy/loss on val set.
  • 'normalize' option is necessary unless evaluating model that predicts raw number of regions; in general, it is advisable to always include this option, as model does poorly at predicting raw number of regions
  • Note that regression model can actually be run on classification model val set for purpose of comparing precision-recall of classifier v. regression model (see section 'Plot Precision-recall' below)

CUDA_VISIBLE_DEVICES={0,1,...7} python visualize_model_predictions /[val-dataset/] /[model_weights/] /[complex image html file name/] /[noncomplex image html file name/] /[images to display per page/] /[COCO category of images/] [predictions-dict-filepath-to-save] [mode: 'classification' or 'regression'] [file to print accuracy and loss] [optional - 'normalize' means normalize model outputs to make interpretable as probabilities in output predictions file]

Plot Precision-recall (regression v. classification model)

  • [classification-model-predictions] [regression-model-predictions] are generated in step 4 above

python plot_precision_recall.py [classification-model-predictions] [regression-model-predictions] [path-to-save-plot] [path-to-save-AP]

Correlate Regression Predictions with True Complexity Scores

python correlate_regression_scores.py [path to pickled predictions - see step 4] [path to save scatterplot] [scatterplot title] [xlabel] [ylabel] [path to text file to save r, p-value] [reduction - 'average' or 'none' - whether to average scores per image]

Additional Scripts

  • bias_amplification.py: Convert category_counts dictionary generated by visualize_model_predictions.count_prediction_categories to csv that can be graphed by visualize_coco_datasets.
  • classify.py: functions used for training in train_coco_class.py
  • train_progress.py: functions used for analyzing training logs in analyze_training.py
  • feature_congestion.py: calculate feature congestion for SAVOIAS images
  • feature_congestion_coco.py: calculate feature congestion for MS-COCO images
  • get_coco_category_list.py: generates text file with COCO (super)categories from annotations file
  • get_val_accuracy.py: Get val accuracy from predictions dictionary generated by visualize_model_ predictions.py
  • html_builder.py: build pages to display images by complexity score
  • meanshift-filtered.py: Functions to conduct mean-shift segmentation followed by region filtering
  • measuring_savoias_complexity_v2.py: calculate number of regions and feature congestion for images in the Scenes, Objects, and Interiors categories of the SAVOIAS dataset as proxies for image complexity; compute the Pearson correlation coefficients between number of regions and feature congestion, and the human-annotated visual complexity scores provided by SAVOIAS for each image.
  • selective_search_savoias.py: A script to perform selective search for object locations on SAVOIAS images
  • visualize_coco_color.py: build html pages of COCO complexity dataset color images
  • visualize_coco_datasets.py: plot number of (non)complex images per COCO (super)category

About

Quantifying visual and linguistic complexity in MSCOCO image captioning dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages