visual-complexity

This repository contains scripts for quantifying visual and linguistic complexity in MSCOCO image captioning dataset. Our approach consists of (1) applying a novel metric to measure the visual complexity of the images, (2) fine-tuning BERT to predict images' visual complexity from their captions, and (3) probing and mitigating content-related biases in the resulting models.

Citation: Lin, E., Yang, Z., & Ordóñez, V. Text-Based Prediction of Visual Complexity: How Does What We See Influence What We Say? Columbia University Undergraduate Research Symposium, New York, NY, October 2022 (poster). [Report] [Slides] [Poster]

Pages displaying validation set image-caption pairs for complexity classifier have filenames in the format coco--val2017-classification.html. Below each image, the number of distinct regions in the image and image caption are shown.
Pages displaying validation set image-caption pairs for complexity score prediction model have filenames in the format coco--val2017-regression.html. Below each image, the complexity score and image caption are shown.
cleaned_coco_category_stats.csv, cleaned_coco_supercategory_stats.csv: Numbers of complex/noncomplex image captions per COCO (super)category in classification sets
cleaned_regression_tanh_norm_dataset_stats.csv:** Data on number of image captions per COCO (super)category in regression training/val/test sets
data_by_categories_extended.csv: detailed breakdown of dataset statistics for classification task, including analysis of class imbalances
regression-train-times.txt: Training times for complexity score prediction models.
{test, val, train}_dataset_stats_by_category.{png, svg}: visualization of how many complex and noncomplex captions there are for images containing different COCO object classes
regression_dataset.pdf: shows distributions of (normalized) complexity scores for train/val sets.

predictions: Pages displaying predicted complexity scores and complexity classifications by different fine-tuned models

predicted_(non)complex_best_classifier_trained_on___val2017.html displays images from that BERT predicted to be "(non)complex" when fine-tuned on .
predicted_(non)complex_best_regression_model_tanh___val2017_regression.html displays images from and their predicted complexity scores (normalized using tanh) from BERT fine-tuned on images.
cleaned_predictions_by_category_classification_val_set_classifier: Data on numbers of image captions predicted to be complex/noncomplex per COCO (super)category by classifier trained on full dataset

training: Plots of loss and accuracy on train/val sets during finetuning.

evaluation:

classification_accuracy_and_loss.txt: Losses and accuracies on validation sets from cross-domain evaluation of complexity classifiers.
classification_and_regression_average_precisions.txt: Precision and recall curve image filepaths and values for complexity score prediction and complexity classification models.
regression_correlations.{csv, xlsx}: Correlations between true visual complexity scores and predicted complexity scores
regression_loss.{csv, xlsx}: Average MSE losses of complexity score prediction models from cross-domain evaluation.
regression_accuracy_and_loss_on_classification_sets.txt: Accuracy and loss when predicted complexity scores are used for binary classification with 0.5 as the threshold value.

scripts: Python scripts for building datasets, training and evaluating models. See sections below for use details.

Computing Complexity Data

python meanshift-filtered.py [path-to-image-directory] [path-to-output-directory] [cdiff_thresh] [sdiff_thresh]

Correlating Ground Truth Complexity and Distinct # of Regions - SAVOIAS Dataset

python correlate_scores.py [path-to-objects-data] [path-to-scenes-data] [path-to-interior-design-data] [objects-plot-output-file] [scenes-plot-output-file] [interior-design-plot-output-file] [combined-plot-output-file] [correlations-output-file]

COCO Dataset Stats - Data Collection

[log_directory] should be a directory containing containing dataset logs generated by filter_coco_class.py (see step 1 in BERT Classification/Regression Model Training below) python get_coco_dataset_stats.py [log_directory] [out_file] ['classification' or 'regression']

COCO Dataset Stats - Visualization

For plotting bar charts of complex/noncomplex samples per COCO category in classification/regression datasets. Requires csv with columns for # complex captions per category, # noncomplex captions per category.

python visualize_coco_datasets.py [name of input csv file with dataset stats] [name of output png file to save plotted data] [title of plot] [complex_idx of dataframe] [noncomplex_idx of dataframe]

Preprocess Regression Data

python get_regression_data.py [path-to-data] [whether-to-filter-grayscale: 0 or 1] [output-file]

For fairest evaluation, do not exclude black-and-white images from dataset.
[path-to-data] should contain .txt files, one for each image in the COCO dataset, with the first line of the file recording # of mean-shift-segmented regions per image and the second line recording the number of distinct regions. Some files have a third line saying 'grayscale 2d' indicating the corresponding image was single-channel.
[output-file] should be a .json file

python normalize_regression_data.py [raw-complexity-data-file] [file-to-save-normalized-data] 'tanh' [reduction]

python normalize_regression_data.py [raw-complexity-data-file] [file-to-save-normalized-data] 'min-max' [min] [max]

[raw-complexity-data-file] is a .json generated by get_regression_data.py (see above)
normalization modes: 'min-max' normalization uses the formula (x-min)/(max-min) to normalize complexity scores, where x is the number of distinct regions. tanh normalization uses the formula tanh(x/reduction)

BERT Classification/Regression Model Training

Run dataset_builder.py to build full (not filtered by object class) dataset.

This script creates JSON files ready to use for building PyTorch datasets.
[path-to-coco-trainset-complexity-data], [path-to-coco-valset-complexity-data]: these are JSON's if the mode is 'regression,' else path to directories {train2017, val2017} if mode is 'classification.' We resplit train/val/test from the original COCO train/val split by randomly sampling from COCO train2017 to get our validation set, and treating COCO val2017 as test set (leftover train2017 samples not included in val set remain as part of train set). Hence, even though we have already converted the regression data to JSON format with normalize_regression_data.py, it still needs to be resplit.
In general, filter_grayscale should be False (0).

python dataset_builder.py [mode - 'classification' or 'regression'] [path-to-coco-trainset-complexity-data] [path-to-coco-valset-complexity-data] [path to save train set] [path to save val set] [path to save test set] [filter_grayscale - 0 or 1]

Run filter_coco_class.py to build dataset (optionally) filtered by object class and (optionally) mask nouns, verbs, adjectives and/or adverbs in captions.

This step still needs to be performed to build the full dataset, as the script converts JSON > PyTorch Dataset even without filtering.
Expects name of COCO instance class(es) as first argument(s) after script name. Note that to filter for images containing instances of a supercategory, you must indicate all object classes that are members of that supercategory (e.g., "car," "boat," etc. for supercategory "vehicle").
train-data, val-data, test-data are JSOn files generated in previous step
dataset-log-file.txt will save information about train/val/test splits, e.g., how many samples, filtering used
[noun_mask] [verb_mask] [adjective_mask] [adverb_mask]: See table below for substitutions introduced into captions:

Tag	Description	Example	Substitute with
NN, NNP	noun, common, singular or mass; noun, proper, singular	thermostat, Liverpool	object
NNPS, NNS	noun, proper, plural; noun, common, plural	Americans, undergraduates	objects
VB, VBP	verb, base form; verb, present tense, not 3rd person singular	ask, predominate	act
VBD, VBN	verb, past tense; verb, past participle	acted
VBG	verb, present participle or gerund	telegraphing	acting
VBZ	verb, present tense, 3rd person singular	bases	acts
JJ	adjective or numeral, ordinal	third, ill-mannered	plain
RB	adverb	occasionally	plainly
JJR, RBR	adjective, comparative; adverb, comparative	bleaker, further	plainer
JJS, RBS	adjective, superlative; adverb, superlative	calmest, best	plainest

python filter_coco_class.py [mode - 'classification' or 'regression'] [class1] [class2] . . . [train-data] [val-data] [test-data] [name-of-dataset-log-file.txt] [noun_mask: 0 = False, 1 = True] [verb_mask] [adjective_mask] [adverb_mask]

Run train_coco_class.py to train BERT on dataset created in (1).

Replace trainset-file and valset-file with the names of the datasets created in the previous step, and replace model-weights-file and log-file with names of files to save model checkpoint and training progress logs to.
Optional arg to add after log-file: 'normalize' means apply sigmoid normalization to outputs before computing loss (for regression models only). This option is necessary because classification models use BCEWithLogitsLoss, which takes care of applying sigmoid() normalization before computing the loss, whereas regression models use MSELoss, which expects normalized inputs.
Automatically applies PyTorch WeightedRandomSampler to correct for complex/noncomplex class imbalance when training classifier.

CUDA_VISIBLE_DEVICES={0, 1, ..., 7} python train_coco_class.py {mode - 'classification' or 'regression'} {trainset-file} {valset-file} {model-weights-file} {log-file} {train-time-log} {normalize}

Run analyze_training.py to generate plots of training progress from log-file and see min/max train/val set accuracies.

python analyze_training.py {log-file} {plot-title} {file-to-save-plot} {mode - 'classification' or 'regression'}

Run visualize_model_predictions.py to build html pages of model's most confident predicted complex/noncomplex images, as well as save .p file of model predictions and compute accuracy/loss on val set.

'normalize' option is necessary unless evaluating model that predicts raw number of regions; in general, it is advisable to always include this option, as model does poorly at predicting raw number of regions
Note that regression model can actually be run on classification model val set for purpose of comparing precision-recall of classifier v. regression model (see section 'Plot Precision-recall' below)

CUDA_VISIBLE_DEVICES={0,1,...7} python visualize_model_predictions /[val-dataset/] /[model_weights/] /[complex image html file name/] /[noncomplex image html file name/] /[images to display per page/] /[COCO category of images/] [predictions-dict-filepath-to-save] [mode: 'classification' or 'regression'] [file to print accuracy and loss] [optional - 'normalize' means normalize model outputs to make interpretable as probabilities in output predictions file]

Plot Precision-recall (regression v. classification model)

[classification-model-predictions] [regression-model-predictions] are generated in step 4 above

python plot_precision_recall.py [classification-model-predictions] [regression-model-predictions] [path-to-save-plot] [path-to-save-AP]

Correlate Regression Predictions with True Complexity Scores

python correlate_regression_scores.py [path to pickled predictions - see step 4] [path to save scatterplot] [scatterplot title] [xlabel] [ylabel] [path to text file to save r, p-value] [reduction - 'average' or 'none' - whether to average scores per image]

Additional Scripts

bias_amplification.py: Convert category_counts dictionary generated by visualize_model_predictions.count_prediction_categories to csv that can be graphed by visualize_coco_datasets.
classify.py: functions used for training in train_coco_class.py
train_progress.py: functions used for analyzing training logs in analyze_training.py
feature_congestion.py: calculate feature congestion for SAVOIAS images
feature_congestion_coco.py: calculate feature congestion for MS-COCO images
get_coco_category_list.py: generates text file with COCO (super)categories from annotations file
get_val_accuracy.py: Get val accuracy from predictions dictionary generated by visualize_model_ predictions.py
html_builder.py: build pages to display images by complexity score
meanshift-filtered.py: Functions to conduct mean-shift segmentation followed by region filtering
measuring_savoias_complexity_v2.py: calculate number of regions and feature congestion for images in the Scenes, Objects, and Interiors categories of the SAVOIAS dataset as proxies for image complexity; compute the Pearson correlation coefficients between number of regions and feature congestion, and the human-annotated visual complexity scores provided by SAVOIAS for each image.
selective_search_savoias.py: A script to perform selective search for object locations on SAVOIAS images
visualize_coco_color.py: build html pages of COCO complexity dataset color images
visualize_coco_datasets.py: plot number of (non)complex images per COCO (super)category

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

visual-complexity

Table of Contents

Computing Complexity Data

Correlating Ground Truth Complexity and Distinct # of Regions - SAVOIAS Dataset

COCO Dataset Stats - Data Collection

COCO Dataset Stats - Visualization

Preprocess Regression Data

BERT Classification/Regression Model Training

Plot Precision-recall (regression v. classification model)

Correlate Regression Predictions with True Complexity Scores

Additional Scripts

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
evaluation		evaluation
images		images
predictions		predictions
scripts		scripts
training		training
README.md		README.md

emlinking/visual-complexity

Folders and files

Latest commit

History

Repository files navigation

visual-complexity

Table of Contents

Computing Complexity Data

Correlating Ground Truth Complexity and Distinct # of Regions - SAVOIAS Dataset

COCO Dataset Stats - Data Collection

COCO Dataset Stats - Visualization

Preprocess Regression Data

BERT Classification/Regression Model Training

Plot Precision-recall (regression v. classification model)

Correlate Regression Predictions with True Complexity Scores

Additional Scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages