This repository provides implementations of analyses in the paper Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition.
git clone git@github.com:FlamingoZh/verb_learning_quantification.git
- Python >= 3.8
- Other dependencies: numpy, scipy, scikit-learn, pandas, pytorch, torchvision, transformers, jupyterlab, matplotlib, seaborn, statsmodels, opencv-python, pytest-shutil, xgboost, shap
We conducted anaylses on Visual Genome, Visual Relationship Detection, and Moments in Time dataset. After downloading the data, make sure that you specify the path to the dataset by modifying base_path in python/utils/data_generation_library.py.
The vision model we used is an unsupervised model with ResNet-50 architecture called Swapping Assignments between Views (SwAV). You can find pretrained model weights on their homepage. In our analyses, we used the model trained with 800 epochs. After downloading the pretrained model, put it under pretrained_models/ (if you don't have this folder, create it).
We also employed the uncased Bidirectional Encoder Representations from Transformers (BERT) model from the Hugging face Transformers library to generate linguistic representations of words. The model should be downloaded automatically the first time you run the script for sampling learning exemplars.
The first thing to do is to generate samples of visual and language representations of words and store them on disk for faster future computation. An example is as follows:
python python/gen_data.py vg_noun vg_noun_concept_least20.txt bert swav --n_sample 20 --cuda
The embeddings will be stored in data/dumped_embeddings/.
One-dimensional (visual or linguistic) aggregation is performed in python/aggregate_exemplars.py. An example is as follows:
python python/aggregate_exemplars.py vg_noun "../data/dumped_embeddings/vg_noun_least20_swav_bert_20.pkl" visual \
--n_exemplar_max 20 \
--n_sample 1000
This script will create a pickle file in data/dumped_plot_data/ and you can load the pickle file in notebooks/1D_exemplar_aggregation.ipynb to make plots.
Two-dimensional aggregation is performed in python/aggregate_exemplars_2D.py. An example is as follows:
python python/aggregate_exemplars_2D.py vg_noun "../data/dumped_embeddings/vg_noun_least20_swav_bert_20.pkl" visual_language \
--n_l_exemplar_max 8 \
--n_v_exemplar_max 8 \
--n_sample 500
Similarly, this script will create a pickle file in data/dumped_plot_data/ and you can make plots by running notebooks/2D_aggregation_VG.ipynb.
Word frequency from CHILDES and Age of Acquisition from Wordbank can be found in data/processed (you can also fetch them on your own by running R/get_freq_and_aoa_vg.rmd). Code for regression analysis can be found in notebooks/xgboost_VG.ipynb.