Verb Learning Quantification

This repository provides implementations of analyses in the paper Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition.

Getting Started

Clone the repository from github

git clone git@github.com:FlamingoZh/verb_learning_quantification.git

Requirements

Python >= 3.8
Other dependencies: numpy, scipy, scikit-learn, pandas, pytorch, torchvision, transformers, jupyterlab, matplotlib, seaborn, statsmodels, opencv-python, pytest-shutil, xgboost, shap

Datasets

We conducted anaylses on Visual Genome, Visual Relationship Detection, and Moments in Time dataset. After downloading the data, make sure that you specify the path to the dataset by modifying base_path in python/utils/data_generation_library.py.

Pretrained Models

The vision model we used is an unsupervised model with ResNet-50 architecture called Swapping Assignments between Views (SwAV). You can find pretrained model weights on their homepage. In our analyses, we used the model trained with 800 epochs. After downloading the pretrained model, put it under pretrained_models/ (if you don't have this folder, create it).

We also employed the uncased Bidirectional Encoder Representations from Transformers (BERT) model from the Hugging face Transformers library to generate linguistic representations of words. The model should be downloaded automatically the first time you run the script for sampling learning exemplars.

Sample Learning Exemplars

The first thing to do is to generate samples of visual and language representations of words and store them on disk for faster future computation. An example is as follows:

python python/gen_data.py vg_noun vg_noun_concept_least20.txt bert swav --n_sample 20 --cuda

The embeddings will be stored in data/dumped_embeddings/.

Aggregate Exemplars

One-dimensional (visual or linguistic) aggregation is performed in python/aggregate_exemplars.py. An example is as follows:

python python/aggregate_exemplars.py vg_noun "../data/dumped_embeddings/vg_noun_least20_swav_bert_20.pkl" visual \
  --n_exemplar_max 20 \
  --n_sample 1000

This script will create a pickle file in data/dumped_plot_data/ and you can load the pickle file in notebooks/1D_exemplar_aggregation.ipynb to make plots.

Two-dimensional aggregation is performed in python/aggregate_exemplars_2D.py. An example is as follows:

python python/aggregate_exemplars_2D.py vg_noun "../data/dumped_embeddings/vg_noun_least20_swav_bert_20.pkl" visual_language \
  --n_l_exemplar_max 8 \
  --n_v_exemplar_max 8 \
  --n_sample 500

Similarly, this script will create a pickle file in data/dumped_plot_data/ and you can make plots by running notebooks/2D_aggregation_VG.ipynb.

Regression Analysis

Word frequency from CHILDES and Age of Acquisition from Wordbank can be found in data/processed (you can also fetch them on your own by running R/get_freq_and_aoa_vg.rmd). Code for regression analysis can be found in notebooks/xgboost_VG.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
R		R
data		data
figs		figs
notebooks		notebooks
preprocessing		preprocessing
python		python
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Verb Learning Quantification

Getting Started

Clone the repository from github

Requirements

Datasets

Pretrained Models

Sample Learning Exemplars

Aggregate Exemplars

Regression Analysis

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Verb Learning Quantification

Getting Started

Clone the repository from github

Requirements

Datasets

Pretrained Models

Sample Learning Exemplars

Aggregate Exemplars

Regression Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages