Generic COG (gCOG): A Compositional Generalization Dataset to Evaluate Multimodal Reasoning
Contact: takuya.ito@ibm.com
Reference: Ito T, Dan S, Rigotti M, Kozloski J, Campbell M (2024). On the generalization capacity of neural networks during generic multimodal reasoning. International Conference on Learning Representations (ICLR). http://arxiv.org/abs/2401.15030.
Dependencies: environment.yml
(conda
) and requirements.txt
(pip
)
- Clone the github repository
- Set up the environment and dependencies, if needed (
conda env create -f environment.yml
andpip -r install requirements.txt
) - Install package from base directory:
pip install -e .
A classification task paradigm to evaluate compositional generalization. The purpose of this task is to measure how models compositionally generalize over precise task features, such as:
- Systematic compositional generalization: Task operations (e.g., re-using previously seen task operations in novel settings)
- Productive compositional generalization: Task complexity (e.g., depth / complexity of a task tree).
- Distractor generalization: Stimulus/noise distractors (inclusion of irrelevant task information in the stimulus image).
This task was originally derived from the COG dataset and its predecessors (Yang et al. (2018)), but includes different task operators and specific dataloaders with explicit training/testing splits. Dataset and splits are provided in two categories:
- Abstract/categorical tokens (as evaluated in the paper)
- Images in pixel and instructions in language representations (see
demos/PixelWordDataset/
for examples) - The primary code associated with the task is housed in
gcog/task/
There are 4 demo notebooks corresponding to each compositional split presented in the paper (distractor, systematicity (depth 1), systematicity (depth 3), productive generalization), as well as a generic notebook that demos the underlying task framework. The demos provide sample code for how to generate train/test splits.
1) Task framework: demos/Demo_TaskGeneration.ipynb
2) Distractor generalization demo: demos/Demo_DistractorGeneralization_Fig3split.ipynb
.
3) Systematic generalization (task tree depth 1) demo: demos/Demo_Systematicity_OpSys_Fig4Asplit.ipynb
4) Systematic generalization (task tree depth 3) demo: demos/Demo_Systematicity_CompTreeSubsets_Fig4Dsplit.ipynb
5) Productive generalization demo: demos/Demo_Productivity_CompTree_Fig5split.ipynb
In addition, there are a separate set of demo notebooks that provide instructions for how to import and/use dataloaders that generate task samples in the image pixel and language instructions (i.e., not categorical tokens): demos/PixelWordDataset/
Exist: Asks if a specific object exists. Specified with a color and shape (letter).
- Example: "Is there a 'red c'?"
- Returns: True or False.
GetColor/GetShape: Asks agent to return either a color or shape of an object with a specified attribute.
- Example: "Get the shape of the green object"
- Example: "Get the color of the letter a".
- Returns: A string attribute (e.g., "a" or "red").
- N.B.: There are checks in the task program to ensure that this question is not ill-posed (i.e., only a single correct answer).
Go: Asks the agent to return the location of a specified object.
- Example: "Get the location of the 'red c'"
- Returns: A tuple (x, y) coordinates.
AddEven: Asks the agent to add the location values of an object(s), and asks if the sum is even.
- Example: "Is the sum of the coordinate values of the 'red c' even?"
- Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is True (4 + 6 = 10 is even)
- Example: "Is the sum of the coordinate values of the 'red c' and the 'blue k' even?"
- Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 4), then the correct answer is False (4 + 6 + 1 + 4 = 15 is odd)
AddOdd: Asks the agent to add the location values of an object(s), and asks if the sum is odd.
- Example: "Is the sum of the coordinate values of the 'red c' odd?"
- Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is False (4 + 6 = 10 is even)
- Example: "Is the sum of the coordinate values of the 'red c' and the 'blue k' odd?"
- Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 4), then the correct answer is True (4 + 6 + 1 + 4 = 15 is odd)
MultiplyEven: Asks the agent to multiply the location values of an object(s), and asks if the product is even.
- Example: "Is the product of the coordinate values of the 'red c' even?"
- Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is True (4 * 6 = 24 is even)
- Example: "Is the product of the coordinate values of the 'red c' and the 'blue k' even"
- Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 3), then the correct answer is True (4 * 6 * 1 * 4 = 96 is even)
MultiplyOdd: Asks the agent to multiply the location values of an object(s), and asks if the product is odd.
- Example: "Is the product of the coordinate values of the 'red c' odd?"
- Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is False (4 * 6 = 24 is even)
- Example: "Is the product of the coordinate values of the 'red c' and the 'blue k' odd"
- Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 3), then the correct answer is False (4 * 6 * 1 * 4 = 96 is even)
If-Else This is a connector operator, which connects two subtrees together by inputting a boolean, and then branching into two different directions. If the boolean is True (e.g., the output of an Exist operator), then the agent should follow the left branch. If False, the agent should follow the right branch. This connector operator enables queries to be arbitrarily long.