Skip to content

IBM/gcog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gCOG

Generic COG (gCOG): A Compositional Generalization Dataset to Evaluate Multimodal Reasoning

Reference: Ito T, Dan S, Rigotti M, Kozloski J, Campbell M (2024). On the generalization capacity of neural networks during generic multimodal reasoning. International Conference on Learning Representations (ICLR). http://arxiv.org/abs/2401.15030.

Last update: 2/12/2024

Dependencies

Dependencies: environment.yml (conda) and requirements.txt (pip)

Installation and setup

  1. Clone the github repository
  2. Set up the environment and dependencies, if needed (conda env create -f environment.yml and pip -r install requirements.txt)
  3. Install package from base directory: pip install -e .

Overview

A classification task paradigm to evaluate compositional generalization. The purpose of this task is to measure how models compositionally generalize over precise task features, such as:

  1. Systematic compositional generalization: Task operations (e.g., re-using previously seen task operations in novel settings)
  2. Productive compositional generalization: Task complexity (e.g., depth / complexity of a task tree).
  3. Distractor generalization: Stimulus/noise distractors (inclusion of irrelevant task information in the stimulus image).

This task was originally derived from the COG dataset and its predecessors (Yang et al. (2018)), but includes different task operators and specific dataloaders with explicit training/testing splits. Dataset and splits are provided in two categories:

  • Abstract/categorical tokens (as evaluated in the paper)
  • Images in pixel and instructions in language representations (see demos/PixelWordDataset/ for examples)
  • The primary code associated with the task is housed in gcog/task/

Demo notebooks with details on specific splits

There are 4 demo notebooks corresponding to each compositional split presented in the paper (distractor, systematicity (depth 1), systematicity (depth 3), productive generalization), as well as a generic notebook that demos the underlying task framework. The demos provide sample code for how to generate train/test splits.

1) Task framework: demos/Demo_TaskGeneration.ipynb

2) Distractor generalization demo: demos/Demo_DistractorGeneralization_Fig3split.ipynb. image

3) Systematic generalization (task tree depth 1) demo: demos/Demo_Systematicity_OpSys_Fig4Asplit.ipynb
image

4) Systematic generalization (task tree depth 3) demo: demos/Demo_Systematicity_CompTreeSubsets_Fig4Dsplit.ipynb
image

5) Productive generalization demo: demos/Demo_Productivity_CompTree_Fig5split.ipynb
image

In addition, there are a separate set of demo notebooks that provide instructions for how to import and/use dataloaders that generate task samples in the image pixel and language instructions (i.e., not categorical tokens): demos/PixelWordDataset/

Description of task operators

Exist: Asks if a specific object exists. Specified with a color and shape (letter).

  • Example: "Is there a 'red c'?"
  • Returns: True or False.

GetColor/GetShape: Asks agent to return either a color or shape of an object with a specified attribute.

  • Example: "Get the shape of the green object"
  • Example: "Get the color of the letter a".
  • Returns: A string attribute (e.g., "a" or "red").
  • N.B.: There are checks in the task program to ensure that this question is not ill-posed (i.e., only a single correct answer).

Go: Asks the agent to return the location of a specified object.

  • Example: "Get the location of the 'red c'"
  • Returns: A tuple (x, y) coordinates.

AddEven: Asks the agent to add the location values of an object(s), and asks if the sum is even.

  • Example: "Is the sum of the coordinate values of the 'red c' even?"
  • Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is True (4 + 6 = 10 is even)
  • Example: "Is the sum of the coordinate values of the 'red c' and the 'blue k' even?"
  • Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 4), then the correct answer is False (4 + 6 + 1 + 4 = 15 is odd)

AddOdd: Asks the agent to add the location values of an object(s), and asks if the sum is odd.

  • Example: "Is the sum of the coordinate values of the 'red c' odd?"
  • Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is False (4 + 6 = 10 is even)
  • Example: "Is the sum of the coordinate values of the 'red c' and the 'blue k' odd?"
  • Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 4), then the correct answer is True (4 + 6 + 1 + 4 = 15 is odd)

MultiplyEven: Asks the agent to multiply the location values of an object(s), and asks if the product is even.

  • Example: "Is the product of the coordinate values of the 'red c' even?"
  • Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is True (4 * 6 = 24 is even)
  • Example: "Is the product of the coordinate values of the 'red c' and the 'blue k' even"
  • Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 3), then the correct answer is True (4 * 6 * 1 * 4 = 96 is even)

MultiplyOdd: Asks the agent to multiply the location values of an object(s), and asks if the product is odd.

  • Example: "Is the product of the coordinate values of the 'red c' odd?"
  • Answer: If the 'red c' is on coordinate (4, 6), then the correct answer is False (4 * 6 = 24 is even)
  • Example: "Is the product of the coordinate values of the 'red c' and the 'blue k' odd"
  • Answer: If the 'red c' is on coordinate (4, 6), and the 'blue k' is on (1, 3), then the correct answer is False (4 * 6 * 1 * 4 = 96 is even)

If-Else This is a connector operator, which connects two subtrees together by inputting a boolean, and then branching into two different directions. If the boolean is True (e.g., the output of an Exist operator), then the agent should follow the left branch. If False, the agent should follow the right branch. This connector operator enables queries to be arbitrarily long.

About

Generic COG (gCOG): A Compositional Generalization Dataset to Evaluate Multimodal Reasoning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages