In [1]:
 cd /data/p300488/lang2prog


/data/p300488/lang2prog


# Exploring CLEVR questions dataset

In [21]:
import os
clevr_path = '/data/p300488/datasets/clevr/CLEVR_v1.0'
train_questions_path = os.path.join(clevr_path, 'questions/CLEVR_train_questions.json')

Read GQA questions dataset. Size of training set?

In [3]:
import json

ds = json.load(open(train_questions_path))['questions']
print(len(ds))

699989


See the structure of a sample, it contains a program annotation for the question.

In [6]:
from pprint import pprint
pprint(ds[0])

{'answer': 'yes',
 'image_filename': 'CLEVR_train_000000.png',
 'image_index': 0,
 'program': [{'function': 'scene', 'inputs': [], 'value_inputs': []},
             {'function': 'filter_size',
              'inputs': [0],
              'value_inputs': ['large']},
             {'function': 'filter_color',
              'inputs': [1],
              'value_inputs': ['green']},
             {'function': 'count', 'inputs': [2], 'value_inputs': []},
             {'function': 'scene', 'inputs': [], 'value_inputs': []},
             {'function': 'filter_size',
              'inputs': [4],
              'value_inputs': ['large']},
             {'function': 'filter_color',
              'inputs': [5],
              'value_inputs': ['purple']},
             {'function': 'filter_material',
              'inputs': [6],
              'value_inputs': ['metal']},
             {'function': 'filter_shape',
              'inputs': [7],
              'value_inputs': ['cube']},
             {'function': 'c

Let'see all the different reasoning primitives and their related concept values

In [7]:
all_primitives = set()
for sample in ds:
    for node in sample['program']:
        _fn = node['function']
        _side_input =  '[' + node['value_inputs'][0] + ']' if node['value_inputs'] else ''
        all_primitives.add(_fn + _side_input)

pprint(all_primitives)

{'count',
 'equal_color',
 'equal_integer',
 'equal_material',
 'equal_shape',
 'equal_size',
 'exist',
 'filter_color[blue]',
 'filter_color[brown]',
 'filter_color[cyan]',
 'filter_color[gray]',
 'filter_color[green]',
 'filter_color[purple]',
 'filter_color[red]',
 'filter_color[yellow]',
 'filter_material[metal]',
 'filter_material[rubber]',
 'filter_shape[cube]',
 'filter_shape[cylinder]',
 'filter_shape[sphere]',
 'filter_size[large]',
 'filter_size[small]',
 'greater_than',
 'intersect',
 'less_than',
 'query_color',
 'query_material',
 'query_shape',
 'query_size',
 'relate[behind]',
 'relate[front]',
 'relate[left]',
 'relate[right]',
 'same_color',
 'same_material',
 'same_shape',
 'same_size',
 'scene',
 'union',
 'unique'}


In this formalism, the primitives are both concept-aware (``filter_color, filter_size`` etc.), as well as vocabulary-aware (``filter_color[red], filter_color[blue]``, etc ). Let's create a version which decouples specific concept values from the primitives (*vocabulary-agnostic*):

In [19]:
vocab_agnostic_primitives = set()
concept_agnostic_primitives = set()
for fn in all_primitives:
    f = fn.split('[')[0]
    vocab_agnostic_primitives.add(f)
    if len(f.split('_')) > 1:
        f = f.split('_')[0] if f.split('_')[1] not in ['than', 'integer'] else f
    concept_agnostic_primitives.add(f)

pprint(vocab_agnostic_primitives)

{'count',
 'equal_color',
 'equal_integer',
 'equal_material',
 'equal_shape',
 'equal_size',
 'exist',
 'filter_color',
 'filter_material',
 'filter_shape',
 'filter_size',
 'greater_than',
 'intersect',
 'less_than',
 'query_color',
 'query_material',
 'query_shape',
 'query_size',
 'relate',
 'same_color',
 'same_material',
 'same_shape',
 'same_size',
 'scene',
 'union',
 'unique'}


And the most general formalism, without concept-awareness (*concept_agnostic*)\:

In [20]:
pprint(concept_agnostic_primitives)

{'count',
 'equal',
 'equal_integer',
 'exist',
 'filter',
 'greater_than',
 'intersect',
 'less_than',
 'query',
 'relate',
 'same',
 'scene',
 'union',
 'unique'}


Let's give some context on different primitive types:

   - **Operational** : ``scene``: Initializes a set of objects given RGB image, ``unique``: {n} -> n
   
   - **Logical**: ``union/intersection``: union / intersection of two sets (outputs of two reasoning branches),
   
   - **Enumeration**: ``exist``: is a set non-empty?, ``count``: size of set, ``less_than/greater_than/equal_integer``: compares two integers
   
   - **Visual**: ``filter``: isolate object set based on attribute value, ``query``: ask for an attribute value, ``same``: object set which has same attribute value as given, ``equal``: whether two objects have equal attribute value
   
   - **Spatial**: ``relate``: object set which has certain spatial relation to given object