# Formatting Evaluation Datasets as `.jsonl`

This notebook is meant for helping format datasets in `.jsonl`, using the standardized format given in `evaluate.ipynb`. Each line of the `.jsonl` file is a dictionary with the following information:
 - `image`: A (potentially relative) path to the target image. Relative paths are used when standard benchmarks are used, and absolute paths vary between users.
 - `width`, `height`: The dimensions of the image.
 - `references`: A list of the format:
    - `caption`: A caption describing the object.
    - `xyxy`: A bounding box, in xyxy format, in pixels.
 - `known_absent_captions`: A list of object text descriptions that are known to not be in the image.

## Formatting the OCID-Ref Validation Dataset


In [14]:
import os
import json

OCID_ref_root = '/scratch/gsk6me/WORLDMODELS/OCID-Ref'
OCID_ref_data_root = '/scratch/gsk6me/WORLDMODELS/OCID-dataset'

with open(os.path.join(OCID_ref_root, "val_expressions.json")) as f:
    val_expressions = json.load(f)


In [3]:
val_expressions['100031']

{'seq_id': 88,
 'scene_id': 997,
 'take_id': 17,
 'scene_path': 'ARID20/floor/top/seq11/rgb/result_2018-08-20-14-51-22.png',
 'sequence_path': 'ARID20/floor/top/seq11',
 'sub_dataset': 'ARID20',
 'instance_id': 6556,
 'scene_instance_id': 15,
 'class': 'hand_towel',
 'class_instance': 'hand_towel_1',
 'new_class': 'towel',
 'sentence': 'The towel on the front right of the cup.',
 'tokens': ['The', 'towel', 'on', 'the', 'front', 'right', 'of', 'the', 'cup'],
 'bbox': '[465, 127, 590, 235]',
 'sentence_id': '100031'}

In [16]:
import tqdm
import PIL.Image

batched_by_image = {}

for (key, value) in val_expressions.items():
    if value['scene_path'] not in batched_by_image:
        batched_by_image[value['scene_path']] = []
    batched_by_image[value['scene_path']].append(value)

print(len(batched_by_image), "unique scenes.")

# Create a row for each scene.
rows = []
for (scene_path, values) in tqdm.tqdm(batched_by_image.items(), desc='Generating .jsonl file...'):
    image = PIL.Image.open(os.path.join(OCID_ref_data_root, scene_path))
    rows.append({
        'image': scene_path,
        'width': image.width,
        'height': image.height,
        'references': [
            {
                'caption': value['sentence'].replace("_", " "),
                'xyxy': json.loads(value['bbox']),
            }
            for value in values
        ]
    })

# Write the .jsonl.
with open("ocid-ref-val.eval.jsonl", "w") as f:
    for row in rows:
        json.dump(row, f)
        f.write("\n")


1963 unique scenes.


Generating .jsonl file...: 100%|██████████| 1963/1963 [00:00<00:00, 1967.35it/s]
