# Separate ground-truth data for each subset

To use this notebook, you must download the secret ground truth file for the test set on the RRC platform.
This requires to log in as MapText competition administrator.

What you can download from the platform is a single ground truth file `test.json`.
This file contains the targets for both subsets, and for convenience we split it here.

Actually, it would be even better to further split the data into a separate file for each image, but this is not done here.

In [1]:
# Secret file downloaded from the online plateform
PATH_TO_SINGLE_GT = "data/00-input/gt/test.json"

In [2]:
!md5sum {PATH_TO_SINGLE_GT}

5dc1b2f33419fd9d32be7cb76ecab7b4  data/00-input/gt/test.json


In [3]:
# Split the list into two lists depending on the value of the "image" field for each item
# the file has the following structure:
# [
#     {
#         "image": "rumsey/test/3081001_h6_w18.png",
#         "groups": [ ... ]
#     },
#     {
#         "image": "rumsey/test/....png",
#         "groups": [ ... ]
#     },
#     ...
#     {
#         "image": "ign/test/000001.jpg",
#         "groups": [ ... ]
#     },
#     ...
# ]
# the first list contains all the items with the "image" field starting with "rumsey"
# the second list contains all the items with the "image" field starting with "ign"
def split_gt_by_image(gt):
    rumsey = []
    ign = []
    for item in gt:
        if item["image"].startswith("rumsey"):
            rumsey.append(item)
        elif item["image"].startswith("ign"):
            ign.append(item)
    return rumsey, ign

In [4]:
# Create output directories
import os
os.makedirs("data/00-input/gt/rumsey", exist_ok=True)
os.makedirs("data/00-input/gt/ign", exist_ok=True)

# Open the file and generate the two lists
import json
with open(PATH_TO_SINGLE_GT) as f:
    gt = json.load(f)
    rumsey, ign = split_gt_by_image(gt)
    # save the two lists to two separate files
    with open("data/00-input/gt/rumsey/test.json", "w") as f:
        json.dump(rumsey, f, indent=2)
    with open("data/00-input/gt/ign/test.json", "w") as f:
        json.dump(ign, f, indent=2)

In [5]:
!md5sum data/00-input/gt/rumsey/test.json
!md5sum data/00-input/gt/ign/test.json

07c4bcfa3c1d93f8074451c744e971d9  data/00-input/gt/rumsey/test.json
2423e77747c961daa0f72087af7f4333  data/00-input/gt/ign/test.json
