## Python Notebook to create configuration from 001 to 008
This script processes the Cityscapes dataset by selectively removing certain categories (specified by their IDs) from the training data's annotations. The modified annotations are then saved into a new JSON file, which could be used for training a model with a different subset of categories or for other purposes.

Conf 001 - remove labels [24]
Conf 002 - remove labels [24, 25]
Conf 003 - remove labels [24, 25, 26]
Conf 004 - remove labels [24, 25, 26, 27]
Conf 005 - remove labels [24, 25, 26, 27, 28]
Conf 006 - remove labels [24, 25, 26, 27, 28, 31]
Conf 007 - remove labels [24, 25, 26, 27, 28, 31, 32]
Conf 008 -  remove labels [24, 25, 26, 27, 28, 31, 32, 33]

### 01 Import Libraries

In [66]:
import os
import json
import random
import shutil


### 02 Setting Paths from Cityscapes json

In [67]:
path_dataset=os.path.join("/Users/gaetanbrison/Documents/GitHub/hi-paris/Image-Segmentation/fc-clip/datasets/", "cityscapes")
path_getFine=os.path.join(path_dataset, "gtFine")
path_leftImg8bit=os.path.join(path_dataset, "leftImg8bit")

In [68]:
path_getFine_train=os.path.join(path_getFine, "train")
path_getFine_val=os.path.join(path_getFine, "val")
path_getFine_test=os.path.join(path_getFine, "test")

path_getFine_cityscapes_panoptic_train=os.path.join(path_getFine, "cityscapes_panoptic_train")
path_getFine_cityscapes_panoptic_val=os.path.join(path_getFine, "cityscapes_panoptic_val")
path_getFine_cityscapes_panoptic_test=os.path.join(path_getFine, "cityscapes_panoptic_test")

cityscapes_panoptic_train=os.path.join(path_getFine, "cityscapes_panoptic_train.json")
cityscapes_panoptic_val=os.path.join(path_getFine, "cityscapes_panoptic_val.json")
cityscapes_panoptic_test=os.path.join(path_getFine, "cityscapes_panoptic_test.json")


In [69]:
cityscapes_panoptic_train

'/Users/gaetanbrison/Documents/GitHub/hi-paris/Image-Segmentation/fc-clip/datasets/cityscapes/gtFine/cityscapes_panoptic_train.json'

### 03  Get the classes to be removed from cityscape train

In [70]:
per=7
def get_per_lab(ind):
    list_of_labels=[24, 25, 26, 27, 28, 31, 32, 33]
    
    return list_of_labels[0:ind+1]

labels_to_be_mv=get_per_lab(per)

In [71]:
labels_to_be_mv

[24, 25, 26, 27, 28, 31, 32, 33]

### 03 Read Panoptic & Remove labels not needed


In [72]:
with open(cityscapes_panoptic_train, 'r') as json_file:
    data = json.load(json_file)
print(data.keys())
print(len(data['categories']),len(data['images']),len(data['annotations']))
print(data['images'][0])
print(data['categories'][0])

dict_keys(['annotations', 'categories', 'images'])
19 2975 2975
{'file_name': 'aachen_000000_000019_gtFine_leftImg8bit.png', 'height': 1024, 'id': 'aachen_000000_000019', 'width': 2048}
{'color': [128, 64, 128], 'id': 7, 'isthing': 0, 'name': 'road', 'supercategory': 'flat'}


This loop iterates over all the annotations in the training data. For each annotation, it checks if the category_id is in the list of labels to be moved (labels_to_be_mv). If it's not, the segment info is kept; otherwise, it's removed.

In [73]:
for i in range(len(data['annotations'])):
    list_l=[]
    for j in range(len(data['annotations'][i]["segments_info"])):
        if data['annotations'][i]["segments_info"][j]['category_id'] not in labels_to_be_mv:
            list_l.append(data['annotations'][i]["segments_info"][j])
    data['annotations'][i]["segments_info"]=list_l
        
    
    

This loop removes categories from the data['categories'] list if their id is in labels_to_be_mv.

In [74]:
list_n=[]
for i in range(len(data['categories'])):
    if data['categories'][i]['id'] not in labels_to_be_mv:
        list_n.append(data['categories'][i])
data['categories']=list_n

### 05 Saving the Modified Data to a New JSON File

In [76]:
# Specify the file path
file_path = '/Users/gaetanbrison/Documents/GitHub/hi-paris/Image-Segmentation/fc-clip/datasets/cityscapes/gtFine/cityscapes_panoptic_train_'+str(per+1)+'.json'

# Write dictionary to JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file)