Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of ConcatDataset #10822

Open
stwerner97 opened this issue Aug 22, 2023 · 5 comments
Open

Evaluation of ConcatDataset #10822

stwerner97 opened this issue Aug 22, 2023 · 5 comments
Assignees

Comments

@stwerner97
Copy link

stwerner97 commented Aug 22, 2023

I want to train and evaluate a detection model on multiple datasets at once. As preparation of the dataset, I use a joined label space with the same class label. How do I set up the evaluator when using multiple concatenated datasets? Looking at this pull request #3522, it seems that this use-case should be supported. I would be fine with both aggregated reports as well as dataset-specific evaluations.

I've tried to use separate evaluation metrics for each dataset.

val_evaluator = [
    dict(
        _scope_="mmdet",
        type="CocoMetric",
        ann_file=f"{objects365_data_root}/annotations/val.json",
        metric='bbox',
        proposal_nums=(1, 10, 100)
    ),
    dict(
        _scope_="mmdet",
        type="CocoMetric",
        ann_file=f"{coco2017_data_root}/annotations/val.json",
        metric='bbox',
        proposal_nums=(1, 10, 100)
    ),
]

This does not work and throws an AssertionError: Results do not correspond to current coco set error. I also tried an evaluator similar to this:

val_evaluator = [
    dict(
         _scope_="mmdet",
        type="CocoMetric",
        metric='bbox',
        proposal_nums=(1, 10, 100)
    )
]

which raises an error that the ann_file is missing AssertionError: ground truth is required for evaluation when `ann_file` is not provided.

Below are the relevant parts of my configuration file.

<... some more code ...>

# -------------------- Dataset Definition -------------------- #
base_train_dataset = dict(
    type = "CocoDataset",
    ann_file = "annotations/train.json",
    data_prefix = dict(img='train/'),
    metainfo = metainfo,
    filter_cfg=dict(filter_empty_gt=True, min_size=32),
    backend_args = {{_base_.backend_args}},
    pipeline = {{_base_.train_pipeline}},
)

base_val_dataset = dict(
    type = "CocoDataset",
    ann_file = "annotations/val.json",
    data_prefix = dict(img='val/'),
    metainfo = metainfo,    
    filter_cfg=dict(filter_empty_gt=True, min_size=32),
    backend_args = {{_base_.backend_args}},
    pipeline= {{_base_.test_pipeline}},
)

objects365_train_dataset = dict(
    data_root=objects365_data_root,
    **base_train_dataset
)

coco2017_train_dataset = dict(
    data_root=coco2017_data_root,
    **base_train_dataset
)

objects365_val_dataset = dict(
    data_root=objects365_data_root,
    **base_val_dataset
)

coco2017_val_dataset = dict(
    data_root=coco2017_data_root,
    **base_val_dataset
)

combined_train_dataset = dict(
    type = "ConcatDataset",
    datasets = [objects365_train_dataset, coco2017_train_dataset]
)

combined_val_dataset = dict(
    type = "ConcatDataset",
    datasets = [objects365_val_dataset, coco2017_val_dataset]
)

# -------------------- Data Definition -------------------- #
data = dict(
    train = combined_train_dataset,
    val = combined_val_dataset,
)

# -------------------- Dataloader Definition -------------------- #
train_dataloader = dict(
    _delete_ = True,
    batch_size=1,
    num_workers=1,
    dataset=combined_train_dataset,
)

val_dataloader = dict(
    _delete_ = True,
    batch_size=1,
    num_workers=1,
    dataset=combined_val_dataset,
)

<... some more code ...>

The setup is successfully training for the first epoch, but then throws an error upon evaluating.

Thanks for the great project! 😊

@hhaAndroid
Copy link
Collaborator

@stwerner97
Copy link
Author

stwerner97 commented Aug 23, 2023

Thanks for the quick response @hhaAndroid! 😊

The example you've linked does not work for me and does not use a ConcatDataset. I've also double checked that the evaluation of both coco2017_val_dataset and objects365_val_dataset works alright if I don't use ConcatDataset and instead train and evaluate on a single dataset. I've also checked that both datasets use consistent annotations, i.e., a single label and the same ID.

@stwerner97
Copy link
Author

I've checked what issues are raised when I use the following evaluator

val_evaluator = [
    dict(
         _scope_="mmdet",
        type="CocoMetric",
        metric='bbox',
        proposal_nums=(1, 10, 100)
    )
]

and noticed that, although the COCOMetric class raises the error AssertionError: ground truth is required for evaluation when `ann_file` is not provided, the ground truth labels are available under another key. The lines of the COCOMetric class that raise the issue are

assert 'instances' in data_sample, \
    'ground truth is required for evaluation when ' \
    '`ann_file` is not provided'

While instances is not set in data_sample, gt_instances is. If I change the key (and do some changes in order to fit the expected downstream shape of the ground truth), the evaluation works for me.

assert 'gt_instances' in data_sample, \
    'ground truth is required for evaluation when ' \
    '`ann_file` is not provided'

gt['anns'] = []

boxes = data_sample['gt_instances']['bboxes'].detach().cpu().numpy()
labels = data_sample['gt_instances']['labels'].detach().cpu().numpy()

for bbox, label in zip(boxes, labels):
    gt['anns'].append({'bbox': bbox, 'bbox_label': label})

@hhaAndroid could you confirm that the key gt_instances indeed holds the ground-truth bbox and class labels? I'll later check if the implementation works when a single dataset is used, but the ann_file isn't set in the evaluator.

@stwerner97
Copy link
Author

stwerner97 commented Aug 24, 2023

Unfortunately, I don't think this works as expected, as the datasets (stored in coco format) could have overlapping image ids, which might give wrong results later on when aggregating the scores.

In essence, I think, one would need to ensure that the image ids are unique across datasets or the dataloader would need to give some indication what source dataset a sample belongs to.

@oomq
Copy link

oomq commented Sep 1, 2023

I've checked what issues are raised when I use the following evaluator

val_evaluator = [
    dict(
         _scope_="mmdet",
        type="CocoMetric",
        metric='bbox',
        proposal_nums=(1, 10, 100)
    )
]

and noticed that, although the COCOMetric class raises the error AssertionError: ground truth is required for evaluation when `ann_file` is not provided, the ground truth labels are available under another key. The lines of the COCOMetric class that raise the issue are

assert 'instances' in data_sample, \
    'ground truth is required for evaluation when ' \
    '`ann_file` is not provided'

While instances is not set in data_sample, gt_instances is. If I change the key (and do some changes in order to fit the expected downstream shape of the ground truth), the evaluation works for me.

assert 'gt_instances' in data_sample, \
    'ground truth is required for evaluation when ' \
    '`ann_file` is not provided'

gt['anns'] = []

boxes = data_sample['gt_instances']['bboxes'].detach().cpu().numpy()
labels = data_sample['gt_instances']['labels'].detach().cpu().numpy()

for bbox, label in zip(boxes, labels):
    gt['anns'].append({'bbox': bbox, 'bbox_label': label})

@hhaAndroid could you confirm that the key gt_instances indeed holds the ground-truth bbox and class labels? I'll later check if the implementation works when a single dataset is used, but the ann_file isn't set in the evaluator.

@hhaAndroid
@stwerner97 I encountered the same problem as you.

AssertionError: Results do not correspond to current coco set

so, I modified the config files, just like this.

val_evaluator = [
    dict(
        _scope_="mmdet",
        type='CocoMetric',
        metric='bbox',
        ann_file=[
            "{}/test.json".format(data_rootsrsdd),
            "{}/test.json".format(data_rootssdd),
            "{}/test.json".format(data_rootrsdd),
            "{}/test.json".format(data_rootdssdd)],
        format_only=False,
        backend_args=backend_args),

]

and then, I found the error was caused by the self._coco_api,which can found in {root}\mmdetection\mmdet\evaluation\metrics\coco_metric.py, because it only keeps the path of the last ann_file when we use the same COCOmetric type. So I modified the way of the codeself._coco_api= COCO(local_path) gets, so that it can merge the json files at once. The code can found in pycocotools\coco.py.

class COCO:
    def __init__(self, annotation_file=None):
        # load dataset
        self.dataset, self.anns, self.cats, self.imgs = dict(), dict(), dict(), dict()
        self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
        if not annotation_file == None:
            if isinstance(annotation_file, list):
                print('loading mutil-annotations into memory...')
                # datasets = {}
                json_contents = []
                for ann in annotation_file:
                    with open(ann, "r") as f:
                        json_content = json.load(f)
                        # datasets.update(dataset)
                        json_contents.append(json_content)
                        print("d")
                merged_images = []
                merged_annotations = []
                merged_info = []
                merged_class = []
                for _json in json_contents:
                    merged_images += _json["images"]
                    merged_annotations += _json["annotations"]
                    merged_info = _json["info"]
                    merged_class = _json["categories"]
        
                datasets = {
                    "info": merged_info,
                    "categories": merged_class,
                        "images": merged_images,
                        "annotations": merged_annotations
                        
                    }
                
                self.dataset = datasets
                self.createIndex()

            else:
                print('loading annotations into memory...')
                tic = time.time()
                with open(annotation_file, 'r') as f:
                    dataset = json.load(f)
                assert type(dataset) == dict, 'annotation file format {} not supported'.format(type(dataset))
                print('Done (t={:0.2f}s)'.format(time.time() - tic))
                self.dataset = dataset
                self.createIndex()

It can work, but the Recall was very low and Precision was zero. I don't know if it's a problem with the model or the code I modified, and I haven't had time to solve it yet. By the way, I considered your advise, and maked sure the each different image corresponds to a unique "img_id" in different json files. I am a fresh, but I hope this helps you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants