CocoDetection dataset incompatible with Faster R-CNN model #2720

sheromon · 2020-09-28T13:37:32Z

🐛 Bug

Before I report my issue, I'd like to say that the TorchVision Object Detection Finetuning tutorial is excellent! I've found the code to be easy to work with, but the tutorial made it even more accessible -- it got me training on my custom dataset in just a few hours (including making my own Docker image with GPU support).

The CocoDetection dataset appears to be incompatible with the Faster R-CNN model. The TorchVision Object Detection Finetuning tutorial specifies the format of datasets to be compatible with the Mask R-CNN model: datasets' getitem method should output an image and a target with fields boxes, labels, area, etc. The CocoDetection dataset returns the COCO annotations as the target, which does not match the dataset specification.

The evaluation code included with the tutorial (engine.evaluate) also appears to be incompatible with the built-in CocoDetection dataset format.

To Reproduce

Steps to reproduce the behavior:

Follow the steps in the TorchVision Object Detection Finetuning tutorial substituting a dataset with COCO annotations and using torchvision.datasets.CocoDetection as the dataset class instead of the custom dataset class defined in the tutorial. I hit an error within the train_one_epoch function in engine.py.

The error message is below.
File "references/detection/engine.py", line 28, in
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
AttributeError: 'list' object has no attribute 'items'

Expected behavior

I expected training and evaluation to run successfully when using torchvision.datasets.CocoDetection. I was able to run training and evaluation by making my own custom COCO dataset class and manipulating the target output to match the specified format.

Environment

Collecting environment information...
PyTorch version: 1.6.0+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Quadro T2000
Nvidia driver version: 430.64
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.6.0+cu101
[pip3] torchvision==0.7.0+cu101
[conda] Could not collect

Additional context

None

cc @pmeier

The text was updated successfully, but these errors were encountered:

fmassa · 2020-09-28T14:34:34Z

Hi,

Thanks for opening this issue. Your assessment is correct: the current implementation for COCODataset is not compatible with the one that the training code expects.
In our reference scripts, we perform the conversion needed so that you can use it, see

vision/references/detection/coco_utils.py

Lines 209 to 220 in 898802f

    
           class CocoDetection(torchvision.datasets.CocoDetection): 
        
               def __init__(self, img_folder, ann_file, transforms): 
        
                   super(CocoDetection, self).__init__(img_folder, ann_file) 
        
                   self._transforms = transforms 
        
               def __getitem__(self, idx): 
        
                   img, target = super(CocoDetection, self).__getitem__(idx) 
        
                   image_id = self.ids[idx] 
        
                   target = dict(image_id=image_id, annotations=target) 
        
                   if self._transforms is not None: 
        
                       img, target = self._transforms(img, target) 
        
                   return img, target

where we add an extra image_id to the targets.

Apart from this image_id, the other metadata can be easily obtained as a transform, see

vision/references/detection/coco_utils.py

Line 231 in 898802f

t = [ConvertCocoPolysToMask()]

and

vision/references/detection/coco_utils.py

Lines 50 to 103 in 898802f

    
           class ConvertCocoPolysToMask(object): 
        
               def __call__(self, image, target): 
        
                   w, h = image.size 
        
                   image_id = target["image_id"] 
        
                   image_id = torch.tensor([image_id]) 
        
                   anno = target["annotations"] 
        
                   anno = [obj for obj in anno if obj['iscrowd'] == 0] 
        
                   boxes = [obj["bbox"] for obj in anno] 
        
                   # guard against no boxes via resizing 
        
                   boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4) 
        
                   boxes[:, 2:] += boxes[:, :2] 
        
                   boxes[:, 0::2].clamp_(min=0, max=w) 
        
                   boxes[:, 1::2].clamp_(min=0, max=h) 
        
                   classes = [obj["category_id"] for obj in anno] 
        
                   classes = torch.tensor(classes, dtype=torch.int64) 
        
                   segmentations = [obj["segmentation"] for obj in anno] 
        
                   masks = convert_coco_poly_to_mask(segmentations, h, w) 
        
                   keypoints = None 
        
                   if anno and "keypoints" in anno[0]: 
        
                       keypoints = [obj["keypoints"] for obj in anno] 
        
                       keypoints = torch.as_tensor(keypoints, dtype=torch.float32) 
        
                       num_keypoints = keypoints.shape[0] 
        
                       if num_keypoints: 
        
                           keypoints = keypoints.view(num_keypoints, -1, 3) 
        
                   keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0]) 
        
                   boxes = boxes[keep] 
        
                   classes = classes[keep] 
        
                   masks = masks[keep] 
        
                   if keypoints is not None: 
        
                       keypoints = keypoints[keep] 
        
                   target = {} 
        
                   target["boxes"] = boxes 
        
                   target["labels"] = classes 
        
                   target["masks"] = masks 
        
                   target["image_id"] = image_id 
        
                   if keypoints is not None: 
        
                       target["keypoints"] = keypoints 
        
                   # for conversion to coco api 
        
                   area = torch.tensor([obj["area"] for obj in anno]) 
        
                   iscrowd = torch.tensor([obj["iscrowd"] for obj in anno]) 
        
                   target["area"] = area 
        
                   target["iscrowd"] = iscrowd 
        
                   return image, target

There are a few reasons why this is the case:

the CocoDetection dataset was introduced before the reference training scripts were implemented
there was no way of obtaining the image_id from the raw annotations, which needed for evaluation.
we didn't want to break backwards-compatibility on the return type of the CocoDetection

Overall, my thinking is that the return type of a dataset is tightly coupled with the training / evaluation script for a model in particular. Unfortunately there is this mismatch between both now, but I'm not sure how to fix this without a pretty drastic BC-breaking change (except from introducing another dataset class for the new COCO style dataset).

I'm glad to hear your thoughts on this.

sheromon · 2020-09-28T15:33:15Z

Thanks for your quick response! Your explanation makes sense, and the additional code that you referenced is very helpful.

I feel like you'd agree that what feels intuitive/natural is for the output of CocoDetection to match the current expectation of the object detection model training. I agree that making the change would break backward compatibility. Might the current behavior potentially be deprecated and later changed to make the future behavior line up? Transforming the target is a valid workaround, so it's not a big problem, but it would be unexpected for any new users.

fmassa · 2020-09-29T13:05:09Z

We could potentially change the output format (and have a deprecation cycle for a few releases to let users fix their code), but I'm always wary of breaking changes as there are many tutorials / code out there that uses the current behavior.

We have been very careful in the past with breaking changes like this one, so this is something that deserves further discussion.

cc @dongreenberg let's discuss about this on our next meeting

sheromon · 2020-09-30T17:36:14Z

Thanks for your consideration.

At the very least, if someone is confused like I was, they can find this issue!

rikkudo · 2022-04-25T23:43:21Z

Thanks for your quick response! Your explanation makes sense, and the additional code that you referenced is very helpful.

I feel like you'd agree that what feels intuitive/natural is for the output of CocoDetection to match the current expectation of the object detection model training. I agree that making the change would break backward compatibility. Might the current behavior potentially be deprecated and later changed to make the future behavior line up? Transforming the target is a valid workaround, so it's not a big problem, but it would be unexpected for any new users.

Hi, so I have this problem too, can you manage to transforming the target? how you do that?

sheromon · 2022-04-26T01:44:12Z

Hi, @rikkudo, this was a while ago, so I can't guarantee this code, but I ended up doing something like this.

from collections import defaultdict
import json
from pathlib import Path

from PIL import Image
import torch
from torch.utils.data import Dataset

class CocoDataset(Dataset):
    """PyTorch dataset for COCO annotations."""

    def __init__(self, data_dir, transforms=None):
        """Load COCO annotation data."""
        self.data_dir = Path(data_dir)
        self.transforms = transforms

        # load the COCO annotations json
        anno_file_path = self.data_dir/'coco_annotations.json'
        with open(str(anno_file_path)) as file_obj:
            self.coco_data = json.load(file_obj)
        # put all of the annos into a dict where keys are image IDs to speed up retrieval
        self.image_id_to_annos = defaultdict(list)
        for anno in self.coco_data['annotations']:
            image_id = anno['image_id']
            self.image_id_to_annos[image_id] += [anno]

    def __len__(self):
        return len(self.coco_data['images'])

    def __getitem__(self, index):
        """Return tuple of image and labels as torch tensors."""
        image_data = self.coco_data['images'][index]
        image_id = image_data['id']
        image_path = self.data_dir/'images'/image_data['file_name']
        image = Image.open(image_path)

        annos = self.image_id_to_annos[image_id]
        anno_data = {
            'boxes': [],
            'labels': [],
            'area': [],
            'iscrowd': [],
        }
        for anno in annos:
            coco_bbox = anno['bbox']
            left = coco_bbox[0]
            top = coco_bbox[1]
            right = coco_bbox[0] + coco_bbox[2]
            bottom = coco_bbox[1] + coco_bbox[3]
            area = coco_bbox[2] * coco_bbox[3]
            anno_data['boxes'].append([left, top, right, bottom])
            anno_data['labels'].append(anno['category_id'])
            anno_data['area'].append(area)
            anno_data['iscrowd'].append(anno['iscrowd'])

        target = {
            'boxes': torch.as_tensor(anno_data['boxes'], dtype=torch.float32),
            'labels': torch.as_tensor(anno_data['labels'], dtype=torch.int64),
            'image_id': torch.tensor([image_id]),  # pylint: disable=not-callable (false alarm)
            'area': torch.as_tensor(anno_data['area'], dtype=torch.float32),
            'iscrowd': torch.as_tensor(anno_data['iscrowd'], dtype=torch.int64),
        }

        if self.transforms is not None:
            image, target = self.transforms(image, target)

        return image, target

rikkudo · 2022-04-26T16:03:38Z

Hi, @rikkudo, this was a while ago, so I can't guarantee this code, but I ended up doing something like this.

from collections import defaultdict
import json
from pathlib import Path

from PIL import Image
import torch
from torch.utils.data import Dataset

class CocoDataset(Dataset):
    """PyTorch dataset for COCO annotations."""

    def __init__(self, data_dir, transforms=None):
        """Load COCO annotation data."""
        self.data_dir = Path(data_dir)
        self.transforms = transforms

        # load the COCO annotations json
        anno_file_path = self.data_dir/'coco_annotations.json'
        with open(str(anno_file_path)) as file_obj:
            self.coco_data = json.load(file_obj)
        # put all of the annos into a dict where keys are image IDs to speed up retrieval
        self.image_id_to_annos = defaultdict(list)
        for anno in self.coco_data['annotations']:
            image_id = anno['image_id']
            self.image_id_to_annos[image_id] += [anno]

    def __len__(self):
        return len(self.coco_data['images'])

    def __getitem__(self, index):
        """Return tuple of image and labels as torch tensors."""
        image_data = self.coco_data['images'][index]
        image_id = image_data['id']
        image_path = self.data_dir/'images'/image_data['file_name']
        image = Image.open(image_path)

        annos = self.image_id_to_annos[image_id]
        anno_data = {
            'boxes': [],
            'labels': [],
            'area': [],
            'iscrowd': [],
        }
        for anno in annos:
            coco_bbox = anno['bbox']
            left = coco_bbox[0]
            top = coco_bbox[1]
            right = coco_bbox[0] + coco_bbox[2]
            bottom = coco_bbox[1] + coco_bbox[3]
            area = coco_bbox[2] * coco_bbox[3]
            anno_data['boxes'].append([left, top, right, bottom])
            anno_data['labels'].append(anno['category_id'])
            anno_data['area'].append(area)
            anno_data['iscrowd'].append(anno['iscrowd'])

        target = {
            'boxes': torch.as_tensor(anno_data['boxes'], dtype=torch.float32),
            'labels': torch.as_tensor(anno_data['labels'], dtype=torch.int64),
            'image_id': torch.tensor([image_id]),  # pylint: disable=not-callable (false alarm)
            'area': torch.as_tensor(anno_data['area'], dtype=torch.float32),
            'iscrowd': torch.as_tensor(anno_data['iscrowd'], dtype=torch.int64),
        }

        if self.transforms is not None:
            image, target = self.transforms(image, target)

        return image, target

thank you for the help, really apreciate!

vfdev-5 added module: datasets topic: object detection module: documentation labels Sep 28, 2020

fmassa added the needs discussion label Sep 28, 2020

sheromon closed this as completed Apr 26, 2022

ranjaniocl mentioned this issue Mar 25, 2024

PyTorch standard Coco dataset (datasets.CocoDetection) not compatible with Faster R-CNN object detection model #8353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CocoDetection dataset incompatible with Faster R-CNN model #2720

CocoDetection dataset incompatible with Faster R-CNN model #2720

sheromon commented Sep 28, 2020 •

edited by pytorch-probot bot

Loading

fmassa commented Sep 28, 2020

sheromon commented Sep 28, 2020

fmassa commented Sep 29, 2020

sheromon commented Sep 30, 2020

rikkudo commented Apr 25, 2022

sheromon commented Apr 26, 2022

rikkudo commented Apr 26, 2022

CocoDetection dataset incompatible with Faster R-CNN model #2720

CocoDetection dataset incompatible with Faster R-CNN model #2720

Comments

sheromon commented Sep 28, 2020 • edited by pytorch-probot bot Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

fmassa commented Sep 28, 2020

sheromon commented Sep 28, 2020

fmassa commented Sep 29, 2020

sheromon commented Sep 30, 2020

rikkudo commented Apr 25, 2022

sheromon commented Apr 26, 2022

rikkudo commented Apr 26, 2022

sheromon commented Sep 28, 2020 •

edited by pytorch-probot bot

Loading