Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CocoDetection dataset incompatible with Faster R-CNN model #2720

Closed
sheromon opened this issue Sep 28, 2020 · 7 comments
Closed

CocoDetection dataset incompatible with Faster R-CNN model #2720

sheromon opened this issue Sep 28, 2020 · 7 comments

Comments

@sheromon
Copy link

sheromon commented Sep 28, 2020

🐛 Bug

Before I report my issue, I'd like to say that the TorchVision Object Detection Finetuning tutorial is excellent! I've found the code to be easy to work with, but the tutorial made it even more accessible -- it got me training on my custom dataset in just a few hours (including making my own Docker image with GPU support).

The CocoDetection dataset appears to be incompatible with the Faster R-CNN model. The TorchVision Object Detection Finetuning tutorial specifies the format of datasets to be compatible with the Mask R-CNN model: datasets' getitem method should output an image and a target with fields boxes, labels, area, etc. The CocoDetection dataset returns the COCO annotations as the target, which does not match the dataset specification.

The evaluation code included with the tutorial (engine.evaluate) also appears to be incompatible with the built-in CocoDetection dataset format.

To Reproduce

Steps to reproduce the behavior:

Follow the steps in the TorchVision Object Detection Finetuning tutorial substituting a dataset with COCO annotations and using torchvision.datasets.CocoDetection as the dataset class instead of the custom dataset class defined in the tutorial. I hit an error within the train_one_epoch function in engine.py.

The error message is below.
File "references/detection/engine.py", line 28, in
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
AttributeError: 'list' object has no attribute 'items'

Expected behavior

I expected training and evaluation to run successfully when using torchvision.datasets.CocoDetection. I was able to run training and evaluation by making my own custom COCO dataset class and manipulating the target output to match the specified format.

Environment

Collecting environment information...
PyTorch version: 1.6.0+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Quadro T2000
Nvidia driver version: 430.64
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.6.0+cu101
[pip3] torchvision==0.7.0+cu101
[conda] Could not collect

Additional context

None

cc @pmeier

@fmassa
Copy link
Member

fmassa commented Sep 28, 2020

Hi,

Thanks for opening this issue. Your assessment is correct: the current implementation for COCODataset is not compatible with the one that the training code expects.
In our reference scripts, we perform the conversion needed so that you can use it, see

class CocoDetection(torchvision.datasets.CocoDetection):
def __init__(self, img_folder, ann_file, transforms):
super(CocoDetection, self).__init__(img_folder, ann_file)
self._transforms = transforms
def __getitem__(self, idx):
img, target = super(CocoDetection, self).__getitem__(idx)
image_id = self.ids[idx]
target = dict(image_id=image_id, annotations=target)
if self._transforms is not None:
img, target = self._transforms(img, target)
return img, target

where we add an extra image_id to the targets.

Apart from this image_id, the other metadata can be easily obtained as a transform, see

t = [ConvertCocoPolysToMask()]
and
class ConvertCocoPolysToMask(object):
def __call__(self, image, target):
w, h = image.size
image_id = target["image_id"]
image_id = torch.tensor([image_id])
anno = target["annotations"]
anno = [obj for obj in anno if obj['iscrowd'] == 0]
boxes = [obj["bbox"] for obj in anno]
# guard against no boxes via resizing
boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
boxes[:, 2:] += boxes[:, :2]
boxes[:, 0::2].clamp_(min=0, max=w)
boxes[:, 1::2].clamp_(min=0, max=h)
classes = [obj["category_id"] for obj in anno]
classes = torch.tensor(classes, dtype=torch.int64)
segmentations = [obj["segmentation"] for obj in anno]
masks = convert_coco_poly_to_mask(segmentations, h, w)
keypoints = None
if anno and "keypoints" in anno[0]:
keypoints = [obj["keypoints"] for obj in anno]
keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
num_keypoints = keypoints.shape[0]
if num_keypoints:
keypoints = keypoints.view(num_keypoints, -1, 3)
keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
boxes = boxes[keep]
classes = classes[keep]
masks = masks[keep]
if keypoints is not None:
keypoints = keypoints[keep]
target = {}
target["boxes"] = boxes
target["labels"] = classes
target["masks"] = masks
target["image_id"] = image_id
if keypoints is not None:
target["keypoints"] = keypoints
# for conversion to coco api
area = torch.tensor([obj["area"] for obj in anno])
iscrowd = torch.tensor([obj["iscrowd"] for obj in anno])
target["area"] = area
target["iscrowd"] = iscrowd
return image, target

There are a few reasons why this is the case:

  • the CocoDetection dataset was introduced before the reference training scripts were implemented
  • there was no way of obtaining the image_id from the raw annotations, which needed for evaluation.
  • we didn't want to break backwards-compatibility on the return type of the CocoDetection

Overall, my thinking is that the return type of a dataset is tightly coupled with the training / evaluation script for a model in particular. Unfortunately there is this mismatch between both now, but I'm not sure how to fix this without a pretty drastic BC-breaking change (except from introducing another dataset class for the new COCO style dataset).

I'm glad to hear your thoughts on this.

@sheromon
Copy link
Author

Thanks for your quick response! Your explanation makes sense, and the additional code that you referenced is very helpful.

I feel like you'd agree that what feels intuitive/natural is for the output of CocoDetection to match the current expectation of the object detection model training. I agree that making the change would break backward compatibility. Might the current behavior potentially be deprecated and later changed to make the future behavior line up? Transforming the target is a valid workaround, so it's not a big problem, but it would be unexpected for any new users.

@fmassa
Copy link
Member

fmassa commented Sep 29, 2020

We could potentially change the output format (and have a deprecation cycle for a few releases to let users fix their code), but I'm always wary of breaking changes as there are many tutorials / code out there that uses the current behavior.

We have been very careful in the past with breaking changes like this one, so this is something that deserves further discussion.

cc @dongreenberg let's discuss about this on our next meeting

@sheromon
Copy link
Author

Thanks for your consideration.

At the very least, if someone is confused like I was, they can find this issue!

@rikkudo
Copy link

rikkudo commented Apr 25, 2022

Thanks for your quick response! Your explanation makes sense, and the additional code that you referenced is very helpful.

I feel like you'd agree that what feels intuitive/natural is for the output of CocoDetection to match the current expectation of the object detection model training. I agree that making the change would break backward compatibility. Might the current behavior potentially be deprecated and later changed to make the future behavior line up? Transforming the target is a valid workaround, so it's not a big problem, but it would be unexpected for any new users.

Hi, so I have this problem too, can you manage to transforming the target? how you do that?

@sheromon
Copy link
Author

Hi, @rikkudo, this was a while ago, so I can't guarantee this code, but I ended up doing something like this.

from collections import defaultdict
import json
from pathlib import Path

from PIL import Image
import torch
from torch.utils.data import Dataset

class CocoDataset(Dataset):
    """PyTorch dataset for COCO annotations."""

    def __init__(self, data_dir, transforms=None):
        """Load COCO annotation data."""
        self.data_dir = Path(data_dir)
        self.transforms = transforms

        # load the COCO annotations json
        anno_file_path = self.data_dir/'coco_annotations.json'
        with open(str(anno_file_path)) as file_obj:
            self.coco_data = json.load(file_obj)
        # put all of the annos into a dict where keys are image IDs to speed up retrieval
        self.image_id_to_annos = defaultdict(list)
        for anno in self.coco_data['annotations']:
            image_id = anno['image_id']
            self.image_id_to_annos[image_id] += [anno]

    def __len__(self):
        return len(self.coco_data['images'])

    def __getitem__(self, index):
        """Return tuple of image and labels as torch tensors."""
        image_data = self.coco_data['images'][index]
        image_id = image_data['id']
        image_path = self.data_dir/'images'/image_data['file_name']
        image = Image.open(image_path)

        annos = self.image_id_to_annos[image_id]
        anno_data = {
            'boxes': [],
            'labels': [],
            'area': [],
            'iscrowd': [],
        }
        for anno in annos:
            coco_bbox = anno['bbox']
            left = coco_bbox[0]
            top = coco_bbox[1]
            right = coco_bbox[0] + coco_bbox[2]
            bottom = coco_bbox[1] + coco_bbox[3]
            area = coco_bbox[2] * coco_bbox[3]
            anno_data['boxes'].append([left, top, right, bottom])
            anno_data['labels'].append(anno['category_id'])
            anno_data['area'].append(area)
            anno_data['iscrowd'].append(anno['iscrowd'])

        target = {
            'boxes': torch.as_tensor(anno_data['boxes'], dtype=torch.float32),
            'labels': torch.as_tensor(anno_data['labels'], dtype=torch.int64),
            'image_id': torch.tensor([image_id]),  # pylint: disable=not-callable (false alarm)
            'area': torch.as_tensor(anno_data['area'], dtype=torch.float32),
            'iscrowd': torch.as_tensor(anno_data['iscrowd'], dtype=torch.int64),
        }

        if self.transforms is not None:
            image, target = self.transforms(image, target)

        return image, target

@rikkudo
Copy link

rikkudo commented Apr 26, 2022

Hi, @rikkudo, this was a while ago, so I can't guarantee this code, but I ended up doing something like this.

from collections import defaultdict
import json
from pathlib import Path

from PIL import Image
import torch
from torch.utils.data import Dataset

class CocoDataset(Dataset):
    """PyTorch dataset for COCO annotations."""

    def __init__(self, data_dir, transforms=None):
        """Load COCO annotation data."""
        self.data_dir = Path(data_dir)
        self.transforms = transforms

        # load the COCO annotations json
        anno_file_path = self.data_dir/'coco_annotations.json'
        with open(str(anno_file_path)) as file_obj:
            self.coco_data = json.load(file_obj)
        # put all of the annos into a dict where keys are image IDs to speed up retrieval
        self.image_id_to_annos = defaultdict(list)
        for anno in self.coco_data['annotations']:
            image_id = anno['image_id']
            self.image_id_to_annos[image_id] += [anno]

    def __len__(self):
        return len(self.coco_data['images'])

    def __getitem__(self, index):
        """Return tuple of image and labels as torch tensors."""
        image_data = self.coco_data['images'][index]
        image_id = image_data['id']
        image_path = self.data_dir/'images'/image_data['file_name']
        image = Image.open(image_path)

        annos = self.image_id_to_annos[image_id]
        anno_data = {
            'boxes': [],
            'labels': [],
            'area': [],
            'iscrowd': [],
        }
        for anno in annos:
            coco_bbox = anno['bbox']
            left = coco_bbox[0]
            top = coco_bbox[1]
            right = coco_bbox[0] + coco_bbox[2]
            bottom = coco_bbox[1] + coco_bbox[3]
            area = coco_bbox[2] * coco_bbox[3]
            anno_data['boxes'].append([left, top, right, bottom])
            anno_data['labels'].append(anno['category_id'])
            anno_data['area'].append(area)
            anno_data['iscrowd'].append(anno['iscrowd'])

        target = {
            'boxes': torch.as_tensor(anno_data['boxes'], dtype=torch.float32),
            'labels': torch.as_tensor(anno_data['labels'], dtype=torch.int64),
            'image_id': torch.tensor([image_id]),  # pylint: disable=not-callable (false alarm)
            'area': torch.as_tensor(anno_data['area'], dtype=torch.float32),
            'iscrowd': torch.as_tensor(anno_data['iscrowd'], dtype=torch.int64),
        }

        if self.transforms is not None:
            image, target = self.transforms(image, target)

        return image, target

thank you for the help, really apreciate!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants