Skip to content

[COCO Dataset improvement] #171

@cregouby

Description

@cregouby

Current Situation

coco_detection_dataset is carrying object detection and object segmentation at the same time. This comes with

  • the need to carry a huge 500MB+ $annotation object hosting all annotations for both tasks along the whole dataset lifetime in memory (i.e. forever)
  • a complex documentation for the dataset that is confusing to run one task or the other, knowing that the two task will never run together (as they target different model architecture)
  • the downloaded archive and ann_zip are stored at root folder on the rappdirs::user_cache_dir("torch") without using a prefix = but have filename not explicit to the coco dataset, so end-user will have to deal with an hardly un-identified val2014.zip 6GB files on disk when reaching the "disk full" exception...

Suggestion

As the two tasks have no common DNN implementation, we could easily (through instantiation)
split the coco_detection_dataset() into

  • coco_detection_dataset()
  • coco_segmentation_dataset()

In the website (see pkgdown.yml) we should also separate classification datasets form the non-classification datasets.

That would allow much more straightforward example, and lower memory footprint of $annotation. That would also ease #170

  • archive and ann_zip should be moved into or prefixed by a /coco folder for better identification in the cache folder

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions