Is DetectionDataset intended behavior to store all images? #1270

David-rn · 2024-06-07T06:43:15Z

Search before asking

I have searched the Supervision issues and found no similar feature requests.

Question

Hi! 👋

I found that supervision was using >40GB of RAM in my system when using DectectionDataset to save a relatively large dataset as_yolo. I realized that the DectectionDataset uses a dictionary to map the image name to the image, which is stored in memory.

I wanted to ask if this is the expected behavior since it could lead to take all the available RAM when dealing with some datasets.
If it is not, maybe it could store a reference path to the original image that is only opened when exporting the dataset.

As an example of this behavior I used this skeleton script:

for image_name in list_images:
    image = cv2.imread(join(images_path, image_name))
    detection = get_detection_from_file(gt_file_path)

    images[image_name] = image
    annotations[image_name] = detection

dataset = sv.DetectionDataset(
        classes=classes,
        images=images,
        annotations=annotations,
    )
dataset.as_yolo(annotations_directory_path=output_gt_path, images_directory_path=output_images_path)

Additional

No response

The text was updated successfully, but these errors were encountered:

LinasKo · 2024-06-07T08:18:31Z

Hi @David-rn 👋

You've hit on something very high in our priorities list.
In the next supervision release we'd like to start implementing a more efficient data loader.

You can find some context here if it's something that catches your interest: #316

SkalskiP · 2024-06-07T08:49:09Z

@LinasKo and @David-rn, I'm closing this issue and marking it as duplicate.

David-rn · 2024-06-07T15:57:11Z

Perfect! I didn't find that issue, thanks @LinasKo for the clarification. If you need anything, glad to help!

SkalskiP · 2024-06-07T18:23:09Z

For now, we will work with @LinasKo, as we know, to design the proper API. Once we will create the first dataset loader (probably for YOLO format), you can pick the next one (COCO or PASCAL).

David-rn added the question Further information is requested label Jun 7, 2024

SkalskiP closed this as completed Jun 7, 2024

SkalskiP added the duplicate This issue or pull request already exists label Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is DetectionDataset intended behavior to store all images? #1270

Is DetectionDataset intended behavior to store all images? #1270

David-rn commented Jun 7, 2024

LinasKo commented Jun 7, 2024 •

edited

Loading

SkalskiP commented Jun 7, 2024

David-rn commented Jun 7, 2024

SkalskiP commented Jun 7, 2024

Is DetectionDataset intended behavior to store all images? #1270

Is DetectionDataset intended behavior to store all images? #1270

Comments

David-rn commented Jun 7, 2024

Search before asking

Question

Additional

LinasKo commented Jun 7, 2024 • edited Loading

SkalskiP commented Jun 7, 2024

David-rn commented Jun 7, 2024

SkalskiP commented Jun 7, 2024

LinasKo commented Jun 7, 2024 •

edited

Loading