Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is DetectionDataset intended behavior to store all images? #1270

Closed
1 task done
David-rn opened this issue Jun 7, 2024 · 4 comments
Closed
1 task done

Is DetectionDataset intended behavior to store all images? #1270

David-rn opened this issue Jun 7, 2024 · 4 comments
Labels
duplicate This issue or pull request already exists question Further information is requested

Comments

@David-rn
Copy link
Contributor

David-rn commented Jun 7, 2024

Search before asking

  • I have searched the Supervision issues and found no similar feature requests.

Question

Hi! 👋

I found that supervision was using >40GB of RAM in my system when using DectectionDataset to save a relatively large dataset as_yolo. I realized that the DectectionDataset uses a dictionary to map the image name to the image, which is stored in memory.

I wanted to ask if this is the expected behavior since it could lead to take all the available RAM when dealing with some datasets.
If it is not, maybe it could store a reference path to the original image that is only opened when exporting the dataset.

As an example of this behavior I used this skeleton script:

for image_name in list_images:
    image = cv2.imread(join(images_path, image_name))
    detection = get_detection_from_file(gt_file_path)

    images[image_name] = image
    annotations[image_name] = detection

dataset = sv.DetectionDataset(
        classes=classes,
        images=images,
        annotations=annotations,
    )
dataset.as_yolo(annotations_directory_path=output_gt_path, images_directory_path=output_images_path)

Additional

No response

@David-rn David-rn added the question Further information is requested label Jun 7, 2024
@LinasKo
Copy link
Collaborator

LinasKo commented Jun 7, 2024

Hi @David-rn 👋

You've hit on something very high in our priorities list.
In the next supervision release we'd like to start implementing a more efficient data loader.

You can find some context here if it's something that catches your interest: #316

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jun 7, 2024

@LinasKo and @David-rn, I'm closing this issue and marking it as duplicate.

@SkalskiP SkalskiP closed this as completed Jun 7, 2024
@SkalskiP SkalskiP added the duplicate This issue or pull request already exists label Jun 7, 2024
@David-rn
Copy link
Contributor Author

David-rn commented Jun 7, 2024

Perfect! I didn't find that issue, thanks @LinasKo for the clarification. If you need anything, glad to help!

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jun 7, 2024

For now, we will work with @LinasKo, as we know, to design the proper API. Once we will create the first dataset loader (probably for YOLO format), you can pick the next one (COCO or PASCAL).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants