Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues while exporting a project ~5000 images #1134

Closed
R-Peleg opened this issue Jul 5, 2021 · 7 comments
Closed

Performance issues while exporting a project ~5000 images #1134

R-Peleg opened this issue Jul 5, 2021 · 7 comments
Assignees
Labels
bounding boxes Image Object Detection with Bounding Boxes annotation scenario images Image annotation cases performance Performance & high load-related issues problem bug or something isn't working

Comments

@R-Peleg
Copy link

R-Peleg commented Jul 5, 2021

Describe the bug
When trying to export a Bounding Box labeling project contains 4982 images, a gateway timeout occurred.
Raising the Nginx timeout did not resolve the problem, with more than 10 minutes passed without a response.

To Reproduce
Steps to reproduce the behavior:

  1. create a project
  2. Import several thousands of annotated images
  3. Hit "Export", then select "COCO" and click "export"
  4. See error

Expected behavior
A COCO dataset should be downloaded

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: mac OS, LS run with docker-compose up
  • Label Studio Version master (commit 185ed8e), with the branch fix/export-performance merged.

Additional context
Using py-spy I collected some profiling info during the request. It seems that much of the time is spent downloading the image files, so a solution might involve handling those downloads better (caching, parallel downloading). The flame SVG reports generated attached.
profiles.zip

@niklub niklub added this to the Label Studio 1.1.1 milestone Jul 5, 2021
@niklub niklub added problem bug or something isn't working performance Performance & high load-related issues labels Jul 5, 2021
@makseq
Copy link
Member

makseq commented Jul 7, 2021

@R-Peleg Could you try to use the export from the latest master branch?

@R-Peleg
Copy link
Author

R-Peleg commented Jul 12, 2021

Tried on commit 1a3b1677, still got timeout

@makseq
Copy link
Member

makseq commented Jul 15, 2021

We made some improvements here: #1151
Could you change the latest master please? @R-Peleg

@R-Peleg
Copy link
Author

R-Peleg commented Jul 15, 2021

Still timeouts.
I suspect it can be much faster, as the images are referenced from an external server, and downloading 4,000 images just take time. Maybe the solution is to make the export asynchronous so it won't be limited to the HTTP timeout and stuck the GUI meanwhile.

@twsl
Copy link
Contributor

twsl commented Jul 27, 2021

I had the same issue just exporting to COCO format.
There should be a option only to export the label file and keep the images in the cloud.

@smoreface
Copy link
Contributor

This should be addressed with the version 1.3.0 release.

@makseq makseq added bounding boxes Image Object Detection with Bounding Boxes annotation scenario images Image annotation cases labels Oct 11, 2021
@makseq
Copy link
Member

makseq commented Jan 29, 2022

You can use download_resources in API request /api/projects//export?download_resources=0 to avoid image downloading. More info: https://labelstud.io/api#operation/api_projects_export_read

@makseq makseq closed this as completed Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounding boxes Image Object Detection with Bounding Boxes annotation scenario images Image annotation cases performance Performance & high load-related issues problem bug or something isn't working
Projects
Label Studio Roadmap
Awaiting triage
Development

No branches or pull requests

5 participants