Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of concurrent preview generations #18210

Merged
merged 1 commit into from
Nov 5, 2022

Conversation

dbw9580
Copy link

@dbw9580 dbw9580 commented Dec 3, 2019

Fixes #15075

I added two semaphores to limit concurrent access to CPU resources. One is to guard the CPU-intensive calls to GeneratorHelper::getThumbnail and several image manipulation operations in Generator::generatePreview. The other is around PreviewManager::getPreview.

The second semaphore is necessary because without it the responses see a significant delay in delivery. This is the waterfall graph of a page load of 30 pictures, without the second semaphore guarding PreviewManager::getPreview, and preview_concurrent_new set to 4:
Screenshot_20191204_180403
You can see most thumbnails are loaded within the last 6 seconds.

This is how it looks with preview_concurrent_new set to 4, and preview_concurrent_all set to 8:
Screenshot_20191204_185025
The loading process is much more gradual and smoother.

@dbw9580 dbw9580 marked this pull request as ready for review December 4, 2019 12:23
@dbw9580
Copy link
Author

dbw9580 commented Dec 4, 2019

I installed my Nextcloud instance inside a linux container with 3 cores and 2GB RAM allocated. The host has 4 cores @ 1.6GHz and 8GB RAM in total. Without proper restrictions on how many preview generations can happen at a time, a scroll load of 50 pictures eats up all 2GB memory and causes the kernel OOM killer to kick in. Sometimes the MySQL process is also killed, rendering the website being out of service for a while.

After applying limits on concurrent preview generations (preview_concurrent_new to 3 and preview_concurrent_all to 8), the same page load typically uses 800 ~ 900MB RAM, seldom over 1GB, and there has never been an outage due to OOM ever since.

@szaimen
Copy link
Contributor

szaimen commented Dec 28, 2019

@kesselb why wasn't this PR considered for Nextcloud 18?
Sounds really awesome!

@kesselb
Copy link
Contributor

kesselb commented Dec 28, 2019

No one had time to look into I guess. I like the idea in general. Code style is a bit off (we use camelCase here). Probably we could deduplicate the code a bit. Needs some testing of course.

Some feedback and questions:

  • I agree to limiting the generation but the all case will also limit / block the delivery of existing. Someone has to look at this (probably with xhprof and xdebug).
  • What happens if a lock is never released (probably a crashed process). By default all locks are released on request_shutdown.
  • Not sure if possible but some tests would be cool.

I think it's a good way using a lock. Most of these "improve preview generation" pull requests had in common that they improve the situation for one use case (I remember a pull request where the author announce 200% performance boost. Some tests later it boils down only for nfs based filesystem).

As always: Feel free to patch your instance and report back your experiences.

@ariselseng
Copy link
Member

@kesselb Maybe I am off here. But cannot this be mostly solved by limited how many concurrent previews are opened in the web UI? Like lazy loading and max 2-3 concurrent previews at a time? Of course that will not fix having many clients connecting at the same time.

@skjnldsv
Copy link
Member

@dbw9580 please rebase

@dbw9580
Copy link
Author

dbw9580 commented Nov 4, 2020

@skjnldsv done. Also deduplicated the code.

@rullzer rullzer mentioned this pull request Dec 14, 2020
59 tasks
@rullzer rullzer removed this from the Nextcloud 21 milestone Dec 14, 2020
@szaimen
Copy link
Contributor

szaimen commented Jan 5, 2021

Any update here?

@dbw9580
Copy link
Author

dbw9580 commented Jan 7, 2021

Hi devs, is there anything I can do to help get this PR merged?

@skjnldsv
Copy link
Member

skjnldsv commented Jan 7, 2021

Hi devs, is there anything I can do to help get this PR merged?

Not much! :)
Lets ping @ChristophWurst @juliushaertl @rullzer and @MorrisJobke

@ChristophWurst
Copy link
Member

I'm not sure how that could work with clustered envs, like if there is more than one application server. I assume it wouldn't lock across servers but always be scoped to each server, right?

@dbw9580
Copy link
Author

dbw9580 commented Jan 7, 2021 via email

@skjnldsv skjnldsv added the 4. to release Ready to be released and/or waiting for tests to finish label Nov 4, 2022
@skjnldsv
Copy link
Member

skjnldsv commented Nov 4, 2022

Restarted drone

@skjnldsv
Copy link
Member

skjnldsv commented Nov 4, 2022

Seems like the unit tests are not ok with this

@skjnldsv skjnldsv added 2. developing Work in progress and removed 4. to release Ready to be released and/or waiting for tests to finish labels Nov 4, 2022
@dbw9580
Copy link
Author

dbw9580 commented Nov 4, 2022

Warning: Transient problem: connection refused Will retry in 10 seconds. 1 
Warning: retries left.

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to localhost port 9000: Connection refused
Error: Process completed with exit code 7.

https://github.com/nextcloud/server/actions/runs/3387563288/jobs/5628546406

I see the error is a network connection issue. How is this related to my change?

@szaimen
Copy link
Contributor

szaimen commented Nov 4, 2022

Warning: Transient problem: connection refused Will retry in 10 seconds. 1 
Warning: retries left.

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to localhost port 9000: Connection refused
Error: Process completed with exit code 7.

https://github.com/nextcloud/server/actions/runs/3387563288/jobs/5628546406

I see the error is a network connection issue. How is this related to my change?

@dbw9580 it is not this test that is the problem. The nodb drone test is the problem which runs in a timeout. See e.g. https://drone.nextcloud.com/nextcloud/server/25183/9/4

If you have any idea how to fix that, help would be really appreciated! :)

@dbw9580
Copy link
Author

dbw9580 commented Nov 4, 2022

I see the nodb test is stuck at 61%, how can I find out exactly which test is running into a problem?

@szaimen
Copy link
Contributor

szaimen commented Nov 4, 2022

Seems like it hangs at Test 'Test\Preview\GeneratorTest::testGetNewPreview' started

@szaimen
Copy link
Contributor

szaimen commented Nov 4, 2022

BTW, @dbw9580 can you please fix DCO? Then we can merge this if the tests should run through :)

@szaimen
Copy link
Contributor

szaimen commented Nov 5, 2022

@dbw9580 so the tests finally pass now. Please fix DCO and then I am merging this. Thank you! :)

Signed-off-by: Bowen Ding <dbw9580@live.com>
Signed-off-by: szaimen <szaimen@e.mail.de>
@dbw9580
Copy link
Author

dbw9580 commented Nov 5, 2022

@szaimen rebased and signed off the commit, PTAL.

@szaimen
Copy link
Contributor

szaimen commented Nov 5, 2022

Thank you very much! The tests pass now and everything looks good! Merging! :)

@szaimen szaimen merged commit 779fedd into nextcloud:master Nov 5, 2022
@welcome
Copy link

welcome bot commented Nov 5, 2022

Thanks for your first pull request and welcome to the community! Feel free to keep them coming! If you are looking for issues to tackle then have a look at this selection: https://github.com/nextcloud/server/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22

@dbw9580
Copy link
Author

dbw9580 commented Nov 5, 2022

@szaimen thank you for the review and all the efforts for making the merge happen! looking forward to seeing this in the next release!

@jospoortvliet
Copy link
Member

@dbw9580 thanks for your incredible patience with this! It took a long time, but I'm also super happy it got in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

We need limit the amount of previews generated at once when listing images in Files