-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge memory use #428
Comments
Seemed stuck again, so I reset the tasks again.. Memory quickly climbed to my new limit of 6G and it died again:
After this it eventually restarted:
After it finished, the memory was almost bang on 100%: Additionally, my other 2 channels don't show tasks in the schedule, only "markipiler". Even though all are scheduled to index at some point: |
How many workers do you have? |
Sorry I didn't see this. I've since removed my copy of Tubesync while I plan on migrating it to a new system. So I don't recall exactly what I had configured there. I suspect it was 2 workers, but could have been 4 🤷♂️. Imo there shouldn't be any need to use so much memory - Playlists and channels can be huge so if pulling this data is what's causing the huge use of memory, perhaps it should be written to disk instead of expecting the host to have enough memory to handle any channel size. Databses like MySQL do this - not sure what storage mechanism Tubesync uses under the hood but if it were a DB like this it should rollover onto disk just fine. |
There isn't any need for that much memory and it's absolutely a bug. The data collected by the worker can be huge in some cases and that's stored in RAM, however it's not being effectively garbage collected which is related to how the worker library is or isn't working properly. As for temporary spooling of large data to disk, that would be a yt-dlp feature and not a tubesync feature. |
Thanks for clarifying! To be clear, is the garbage collection bug on tubesync's side or yt-dlp's? I'll be setting up a new node probably in the next week or so.. If there's more debugging information I can provide, in case this happens again, please let me know and I'll do my best to share it here. I'll be running it in an LXC next though, instead of just Docker on Unraid, so the environment will be subtly different. |
It's likely in the (now generally unsupported and legacy) background worker library used in current tubesync builds, so neither directly tubesync or yt-dlp. yt-dlp does allocate a lot of memory when asked to index a massive channel, but that's likely not avoidable given the sheer size of the metadata that needs to be processed. The worker library in tubesync has been on the cards to be replaced for... some time now, it's just a lot of work and I've not had time. |
Understandable. Anyway, I really appreciate the project so I'll keep on eye on this going forward :) |
I can reproduce this issue also. I'm working on a branch this weekend which I'm hoping will be a short-term fix. Only saw the issue when I was running some VMs, which reduced the memory available on my server. I'm hoping setting max_requests_jitter and max_requests on the gunicorn workers might improve this and force each worker to give up it's memory allocations when a large job is being processed by them. |
@locke4 this isn't going to be anything to do with gunicorn, it's the background workers which are far more complex to replace. |
I've also been having this issue, though I do use TubeSync a little differently then most. I have a single Plex Watch Later playlist that I manually sync in TubeSync when I add new items, and I also use TubeSync with MariaDB and I have 64 GB of RAM. Eventually it gets to the YouTube playlist limit of 4,000 videos and becomes unmanageable in TubeSync. When I max out or near max out the YouTube playlist TubeSync will use an ungodly amount of system RAM easily over 16 GB of RAM and eventually all of it until the system becomes unresponsive. I tried creating a new empty playlist and pointing TubeSync at that, and while that did allow the indexing to go quicker Docker would still kill the python3 process for using over 8 GB of RAM the new limit I imposed on the container. Even though the playlist is empty, all the old playlist items are still stored in sync_media. Which this was expected, but what I noticed was that django is still loading all of these items and taking a ton of ram to do so. I also saw this when loading the /admin panel as well. I emptied the tables sync_media and background_task_completedtask and TubeSync becomes usable again taking only a few hundred MB of ram. |
Fair enough, you're talking about the Django Background Tasks as the background workers? I'm running a test on a channel which I know breaks tubesync:latest but assuming it won't have any improvements if it's not relating to the gunicorn or celery workers. I have a fix relating to the filter_text behaviour for this PR anyway so good to try and get that completed this weekend regardless. I want to see if I can look into a fix short of replacing them as I appreciate replacing is a much larger task. For extra info, RAM allocated during indexing also doesn't seem to be dropped which exacerbates this (below is an indexing task which completed around 8am but memory never released, deleting the channel immediately crashes the docker container with SIGKILL out of memory errors). 11am is me updating to my branch and repeating the test. |
As expected, doesn't fix the excessive memory usage during a delete operation. I was however able to complete the delete task (though the web browser timed out, the container didn't crash out with a SIGKILL error) which I don't really understand. I found arteria/django-background-tasks#186 which might be promising but forking django-background-tasks isn't the best solution long-term. |
Migrated my setup from Unraid to Proxmox, now starting off with a smaller 300 video channel. Seems ok so far: Once this is done I'll try again with another 5K video channel, see how it likes that. EDIT: Going for the big one haha. Looks fine so far though. I'm only sync'ing the metadata so far: EDIT 2: It's ballooned quite a bit, so I've put it up to 4GB overnight, as it scrapes: EDIT 3: Still seems to be using a crazy amount of RAM, though it's just enqueuing thumbnail downloads, nothing else: EDIT 4: Memory grows by a gigabyte when I refresh the tasks page, too.. |
@meeb, I want to take a look at this task thing again this weekend. Am I right in thinking your desired replacement is for tasks to be offloaded to a redis queue for celery workers to pick from? I want to make sure I go in the direction that you think is best so any pointers before I restart my feature branch would be really appreciated! |
@locke4 Yes, that's the plan. However I should warn you that this is a significant amount of work. General rewrite and refactoring big. Porting the tasks to celery tasks is the easy bit. Migrating all existing tasks (and task management and UI updates on tasks) without breaking anyone's installs is going to be much harder. |
Putting my hand up now to be a nightly build tester ✋😅 |
Just to mention here that even with rewriting the worker and tasks system this won't really reduce the peak memory requirement much, it'll just mean it gets freed again after use. |
I've got a couple of thoughts that'd I'd appreciate your thoughts on @meeb after my prototyping my feature branch on this issue yesterday.
I've got a couple of weird interactions I need to solve before I commit this feature branch to my fork repo but generally I think this approach is a good candidate fix. I should have a PR ready in a week, provided you're happy with this approach |
Thanks for the effort, @locke4! Specifically for your questions:
Generally I think you're on the right approach, but I would suspect there's going to be a lot of edge cases, weirdness and other issues that will need to be worked out with this transition. I've had two attempts at this myself and never fully finished it enough to merge. If this has any bugs in at all it's going to annoy a lot of people. |
@meeb Not to derail, and perhaps this could be another issue elsewhere, but has anyone reached out to the yt-dlp project regarding this metadata? Surely a JSON stream or something incremental is of better design than a massive blob of data? Even if it was written to disk incrementally so tubesync could stream it into the DB, this'd be far superior. It's not sustainable for either project, imo, to function like that. It'll still prevent anyone from scanning popular channels unless they're willing to dedicate 3-6GB of RAM for the instance. I'd have hoped to run such a container at 1-2GB max in a perfect world. Just my thoughts 😊 |
I have not, and quite honestly it might be how I've implemented it rather than an upstream issue. Worth checking both ends though! I hacked up the original TubeSync in a weekend to scratch a personal itch so... there may well be legacy silliness that worked fine for the smaller channels I sync but not for people trying to parse gigabytes of metadata. I've not stumbled over it yet but there may be a streaming metadata feature buried in the yt-dlp API. |
Thanks for the comments! On 1., agreed that it's not ideal to have both. At the moment I've been focussed on just fixing the two breaking bugs with large channels which I believe is now complete bar writing test cases and a couple of edge cases. I'm going to complete this feature branch fully as a fix branch. I'll then work on fully removing background-tasks on a new feature branch, so if I'm unsuccessful at least the fixes can be merged from my current branch and a full refactor completed later building on this. My main issue I haven't solved yet is how to ensure tasks are migrated or recreated in celery on update. I was thinking that one way of ensuring fully removing background-tasks isn't a breaking change is to force reset tasks on boot in some intelligent way (to avoid an unnecessary task reset). |
OK, also remember that if you're not tackling the concurrency and race condition issues initially either lock the celery workers to 1 (and disable the ENV var for worker count) as well! Happy to review what you submit as a PR. |
I have no idea why this is the way it is but this might be related it's been heavily sanitized so hopefully it still contains what's necessary.
|
Okay I reassessed my settings and in my case I just increased the number of workers from 1 up to 10 because why not. |
You should definitely keep your workers at 1. There's no technical reason locally you can't run more workers, however YouTube quickly gets angry at you and blocks your IP or imposes other similar restrictions if you run more than 1 worker. |
And thanks for the log, that just shows the worker using a load of memory for some reason and getting killed, however every bit helps. |
This is what it seemed to be doing as soon as I added other workers. Thanks for the fast reply @meeb also thanks for the hard work! More context it asked me to do a I had added these envvars too to my deployment as well. environment:
TUBESYNC_RESET_DOWNLOAD_DIR: "false"
TZ: America/Toronto
PUID: "1000"
PGID: "1000"
TUBESYNC_WORKERS: 4
TUBESYNC_DEBUG: "true"
GUNICORN_WORKERS: "3"
DJANGO_SECRET_KEY: "REDACTED"
|
You can ignore the message to run As for your errors, the first once is that playlist wasn't indexable for some reason, try using The second error is self-explanatory, tubesync doesn't have permissions to create that directory so check the local directory permissions. |
I have the playlist as unlisted but it shouldn't be private will that cause a problem I just like telling Tubesync to slurp up video I think could be useful in the future which may get taken down for various reasons. |
Try a |
Hello,
I've been debugging my Unraid instance lately, as it's been crashing since I added the TubeSync container on it. I've added some small channels that work fine (~150-200 videos), but adding a 5K+ channel like Markipilier or LTT causes huge amounts of memory to be chewed up.
I found out my system was probably running out of memory, due to it being unconstrained:
So I followed the directions in this issue, and locked the memory to 4G. But now when I try to index any channel, big or small, I get:
Mind you, all channels have been scraped already.. so it shouldn't be doing as big a job again. I'm also using an external MariaDB instance.
Any ideas what I can test next?
EDIT: Extra context:
EDIT 2: I can see the kernel killing the processes at the unraid/OS level:
The text was updated successfully, but these errors were encountered: