Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some way to run in parallel? #103

Closed
sam0x17 opened this issue May 13, 2022 · 9 comments
Closed

Some way to run in parallel? #103

sam0x17 opened this issue May 13, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@sam0x17
Copy link

sam0x17 commented May 13, 2022

I've noticed that the ffmpeg options won't utilize all cores -- is there a setting that allows configuring a target number of cores and/or running multiple threads when processing a large number of files? I'm on a 5950x so I'd like to max it out as it process 7k files rather than wait 3x as long.

Would it work if I run several instances of PlexCleaner at the same time, or would that cause synchronization issues as the instances try to tackle the same files at the same time?

@ptr727
Copy link
Owner

ptr727 commented May 13, 2022

As far as I'm aware ffmpeg and handbrake will use as many cores and as much memory as it can, and could be limited based on the number of encoding frames being processed.

I did consider running coding for processing multiple files in parallel, but on my systems (fewer cores, and runs in background), the overhead of job synchronization was not warranted.

If there is a commandline option for ffmpeg that helps in your case I could add it, you could copy the debug output while encoding and experiment with your own custom values, and let me know.

@ptr727 ptr727 added the question Further information is requested label May 13, 2022
@sam0x17
Copy link
Author

sam0x17 commented May 14, 2022

I'm not aware of a command line option that makes ffmpeg use any more than it does (and as you said, oftentimes it's not parallelizable beyond the number of audio+video tracks). The way I resolved this with my own script was to run on 5-6 video files at a time, which resulted in always using ~98% CPU usage. Would be cool if there was a way to specify some degree of parallelism with PlexCleaner, or ensure that if two PlexCleaner processes are running, they don't trip each other up / don't try to process the same file if one is already processing it (haven't tested this, but I imagine there may be some issues there).

@ptr727
Copy link
Owner

ptr727 commented May 14, 2022

It would be possible to process individual files in parallel up to a configurable limit, but as you mention the bookkeeping and synchronization may be tricky, not impossible, I'll keep it in mind for future refactoring.

I would suggest you do as you did by processing different directories in parallel, launching an instance for each directory, be sure to use different log files.

I did see some google referenced to a ffmpeg -threads parameter, but it may be deprecated as it is no longer mentioned in the current documentation.
The docs do mention a -filter_threads parameter, but it already defaults to the number of processors, so I suspect the codec and number of frames being encoded is probably a limiting factor, that would only improve by encoding multiple files in parallel, or multiple parts of the same file in parallel.

-filter_threads nb_threads (global)
Defines how many threads are used to process a filter pipeline. Each pipeline will produce a thread pool with this many threads available for parallel processing. The default is the number of available CPUs.

@ptr727 ptr727 added enhancement New feature or request and removed question Further information is requested labels May 14, 2022
@ptr727
Copy link
Owner

ptr727 commented May 15, 2022

Well, instead of doing my weekend home and work chores, I implemented parallel processing.
Was easier than anticipated using PLINQ, as almost all iterations were over enumerable items, but could probably be optimized.

Please give the pre-release / develop builds a try, and let me know.

From the little testing I did processing of already processes files are quite a bit faster, I have not tested parallel encoding, I don't have anything queued up just yet.

I do notice that the default, that uses processor count for thread count, really bogs my system down, maybe good if that is what you want, else I'd suggest using half or experimenting with the thread count.

Troubleshooting will be quite a bit more complicated as the logged events are time based and not in logical order. I added the thread id to output to help, will look for open source tooling that can help analyze / structure the logs in logical order.

E.g.

docker run \
  -it \
  --rm \
  --name PlexCleaner-Develop-All \
  --user nobody:users \
  --env TZ=America/Los_Angeles \
  --volume /data/media:/media:rw \
  ptr727/plexcleaner:develop \
  /PlexCleaner/PlexCleaner \
    --parallel \
    --settingsfile /media/PlexCleaner/PlexCleaner-Develop.json \
    --logfile /media/PlexCleaner/PlexCleaner-Develop-All.log \
    process \
    --mediafiles /media/Series \
    --mediafiles /media/Movies \
    --mediafiles /media/Movies-4K

Using latest (cold ZFS cache):

[22:44:39 INF] Completed "Process"
[22:44:39 INF] Processing time : 00:07:15.3534166
[22:44:39 INF] Total files : 46111

Using latest (hot ZFS cache):

[22:51:31 INF] Completed "Process"
[22:51:31 INF] Processing time : 00:00:50.8615600
[22:51:31 INF] Total files : 46111

Using develop with parallel (hot ZFS cache):

22:48:27 [INF] <1> Completed "Process"
22:48:27 [INF] <1> Processing time : 00:00:12.6582025
22:48:27 [INF] <1> Total files : 46111

i.e. 13s vs. 51s to process 46111 files.

@sam0x17
Copy link
Author

sam0x17 commented May 15, 2022 via email

@ptr727
Copy link
Owner

ptr727 commented May 20, 2022

I've been testing the develop docker build it a couple days and seems to work as expected.
Any feedback or issues?

@sam0x17
Copy link
Author

sam0x17 commented May 20, 2022 via email

@ptr727
Copy link
Owner

ptr727 commented May 20, 2022

Default thread count is cores / 2, manual config using --threadcount e.g. PlexCleaner --parallel --threadcount 2

@ptr727
Copy link
Owner

ptr727 commented May 20, 2022

Implemented with #110

@ptr727 ptr727 closed this as completed May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants