Skip to content
This repository has been archived by the owner on Jul 11, 2019. It is now read-only.

"command error: Resource temporarily unavailable (os error 11)" when processing a lot of files #19

Closed
Shnatsel opened this issue Sep 10, 2016 · 16 comments

Comments

@Shnatsel
Copy link

When invoking Parallel 70,000 files with filenames piped on stdin, its memory usage grows continously up to 0,5Gb of RAM, and when it stops growing Parallel starts spamming the following line:

parallel: command error: Resource temporarily unavailable (os error 11)

Happens both with and without --no-shell argument. Command line to reproduce the issue:

find '/some/folder/with/tens/of/thousands/of/files' -type f | 'target/release/parallel' -j 6 cat '{}' > /dev/null

Parallel built from git with cargo build --release on Linux x86_64, rustc 1.11.0 (9b21dcd6a 2016-08-15)

@Shnatsel
Copy link
Author

For comparison, GNU parallel is limited to 15Mb of RAM on my system on the same workload.

@mmstick
Copy link
Owner

mmstick commented Sep 10, 2016

I'll be landing a large amount of changes soon that refactors a decent portion of the source code and adds quiet mode, and once that is done I'll look into how I can improve the resource consumption of inputs. I may need to remove the feature of counting the total number of jobs and use an iterator for reading inputs from standard input.

@mmstick
Copy link
Owner

mmstick commented Sep 11, 2016

Spent a whole day trying to fix a bug when it was actually caused by the beta and nightly compiler. Anyway, I've landed the changes that I meant to land yesterday, so now I will begin to work on handling standard input in a more efficient manner.

@Shnatsel
Copy link
Author

Ah, I should have warned you about the nightly compiler - I've already figured out that it's buggy with your parallel.

I've already tried chars() iterator on BufReader in my Rust version of tr, and in addition to being an unstable language feature it's also very slow. lines() is probably a better idea.

Also, in 0.4.1 your parallel was 2x slower than GNU parallel when reading arguments from stdin.

@mmstick
Copy link
Owner

mmstick commented Sep 13, 2016

Updating the issue to say that I have Nightly builds fixed by eliminating the unsafe { mem::uninitialized<Child>() } usage. Additionally, I have been working on fixing this issue, but it will take some time to implement as I want to implement it as efficiently as possible the first time, such as trying not to use Vectors.

I plan to solve the issue by buffering 64K worth of arguments at a time and writing the arguments to disk in an unprocessed file in reverse-newline-delimited order. Then, creating an iterator that will buffer 64K worth of arguments at a time and truncate the unprocessed file after reading arguments. As arguments are completed, they will be written to processed file. This should allow me to retain the ability to determine total number of jobs and getting the Nth job, whether it's currently in memory, or in either the processed or unprocessed file. This should keep memory usage very low. Once everything is working, I'll benchmark the program with perf stat and time to get memory consumption and cycles/time spent and modify the size of the buffer to reduce the number of syscalls.

@mmstick
Copy link
Owner

mmstick commented Sep 21, 2016

I'm pretty close to resolving this problem. The issue of memory is fixed with my local changes now that inputs are being buffered to and from the disk into byte arrays. However, I've yet to resolve the issue of OS Error 11 as that's being caused by Rust failing to close child processes for some reason. I'm not sure how to ensure that child processes are closed so I'm asking the community for help with this issue.

@Shnatsel
Copy link
Author

Processes that you're done working with stay around as zombie processes; this means that they have terminated, and the only thing left from them is an entry in the process table and the exit code. As soon as the exit code is read by parallel, the process table entry will be gone.

This is done by waitpid syscall and I believe the appropriate function to call from Rust is https://doc.rust-lang.org/std/process/struct.Child.html#method.wait

@mmstick
Copy link
Owner

mmstick commented Sep 21, 2016

I've been able to fix it by borrowing the Child process as a mutable reference and then borrowing the child's fields with the as_mut() methods. Previously, I was not able to use the wait() method because it caused a borrow checker conflict with the child's fields being borrowed.

It will still be a while before I push the fixes though. I'm in the middle of refactoring a large portion of the code I've written so far, which has caused some bugs that I'm having to track down.

@mmstick
Copy link
Owner

mmstick commented Sep 21, 2016

The good news is that I just successfully processed 100,000 inputs, seq 1 100000, using only 13 Mbytes according to the maximum resident set size reported by time.

@mmstick
Copy link
Owner

mmstick commented Sep 24, 2016

Later today I'll have the changes landed for you to test out. It's going to be quite the update.

20 files changed, 5265 insertions(+), 612 deletions(-)

And some benchmarks:

Rust Parallel

    ~/D/parallel (master) $ seq 1 10000 | time -v target/release/parallel echo > /dev/null
        Command being timed: "target/release/parallel echo"
        User time (seconds): 0.48
        System time (seconds): 2.48
        Percent of CPU this job got: 59%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.93
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 12928
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2198164
        Voluntary context switches: 73174
        Involuntary context switches: 36678
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

GNU Parallel

    ~/D/parallel (master) $ seq 1 10000 | time -v parallel echo > /dev/null
        Command being timed: "parallel echo"
        User time (seconds): 97.04
        System time (seconds): 29.17
        Percent of CPU this job got: 232%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:54.17
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 66848
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 15070207
        Voluntary context switches: 250452
        Involuntary context switches: 113320
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@mmstick
Copy link
Owner

mmstick commented Sep 24, 2016

The new release has been made, so you can try it out to see if it's working as you like.

@Shnatsel
Copy link
Author

It doesn't seem to leave a lot of zombie processes around anymore. Huzzah!

Tried it on 10,000 files so far, and it's now significantly slower on a simple 'cat' workload than gnu parallel. The command line is:

find '/folder/with/lots/of/text/files/' -type f | head -n 10000 | parallel -j 6 cat '{}' > /dev/null

Runtime and peak memory usage for each:
rust: 1:40, 36,8MiB
rust, --no-shell: 1:36, 58,9MiB
gnu: 0:31, 14,6MiB

Additionally, the memory usage for Rust parallel grows over time while GNU parallel uses a fixed amount of memory.

For the record, the regular use case for this is piping all that stuff to grep instead of /dev/null to get aggregate statistics for the entire dataset.

@Shnatsel
Copy link
Author

This issue is resolved by 0.5.0 release.

@Shnatsel
Copy link
Author

Shall I open another issue for the lack of performance parity with GNU?

@mmstick
Copy link
Owner

mmstick commented Sep 24, 2016

It should be opened as a bug. I'm guessing that memory consumption is rising because the standard output and error of each task is being temporarily buffered into memory, and subsequently dropped from memory after that process has had it's turn being printed. The solution will be to modify the piping to use the DiskBuffer mechanism I created for inputs.

@mmstick
Copy link
Owner

mmstick commented Dec 27, 2016

I think you'll find with the latest version, 0.7.0, the issue of memory consumption has been thoroughly resolved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants