Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piping is extremely slow #10763

Closed
V0ldek opened this issue Oct 18, 2023 · 4 comments · Fixed by #12774
Closed

Piping is extremely slow #10763

V0ldek opened this issue Oct 18, 2023 · 4 comments · Fixed by #12774
Labels
performance Work to make nushell quicker and use less resources redirection-pipe All related to redirection to files or more complex pipelines with STDERR
Milestone

Comments

@V0ldek
Copy link

V0ldek commented Oct 18, 2023

Describe the bug

Piping when in nushell can be an order of magnitude slower than an equivalent piped command in bash. High-throughput commands on my machine suffer a slowdown of up to 50x when compared to similar redirects in bash or raw IO (for example redirecting to file vs. a binary that writes to the file itself).

How to reproduce

I used this Rust code to perform the measurements:

use std::io::{stdout, Write};
use std::time::{Duration, Instant};

const BUF_SIZE: usize = 8 * 1024;
const WRITE_CNT: usize = 8 * 1024;
const BYTES_LEN: usize = BUF_SIZE * WRITE_CNT;
const SAMPLES: usize = 15;

const BUF: [u8; BUF_SIZE] = [0xFF; BUF_SIZE];

fn main() {
    let mut stdout = stdout().lock();

    let mut samples: Vec<_> = (0..SAMPLES).map(|_| run(&mut stdout)).collect();
    samples.sort();

    let min = samples[0];
    let max = samples[SAMPLES - 1];
    let med = samples[SAMPLES / 2];

    eprintln!("Elapsed: [{min:?}, {med:?}, {max:?}]");
    eprintln!(
        "Throughput: [{:.2} MB/s, {:.2} MB/s, {:.2} MB/s]",
        thpt(min),
        thpt(med),
        thpt(max)
    );
}

fn run<W: Write>(out: &mut W) -> Duration {
    let start = Instant::now();
    for _ in 0..WRITE_CNT {
        out.write_all(&BUF).unwrap();
    }

    start.elapsed()
}

fn thpt(duration: Duration) -> f64 {
    (BYTES_LEN as f64 / 1000000.0) / duration.as_secs_f64()
}

It writes 64MBs of bytes in buffers of 8KB into its standard output and measures the time it takes. We measure 15 samples to remove spurious results, and print min, median, and max results/

The overhead is most visible if the receiver of the output has high throughput – /dev/null is hard to beat for that.

nushell

$ ./target/release/out-thpt out> /dev/null
Elapsed: [275.3485ms, 364.661123ms, 416.359778ms]
Throughput: [243.72 MB/s, 184.03 MB/s, 161.18 MB/s]

bash

$ ./target/release/out-thpt > /dev/null
Elapsed: [3.679879ms, 5.63634ms, 11.30741ms]
Throughput: [18236.70 MB/s, 11906.46 MB/s, 5934.95 MB/s]

When writing to files the same effect is seen, but with a smaller difference:

nushell

$ ./target/release/out-thpt | save out.txt -f
Elapsed: [280.598264ms, 298.941421ms, 538.216298ms]
Throughput: [239.16 MB/s, 224.49 MB/s, 124.69 MB/s]

bash

$ ./target/release/out-thpt > out.txt
Elapsed: [81.644023ms, 82.946274ms, 85.552982ms]
Throughput: [821.97 MB/s, 809.06 MB/s, 784.41 MB/s]

Another, more useful example: piping the results to a high-throughput tool like ripgrep:

nushell

$ ./target/release/out-thpt | rg 'x'
Elapsed: [190.961775ms, 278.648481ms, 652.648382ms]
Throughput: [351.43 MB/s, 240.84 MB/s, 102.83 MB/s]

bash

$ ./target/release/out-thpt | rg 'x'
Elapsed: [24.366165ms, 62.707143ms, 381.635499ms]
Throughput: [2754.18 MB/s, 1070.19 MB/s, 175.85 MB/s]

Expected behavior

I would expect nu to perhaps have some overhead on piping, but the level of 5x, 10x, 100x slowdown is very harsh when working with large datasets.

It also creates a very unexpected slowdown when writing to file. It's suddenly much worse to use the shell's redirect to a file than a commands builtin file output (if such exists).

Screenshots

No response

Configuration

version | transpose key value | to md --pretty

key value
version 0.83.1
branch
commit_hash
build_os linux-x86_64
build_target x86_64-unknown-linux-gnu
rust_version rustc 1.71.0 (8ede3aae2 2023-07-12)
rust_channel stable-x86_64-unknown-linux-gnu
cargo_version cargo 1.71.0 (cfd3bbd8f 2023-06-08)
build_time 2023-08-03 13:53:50 +01:00
build_rust_channel release
allocator standard
features default, sqlite, trash, which, zip
installed_plugins

Additional context

No response

@V0ldek V0ldek added the needs-triage An issue that hasn't had any proper look label Oct 18, 2023
@sholderbach sholderbach added performance Work to make nushell quicker and use less resources redirection-pipe All related to redirection to files or more complex pipelines with STDERR labels Oct 18, 2023
@sophiajt
Copy link
Contributor

Would be nice to find when this regression happened. When we first added piping, we tested against systems like bash and external-to-external performance was comparable. It's been a few years since we did that test, though.

@amtoine
Copy link
Member

amtoine commented Oct 19, 2023

not sure if anything has changed on that front but could you try with Nushell 0.86.0 or 0.86.1 @V0ldek ? 😇

@sholderbach sholderbach removed the needs-triage An issue that hasn't had any proper look label Oct 19, 2023
@V0ldek
Copy link
Author

V0ldek commented Oct 21, 2023

not sure if anything has changed on that front but could you try with Nushell 0.86.0 or 0.86.1 @V0ldek ? 😇

Sure thing. Unfortunately, life happened and due to boring reasons I don't have access to the machine I ran the original tests on. But here's the comparison on a new (faster) one. I had v0.84.0 there, so I'm including that and the newest nushell I could get from crates.io (v0.86.0):

/dev/null

nushell (v0.84.0)

$ ./target/release/out-thpt out> /dev/null
Elapsed: [104.189532ms, 106.332611ms, 109.440512ms]
Throughput: [644.10 MB/s, 631.12 MB/s, 613.20 MB/s]

nushell (v0.86.0)

$ ./target/release/out-thpt out> /dev/null
Elapsed: [74.913097ms, 75.719376ms, 97.910004ms]
Throughput: [895.82 MB/s, 886.28 MB/s, 685.41 MB/s]

bash

$ ./target/release/out-thpt > /dev/null
Elapsed: [2.177211ms, 2.244449ms, 3.80449ms]
Throughput: [30823.32 MB/s, 29899.93 MB/s, 17639.39 MB/s]

File

nushell (v0.84.0)

$ ./target/release/out-thpt | save out.txt -f
Elapsed: [180.060344ms, 217.293486ms, 345.291572ms]
Throughput: [372.70 MB/s, 308.84 MB/s, 194.35 MB/s]

nushell (v0.86.0)

$ ./target/release/out-thpt | save out.txt -f
Elapsed: [157.707875ms, 159.822502ms, 287.39845ms]
Throughput: [425.53 MB/s, 419.90 MB/s, 233.50 MB/s]

bash

$ ./target/release/out-thpt > out.txt
Elapsed: [56.437486ms, 56.747906ms, 59.088127ms]
Throughput: [1189.08 MB/s, 1182.58 MB/s, 1135.74 MB/s]

ripgrep

nushell (v0.84.0)

$ ./target/release/out-thpt | rg 'x'
Elapsed: [72.738512ms, 94.279384ms, 317.48761ms]
Throughput: [922.60 MB/s, 711.81 MB/s, 211.37 MB/s]

nushell (v0.86.0)

$ ./target/release/out-thpt | rg 'x'
Elapsed: [73.575086ms, 80.786581ms, 321.982243ms]
Throughput: [912.11 MB/s, 830.69 MB/s, 208.42 MB/s]

bash

$ ./target/release/out-thpt | rg 'x'
Elapsed: [11.343015ms, 13.020561ms, 227.359305ms]
Throughput: [5916.32 MB/s, 5154.07 MB/s, 295.17 MB/s]

Looks like there's been some improvement between v0.84.0 and v0.86.0, but it's still way below the "baseline" of bash.

The new version | transpose key value | to md --pretty:

key value
version 0.86.0
branch
commit_hash
build_os linux-x86_64
build_target x86_64-unknown-linux-gnu
rust_version rustc 1.73.0 (cc66ad468 2023-10-03)
rust_channel stable-x86_64-unknown-linux-gnu
cargo_version cargo 1.73.0 (9c4383fb5 2023-08-26)
build_time 2023-10-21 17:18:13 +01:00
build_rust_channel release
allocator mimalloc
features default, sqlite, trash, which, zip
installed_plugins

@yaksher
Copy link

yaksher commented Nov 9, 2023

I ran a reduced version of this benchmark (using the same code I just copy-pasted) on my new M3 MacBook.

The difference I observed is "only" 10x and not the 30x observed by the original poster, but nonetheless.

key value
version 0.86.0
branch
commit_hash
build_os macos-aarch64
build_target aarch64-apple-darwin
rust_version rustc 1.73.0 (cc66ad468 2023-10-03)
rust_channel stable-aarch64-apple-darwin
cargo_version cargo 1.73.0 (9c4383fb5 2023-08-26)
build_time 2023-10-26 00:25:34 -07:00
build_rust_channel release
allocator mimalloc
features dataframe, default, sqlite, trash, which, zip
installed_plugins

Nushell:

$ ./target/release/throughput out> /dev/null
Elapsed: [100.3025ms, 101.858708ms, 116.751542ms]
Throughput: [669.06 MB/s, 658.84 MB/s, 574.80 MB/s]

Zsh:

% ./target/release/throughput > /dev/null
Elapsed: [5.073042ms, 5.839875ms, 10.926416ms]
Throughput: [13228.53 MB/s, 11491.49 MB/s, 6141.89 MB/s]

fdncred pushed a commit that referenced this issue Mar 14, 2024
# Description
The PR overhauls how IO redirection is handled, allowing more explicit
and fine-grain control over `stdout` and `stderr` output as well as more
efficient IO and piping.

To summarize the changes in this PR:
- Added a new `IoStream` type to indicate the intended destination for a
pipeline element's `stdout` and `stderr`.
- The `stdout` and `stderr` `IoStream`s are stored in the `Stack` and to
avoid adding 6 additional arguments to every eval function and
`Command::run`. The `stdout` and `stderr` streams can be temporarily
overwritten through functions on `Stack` and these functions will return
a guard that restores the original `stdout` and `stderr` when dropped.
- In the AST, redirections are now directly part of a `PipelineElement`
as a `Option<Redirection>` field instead of having multiple different
`PipelineElement` enum variants for each kind of redirection. This
required changes to the parser, mainly in `lite_parser.rs`.
- `Command`s can also set a `IoStream` override/redirection which will
apply to the previous command in the pipeline. This is used, for
example, in `ignore` to allow the previous external command to have its
stdout redirected to `Stdio::null()` at spawn time. In contrast, the
current implementation has to create an os pipe and manually consume the
output on nushell's side. File and pipe redirections (`o>`, `e>`, `e>|`,
etc.) have precedence over overrides from commands.

This PR improves piping and IO speed, partially addressing #10763. Using
the `throughput` command from that issue, this PR gives the following
speedup on my setup for the commands below:
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| --------------------------- | -------------:| ------------:|
-----------:|
| `throughput o> /dev/null` | 1169 | 52938 | 54305 |
| `throughput \| ignore` | 840 | 55438 | N/A |
| `throughput \| null` | Error | 53617 | N/A |
| `throughput \| rg 'x'` | 1165 | 3049 | 3736 |
| `(throughput) \| rg 'x'` | 810 | 3085 | 3815 |

(Numbers above are the median samples for throughput)

This PR also paves the way to refactor our `ExternalStream` handling in
the various commands. For example, this PR already fixes the following
code:
```nushell
^sh -c 'echo -n "hello "; sleep 0; echo "world"' | find "hello world"
```
This returns an empty list on 0.90.1 and returns a highlighted "hello
world" on this PR.

Since the `stdout` and `stderr` `IoStream`s are available to commands
when they are run, then this unlocks the potential for more convenient
behavior. E.g., the `find` command can disable its ansi highlighting if
it detects that the output `IoStream` is not the terminal. Knowing the
output streams will also allow background job output to be redirected
more easily and efficiently.

# User-Facing Changes
- External commands returned from closures will be collected (in most
cases):
  ```nushell
  1..2 | each {|_| nu -c "print a" }
  ```
This gives `["a", "a"]` on this PR, whereas this used to print "a\na\n"
and then return an empty list.

  ```nushell
  1..2 | each {|_| nu -c "print -e a" }
  ```
This gives `["", ""]` and prints "a\na\n" to stderr, whereas this used
to return an empty list and print "a\na\n" to stderr.

- Trailing new lines are always trimmed for external commands when
piping into internal commands or collecting it as a value. (Failure to
decode the output as utf-8 will keep the trailing newline for the last
binary value.) In the current nushell version, the following three code
snippets differ only in parenthesis placement, but they all also have
different outputs:

  1. `1..2 | each { ^echo a }`
     ```
     a
     a
     ╭────────────╮
     │ empty list │
     ╰────────────╯
     ```
  2. `1..2 | each { (^echo a) }`
     ```
     ╭───┬───╮
     │ 0 │ a │
     │ 1 │ a │
     ╰───┴───╯
     ```
  3. `1..2 | (each { ^echo a })`
     ```
     ╭───┬───╮
     │ 0 │ a │
     │   │   │
     │ 1 │ a │
     │   │   │
     ╰───┴───╯
     ```

  But in this PR, the above snippets will all have the same output:
  ```
  ╭───┬───╮
  │ 0 │ a │
  │ 1 │ a │
  ╰───┴───╯
  ```

- All existing flags on `run-external` are now deprecated.

- File redirections now apply to all commands inside a code block:
  ```nushell
  (nu -c "print -e a"; nu -c "print -e b") e> test.out
  ```
This gives "a\nb\n" in `test.out` and prints nothing. The same result
would happen when printing to stdout and using a `o>` file redirection.

- External command output will (almost) never be ignored, and ignoring
output must be explicit now:
  ```nushell
  (^echo a; ^echo b)
  ```
This prints "a\nb\n", whereas this used to print only "b\n". This only
applies to external commands; values and internal commands not in return
position will not print anything (e.g., `(echo a; echo b)` still only
prints "b").

- `complete` now always captures stderr (`do` is not necessary).

# After Submitting
The language guide and other documentation will need to be updated.
@hustcer hustcer added this to the v0.94.0 milestone May 17, 2024
FilipAndersson245 pushed a commit to FilipAndersson245/nushell that referenced this issue May 18, 2024
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
    Read(Box<dyn Read + Send + 'static>),
    File(File),
    Child(ChildProcess),
}
```

This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.

Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
    Empty,
    Value(Value, Option<PipelineMetadata>),
    ListStream(ListStream, Option<PipelineMetadata>),
    ByteStream(ByteStream, Option<PipelineMetadata>),
}
```

The PR is relatively large, but a decent amount of it is just repetitive
changes.

This PR fixes nushell#7017, fixes nushell#10763, and fixes nushell#12369.

This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |

# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.

# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.

---------

Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Work to make nushell quicker and use less resources redirection-pipe All related to redirection to files or more complex pipelines with STDERR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants