Add buffering to stdout when it's not a terminal. #885

tmccombs · 2021-11-15T07:09:00Z

Follow-up of #736

sharkdp · 2021-11-15T08:30:04Z

Thank you for bringing this up to date! It would be good to see some benchmark results (regression.sh tests and maybe some benchmarks with output activated for TTY behavior) in order to gain confidence in the changes. Let me know if you need help.

src/output.rs

src/walk.rs

tavianator · 2021-11-15T15:39:49Z

By the way, you could add

Co-authored-by: sourlemon207 <jw1756@protonmail.com>

to the commit message to attribute it to both you and the original author.

tmccombs · 2021-11-16T05:50:11Z

I tried running the the rust-lang repository and got:

`fd` regression benchmark

No pattern

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang'`	245.2 ± 2.7	241.6	249.7	1.00
`./fd-feature --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang'`	269.5 ± 4.4	263.3	278.1	1.10 ± 0.02

Simple pattern

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	343.1 ± 4.6	338.4	355.0	1.00 ± 0.01
`./fd-feature '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	342.6 ± 1.8	340.6	346.4	1.00

Simple pattern (-HI)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	242.9 ± 3.7	238.7	251.2	1.00
`./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	248.9 ± 5.3	242.5	258.1	1.02 ± 0.03

File extension

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang'`	250.9 ± 4.4	246.0	258.4	1.00
`./fd-feature -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang'`	253.3 ± 2.9	249.3	258.6	1.01 ± 0.02

File type

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang'`	245.2 ± 4.6	240.8	255.7	1.00
`./fd-feature -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang'`	251.1 ± 4.9	246.5	261.0	1.02 ± 0.03

Cold cache

Command	Mean [s]	Min [s]	Max [s]	Relative
`./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	28.951 ± 0.725	28.224	29.674	1.01 ± 0.03
`./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	28.654 ± 0.318	28.339	28.975	1.00

interestingly, it looks like this branch is actually a little bit slower, I'm not really sure why.

tmccombs · 2021-11-16T06:41:09Z

I switch back to using channel instead of sync_channel than it is, at least not any slower:

`fd` regression benchmark

No pattern

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang'`	250.0 ± 2.4	245.9	253.7	1.00
`./fd-feature --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang'`	254.4 ± 1.4	251.7	255.9	1.02 ± 0.01

Simple pattern

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	342.2 ± 3.1	336.3	347.9	1.00
`./fd-feature '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	342.7 ± 2.7	340.3	349.0	1.00 ± 0.01

Simple pattern (-HI)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	242.6 ± 3.1	239.2	249.0	1.00
`./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	244.9 ± 3.8	239.2	253.7	1.01 ± 0.02

File extension

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang'`	253.8 ± 3.0	249.2	259.3	1.00
`./fd-feature -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang'`	254.7 ± 3.2	250.6	260.4	1.00 ± 0.02

File type

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang'`	243.4 ± 4.5	238.9	255.0	1.00
`./fd-feature -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang'`	244.8 ± 3.0	240.9	249.1	1.01 ± 0.02

Cold cache

Command	Mean [s]	Min [s]	Max [s]	Relative
`./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	29.176 ± 0.388	28.729	29.432	1.00 ± 0.02
`./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang'`	29.120 ± 0.375	28.754	29.504	1.00

sharkdp · 2021-11-16T19:31:21Z

Ok, thank you. And do we have a significant improvement for searches that need to write a lot of results to stdout? Something like fd -HI?

sharkdp · 2021-11-16T19:31:54Z

Have you played around with different buffer sizes? https://doc.rust-lang.org/stable/std/io/struct.BufWriter.html#method.with_capacity

tmccombs · 2021-11-18T07:16:35Z

Have you played around with different buffer sizes?

I have not

I'm wondering if it might be that the bottleneck, at least on my system, is reading from disk, so even if it writes to stdout faster, it doesn't impact the overall run time.

So I tried changing it to write the output to a file, instead of whatever hyperfine does with the output by default, and got much better results:

❯ hyperfine --warmup 5 './fd-master -HI "" ~/bulk-home/devel/rust-lang/ > test1' './fd-feature -HI "" ~/bulk-home/devel/rust-lang/ > test2' --shell bash                                                at 00:14:26 
Benchmark 1: ./fd-master -HI "" ~/bulk-home/devel/rust-lang/ > test1
  Time (mean ± σ):     559.2 ms ±  91.6 ms    [User: 813.3 ms, System: 878.3 ms]
  Range (min … max):   511.7 ms … 817.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./fd-feature -HI "" ~/bulk-home/devel/rust-lang/ > test2
  Time (mean ± σ):     326.5 ms ±  17.4 ms    [User: 781.8 ms, System: 681.6 ms]
  Range (min … max):   301.9 ms … 347.3 ms    10 runs
 
Summary
  './fd-feature -HI "" ~/bulk-home/devel/rust-lang/ > test2' ran
    1.71 ± 0.30 times faster than './fd-master -HI "" ~/bulk-home/devel/rust-lang/ > test1'

This is based on the work of sharkdp#736 by @sourlemon207. I've added the suggestion I recommended on that PR.

Co-authored-by: sourlemon207 <jw1756@protonmail.com>

sharkdp · 2021-11-26T18:46:12Z

So I tried changing it to write the output to a file, instead of whatever hyperfine does with the output by default, and got much better results:

Using hyperfine, it should write to /dev/null by default (related: sharkdp/hyperfine#377). I would have naively assumed that this wouldn't change performance!?

sharkdp · 2021-11-26T19:05:06Z

I ran some benchmarks myself. I see quite significant improvements when a lot of results (1.6 million) are printed:

Benchmark 1: ./fd-master --hidden --no-ignore '' '/home/shark'
  Time (mean ± σ):      1.392 s ±  0.025 s    [User: 4.129 s, System: 4.597 s]
  Range (min … max):    1.372 s …  1.454 s    10 runs
 
Benchmark 2: ./fd-feature --hidden --no-ignore '' '/home/shark'
  Time (mean ± σ):      1.094 s ±  0.037 s    [User: 3.769 s, System: 4.273 s]
  Range (min … max):    1.072 s …  1.193 s    10 runs
 
Summary
  './fd-feature --hidden --no-ignore '' '/home/shark'' ran
    1.27 ± 0.05 times faster than './fd-master --hidden --no-ignore '' '/home/shark''

In combination with the changes by @tavianator, this will make colorized searches almost 5x faster (compared to 8.2.1) 🥳 Very cool!

Benchmark 1: ./fd-8.2.1 --color=always '' '/home/shark'
  Time (mean ± σ):      2.736 s ±  0.034 s    [User: 2.001 s, System: 2.149 s]
  Range (min … max):    2.706 s …  2.820 s    10 runs
 
Benchmark 2: ./fd-feature --color=always '' '/home/shark'
  Time (mean ± σ):     595.9 ms ±   5.8 ms    [User: 1085.6 ms, System: 916.4 ms]
  Range (min … max):   584.9 ms … 606.8 ms    10 runs
 
Summary
  './fd-feature --color=always '' '/home/shark'' ran
    4.59 ± 0.07 times faster than './fd-8.2.1 --color=always '' '/home/shark''

tavianator · 2021-11-26T19:13:46Z

A write() to /dev/null doesn't even look at the data, so reducing the number of write()s just reduces syscall overhead which may be hidden by parallelism anyway.

sharkdp · 2021-11-26T19:18:25Z

A write() to /dev/null doesn't even look at the data, so reducing the number of write()s just reduces syscall overhead which may be hidden by parallelism anyway.

Interesting. I was secretly hoping that you would explain what's going on 😄 - thanks!

So I guess this means that we would get slightly more realistic benchmarking results if we pipe the output to a file?

sharkdp · 2021-11-26T19:36:40Z

Have you played around with different buffer sizes?

I have not

I tried out various buffer sizes and it doesn't make a huge difference as long as the buffer size is larger than a few 100 B. The default is 8 KiB which seems reasonable.

sharkdp · 2021-11-26T19:56:13Z

So I guess this means that we would get slightly more realistic benchmarking results if we pipe the output to a file?

If the output goes to an actual file (on tmpfs), the advantage is even greater:

Benchmark 1: ./fd-master --hidden --no-ignore '' '/home/shark' > /tmp/test.out
  Time (mean ± σ):      1.863 s ±  0.005 s    [User: 4.058 s, System: 5.137 s]
  Range (min … max):    1.857 s …  1.875 s    10 runs
 
Benchmark 2: fd --hidden --no-ignore '' '/home/shark' > /tmp/test.out
  Time (mean ± σ):      1.087 s ±  0.005 s    [User: 3.716 s, System: 4.357 s]
  Range (min … max):    1.078 s …  1.098 s    10 runs
 
Summary
  'fd --hidden --no-ignore '' '/home/shark' > /tmp/test.out' ran
    1.71 ± 0.01 times faster than './fd-master --hidden --no-ignore '' '/home/shark' > /tmp/test.out'

Even more so, if we are writing to an SSD:

Benchmark 1: ./fd-master --hidden --no-ignore '' '/home/shark' > /home/shark/test.out
  Time (mean ± σ):      3.376 s ±  0.064 s    [User: 4.138 s, System: 6.629 s]
  Range (min … max):    3.318 s …  3.525 s    10 runs
 
Benchmark 2: fd --hidden --no-ignore '' '/home/shark' > /home/shark/test.out
  Time (mean ± σ):      1.480 s ±  0.299 s    [User: 3.748 s, System: 4.469 s]
  Range (min … max):    1.184 s …  2.137 s    10 runs
 
Summary
  'fd --hidden --no-ignore '' '/home/shark' > /home/shark/test.out' ran
    2.28 ± 0.46 times faster than './fd-master --hidden --no-ignore '' '/home/shark' > /home/shark/test.out'

pv confirms the same:

▶ ./fd-master --hidden --no-ignore '' '/home/shark' | pv -a > /dev/null
[70,7MiB/s]

▶ ./fd-feature-885 --hidden --no-ignore '' '/home/shark' | pv -a > /dev/null
[ 145MiB/s]

sharkdp · 2021-11-26T20:21:50Z

I see no reason to delay merging this.

tavianator · 2021-11-26T20:22:12Z

So I guess this means that we would get slightly more realistic benchmarking results if we pipe the output to a file?

Depends what you're trying to be realistic about :). Writing to /dev/null is a pretty good proxy for the "maximum possible" throughput of a program, which is a reasonable thing to benchmark.

You might also try writing to a file, but that complicates things. Depending on how much you write, the data might only hit the page cache, or the flash cache of a hard drive, etc. The filesystem, hardware, etc. will all affect the performance and make it harder to interpret the results.

The right thing to do depends on what you actually want to measure. I think mostly people care about interactive use, where writing to the TTY is probably the biggest bottleneck. The current benchmarks don't really capture this well.

Another common use is probably piping fd to some other program. For this case it might be better to do something like fd | cat >/dev/null instead of fd >/dev/null so that the writes actually go through a pipe.

sharkdp · 2021-11-26T20:24:10Z

I see no reason to delay merging this. Thank you @sourlemon207 @tmccombs @tavianator.

I'll create a ticket to address the shortcomings of the current benchmark set.

tmccombs force-pushed the tty-buffer branch from cb4e57f to 48810a0 Compare November 15, 2021 08:04

tmccombs changed the title ~~Tty buffer~~ Add buffering to stdout when it's not a terminal. Nov 15, 2021

tavianator reviewed Nov 15, 2021

View reviewed changes

src/output.rs Outdated Show resolved Hide resolved

src/walk.rs Show resolved Hide resolved

src/walk.rs Outdated Show resolved Hide resolved

src/walk.rs Outdated Show resolved Hide resolved

sharkdp mentioned this pull request Nov 23, 2021

Improved benchmark suite sharkdp/bat#1953

Merged

tmccombs and others added 4 commits November 26, 2021 19:40

Add buffering to stdout when it's not a terminal

c58cb51

This is based on the work of sharkdp#736 by @sourlemon207. I've added the suggestion I recommended on that PR.

Add entry for buffering to CHANGELOG

3e1453b

squash! Add buffering to stdout when it's not a terminal

b78b319

Co-authored-by: sourlemon207 <jw1756@protonmail.com>

Use non-sync channel

e45181b

sharkdp force-pushed the tty-buffer branch from 7fb4ae9 to e45181b Compare November 26, 2021 19:06

sharkdp approved these changes Nov 26, 2021

View reviewed changes

tavianator approved these changes Nov 26, 2021

View reviewed changes

sharkdp merged commit f219da4 into sharkdp:master Nov 26, 2021

sharkdp mentioned this pull request Nov 26, 2021

Extended benchmark suite #893

Open

tmccombs mentioned this pull request Jan 4, 2022

excessive memory usage on huge trees #918

Closed

tmccombs mentioned this pull request Dec 13, 2023

Add a streaming mode when running under pipe #1313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add buffering to stdout when it's not a terminal. #885

Add buffering to stdout when it's not a terminal. #885

tmccombs commented Nov 15, 2021 •

edited by sharkdp

sharkdp commented Nov 15, 2021

tavianator commented Nov 15, 2021

tmccombs commented Nov 16, 2021 •

edited by sharkdp

tmccombs commented Nov 16, 2021 •

edited by sharkdp

sharkdp commented Nov 16, 2021

sharkdp commented Nov 16, 2021

tmccombs commented Nov 18, 2021

sharkdp commented Nov 26, 2021

sharkdp commented Nov 26, 2021

tavianator commented Nov 26, 2021 •

edited

sharkdp commented Nov 26, 2021

sharkdp commented Nov 26, 2021

sharkdp commented Nov 26, 2021 •

edited

sharkdp commented Nov 26, 2021

tavianator commented Nov 26, 2021

sharkdp commented Nov 26, 2021

Add buffering to stdout when it's not a terminal. #885

Add buffering to stdout when it's not a terminal. #885

Conversation

tmccombs commented Nov 15, 2021 • edited by sharkdp

sharkdp commented Nov 15, 2021

tavianator commented Nov 15, 2021

tmccombs commented Nov 16, 2021 • edited by sharkdp

fd regression benchmark

No pattern

Simple pattern

Simple pattern (-HI)

File extension

File type

Cold cache

tmccombs commented Nov 16, 2021 • edited by sharkdp

fd regression benchmark

No pattern

Simple pattern

Simple pattern (-HI)

File extension

File type

Cold cache

sharkdp commented Nov 16, 2021

sharkdp commented Nov 16, 2021

tmccombs commented Nov 18, 2021

sharkdp commented Nov 26, 2021

sharkdp commented Nov 26, 2021

tavianator commented Nov 26, 2021 • edited

sharkdp commented Nov 26, 2021

sharkdp commented Nov 26, 2021

sharkdp commented Nov 26, 2021 • edited

sharkdp commented Nov 26, 2021

tavianator commented Nov 26, 2021

sharkdp commented Nov 26, 2021

tmccombs commented Nov 15, 2021 •

edited by sharkdp

tmccombs commented Nov 16, 2021 •

edited by sharkdp

`fd` regression benchmark

tmccombs commented Nov 16, 2021 •

edited by sharkdp

`fd` regression benchmark

tavianator commented Nov 26, 2021 •

edited

sharkdp commented Nov 26, 2021 •

edited