Moving to the end of 6 Mb file takes forever #304

mfrw · 2018-09-11T13:17:31Z

I am not really sure if this is an issue or intended behaviour.
It took me forever to move to end of a 6Mb file.

Steps to reproduce:

$ seq 1 1000000 > test
$ bat test
Press G to move to end of the file.
This step took a long time. It seems bat was going through the file sequentially.

Apologies, if this is not the place to ask this question or this is a known bug/feature.

The text was updated successfully, but these errors were encountered:

sharkdp · 2018-09-11T20:37:37Z

Thank you for reporting this. Yes, this is the right place to ask questions like this 👍

Let's try to untangle this a little bit:

When you call bat test, bat runs a pager (presumably less) and pipes all its output (the whole file contents) to the pager. Moving to the end of the file is slow because bat takes some time to output the full file contents.

Next, we can do some benchmarks. I am going to use my hyperfine tool to perform the measurements.

First, let's compare cat and bat when both are printing to /dev/null. Note that this is not the benchmark we are looking for, because in the interactive use (with or without the pager), we need to print (part of) the output to the terminal. This can cost a significant amount of time, especially if ANSI escape sequences are involved. Also, bat will use a faster loop-through-mode if it detects a non-interactive terminal.

> hyperfine 'cat test' 'bat test'
Benchmark #1: cat test

  Time (mean ± σ):       2.1 ms ±   0.9 ms    [User: 1.2 ms, System: 1.7 ms]
 
  Range (min … max):     1.2 ms …   6.6 ms
 
  Warning: Command took less than 5 ms to complete. Results might be inaccurate.
 
Benchmark #2: bat test

  Time (mean ± σ):     422.1 ms ±   6.8 ms    [User: 207.3 ms, System: 214.4 ms]
 
  Range (min … max):   415.7 ms … 439.5 ms
 
Summary

  'cat test' ran
  199.43x faster than 'bat test'

So even with bats loop-through mode, it is two orders of magnitude slower than cat. This is unfortunate, but not too surprising. We have never optimized for speed and bat is reading and printing files line-by-line instead of using a larger buffer. Still, this is definitely something we could work on.

Next, we can make bat even slower by enabling all components that would be printed if we were printing to an interactive terminal:

> hyperfine --warmup 3 'bat --style=full --decorations=always --color=always test'
Benchmark #1: bat --style=full --decorations=always --color=always test

  Time (mean ± σ):      2.298 s ±  0.035 s    [User: 2.069 s, System: 0.226 s]
 
  Range (min … max):    2.257 s …  2.378 s

With everything enabled (decorations such as the line numbers, the grid and ANSI colors), bat is another order of magnitude slower and takes about 2 seconds to print the whole file.

However, this still doesn't quite explain why it takes around 8-10 seconds (on my machine) for the pager to scroll to the end of the output when just using bat test.

To simulate this behavior in the benchmarks, we can use the --show-output option of hyperfine which will print the whole output to the terminal (instead of piping to /dev/null). Using this option, the benchmark will now include the rendering time of the terminal emulator (which might or might not be comparable to what less needs to do). Let's compare both cat and bat with this option enabled:

> hyperfine --show-output 'cat test' 'bat --paging=never test'

[...]

  Time (mean ± σ):      1.054 s ±  0.044 s    [User: 0.9 ms, System: 376.8 ms]
 
  Range (min … max):    0.984 s …  1.115 s

[...]

  Time (mean ± σ):      9.254 s ±  0.105 s    [User: 3.065 s, System: 1.594 s]
 
  Range (min … max):    9.107 s …  9.423 s

We can see that both cat and bat are significantly slowed down when having to actually print the output in a terminal.

So if my interpretation is correct, most of the time (around 75%) is actually caused by the terminal emulator or pager that needs to interpret the output of bat (which includes the ANSI color sequences, for example). I don't think there is anything that we can do about this, except to disable decorations and colors (--decorations=never --color=never).

That being said, we also saw that performance is not bat's strength 😄

I don't see this as a really big problem as I usually don't want to syntax-highlight files with 6 MB of contents, but it might still be fun to work on optimization here.

mfrw · 2018-09-12T10:33:30Z

Thank you very much @sharkdp for the explanation :)

gsar · 2018-11-16T20:10:07Z

@sharkdp would it be possible to have a mode where syntax highlighting is only attempted for the visible region of the text, not for the whole file?

assuming that is doable, it would make the performance on large files a lot faster and that would let me view log files with bat and keep bat as an alias for less.

sharkdp · 2018-11-18T11:02:00Z

@gsar I don't think that this is possible.

bat only pipes its output to less. There is no two-way communication between bat and the pager which would be needed to access the current location in the file.
For proper syntax highlighting, we need to parse the file from the very beginning. There is no way you can consistently highlight just a part of a file (think of a block comment like  in a XML file that would start somewhere before that part of the file).

georgmu · 2019-03-13T11:02:59Z

Is there a reason that there is no output buffering?

The --help section states that "-u" for unbuffered is ignored since everything is always unbuffered.

To test the speed without testing the terminal, I use the following function:
time bat --color always --decorations always test | tail

Without buffering, this takes 4.4 seconds on my laptop
With buffering, this takes 2 seconds.

diff --git a/src/controller.rs b/src/controller.rs
index ac39abb..51daa27 100644
--- a/src/controller.rs
+++ b/src/controller.rs
@@ -1,4 +1,4 @@
-use std::io::{self, Write};
+use std::io::{self, Write, BufWriter,};
 use std::path::Path;
 
 use crate::app::{Config, PagingMode};
@@ -36,7 +36,7 @@ impl<'b> Controller<'b> {
         }
 
         let mut output_type = OutputType::from_mode(paging_mode, self.config.pager)?;
-        let writer = output_type.handle()?;
+        let mut writer = BufWriter::new(output_type.handle()?);
         let mut no_errors: bool = true;
 
         let stdin = io::stdin();
@@ -50,7 +50,7 @@ impl<'b> Controller<'b> {
                 Ok(mut reader) => {
                     let result = if self.config.loop_through {
                         let mut printer = SimplePrinter::new();
-                        self.print_file(reader, &mut printer, writer, *input_file)
+                        self.print_file(reader, &mut printer, &mut writer, *input_file)
                     } else {
                         let mut printer = InteractivePrinter::new(
                             &self.config,
@@ -58,7 +58,7 @@ impl<'b> Controller<'b> {
                             *input_file,
                             &mut reader,
                         );
-                        self.print_file(reader, &mut printer, writer, *input_file)
+                        self.print_file(reader, &mut printer, &mut writer, *input_file)
                     };
 
                     if let Err(error) = result {

Without the tail pipe, I can also test the pager speed by quickly pressing ">", then "q" to scroll to the end and then quit (since the scroll to the end takes so long here, the speed should be more ore less accurate)

The current version takes ~9 seconds.
The patched version takes ~3 seconds.

My proposal would be to implement "-u" command line option and use buffered output if not specified.

sharkdp · 2019-05-31T19:19:38Z

@georgmu That sounds interesting, thank you!

Would the buffering cause any observable effects? If not, would you be interested in opening a PR?

georgmu · 2019-06-03T13:40:55Z

I will prepare a PR, but I am currently fighting the borrow checker to have writer either &mut Write or BufWriter

sharkdp · 2019-06-04T05:07:35Z

I will prepare a PR, but I am currently fighting the borrow checker to have writer either &mut Write or BufWriter

Great, thank you. Let us know if you need help (perhaps just open a PR with the failing version).

georgmu · 2019-06-17T19:42:07Z

Sorry for the delay. I have managed to build it and then compared bat to cat.

cat has a special implementation which reads in bigger blocks, but checks with some special handling to only buffer if there is input available.

As an example: If I just open cat and start to type and press return, the line is printed (as expected). With my patch, this is not the case, so the behavior differs.

To make it better, the output should always be buffered, but only be flushed if there is no input pending.

I will create a PR so you can have a look at the changes

georgmu · 2019-06-17T19:48:38Z

Draft PR is #596.

sharkdp · 2019-06-17T20:20:14Z

As an example: If I just open cat and start to type and press return, the line is printed (as expected). With my patch, this is not the case, so the behavior differs.

To make it better, the output should always be buffered, but only be flushed if there is no input pending.

Yes, that's definitely a behavior that should be preserved. Otherwise, we cannot do things like in this tail -f example: https://github.com/sharkdp/bat#tail--f

(I guess you did that already, but: make sure that you turn off paging if you perform experiments with this: bat --paging=never. Otherwise, the pager might buffer part of the output.)

antoyo · 2020-07-05T19:42:56Z

@gsar I don't think that this is possible.

For proper syntax highlighting, we need to parse the file from the very beginning. There is no way you can consistently highlight just a part of a file (think of a block comment like  in a XML file that would start somewhere before that part of the file).

You could just do like vim: it will only look a certain number of lines (maybe 500-1000, I don't know) before the view.

But, that might require having a pager built in bat for that to work.

sharkdp · 2022-09-06T21:22:17Z

See recent benchmark results in #2244 (comment)

Emilv2 · 2022-10-03T19:27:12Z

It seems like there is still room for some easy improvement regarding speed of very large files. bat --color=never --decorations=never --style=plain is fairly fast, but bat --color=never --decorations=always --style=plain --highlight-line=x is still very slow on files with a known file extension. Removing the file extension makes bat much faster while the output stays the same. It seems bat is still doing the syntax parsing even with --color=never even though this option means nothing is done with that information.

sharkdp added help wanted Extra attention is needed question Further information is requested labels Sep 11, 2018

sharkdp mentioned this issue Jun 17, 2019

Improve SimplePrinter throughput #594

Closed

sharkdp mentioned this issue Aug 15, 2019

Add cat to alternatives.md #630

Closed

rgreenblatt mentioned this issue Aug 23, 2019

slower than cat #635

Closed

sharkdp mentioned this issue Aug 28, 2019

use buffered output by default and implement -u option for unbuffered output #596

Closed

sharkdp mentioned this issue Feb 2, 2020

Potential issue piping binary file #816

Closed

desbma mentioned this issue Apr 18, 2020

bat is extremely slow (>100x slower than cat) for 4 char file #925

Closed

sharkdp mentioned this issue Jul 25, 2020

Use of Linux splice command in plain-text mode for performance increase #1112

Closed

sharkdp added performance and removed question Further information is requested labels Jan 9, 2021

sharkdp mentioned this issue Jul 25, 2021

bat release focus: performance #1751

Closed

11 tasks

Enselic mentioned this issue Aug 6, 2021

Slow to scroll to bottom for large files #1784

Closed

Enselic mentioned this issue Jul 14, 2022

30x slower than cat when redirecting? #2244

Closed

sharkdp mentioned this issue Aug 9, 2022

Detect binary file takes too long #2262

Open

Emilv2 mentioned this issue Nov 5, 2022

Bat is slow with --color=never --decorations=always on some files #2397

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving to the end of 6 Mb file takes forever #304

Moving to the end of 6 Mb file takes forever #304

mfrw commented Sep 11, 2018

sharkdp commented Sep 11, 2018

mfrw commented Sep 12, 2018

gsar commented Nov 16, 2018

sharkdp commented Nov 18, 2018

georgmu commented Mar 13, 2019

sharkdp commented May 31, 2019

georgmu commented Jun 3, 2019

sharkdp commented Jun 4, 2019

georgmu commented Jun 17, 2019

georgmu commented Jun 17, 2019

sharkdp commented Jun 17, 2019

antoyo commented Jul 5, 2020

sharkdp commented Sep 6, 2022

Emilv2 commented Oct 3, 2022

Moving to the end of 6 Mb file takes forever #304

Moving to the end of 6 Mb file takes forever #304

Comments

mfrw commented Sep 11, 2018

sharkdp commented Sep 11, 2018

mfrw commented Sep 12, 2018

gsar commented Nov 16, 2018

sharkdp commented Nov 18, 2018

georgmu commented Mar 13, 2019

sharkdp commented May 31, 2019

georgmu commented Jun 3, 2019

sharkdp commented Jun 4, 2019

georgmu commented Jun 17, 2019

georgmu commented Jun 17, 2019

sharkdp commented Jun 17, 2019

antoyo commented Jul 5, 2020

sharkdp commented Sep 6, 2022

Emilv2 commented Oct 3, 2022