Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving to the end of 6 Mb file takes forever #304

Open
mfrw opened this issue Sep 11, 2018 · 14 comments
Open

Moving to the end of 6 Mb file takes forever #304

mfrw opened this issue Sep 11, 2018 · 14 comments
Labels
help wanted Extra attention is needed performance

Comments

@mfrw
Copy link

mfrw commented Sep 11, 2018

I am not really sure if this is an issue or intended behaviour.
It took me forever to move to end of a 6Mb file.

Steps to reproduce:

  • $ seq 1 1000000 > test
  • $ bat test
    Press G to move to end of the file.
    This step took a long time. It seems bat was going through the file sequentially.

Apologies, if this is not the place to ask this question or this is a known bug/feature.

@sharkdp
Copy link
Owner

sharkdp commented Sep 11, 2018

Thank you for reporting this. Yes, this is the right place to ask questions like this 👍

Let's try to untangle this a little bit:

When you call bat test, bat runs a pager (presumably less) and pipes all its output (the whole file contents) to the pager. Moving to the end of the file is slow because bat takes some time to output the full file contents.

Next, we can do some benchmarks. I am going to use my hyperfine tool to perform the measurements.

First, let's compare cat and bat when both are printing to /dev/null. Note that this is not the benchmark we are looking for, because in the interactive use (with or without the pager), we need to print (part of) the output to the terminal. This can cost a significant amount of time, especially if ANSI escape sequences are involved. Also, bat will use a faster loop-through-mode if it detects a non-interactive terminal.

> hyperfine 'cat test' 'bat test'
Benchmark #1: cat test

  Time (mean ± σ):       2.1 ms ±   0.9 ms    [User: 1.2 ms, System: 1.7 ms]
 
  Range (min … max):     1.2 ms …   6.6 ms
 
  Warning: Command took less than 5 ms to complete. Results might be inaccurate.
 
Benchmark #2: bat test

  Time (mean ± σ):     422.1 ms ±   6.8 ms    [User: 207.3 ms, System: 214.4 ms]
 
  Range (min … max):   415.7 ms … 439.5 ms
 
Summary

  'cat test' ran
  199.43x faster than 'bat test'

So even with bats loop-through mode, it is two orders of magnitude slower than cat. This is unfortunate, but not too surprising. We have never optimized for speed and bat is reading and printing files line-by-line instead of using a larger buffer. Still, this is definitely something we could work on.

Next, we can make bat even slower by enabling all components that would be printed if we were printing to an interactive terminal:

> hyperfine --warmup 3 'bat --style=full --decorations=always --color=always test'
Benchmark #1: bat --style=full --decorations=always --color=always test

  Time (mean ± σ):      2.298 s ±  0.035 s    [User: 2.069 s, System: 0.226 s]
 
  Range (min … max):    2.257 s …  2.378 s

With everything enabled (decorations such as the line numbers, the grid and ANSI colors), bat is another order of magnitude slower and takes about 2 seconds to print the whole file.

However, this still doesn't quite explain why it takes around 8-10 seconds (on my machine) for the pager to scroll to the end of the output when just using bat test.

To simulate this behavior in the benchmarks, we can use the --show-output option of hyperfine which will print the whole output to the terminal (instead of piping to /dev/null). Using this option, the benchmark will now include the rendering time of the terminal emulator (which might or might not be comparable to what less needs to do). Let's compare both cat and bat with this option enabled:

> hyperfine --show-output 'cat test' 'bat --paging=never test'

[...]

  Time (mean ± σ):      1.054 s ±  0.044 s    [User: 0.9 ms, System: 376.8 ms]
 
  Range (min … max):    0.984 s …  1.115 s

[...]

  Time (mean ± σ):      9.254 s ±  0.105 s    [User: 3.065 s, System: 1.594 s]
 
  Range (min … max):    9.107 s …  9.423 s

We can see that both cat and bat are significantly slowed down when having to actually print the output in a terminal.

So if my interpretation is correct, most of the time (around 75%) is actually caused by the terminal emulator or pager that needs to interpret the output of bat (which includes the ANSI color sequences, for example). I don't think there is anything that we can do about this, except to disable decorations and colors (--decorations=never --color=never).

That being said, we also saw that performance is not bat's strength 😄

I don't see this as a really big problem as I usually don't want to syntax-highlight files with 6 MB of contents, but it might still be fun to work on optimization here.

@sharkdp sharkdp added help wanted Extra attention is needed question Further information is requested labels Sep 11, 2018
@mfrw
Copy link
Author

mfrw commented Sep 12, 2018

Thank you very much @sharkdp for the explanation :)

@gsar
Copy link

gsar commented Nov 16, 2018

@sharkdp would it be possible to have a mode where syntax highlighting is only attempted for the visible region of the text, not for the whole file?

assuming that is doable, it would make the performance on large files a lot faster and that would let me view log files with bat and keep bat as an alias for less.

@sharkdp
Copy link
Owner

sharkdp commented Nov 18, 2018

@gsar I don't think that this is possible.

  1. bat only pipes its output to less. There is no two-way communication between bat and the pager which would be needed to access the current location in the file.
  2. For proper syntax highlighting, we need to parse the file from the very beginning. There is no way you can consistently highlight just a part of a file (think of a block comment like <!-- .. --> in a XML file that would start somewhere before that part of the file).

@georgmu
Copy link

georgmu commented Mar 13, 2019

Is there a reason that there is no output buffering?

The --help section states that "-u" for unbuffered is ignored since everything is always unbuffered.

To test the speed without testing the terminal, I use the following function:
time bat --color always --decorations always test | tail

Without buffering, this takes 4.4 seconds on my laptop
With buffering, this takes 2 seconds.

diff --git a/src/controller.rs b/src/controller.rs
index ac39abb..51daa27 100644
--- a/src/controller.rs
+++ b/src/controller.rs
@@ -1,4 +1,4 @@
-use std::io::{self, Write};
+use std::io::{self, Write, BufWriter,};
 use std::path::Path;
 
 use crate::app::{Config, PagingMode};
@@ -36,7 +36,7 @@ impl<'b> Controller<'b> {
         }
 
         let mut output_type = OutputType::from_mode(paging_mode, self.config.pager)?;
-        let writer = output_type.handle()?;
+        let mut writer = BufWriter::new(output_type.handle()?);
         let mut no_errors: bool = true;
 
         let stdin = io::stdin();
@@ -50,7 +50,7 @@ impl<'b> Controller<'b> {
                 Ok(mut reader) => {
                     let result = if self.config.loop_through {
                         let mut printer = SimplePrinter::new();
-                        self.print_file(reader, &mut printer, writer, *input_file)
+                        self.print_file(reader, &mut printer, &mut writer, *input_file)
                     } else {
                         let mut printer = InteractivePrinter::new(
                             &self.config,
@@ -58,7 +58,7 @@ impl<'b> Controller<'b> {
                             *input_file,
                             &mut reader,
                         );
-                        self.print_file(reader, &mut printer, writer, *input_file)
+                        self.print_file(reader, &mut printer, &mut writer, *input_file)
                     };
 
                     if let Err(error) = result {

Without the tail pipe, I can also test the pager speed by quickly pressing ">", then "q" to scroll to the end and then quit (since the scroll to the end takes so long here, the speed should be more ore less accurate)

The current version takes ~9 seconds.
The patched version takes ~3 seconds.

My proposal would be to implement "-u" command line option and use buffered output if not specified.

@sharkdp
Copy link
Owner

sharkdp commented May 31, 2019

@georgmu That sounds interesting, thank you!

Would the buffering cause any observable effects? If not, would you be interested in opening a PR?

@georgmu
Copy link

georgmu commented Jun 3, 2019

I will prepare a PR, but I am currently fighting the borrow checker to have writer either &mut Write or BufWriter

@sharkdp
Copy link
Owner

sharkdp commented Jun 4, 2019

I will prepare a PR, but I am currently fighting the borrow checker to have writer either &mut Write or BufWriter

Great, thank you. Let us know if you need help (perhaps just open a PR with the failing version).

@georgmu
Copy link

georgmu commented Jun 17, 2019

Sorry for the delay. I have managed to build it and then compared bat to cat.

cat has a special implementation which reads in bigger blocks, but checks with some special handling to only buffer if there is input available.

As an example: If I just open cat and start to type and press return, the line is printed (as expected). With my patch, this is not the case, so the behavior differs.

To make it better, the output should always be buffered, but only be flushed if there is no input pending.

I will create a PR so you can have a look at the changes

@georgmu
Copy link

georgmu commented Jun 17, 2019

Draft PR is #596.

@sharkdp
Copy link
Owner

sharkdp commented Jun 17, 2019

As an example: If I just open cat and start to type and press return, the line is printed (as expected). With my patch, this is not the case, so the behavior differs.

To make it better, the output should always be buffered, but only be flushed if there is no input pending.

Yes, that's definitely a behavior that should be preserved. Otherwise, we cannot do things like in this tail -f example: https://github.com/sharkdp/bat#tail--f

(I guess you did that already, but: make sure that you turn off paging if you perform experiments with this: bat --paging=never. Otherwise, the pager might buffer part of the output.)

@antoyo
Copy link

antoyo commented Jul 5, 2020

@gsar I don't think that this is possible.

  1. For proper syntax highlighting, we need to parse the file from the very beginning. There is no way you can consistently highlight just a part of a file (think of a block comment like <!-- .. --> in a XML file that would start somewhere before that part of the file).

You could just do like vim: it will only look a certain number of lines (maybe 500-1000, I don't know) before the view.

But, that might require having a pager built in bat for that to work.

@sharkdp
Copy link
Owner

sharkdp commented Sep 6, 2022

See recent benchmark results in #2244 (comment)

@Emilv2
Copy link

Emilv2 commented Oct 3, 2022

It seems like there is still room for some easy improvement regarding speed of very large files. bat --color=never --decorations=never --style=plain is fairly fast, but bat --color=never --decorations=always --style=plain --highlight-line=x is still very slow on files with a known file extension. Removing the file extension makes bat much faster while the output stays the same. It seems bat is still doing the syntax parsing even with --color=never even though this option means nothing is done with that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed performance
Projects
None yet
Development

No branches or pull requests

6 participants