Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added text wrapping. (Fixes #54) #102

Merged
merged 17 commits into from
May 16, 2018
Merged

Added text wrapping. (Fixes #54) #102

merged 17 commits into from
May 16, 2018

Conversation

eth-p
Copy link
Collaborator

@eth-p eth-p commented May 12, 2018

I use bat pretty frequently, so I wanted to give back and make it even better by adding text wrapping (issue #54).

I've never touched Rust before this, and I apologize if the code style is weird or if any of my changes have less-than-ideal performance.

Changes:

  • Text wrapping.
    Lines will now automatically wrap at a character boundary.
    Syntax coloring will also wrap.
    image

  • Resizing gutter (panel).
    The gutter (area that shows line numbers and changes) will now automatically resize.
    This cuts back on the amount of wasted space when not showing everything.
    image
    It will also disappear if not needed.
    image

@eth-p eth-p changed the title Added line wrapping. (Fixes #54) Added text wrapping. (Fixes #54) May 12, 2018
@sharkdp
Copy link
Owner

sharkdp commented May 12, 2018

This looks super cool - thank you very much for your contribution!

I will only leave a few quick comments for now and add a proper review later when all the checks pass (and the conflicts are resolved).

src/printer.rs Outdated
}
};

// Generate the panel (gutter) width.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be computed first and then simply passed to the Printer constructor? This way, we don't need the (mutable) instance variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that it would be a fair tradeoff, making the code easier to maintain by only having a single function (gen_decorations) for determining which components to display and their sizes.

I could change it with a couple of ifs and consts, if you prefer to avoid the mutable instance variable though.

src/printer.rs Outdated
let decorations = instance.gen_decorations(0);
instance.panel_width = decorations.len()
+ decorations.iter().fold(0, |a, x| a + x.size)
+ if config.output_components.grid() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is added in both cases here? This can't be right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to count the grid line as part of the gutter's width, but ended up changing it later on to simplify calculations. My bad, it looks like I forgot to remove this leftover code.

src/printer.rs Outdated

// Finished.
write!(self.handle, "\n")?;
return Ok(());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok(()) is equivalent to return Ok(()); here.

src/printer.rs Outdated
fn print_git_marker(&self, line_number: usize) -> Option<String> {


#[doc = "
Copy link
Owner

@sharkdp sharkdp May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply use triple slashes for doc strings:

/// Generates all the line decorations
fn gen_decorations(...)

In general, doc strings are fine but I prefer to use them only if things are not clear from the function name. Things like this Arguments list here are hardly useful in my opinion and have to be maintained.

src/terminal.rs Outdated
@@ -26,26 +26,46 @@ fn rgb2ansi(r: u8, g: u8, b: u8) -> u8 {
}
}

//pub fn as_terminal_escaped(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I changed the code, I kept comments of the original functions as a reference for how to do certain things. It looks like I forgot to remove that one, sorry.

@sharkdp
Copy link
Owner

sharkdp commented May 12, 2018

Also, congratulations on your first Rust code!

(could you please run cargo fmt to auto-format your changes? see https://github.com/rust-lang-nursery/rustfmt)

eth-p added 4 commits May 12, 2018 06:32
- Reformatted code.
- Removed leftover code.
- Removed leftover comments.
- Fixed compiling on Rust 1.24.0
Now bat(1) can be used like cat(1) again!
Copy link
Collaborator Author

@eth-p eth-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd26d40: Wow... Uh, you ok git? My local copy definitely didn't look like this, sorry.

src/printer.rs Outdated
} else {
for &(style, text) in regions.iter() {
let mut chars = text.chars().filter(|c| *c != '\n');
let mut remaining = text.chars().filter(|c| *c != '\n').count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just clone the chars iterator

Copy link
Collaborator Author

@eth-p eth-p May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that in the initial PR, but it failed to build on Rust 1.24.0.

The chars.clone() method worked on the on the latest version, however, but I'm not quite sure how to make it work for 1.24.0. If you have any suggestions, I would happily replace it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh weird

src/terminal.rs Outdated
s
let mut s: String = String::new();
write!(s, "{}", style.paint(text)).unwrap();
return s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove all of this and just return style.paint(text).to_string(). also, the return statement is not necessary. the last expression in a function will be returned

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed.

───────┬────────────────────────────────────────────────────────────────────────
│ File: sample.rs
───────┼────────────────────────────────────────────────────────────────────────
│ struct Rectangle {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vertical line is gone?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be part of the resizing gutter change. When there aren't any line decorations, the vertical grid border is removed to save space (since there's nothing to separate).

I could add it back if the old behaviour is preferred, however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that the vertical line is part of the grid and should be there if it's specified, but maybe @sharkdp can clarify

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. If there is nothing to display in the side-panel, I don't think there is any need for a vertical grid line.

@@ -1,22 +1,22 @@
1 │ struct Rectangle {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/printer.rs Outdated
}

// Grid border.
let border = if gutter_width > 0 && self.config.output_components.grid() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can go inside gen_border right?

src/printer.rs Outdated
}

// It wraps.
if self.config.output_wrap == OutputWrap::Character {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the else part, so we already know that this will evaluate to true every time. or am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it evaluates to true every time. I left it in there in case different wrapping styles were added in the future, but on second thought, it's unnecessary right now.

src/printer.rs Outdated
Ok(())
}

fn print_line_number(&self, line_number: usize) -> Option<String> {
/// Generates all the line decorations.
fn gen_decorations(&self, line_number: usize) -> Vec<PrintSegment> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line_decorations?

src/printer.rs Outdated
/// Generates the decoration for displaying the line number.
fn gen_deco_line_number(&self, line_number: usize) -> PrintSegment {
let plain: String = format!("{:width$}", line_number, width = LINE_NUMBER_WIDTH);
let color = self.colors.line_number.paint(plain.to_owned());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain is already a string, so to_owned is unnecessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing to_owned(), and ended up with this:

Error[E0382]: use of moved value: `plain`
   --> src/printer.rs:217:19
    |
213 |         let color = self.colors.line_number.paint(plain);
    |                                                   ----- value moved here
...
217 |             size: plain.len(),
    |                   ^^^^^ value used here after move
    |
    = note: move occurs because `plain` has type `std::string::String`, which does not implement the `Copy` trait

Replacing it with plain.clone() works, though. Would that be better to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think paint can take a &str, so you can pass &plain. also I didn't notice before, but the variable name color is probably not correct

src/printer.rs Outdated
}

/// Generates the decoration for displaying the git changes.
fn gen_deco_line_changes(&self, line_number: usize) -> PrintSegment {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it just be line_changes? two abbreviations in a method name make it confusing. same for the number one. that way, you can remove the doc comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll rename it.

@@ -13,6 +13,12 @@ pub enum OutputComponent {
Plain,
}

#[derive(Debug, Eq, PartialEq, Copy, Clone, Hash)]
pub enum OutputWrap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there are only two variants, each with no data, maybe this can be represented with a bool instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used an enum in case anybody wanted to add different wrapping styles (e.g. word wrap) in the future, but I could change it to a bool if that's preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I guess you can just go ahead and add that variant then

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the enum here, even if there are just two variants for now ("Boolean blindness"). We shouldn't add the third variant if it is not yet used.

src/printer.rs Outdated
border.text.to_owned()
)?;

continue;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this continue statement needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears not. Removing it in the next commit.

src/printer.rs Outdated
write!(self.handle, "\n")?;
}

// Finished.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be safely removed 😃

@sharkdp
Copy link
Owner

sharkdp commented May 13, 2018

I just performed some simple benchmarks. As probably expected, the text-wrapping has some performance overhead (up to ~30%):

▶ hyperfine 'bat --style=full --color=always src/*.rs' \
            'target/release/bat-wrapping --style=full --color=always --wrap=character src/*.rs'
Benchmark #1: bat --style=full --color=always src/*.rs

  Time (mean ± σ):      81.6 ms ±   2.1 ms    [User: 76.4 ms, System: 4.9 ms]
 
  Range (min … max):    78.7 ms …  86.8 ms
 
Benchmark #2: target/release/bat-wrapping --style=full --color=always --wrap=character src/*.rs

  Time (mean ± σ):      97.3 ms ±   2.2 ms    [User: 92.7 ms, System: 4.4 ms]
 
  Range (min … max):    94.0 ms … 101.5 ms
 
Summary

'bat --style=full --color=always src/*.rs' ran
    1.19x faster than 'target/release/bat-wrapping --style=full --color=always --wrap=character src/*.rs'
▶ hyperfine 'bat --style=full --color=always large.rs' \
            'target/release/bat-wrapping --style=full --color=always --wrap=character large.rs'
Benchmark #1: bat --style=full --color=always large.rs

  Time (mean ± σ):     770.8 ms ±   6.0 ms    [User: 762.6 ms, System: 7.2 ms]
 
  Range (min … max):   762.0 ms … 779.1 ms
 
Benchmark #2: target/release/bat-wrapping --style=full --color=always --wrap=character large.rs

  Time (mean ± σ):      1.016 s ±  0.013 s    [User: 1.006 s, System: 0.008 s]
 
  Range (min … max):    1.001 s …  1.041 s
 
Summary

'bat --style=full --color=always large.rs' ran
    1.32x faster than 'target/release/bat-wrapping --style=full --color=always --wrap=character large.rs'

Unfortunately, the performance drops by the same amount even when calling with --wrap=never.

@eth-p
Copy link
Collaborator Author

eth-p commented May 13, 2018

That would probably be the consequence of the way I rewrote the line decorations. I'll see what I can do to reduce the performance impact.

Would you mind if I split the line decorations off into separate objects? There's a lot of caching that could be done if so.

@eth-p
Copy link
Collaborator Author

eth-p commented May 14, 2018

@sharkdp I split up the decorations into their own file and added caching to them. Here's the results:

$ ls -lha | grep larger
-rw-r--r--   1 eth-p  staff   271K 13 May 18:25 larger.rs

Character wrapping

$ bench 'target/release/bat --style full --paging never --wrap character larger.rs'
benchmarking target/release/bat --style full --paging never --wrap character larger.rs
time                 869.2 ms   (724.6 ms .. 1.021 s)
                     0.996 R²   (0.986 R² .. 1.000 R²)
mean                 837.7 ms   (811.1 ms .. 861.4 ms)
std dev              29.52 ms   (24.26 ms .. 32.64 ms)
variance introduced by outliers: 19% (moderately inflated)

No wrapping

$ bench 'target/release/bat --style full --paging never --wrap never larger.rs'
benchmarking target/release/bat --style full --paging never --wrap never larger.rs
time                 791.4 ms   (782.2 ms .. 810.2 ms)
                     1.000 R²   (NaN R² .. 1.000 R²)
mean                 796.2 ms   (791.7 ms .. 802.4 ms)
std dev              6.350 ms   (1.870 ms .. 8.513 ms)
variance introduced by outliers: 19% (moderately inflated)

Bat 0.3

$ bench 'bat --style full --paging never larger.rs'
benchmarking bat --style full --paging never larger.rs
time                 812.2 ms   (769.5 ms .. 836.7 ms)
                     1.000 R²   (NaN R² .. 1.000 R²)
mean                 818.7 ms   (813.1 ms .. 823.6 ms)
std dev              5.796 ms   (3.063 ms .. 8.088 ms)
variance introduced by outliers: 19% (moderately inflated)

@sharkdp
Copy link
Owner

sharkdp commented May 14, 2018

@eth-p Thank you for looking into the performance-side of this!

Is the caching really the important change here? It seems like it would only help for files with really long lines where the line number would not have to be displayed?

Sorry that the review process has been dragging on for some time now, but there are (still) a few things that I would like to verify/update before I merge this.

To be honest, while I really like the outcome (see screenshots) and the implemented features, I am not quite confident about the changes in the code. Part of the reason might also be that I wasn't really happy with that part of the code (generating the different output elements) to begin with 😄.

While it's hard to pinpoint any "bad" changes, here are a few things that we might want to discuss:

  • If I understand correctly, the DecorationText struct basically holds some (formatted) text and its display width (~ length without ANSI codes). While possibly good for performance reasons, I don't really like that we have to pass around this redundant information (and keep it in sync!). An alternative could be to compute the display width (https://docs.rs/console/0.6.1/console/fn.measure_text_width.html)
  • The split into the for_line and for_wrap methods in the Decoration trait doesn't feel very natural to me. It leads to quite some code duplication. Another option would be to have a single method ("to_string"?), that receives a continuation: bool or wrapped_line: bool parameter.
  • LineChangesDecoration is the only decoration that uses the printer argument. Maybe we should pass the LineChanges to the LineChangesDecoration object on construction and remove the printer argument from the for_line function.
  • Given the points above, we should maybe reconsider if a trait is the right abstraction here (the answer can be "yes" 😄).

Concerning the actual wrapping:

  • There is a small problem with windows-style line-breaks (\r\n) which can be reproduced by converting some file via unix2dos or by simply calling echo -e "windows\r\nnewline" | bat. With wrapping enabled, this outputs ^M characters which is probably due to the wrap-algorithm treating only the \n as a line break.
  • Instead of implementing character-level wrapping, it might be worth to think about region-based wrapping (regions being defined by syntect through the highlighting). This might actually be easier than the character-level wrapping, but it's probably not too helpful if regions can be larger than a single line.

@eth-p
Copy link
Collaborator Author

eth-p commented May 14, 2018

@sharkdp

Is the caching really the important change here? It seems like it would only help for files with really long lines where the line number would not have to be displayed?

The way I implemented the caching, it should increase the performance whether or not the lines are wrapped. Consider the non-cached versions of the git changes and grid border decorations:

    fn line_changes(&self, line_number: usize) -> PrintSegment {
        let color = if let Some(ref changes) = self.line_changes {
            match changes.get(&(line_number as u32)) {
                Some(&LineChange::Added) => self.colors.git_added.paint("+"),
                Some(&LineChange::RemovedAbove) => self.colors.git_removed.paint("‾"),
                Some(&LineChange::RemovedBelow) => self.colors.git_removed.paint("_"),
                Some(&LineChange::Modified) => self.colors.git_modified.paint("~"),
                _ => Style::default().paint(" "),
            }
        } else {
            Style::default().paint(" ")
        };

        return PrintSegment {
            text: color.to_string(),
            size: 1,
        };
    }

    fn line_border(&self) -> PrintSegment {
        return PrintSegment {
            text: self.colors.grid.paint("│ ").to_string(),
            size: 2,
        };
    }

In both of them, we were calling Style.paint on constant strings every line. By generating those strings once and caching the result and cloning it when necessary, we're cutting back on the number of calls to Style.paint. While I don't know exactly how expensive a call to Style.paint is, it's probably more than String.clone.

If I understand correctly, the DecorationText struct basically holds some (formatted) text and its display width (~ length without ANSI codes). While possibly good for performance reasons, I don't really like that we have to pass around this redundant information (and keep it in sync!). An alternative could be to compute the display width (https://docs.rs/console/0.6.1/console/fn.measure_text_width.html)

I could definitely change it to use measure_text_width, but it's your call.

The split into the for_line and for_wrap methods in the Decoration trait doesn't feel very natural to me. It leads to quite some code duplication. Another option would be to have a single method ("to_string"?), that receives a continuation: bool or wrapped_line: bool parameter.

I don't think to_string would fit, since Decoration is more similar to a factory than an object. I could rename it to generate and add continuation: bool to its parameter list, if that works?

LineChangesDecoration is the only decoration that uses the printer argument. Maybe we should pass the LineChanges to the LineChangesDecoration object on construction and remove the printer argument from the for_line function.

That was my original plan, but I couldn't figure out a sane way to do it since the Printer.line_changes field is modified before each file is printed. I would've preferred to create a PrintContext struct that contained file information like the LineChanges object, but I thought that might be over-engineering it a little bit and pushing my understanding of Rust lifetimes.

Given the points above, we should maybe reconsider if a trait is the right abstraction here (the answer can be "yes" 😄).

I suppose that would depend on whether or not the decorations should be considered as individual modules which are used by the printer, or as a part of the printer itself. I was thinking the former, but I'm likely biased having come from an OOP background where monolithic functions/classes are discouraged.

I'll leave it up to you to decide whether or not I should move the decorations back into Printer, but it might start to look messy if you want me to keep the caching.

There is a small problem with windows-style line-breaks.

Fixed!

Instead of implementing character-level wrapping, it might be worth to think about region-based wrapping (regions being defined by syntect throught the highlighting). This might actually be easier than the character-level wrapping, but it's probably not too helpful if regions can be larger than a single line.

That was actually my original idea! For the reason you just mentioned, it ended up not working out too well. I would've had to implement character wrapping on top of region wrapping, and it still would've resulted in funny-looking wrapping like this:

1 | This is a single region.                  |
  | [This is an even longer single region, an |
  | d it ended up wasting spacing and was sti |
  | ll broken up.                             |

@sharkdp
Copy link
Owner

sharkdp commented May 15, 2018

Thank you very much for the clarifications!

I could definitely change it to use measure_text_width, but it's your call.

Ok. Let's stay with your solution for now.

I don't think to_string would fit, since Decoration is more similar to a factory than an object. I could rename it to generate and add continuation: bool to its parameter list, if that works?

generate with a bool parameter sounds good to me 👍

That was my original plan, but I couldn't figure out a sane way to do it since the Printer.line_changes field is modified before each file is printed. I would've preferred to create a PrintContext struct that contained file information like the LineChanges object, but I thought that might be over-engineering it a little bit and pushing my understanding of Rust lifetimes.

Ok, we can look into this in the future.

I'll leave it up to you to decide whether or not I should move the decorations back into Printer, but it might start to look messy if you want me to keep the caching.

Ok, let's keep it for now.

There is a small problem with windows-style line-breaks.

Fixed!

Great, thank you.

That was actually my original idea! For the reason you just mentioned, it ended up not working out too well. I would've had to implement character wrapping on top of region wrapping, and it still would've resulted in funny-looking wrapping like this:

(..)

Yeah, right 😔

@eth-p
Copy link
Collaborator Author

eth-p commented May 15, 2018

Merged the for_line and for_wrap methods. I also fixed #117 by disabling the panel when there isn't enough space to show the actual file contents.

It should pass all the checks now.
@sharkdp sharkdp merged commit 2eee685 into sharkdp:master May 16, 2018
@sharkdp sharkdp mentioned this pull request May 16, 2018
@sharkdp
Copy link
Owner

sharkdp commented May 16, 2018

Thank you very much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants