Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colors: consistent color parsing and configuration #423

Closed
wants to merge 23 commits into from

Conversation

rwe
Copy link
Contributor

@rwe rwe commented Oct 22, 2021

This replaces both of the diff-so-fancy's hand-rolled color reconstruction mechanisms with cached invocations of git config --get-color, which is fast and uses git's own parsing to construct the appropriate ANSI sequences.

Those invocations are cached on the config key (if there is one) or the color spec.

This does not introduce noticeable performance impact: timing git -c pager.log="${dsf}" log -p on this repo, between this branch and 1.4.2, actually shows a slight speed-up.

The commits in this PR progress as follows:

  • Naïve work fixing the color subroutine; then realizing git_ansi_color is used alongside it, and attempting to fix that too.
  • (^ Those commits are left for reference, but were made redundant).
  • Realizing that they should be consolidated, and should be using git config --get-color anyway instead of just mimicking it; and then proceeding toward some refactors to allow those to be consolidated.

Along the way, there are a couple fixes to the default/fallback handling. The last couple commits use some of the newly-exposed features (e.g. italic/strike/dim) for some improved defaults.

Background

diff-so-fancy introduced three mutually incompatible syntaxes for parsing color specs:

  • git's parsing, which is used to drive the original colored git diff output, and by the contrib/diff-highlight pager bundled with git. That syntax is like white brightred bold ul reverse and supports the 8 standard ANSI names (black/red/green/yellow/blue/magenta/cyan/white) + bright… variants; ANSI colors like 124, and hex like #ffcccc. It also supports bold, dim, italic, ul, blink, reverse, and strike attributes and their negations (nobold etc).
  • colors() which interpreted syntax like white_bold_on_black; but used inverse instead of reverse, used underline instead of ul; did not support the dim or strike attributes; did not support attribute negation; and did not understand bright… names. It also arbitrarily maintained two additional non-standard color names (orange and purple).
  • git_ansi_colors() which attempted to more-closely mimic git's parsing, but only supported bold and reverse. (No dim, italic, underline, blink, or strike; no negations).

This was frustrating, because there are overlapping config keys between diff-so-fancy, diff-highlight, and git diff, and the lowest-common-denominator configuration was a little tricky and too limited.

While trying to fix a crash related to the reverse ↔︎ inverse mismatch—which in retrospect was a red herring, since color() doesn't appear to be invoked on user config…but whatever!—it became clear that this complexity was unnecessary and likely less performant than letting git itself—which is already a dependency—do the work.

rwe added 23 commits October 22, 2021 13:59
This matches the name that git uses (see: color.c:parse_attr).
Used by DiffHighlight when `normal` is blank or undefined.
This makes the initial values of DiffHighlight::OLD_HIGHLIGHT explicit
in order to simplify reasoning about them.

Since DiffHighlight::OLD_HIGHLIGHT etc. are initialized in
`DiffHighlight.pm` from the same values at load time, a missing config
would previously always fall back to the fallback values defined in that
module, which are undef/reverse as declared here.
Avoids doubt about depending on the mutations to
DiffHighlight::NEW_HIGHLIGHT etc. by init_diff_highlight_colors.

It also helps consolidate the usage of those colors with other
configured colors, so that as far as we're concerned there's a single
way to read those.
It's always reached by color('reset') or the cached $reset_color var;
nothing ever called get_config_color('reset'). It's useful to remove it
because it's different than all the other names in get_config_color(),
which map to git config keys.
Uses `git config --get-color … …` to load the ANSI sequence of a named
config like `color.diff.new` directly from git. This completely avoids
the need to attempt to reproduce that entire color-spec parsing logic.
The loaded sequences are cached by name.
git will parse and return the fallback color when given an empty config
name, e.g. `git config --get-color '' 'blue'`. This is documented in the
git config man pages.

This means *none* of that color spec parsing needs to be reproduced
by diff-so-fancy. This is much simpler, and also means that the color
specs can be consistently represented in the git config, avoiding
mismatches between the accepted colors and or modifiers.
This extends `git_ansi_color` to include the fallback value as the
second parameter, so that configured colors (e.g. `color.diff.new`) can
be interpreted and cached in a single command even falling back to the
default.

For direct interpretation of color specs ('reset', 'yellow bold
reverse') without an associated config key, this just uses the empty
string key like `git config` does and caches based on the spec string
itself.
They are not accepted by git and so cannot appear in normal color specs
like `color.diff.new`. The user can still refer to `214` directly or hex
colors.
git's default outputs for `color.diff.old`/`.new` are (non-bold) red and
green. If no such config exists, falling back to the same value makes
"manually recolored" lines consistent with pass-through lines.
The default DiffHighlight fallback colors were just "reverse" for both
added/removed text. Now, added words are reversed + bold, and removed
text is dim + italic + strike, in order to distinguish them.

These fallbacks only include attributes and don't refer to colors, and
so are safely compatible with existing user colors.

Unlike `color.diff.new` (which should match the colors we're receiving
from git), we _give_ these `color.diff-highlight.*` configs to
DiffHighlight. That means we can decide the fallbacks here arbitrarily
and don't need to match the hardcoded defaults already in
DiffHighlight.m.
This updates the recommended (N.B.: not _fallback_) highlight config to
better emphasize changes. Previously *all* non-contextual diff text was
bolded.

Text is now non-bold by default, but added words are bolded for
emphasis, and removed words are italic+strike.
@scottchiefbaker
Copy link
Contributor

Whoa... there is a LOT going on here. Lots of files and lines changed. My first reaction is that these are some pretty hefty changes that put a decent developmental burden on developers to update/maintain.

What exact problem is this solving?

We purposely kept the color parsing simple (no more than three octets), and it has generated little or no bug reports. All color parsing is done in Perl save the one call to git --config, so it should be speedy. See QuickSilver release. We purposely did not implement every ANSI feature that Git supports. If there is one you'd like to add I would be open that, but I believe we already support the configs of 99% of users.

Also, please see the third_party/cli_bench/cli_bench.pl script I wrote to test for speed changes between next and your branch. Perhaps that can illuminate and potential speed changes.

@scottchiefbaker
Copy link
Contributor

A quick comparison of next versus this PR.

next

:perl third_party/cli_bench/cli_bench.pl /tmp/diff-so-fancy-next < /tmp/raw.diff
...................................................

23 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (35.0%)
24 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (52.5%)
25 ms: %%%%%%%%%%%%%%%%%%%% (12.5%)

Ran '/tmp/diff-so-fancy-next' 50 times with average completion time of 23 ms

rwe

:perl third_party/cli_bench/cli_bench.pl /tmp/diff-so-fancy-rwe < /tmp/raw.diff
...................................................

39 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (20.0%)
40 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (40.0%)
41 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (40.0%)

Ran '/tmp/diff-so-fancy-rwe' 50 times with average completion time of 40 ms

23 ms vs 40 ms is a slowdown of about 74%. One of our primary goals has been to keep the startup (and ultimately runtime) of this script as low as possible.

@rwe
Copy link
Contributor Author

rwe commented Oct 25, 2021

@scottchiefbaker — Thanks for the input and additional measurements.

The timing isn't close to what I'd measured, so I think something else must be going on. Note that the benchmark you ran is only consuming the diff during the first run; after that stdin will be empty and no diff is run at all.

But…you're right: running the above with quoted args ('./diff-so-fancy < raw.diff') shows a slowdown between next 1a0ad54 at 425ms to this branch (currently cb73fa0) at 476ms, so I'll dig into what can be done about it.

Re maintenance: the intention was the opposite, actually. The change removes the need to maintain that parsing logic at all, while at the same time ensuring that the capabilities track to at least what git supports. The commit stream might be a bit too fine-grained, but if you look at the diff, there are broadly two simple changes:

  • The default values got moved into maps — just a simplification to maintain single-source-of truth to make caching easier.
  • The color parsing is accomplished with git config --get-color $config $color.

@scottchiefbaker
Copy link
Contributor

@rwe oh good catch. I forgot to put quotes around my commands.

next

:perl third_party/cli_bench/cli_bench.pl '/tmp/diff-so-fancy-next < /tmp/raw.diff'
...................................................

29 ms: %%%%%%%%%%%%%%%%%%%%%%%% (15.0%)
30 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (32.5%)
31 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (52.5%)

Ran '/tmp/diff-so-fancy-next < /tmp/raw.diff' 50 times with average completion time of 30 ms

rwe

:perl third_party/cli_bench/cli_bench.pl '/tmp/diff-so-fancy-rwe < /tmp/raw.diff'
...................................................

53 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (22.5%)
54 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (50.0%)
55 ms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (27.5%)

Ran '/tmp/diff-so-fancy-rwe < /tmp/raw.diff' 50 times with average completion time of 54 ms

So we're looking at 30 ms vs 54 ms... still about 80% slower. I'm not sure how keen I am to mimic every nuance of
what Git allows for color schemes at the expense of the technical debt for this code, and the slowness incurred.

Do you know what version of Git git config --get-color was added in? Is this a newer Git thing, or has it been around
forever?

@rwe
Copy link
Contributor Author

rwe commented Oct 25, 2021

Do you know what version of Git git config --get-color was added in? Is this a newer Git thing, or has it been around forever?

Forever—some digging shows it was introduced in 2007, git 1.5.4: git/git@9ce0352

The overhead can be removed entirely by bulk-fetching with --type=color --get-regexp, but would need git version to 2.18 (3.5 years ago): git/git@63e2a0f.

@carlfriedrich
Copy link

I came here after I stumbled upon the fact that diff-so-fancy neither supports hex colors nor git's default color attributes dim or ul. This feels quite unintuitive, since the colors are configured in gitconfig and basically use the same syntax as git's standard color configuration options, so I would expect this to work in the same way every other color-related git feature is configured.

@rwe Thanks for your work here! This makes the color configuration for diff-so-fancy exactly like I would it expect to work.

@scottchiefbaker

So we're looking at 30 ms vs 54 ms... still about 80% slower. I'm not sure how keen I am to mimic every nuance of what Git allows for color schemes at the expense of the technical debt for this code, and the slowness incurred.

I understand that speed is a main goal. 80% slower sounds huge when expressed in relative numbers. In absolute numbers, however, we're talking about a few milliseconds here. I would strongly prefer being able to configure colors just like in git over saving 24ms (which I won't notice, ever).

And concerning the technical debt: the patch in this PR actually removes technical debt. It does not try to mimic nuances of Git, it simply uses its features that are already there. Just look at the patchset, it's 415 lines deleted against 71 lines added.

So apart from the speed issue (which is a theoretical but not a real-world issue imo) I think there's no decent reason to not accept this change.

Is there a chance for this to get merged? Would really appreciate it!

carlfriedrich added a commit to carlfriedrich/dotfiles that referenced this pull request Jul 22, 2022
Switch to custom version from this PR:
so-fancy/diff-so-fancy#423

This makes it possible to use hex color codes. Configure these so that
the changes portions of a line get highlighted.

Also remove the disabling of the markEmptyLines settings so that changed
empty lines get visible.
carlfriedrich added a commit to carlfriedrich/dotfiles that referenced this pull request Aug 1, 2022
Switch to custom version from this PR:
so-fancy/diff-so-fancy#423

This makes it possible to use hex color codes. Configure these so that
the changes portions of a line get highlighted.

Also remove the disabling of the markEmptyLines settings so that changed
empty lines get visible.
carlfriedrich added a commit to carlfriedrich/dotfiles that referenced this pull request Aug 1, 2022
Switch to custom version from this PR:
so-fancy/diff-so-fancy#423

This makes it possible to use hex color codes. Configure these so that
the changes portions of a line get highlighted.

Also remove the disabling of the markEmptyLines settings so that changed
empty lines get visible.
carlfriedrich added a commit to carlfriedrich/dotfiles that referenced this pull request Aug 1, 2022
Switch to custom version from this PR:
so-fancy/diff-so-fancy#423

This makes it possible to use hex color codes. Configure these so that
the changes portions of a line get highlighted.

Also remove the disabling of the markEmptyLines settings so that changed
empty lines get visible.
@scottchiefbaker
Copy link
Contributor

We just landed several other commits that address ul, bright, and dim. We cover 95% of the use cases for Git color codes. At risk of making the code much more complex I think I'm OK with that percent.

If there is something specific people want to see added please open another issue specific to that. Hex colors (I didn't even know Git supported that)?

At this time, shelling out to git to get the colors affects start up time, and that's a non-starter for me right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants