New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] man.vim: highlight bold and underlined text #7623
Conversation
This is cool! We certainly aren't opposed to using Lua for all runtime code that Nvim project maintains (need to make a list of the exact files somewhere ...). And we definitely forked man.vim so forking it even more is fine :) |
Works good on macOS! EDIT: I think as long as there aren't any cases where SGR is used to enable bold mode (as opposed to CSI 1m), leaving it out is fine. |
Can this be merged? |
I'd like to review it first. |
runtime/syntax/man.vim
Outdated
|
||
augroup man_init_highlight_groups | ||
autocmd! | ||
autocmd ColorScheme * call s:init_highlight_groups() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? highlight default
is a one-time thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed it in order to keep the highlighting across colorscheme changes. AFAICT highlight default link
s are preserved after a highlight clear
, but groups (other than the builtin ones) with defined default attributes get cleared. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this happen on startup, or only if user changes colorscheme at runtime? The latter is not important. And why wouldn't this also be needed for the other hi default
's, e.g. on line 19:
highlight default link manSubHeading Function
More generally, syntax needs to be re-initialized if a colorscheme or whatever nukes it. Every plugin cannot be expected to deal with that ad-hoc.
Rebased, added escape sequence handling (for bold, italics, and underlines), plus some test cases (just for highlighting). |
runtime/autoload/man.vim
Outdated
@@ -388,4 +387,14 @@ function! man#init_pager() abort | |||
execute 'silent file man://'.fnameescape(ref) | |||
endfunction | |||
|
|||
function! man#highlight_formatted_text() abort | |||
let l:modifiable = &modifiable | |||
set modifiable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done in the Lua code via vim.api.nvim_command()
or whatever. Then we can eliminate this "public" VimL function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this really hard to read but that may just be because I have very little experience with Lua.
Would appreciate a review from someone else more familiar with Lua.
chars[#chars + 1] = char | ||
elseif escape then | ||
-- Use prev_char to store the escape sequence | ||
prev_char = prev_char .. char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this? Why not just use char
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's matching against the entire escape sequence, not just the last character.
runtime/lua/man.lua
Outdated
add_attr_hl(match + 0) -- coerce to number | ||
end | ||
escape = false | ||
elseif not prev_char:match("^%[[\020-\063]*$") then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's iterating one character at a time, that's checking for a partial CSI sequence, and throwing away the sequence if it ends up being something else.
runtime/lua/man.lua
Outdated
if sgr then | ||
local match = '' | ||
while sgr and #sgr > 0 do | ||
match, sgr = sgr:match("^(%d*);?(.*)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the ?
before (.*)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes the semicolon optional, to match sequences like \e[4;22m
as well as \e[0m
runtime/lua/man.lua
Outdated
end | ||
|
||
-- Break input into UTF8 characters | ||
for char in line:gmatch("[^\128-\191][\128-\191]*") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need that pattern? Why not just "."
as the pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"."
will just match a single byte. But the input is encoded as UTF-8, so matching one byte at a time won't work if any characters fall outside of the ASCII range. More details here.
Side note: I've only ever seen UTF-8, but I wonder if other encodings would be possible based on terminal/locale settings? If so, would likely need some sort of Lua string library to deal with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as all Ctrl chars are ASCII, which I strongly suspect in man format, segmenting UTF-8 codepoints (which not unambiguously is "chars" anyway) shouldn't be needed. The highlight API only cares about bytes anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on locale settings, there could be a lot of overstruck text outside of ASCII range. Handling those multibyte text characters correctly requires splitting into codepoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it well-defined somewhere that overstrike works on a codepoint at a time, and neither byte nor grapheme cluster (which is closer to user-perceived character)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my testing & digging in the grotty
sources, it certainly looks that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of precedence, wouldn't it be more correct to handle overstrikes as applying to a grapheme and not a codepoint?
runtime/lua/man.lua
Outdated
-- bullet (overstrike text '+^Ho') | ||
attr = BOLD | ||
char = [[·]] | ||
elseif prev_char == [[·]] and char == 'o' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a double square bracket?
runtime/lua/man.lua
Outdated
elseif escape then | ||
-- Use prev_char to store the escape sequence | ||
prev_char = prev_char .. char | ||
local sgr = prev_char:match("^%[([\020-\063]*)m$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you link http://invisible-island.net/xterm/ctlseqs/ctlseqs.html as a reference to understand the reasoning behind the structure of regular expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I forget why I chose that particular range, but [\020-\063]
is wrong. For a valid CSI sequence, the bytes in between [
and the final byte must be in the ASCII range 0x20-0x3f, so that should be [\032-\063]
. I don't think it's mentioned in the xterm page, but this is part of the ECMA-48 standard, sect. 5.4. I'll add an explaining comment.
runtime/lua/man.lua
Outdated
on = false | ||
elseif code == 1 then | ||
attr = BOLD | ||
elseif code == 21 or code == 22 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
21
is described as a double underline at http://invisible-island.net/xterm/ctlseqs/ctlseqs.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was looking at the Wikipedia page which indicates "Bold off or Double Underline". I never saw 21 used anywhere, so I'll drop it.
runtime/lua/man.lua
Outdated
attr = ITALIC | ||
elseif code == 23 then | ||
attr = ITALIC | ||
on = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of on, I think end
is more clear, as in whether or not to end the attribute.
runtime/lua/man.lua
Outdated
local chars = {} | ||
local prev_char = '' | ||
local overstrike, escape = false, false | ||
local hls = {} -- Store highlight groups as { attr, start, end } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really janky imo to not name the fields stored in that tuple but I'm not sure if there is a better way. I have very little lua experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll name them
runtime/lua/man.lua
Outdated
local last_hl = hls[#hls] | ||
if char == prev_char then | ||
if char == '_' and attr == UNDERLINE and last_hl and last_hl[3] == byte then | ||
-- This underscore is in the middle of an underlined word |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this occur? Could you provide some sample input to make the comment more clear?
And also, is there any precedence for this behaviour on when to underline vs bold in regards to an overstriken underscore? To me, it seems intuitive that it would always become bolded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't occur too often, but one example would be the man page for mawk
. View it with less
, search for opt_expr
, and you'll see an underlined word with an underscore. The behavior of checking the previous overstrike and defaulting to bold is essentially what less
does too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very odd, I'd expect that to be a space, followed by a backspace and then an underline, not an underline, backspace and then underline. LGTM but do we need the last_hl and last_hl[3] == byte
check?
test/functional/plugin/man_spec.lua
Outdated
{{ bold = true, foreground = Screen.colors.Blue }}) | ||
end | ||
|
||
local function expect_without_highlights(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an anti-pattern. Just use screen:snapshot_util()
and it will capture highlighting with not much extra work
test/functional/plugin/man_spec.lua
Outdated
screen:detach() | ||
end) | ||
|
||
local function expect(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
screen:set_default_attr_ids
test/functional/plugin/man_spec.lua
Outdated
screen:expect(string, nil, true) | ||
end | ||
|
||
local function insert_lines(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helpers.insert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this because the text contains a lot of control characters, and I found
insert_lines("i\bi")
more readable than
helpers.insert("i<C-v><C-h>i")
Would you prefer the second form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keidax The second form is preferred, it's the prevailing convention in our tests and throughout Vim-land.
cb7531e
to
eb44519
Compare
Thanks for the feedback! I rebased and addressed all the comments. |
end) | ||
|
||
describe('In autoload/man.vim', function() | ||
describe('function man#highlight_formatted_text', function() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function name needs update
LGTM thanks for reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments on outdated issues.
This works well, and performance seems good. Thanks everyone! |
This doesn't work for me, I'm guessing because of
Still investigating. |
On second thought, it may just be because the
|
Confirmed, the following makes it work:
(My install prefix is |
I was inspired by #5847, but wanted to keep the original syntax highlighting colors:
As much as I optimized the Vimscript version it felt a little too slow. Working in Lua speeds things up nicely. My (very informal) testing with LuaJIT shows a roughly 10% overall increase in the time to start
nvim
and render a man page.It doesn't look like there's any other Lua code included in the runtime yet, but I hope this can still be considered.
I also didn't handle any ANSI SGR sequences, since
man
on my system just uses backspaces, but if there's interest I can try to include those too.