Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] man.vim: highlight bold and underlined text #7623

Merged
merged 5 commits into from Jan 9, 2018

Conversation

keidax
Copy link
Contributor

@keidax keidax commented Nov 23, 2017

I was inspired by #5847, but wanted to keep the original syntax highlighting colors:
screenshot 2017-11-22 at 8 39 29 pm
As much as I optimized the Vimscript version it felt a little too slow. Working in Lua speeds things up nicely. My (very informal) testing with LuaJIT shows a roughly 10% overall increase in the time to start nvim and render a man page.

It doesn't look like there's any other Lua code included in the runtime yet, but I hope this can still be considered.

I also didn't handle any ANSI SGR sequences, since man on my system just uses backspaces, but if there's interest I can try to include those too.

@justinmk
Copy link
Member

This is cool! We certainly aren't opposed to using Lua for all runtime code that Nvim project maintains (need to make a list of the exact files somewhere ...). And we definitely forked man.vim so forking it even more is fine :)

@mhinz
Copy link
Member

mhinz commented Nov 23, 2017

Works good on macOS!

EDIT:

I think as long as there aren't any cases where SGR is used to enable bold mode (as opposed to CSI 1m), leaving it out is fine.

@mhinz
Copy link
Member

mhinz commented Nov 25, 2017

Can this be merged?

@nhooyr
Copy link
Contributor

nhooyr commented Nov 25, 2017

I'd like to review it first.


augroup man_init_highlight_groups
autocmd!
autocmd ColorScheme * call s:init_highlight_groups()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? highlight default is a one-time thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed it in order to keep the highlighting across colorscheme changes. AFAICT highlight default links are preserved after a highlight clear, but groups (other than the builtin ones) with defined default attributes get cleared. Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this happen on startup, or only if user changes colorscheme at runtime? The latter is not important. And why wouldn't this also be needed for the other hi default's, e.g. on line 19:

highlight default link manSubHeading     Function

More generally, syntax needs to be re-initialized if a colorscheme or whatever nukes it. Every plugin cannot be expected to deal with that ad-hoc.

@keidax
Copy link
Contributor Author

keidax commented Dec 6, 2017

Rebased, added escape sequence handling (for bold, italics, and underlines), plus some test cases (just for highlighting).

@@ -388,4 +387,14 @@ function! man#init_pager() abort
execute 'silent file man://'.fnameescape(ref)
endfunction

function! man#highlight_formatted_text() abort
let l:modifiable = &modifiable
set modifiable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done in the Lua code via vim.api.nvim_command() or whatever. Then we can eliminate this "public" VimL function.

Copy link
Contributor

@nhooyr nhooyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this really hard to read but that may just be because I have very little experience with Lua.

Would appreciate a review from someone else more familiar with Lua.

chars[#chars + 1] = char
elseif escape then
-- Use prev_char to store the escape sequence
prev_char = prev_char .. char
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this? Why not just use char directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's matching against the entire escape sequence, not just the last character.

add_attr_hl(match + 0) -- coerce to number
end
escape = false
elseif not prev_char:match("^%[[\020-\063]*$") then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's iterating one character at a time, that's checking for a partial CSI sequence, and throwing away the sequence if it ends up being something else.

if sgr then
local match = ''
while sgr and #sgr > 0 do
match, sgr = sgr:match("^(%d*);?(.*)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the ? before (.*)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the semicolon optional, to match sequences like \e[4;22m as well as \e[0m

end

-- Break input into UTF8 characters
for char in line:gmatch("[^\128-\191][\128-\191]*") do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need that pattern? Why not just "." as the pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"." will just match a single byte. But the input is encoded as UTF-8, so matching one byte at a time won't work if any characters fall outside of the ASCII range. More details here.

Side note: I've only ever seen UTF-8, but I wonder if other encodings would be possible based on terminal/locale settings? If so, would likely need some sort of Lua string library to deal with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as all Ctrl chars are ASCII, which I strongly suspect in man format, segmenting UTF-8 codepoints (which not unambiguously is "chars" anyway) shouldn't be needed. The highlight API only cares about bytes anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on locale settings, there could be a lot of overstruck text outside of ASCII range. Handling those multibyte text characters correctly requires splitting into codepoints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it well-defined somewhere that overstrike works on a codepoint at a time, and neither byte nor grapheme cluster (which is closer to user-perceived character)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my testing & digging in the grotty sources, it certainly looks that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of precedence, wouldn't it be more correct to handle overstrikes as applying to a grapheme and not a codepoint?

-- bullet (overstrike text '+^Ho')
attr = BOLD
char = [[·]]
elseif prev_char == [[·]] and char == 'o' then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a double square bracket?

elseif escape then
-- Use prev_char to store the escape sequence
prev_char = prev_char .. char
local sgr = prev_char:match("^%[([\020-\063]*)m$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you link http://invisible-island.net/xterm/ctlseqs/ctlseqs.html as a reference to understand the reasoning behind the structure of regular expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I forget why I chose that particular range, but [\020-\063] is wrong. For a valid CSI sequence, the bytes in between [ and the final byte must be in the ASCII range 0x20-0x3f, so that should be [\032-\063]. I don't think it's mentioned in the xterm page, but this is part of the ECMA-48 standard, sect. 5.4. I'll add an explaining comment.

on = false
elseif code == 1 then
attr = BOLD
elseif code == 21 or code == 22 then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

21 is described as a double underline at http://invisible-island.net/xterm/ctlseqs/ctlseqs.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was looking at the Wikipedia page which indicates "Bold off or Double Underline". I never saw 21 used anywhere, so I'll drop it.

attr = ITALIC
elseif code == 23 then
attr = ITALIC
on = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of on, I think end is more clear, as in whether or not to end the attribute.

local chars = {}
local prev_char = ''
local overstrike, escape = false, false
local hls = {} -- Store highlight groups as { attr, start, end }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really janky imo to not name the fields stored in that tuple but I'm not sure if there is a better way. I have very little lua experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll name them

local last_hl = hls[#hls]
if char == prev_char then
if char == '_' and attr == UNDERLINE and last_hl and last_hl[3] == byte then
-- This underscore is in the middle of an underlined word
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this occur? Could you provide some sample input to make the comment more clear?

And also, is there any precedence for this behaviour on when to underline vs bold in regards to an overstriken underscore? To me, it seems intuitive that it would always become bolded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't occur too often, but one example would be the man page for mawk. View it with less, search for opt_expr, and you'll see an underlined word with an underscore. The behavior of checking the previous overstrike and defaulting to bold is essentially what less does too.

Copy link
Contributor

@nhooyr nhooyr Dec 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very odd, I'd expect that to be a space, followed by a backspace and then an underline, not an underline, backspace and then underline. LGTM but do we need the last_hl and last_hl[3] == byte check?

@justinmk justinmk added this to the 0.2.4 milestone Dec 24, 2017
{{ bold = true, foreground = Screen.colors.Blue }})
end

local function expect_without_highlights(string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an anti-pattern. Just use screen:snapshot_util() and it will capture highlighting with not much extra work

screen:detach()
end)

local function expect(string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screen:set_default_attr_ids

screen:expect(string, nil, true)
end

local function insert_lines(...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helpers.insert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because the text contains a lot of control characters, and I found

insert_lines("i\bi")

more readable than

helpers.insert("i<C-v><C-h>i")

Would you prefer the second form?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keidax The second form is preferred, it's the prevailing convention in our tests and throughout Vim-land.

@keidax
Copy link
Contributor Author

keidax commented Dec 28, 2017

Thanks for the feedback! I rebased and addressed all the comments.

end)

describe('In autoload/man.vim', function()
describe('function man#highlight_formatted_text', function()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name needs update

@justinmk
Copy link
Member

LGTM thanks for reviews!

Copy link
Contributor

@nhooyr nhooyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinmk justinmk merged commit eb44519 into neovim:master Jan 9, 2018
@justinmk justinmk removed the RFC label Jan 9, 2018
@justinmk
Copy link
Member

justinmk commented Jan 9, 2018

This works well, and performance seems good. Thanks everyone!

@aktau
Copy link
Contributor

aktau commented Jan 9, 2018

This doesn't work for me, I'm guessing because of LUA_PATH issues I was afraid of a while ago:

E5105: Error while calling lua chunk: [string "<VimL compiled string>"]:1: module 'man' not found

Still investigating.

@justinmk justinmk removed this from the 0.2.4 milestone Jan 9, 2018
@justinmk justinmk added this to the 0.2.3 milestone Jan 9, 2018
@aktau
Copy link
Contributor

aktau commented Jan 9, 2018

On second thought, it may just be because the man.lua file is not installed:

11:20:01 ~/neovim » fname lua
./share/nvim/runtime/syntax/lua.vim
./share/nvim/runtime/doc/if_lua.txt
./share/nvim/runtime/indent/lua.vim
./share/nvim/runtime/ftplugin/lua.vim

@aktau
Copy link
Contributor

aktau commented Jan 9, 2018

Confirmed, the following makes it work:

11:25:38 neovim/runtime git:(master) » cp -r lua $HOME/neovim/share/nvim/runtime

(My install prefix is -DCMAKE_INSTALL_PREFIX:PATH=$(HOME)/neovim).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants