Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

structured/"rich" text, text annotations/overlay #1767

Closed
phmarek opened this issue Jan 3, 2015 · 51 comments
Closed

structured/"rich" text, text annotations/overlay #1767

phmarek opened this issue Jan 3, 2015 · 51 comments
Labels
enhancement feature request extmarks extmarks, decorations, virtual text, namespaces syntax regex syntax or non-regex parsing, lpeg, grammars treesitter
Milestone

Comments

@phmarek
Copy link

phmarek commented Jan 3, 2015

With the new plugin structure and things like the MessagePack API, it becomes much easier to have external plugins; one thing that is still awkward is the coloring of information.

I'm thinking about a plugin here (slimv, to be exact, although many others would have the same issue) that wants to have its own buffer to display arbitrary information. To get some highlightning (of "active" fields and other parts), there have to be syntax rules that operate via string matching - and that is

  • cumbersome (needs separators that don't exist in the text)
  • not nice re cursor movement (think concealcursor, conceallevel)
  • slow (for big buffers)
  • unnecessary.

How about being able to specify "classes" for (parts of) text, so that the syntax coloring rules can be applied directly without needing the RE engine inbetween? I imagine sending not a plain string for a line, but something like ["a string ", { class: "Error", text: "some error string"}, "more plain text"].

If that could be stored in Neovim directly it should make lots of things easier - especially for the plugins if they could store (some) arbitrary information in the dictionary as well! (Currently such things have to be put into the line, concealed in some way, and then matched and parsed out again, which is awful.)

To give a specific example - currently a line looks like this:

{[10] "types" []} = {[11] #<HASH-TABLE :TEST EQUALP :COUNT 3 {1009D95AD3}> []} {<3> [remove entry] <>}

while the visual display is

"types"  = #<HASH-TABLE :TEST EQUALP :COUNT 3 {1009D95AD3}>  [remove entry]

(and includes colors, of course).

@justinmk
Copy link
Member

justinmk commented Jan 3, 2015

How about being able to specify "classes" for (parts of) text, so that the syntax coloring rules can be applied directly without needing the RE engine

#719 (comment) mentions scintillua, which looks like a very good alternative to regex for describing syntax.

It doesn't solve your more general request, which I interpret as "text properties". Text properties are very important but will require careful design, and won't be possible until we achieve more fundamental changes such as abstracting the buffer structure. Currently in the n/vim core, a buffer is basically a character array (memline.c); trying to bolt-on text properties will result in terrible performance.

@fmoralesc
Copy link
Contributor

vis's basic data structure (a piece chain) seems good for this kind of thing, but moving to that could require a major rewrite of the regexp engine.

@aktau
Copy link
Contributor

aktau commented Jan 3, 2015

vis's basic data structure (a piece chain) seems good for this kind of thing, but moving to that could require a major rewrite of the regexp engine.

I saw vis appear on HN too, and it intrigues me a lot. Though at some points in the description I thought: we can't omit that (feature) as he has done, (n)vim needs it.

Being able to mmap files into the memory pace is really really cool, but already breaks down when input conversion needs to be done, as said by @martanne (plus the fact that vim currently scans over the entire file to determine the encoding, which is cool since it has to read it into allocated memory anyway).

@justinmk
Copy link
Member

justinmk commented Jan 3, 2015

@aktau If a buffer is opened with :e ++enc=utf8 we could avoid conversion and scanning. And we could provide a user option that says "always assume utf8".

@aktau
Copy link
Contributor

aktau commented Jan 3, 2015

@aktau If a buffer is opened with :e ++enc=utf8 we could avoid conversion and scanning. And we could provide a user option that says "always assume utf8".

Yes, if the fenc (forced or not) is the same as enc, then a read-only mmap could be done. Otherwise not so much.

Most likely this is the majority case. But for example this would break down when opening binary files which are likely to fail the utf-8 test, which is usually when people really need large file support.

I'm also not sure of the impact of such a split-up on the manageability of the code. As I understand it, the current mem{line,file} combo is actually quite clean.

@phmarek
Copy link
Author

phmarek commented Jan 3, 2015

@aktau as soon as there's a NUL byte in the first few kBytes, the encoding should be seen as "binary", so the UTF8 test shouldn't matter here.

@phmarek
Copy link
Author

phmarek commented Jan 3, 2015

Well, it's equally possible to keep that memory layout, and "just" have an additional map (array or hash or tree or whatever) that can deliver additional details for parts of a line... don't know whether that's the same kind of rewrite, though.

@aktau
Copy link
Contributor

aktau commented Jan 3, 2015

@aktau as soon as there's a NUL byte in the first few kBytes, the encoding should be seen as "binary", so the UTF8 test shouldn't matter here.

As far as I can remember (got an unfinished blog post about this), if enc is utf-8 (which it most likely is), and fenc is something else, conversion will take place. I'm not even sure if binary is a vim encoding, I don't think so.

@aktau
Copy link
Contributor

aktau commented Jan 3, 2015

Well, it's equally possible to keep that memory layout, and "just" have an additional map (array or hash or tree or whatever) that can deliver additional details for parts of a line... don't know whether that's the same kind of rewrite, though.

Like some sort of conversion overlay, you mean. This would work well for latin-X to utf-8 or the reverse, but more distincts encodings would probably suffer.

@aktau
Copy link
Contributor

aktau commented Jan 3, 2015

All that said, I think perhaps an on-the-fly conversion overlay could work. It only being generated when a certain piece of the buffer is actually requested. I shudder at the thought of implementing this without any crazy bugs though. The thought-stuff sure is enticing.

@justinmk
Copy link
Member

justinmk commented Jan 3, 2015

On the other hand, large files support seems like yet another concept that could be added on as a plugin. In the case of binary files, syntax/highlighting is obviously not needed, so a "view" of a file could be fed to nvim and the usual motions and non-syntax plugins could work on the partial buffer.

In the case of a large log file, for which syntax/highlighting would be needed, nothing is lost because vim already has maxlines and synmaxcol values which limit the lines evaluated by the syntax engine.

Some problems I can think of with this "view" approach:

  • in-file search (/) and :vimgrep won't work. We would need to fall back to an external search tool (which likely wouldn't support vim-style regex).
  • we would need to modify nvim core to understand the concept of "deferred content". E.g., we only send the current view, but let nvim know the actual line/column count (and other parameters I haven't thought of)

Personally I really prefer trying to leverage robust external solutions and only enhancing the core by adding hooks.

@justinmk
Copy link
Member

justinmk commented Jan 3, 2015

Another reason I like the plugin approach is that in the common case, loading the file in memory is not really a problem and avoids complication. When people load large files they are unhappy about one of two things:

  • too slow
  • not enough features

If you load a giant C# (10 MB) file in Visual Studio, it will buckle (I know this for a fact). Add ReSharper and you might as well get some coffee. So you must either choose fast or good, and that means it is reasonable to disable some features (vim regex, whole-file analysis) on very large files.

I am interested to hear other cases that I am missing which would break with the "partial view" approach.

@aktau
Copy link
Contributor

aktau commented Jan 3, 2015

f you load a giant C# (10 MB) file in Visual Studio, it will buckle (I know this for a fact). Add ReSharper and you might as well get some coffee. So you must either choose fast or good, and that means it is reasonable to disable some features (vim regex, whole-file analys) on very large files.

I wasn't actually thinking about 10MB files as large. It sounds like peanuts. I was more thinking of 1-50GB size files. Which would have trouble fitting in main memory. Off the top my head, I don't know how well (n)vim does with a 10MB source file (syntax highlighted and all), but I would consider it a failure if we don't solve that (in case it has issues).

@justinmk
Copy link
Member

justinmk commented Jan 3, 2015

n/vim (with syntax highlighting and neocomplete) has no problem at all on the same 10 MB C# file (obviously VS/ReSharper are doing a lot more work on that file, so I don't mean to compare the two). I only raised that example to point out that one cannot expect all features in all scenarios.

Migrating the buffer data structure of n/vim is pretty close to a total rewrite. I find it much more interesting to see how far we can get with alternative solutions.

@phmarek
Copy link
Author

phmarek commented Jan 4, 2015

Hmm, to get back to my original request ... how about providing some kind of rich text buffer with some restrictions?

  • readonly, ie. only modifyable via replacing whole lines
  • highlightning only valid within line, so needs to be repeated for each line in a paragraph
  • cannot be saved or loaded

@justinmk
Copy link
Member

justinmk commented Jan 4, 2015

Is text properties not a correct interpretation of your original request? I don't understand what is new in the rich text buffer you describe.

@phmarek
Copy link
Author

phmarek commented Jan 4, 2015

Yeah, text properties might be a good name for it, too.

I'd need not only the highlighning class name, though - storing arbitrary data as well would be nice.

@justinmk
Copy link
Member

justinmk commented Jan 4, 2015

Storing arbitrary data in association with a piece of text, and that association follows the text as edits are made. I believe the existing marks logic could be extended to do this, though it may not be scalable.

@phmarek
Copy link
Author

phmarek commented Jan 4, 2015

Storing arbitrary data in association with a piece of text, and that
association follows the text as edits are made.
Sounds right, although for my use case no (user-)edits are needed.
I'd just replace whole lines via RPC.

As for an easy example, think about netrw directory listings with
coloring, like ls does.
Perhaps with optional other highlightning, eg. files newer than an hour,
files bigger than X, or something like that, to get more colorized items
in a line.

@tarruda
Copy link
Member

tarruda commented Jan 4, 2015

How about allowing arbitrary key/value pairs in the :highlight command(eg: :highlight SomeGroup rgba=#e1e1e1cc) and simplify association of highlight groups with arbitrary ranges? These arbitrary key/value pairs are consumed only by UIs that are interested

The advantage is that we reuse the existing mechanism for decorating text

@fmoralesc
Copy link
Contributor

@phmarek If you can compute the position of the text to highlight since it is static, shouldn't it be possible to use matchaddpos()? That said, it seems to me that would be even more cumbersome than what we currently have; I use the concealed tags method in vim-pad and I know what you mean about it being not as clean as one would want.

Perhaps introducing a virtual key to tag separators (let's say <HSep>, like <SNR>) would help that, so instead of

 {[10] "types" []} = {[11] #<HASH-TABLE :TEST EQUALP :COUNT 3 {1009D95AD3}> []} {<3> [remove entry] <>}

you could have

<Hsep>10 "types" 10<Hsep> = <Hsep>11 #<HASH-TABLE :TEST EQUALP :COUNT 3 {1009D95AD3}> 11<Hsep> <Hsep>3 [remove entry] 3<Hsep>

@tarruda That would be quite helpful, and not only for UIs.

@phmarek
Copy link
Author

phmarek commented Jan 4, 2015

@fmoralesc That might be an option, too, but not much cleaner IMO.

And, in the long run, I'd like to shoot for having a "text" property named img, to have inline images, and this here would be an "easy" first step ;P

@bfredl
Copy link
Member

bfredl commented Jan 4, 2015

matchaddpos works good for e.g. a read-only output buffer, but it's a little bit inconvenient since it only modifies the current window. In a plugin I want to dynamically highlight an output buffer that need not be the current window. Switching current window back and forth kind-of works, but is not entirely reliable. Also it would be more convenient if the added highlighting were associated with a buffer and not a window (if the window is closed and the buffer then reopened, the matches need not be re-added). A bufferwise matchaddpos is perhaps something to consider?

@fmoralesc
Copy link
Contributor

@phmarek Sure, I was only thinking of the issue of having to specify different separators depending on the contents of the region, which that would solve.

I think @tarruda's suggestion could help for implementing img. Actually, already nothing should stop a UI to interpret text like

  ![image](path)

to be displayed as a image, it's just that all UIs currently assume a grid of text. Expanding the :hi command would allow providing hints to UIs about this:

 syn match cmImage /![.\+](.\+)/ 
 hi cmImage type=img

I've thought a mechanism like this could allow plugins like NerdTree to be displayed as native lists (like this), and special buffers to be displayed using non-fixed with fonts.

@bfredl 👍

@justinmk
Copy link
Member

justinmk commented Jan 4, 2015

Not sure highlight is the right mechanism, rather :syntax. But extending vimscript seems unnecessary to me in this case. We should only reuse the internal structures, but expose the functionality via the API only.

@fmoralesc
Copy link
Contributor

@justinmk Probably. Extending both would be helpful.

(As to extending :syntax, I was thinking of adding a conceahhl attr to it, to allow highlighting different conceals differently, which has been a pain for me for a while at vim-pandoc-syntax).

Extending vimscript can also benefit vanilla vim, if the code could be proposed to vim_dev (I know...)

@justinmk
Copy link
Member

justinmk commented Jan 4, 2015

Sure if vim_dev accepts, but otherwise extending vimscript really complicates the burden of compatibility (or managing, documenting, and providing solutions for incompatibility) and always requires difficult, time-hungry decisions. Incompatibility is always an option (and sometimes we will choose it), but we can also work on making our API really nice to work with (from vimscript, too, using rpcnotify() and friends).

@tarruda
Copy link
Member

tarruda commented Jan 5, 2015

But extending vimscript seems unnecessary to me in this case. We should only reuse the internal structures, but expose the functionality via the API only.

The vimscript changes are minimal, all I'm proposing is to allow arbitrary key/value pairs after the :highlight GROUP command(as opposed to allowing only fixed keys such as guifg, ctermfg, etc). Highlight group data would be stored and passed as dictionaries, and UIs will only extract the information they support.

For example, if a TUI and a GUI are connected to the same instance and need to display a highlight group, the TUI would check for ctermfg/ctermbg while the GUI would check for guifg/guibg and even richer information such as alpha level or images as suggested by @fmoralesc suggested.

For associating the highlight information with arbitrary positions, I vote for @bfredl suggestion: a buffer-awarematchaddpos()

Besides being backwards compatible and allowing arbitrary formatting to be associated with text, this has the advantage of simplifying code (I estimate about 40% of syntax.c could be removed).

@felipesere
Copy link

@bfredl how is your tree-sitter work coming along? I had to open some multi-megabyte HTML files (terms and conditions) and had to switch highlighting off to move the cursor. That is how I stumbled here!

@bfredl
Copy link
Member

bfredl commented Sep 5, 2019

@felipesere It will be one of my priorities after 0.4 is released (very soon, hopefully).

@adaszko
Copy link

adaszko commented Oct 24, 2019

I'm not sure if this is off-topic but #1767 (comment) got me thinking: Will it be possible, once the tree-sitter branch is merged, to use language-specific syntax objects like identifier, function_definition directly from VimScript? I'm thinking of cases where for instance iskeyword option isn't precise enough (think e.g. Rust with it's foo!() and foo! being a keyword if foo is a macro, but !foo not being a keyword in if !foo {}). I see a huge potential in this. It always struck me as odd that movements like ( don't make much sense in almost any programming language, only when writing prose. With tree-sitter merged in, there could be language-specific, precise text objects available under a single key in Vim! Another example is targets.vim implementing function argument text object. It easily gets confused when there are commas within a function argument text object. I believe tree-sitter has the potential to fix that shortcoming.

@justinmk
Copy link
Member

Will it be possible, once the tree-sitter branch is merged, to use language-specific syntax objects like identifier, function_definition directly from VimScript

Yes. Directly from Lua, which is accessible from Vimscript. Initially however, we will provide only a query API. We will document patterns for using the API to query syntax, which can be used to create mappings. Later, we will think about adding first-class Normal-mode commands for common cases like "around Function" (af, if), etc.

@bfredl
Copy link
Member

bfredl commented Mar 22, 2024

A lot of work has been done in this area. more focused issues / drafts are tracking pieces that are missing.

@bfredl bfredl closed this as completed Mar 22, 2024
@neovim neovim locked as resolved and limited conversation to collaborators Mar 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement feature request extmarks extmarks, decorations, virtual text, namespaces syntax regex syntax or non-regex parsing, lpeg, grammars treesitter
Projects
None yet
Development

No branches or pull requests