Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] experiment: stream buffer modifications to external process #3160

Closed
wants to merge 4 commits into from
Closed

[WIP] experiment: stream buffer modifications to external process #3160

wants to merge 4 commits into from

Conversation

rhizoome
Copy link
Contributor

@rhizoome rhizoome commented Oct 24, 2019

to extract information from the buffers interactively, allowing:

  • tree-sitter based highlighting

  • better kak-lsp/tree integration

This issue communicates that I am working on this, currently everything I have to do is straight forward: Write a journal of modifications since last save, then create a test-program that uses the journal, and then we see....

In my opinion the only obstacle for fluid integration is the need to send the full buffer to the external process on every update. Kakoune reacts very fast to external commands, I have scripts that issue quite a lot of commands in the background, Kakoune handles these smoothly.

Disclaimer: Don't get your hopes up too high, it uses shm_open() (POSIX) and a public domain md5 implementation. My C++ isn't too good and I haven't mastered the kakoune code-style, yet. I will improve this, but first I want a running PoC. I don't know if patch can reach acceptance.

  • create and clean up journal-files (shm_open)
  • write journal to files
  • create simple client
  • create syntax highlighter

@lenormf
Copy link
Contributor

lenormf commented Oct 24, 2019

There's a Murmur3 implementation already available in hash.cc, I don't think adding MD5 is necessary.

@rhizoome
Copy link
Contributor Author

There's a Murmur3 implementation already available in hash.cc, I don't think adding MD5 is necessary.

Thanks for the hint.

The problem md5 solves is that PATH_MAX (4096 on my system) is longer than NAME_MAX (255), since I have to map an absolute path to a name for shm_open(). I need something collision free. I don't know how good MurmurHash3 is for that. Usually a cryptographic hash is used for that kind of thing. Switching to another hash for mapping or another method altogether (a counter?) later, won't be problem. I just mentioned it, because it's currently the biggest change, I guess there will be more difficult things to resolve, before the patch can be accepted. If it even works... 🤔

Jean-Louis Fuchs added 2 commits October 24, 2019 10:59
to extract information from the buffers interactively, allowing:

* tree-sitter based highlighting

* better kak-lsp/tree integration

In my opinion the only obstacle for fluid integration is the need to send
the full buffer to the external process on every update. Kakoune reacts very
fast to external commands, I have scripts that issue quite a lot of commands
in the background, Kakoune handles these smoothly.

Disclaimer: Don't get your hopes up too high, I don't know if patch can reach
acceptance.
@rhizoome rhizoome changed the title wip/experiment: buffer journal for sync with external processes [WIP] experiment: buffer journal for sync with external processes Oct 24, 2019
@eraserhd
Copy link
Contributor

@ganwell This is really interesting. I'm not sure what I think about it... It definitely is connected to a bunch of things I'm doing or pondering.

parinfer-rust currently reads the whole buffer (twice, and diffs the last version against the current), but the algorithm's designer wants it to use a change API of some kind to work completely correctly. It's on my to-do list. See #2673, and tangentially, #1130. I learned, on #2673 that Kakoune already has an append-only buffer change list.

Also, I want to add buffer access to my 9p branch, #3116. It sure would be cool to come up with a mechanism that would allow us to pool efforts on these three or four points? (This branch also has the problem with unique names for buffers.)

I have some skepticism about whether using a shared-memory database could improve performance, but I guess I'm also not sure exactly how it would work. Do client programs still stream the buffer contents? Is there generally one cat-buffer program that reconstructs the contents from the shared-memory database, and that is read by client programs? Or is it intended that tool programs fork and only process change deltas? If the latter, how are they notified?

@rhizoome
Copy link
Contributor Author

rhizoome commented Oct 24, 2019

I have some skepticism about whether using a shared-memory database could improve performance, but I guess I'm also not sure exactly how it would work. Do client programs still stream the buffer contents? Is there generally one cat-buffer program that reconstructs the contents from the shared-memory database, and that is read by client programs? Or is it intended that tool programs fork and only process change deltas? If the latter, how are they notified?

Shared-memory is the wrong term: shm_open creates a file on /dev/shm, other than that its just a normal file, the idea would work over the unix-socket, too. I see you extended the socket protocol, may be I use the unix-socket later. Currently shm_open is more about a simpler implementation for the prototype. With shm_open we have multiple readers for free, with a socket we have to send the modifications to each client.

The journal-protocol will improve performance and memory-usage. A sync-client opens the original file using mmap, so pages are shared with other clients and the kakoune-server (copy on write). Kakoune streams serialized Buffer::Modification objects to the journal. The sync-client follows the journal (think tail -f) and applies all the modifications to its copy of the buffer (MAP_PRIVATE). The sync-client triggers a callback, which for example updates a tree-sitter AST. When the user saves the file, the clients update the buffer from the file-system (I don't know if mmap has functionality for that, or it means remapping the file) and the journal is truncated. New modifications are streamed in.

If the idea works, I would create a rust library that implements the buffer-sync and tree-sitter-updates. Other projects can either use the library, running a binary per project (ie one for kak-lsp, one for kak-tree) or an umbrella-project could integrate these in one big binary. I'd rather have multiple binaries: unix-philosophy.

A tree-sitter-based highlighter can read the journal and send range-commands into the unix-socket, no shell calls needed. Most other use-cases still need shell calls, but they have an up to date buffer ready (a persistent process is needed).

I am not sure if the async nature of the journal, means that data-races make any plugin using it buggy, that's why I want to do a prototype with highlighting. I guess the timestamps should solve synchronization problems, but I don't understand the implications, yet.

@rhizoome
Copy link
Contributor Author

rhizoome commented Nov 5, 2019

Streaming is implemented 🎉

asciicast

Please note none of this code is intended for merging, it is all just prototype quality.

What I've learned

  • Streaming works with low delay, even though the client-code refreshes the whole file on every modification

  • Following a shm_open-file (a file on /dev/shm) needs ugly follow-code (tail -f) or inotify

  • A socket or fifo would not need follow-code, but the multiple readers aren't for free anymore

  • I guess I have to switch to the unix-socket, which has its own problems. For example multiplexing the buffers and identifying the buffers

  • Ropes are nice

Next

I will try if actual syntax-highlighting works. I guess I need to serialize the timestamp too.

@lenormf
Copy link
Contributor

lenormf commented Nov 5, 2019

Why don't you write a client instead of having the server write everything to a file? You could create a new UI type that sends serialized information on the socket.

@rhizoome
Copy link
Contributor Author

rhizoome commented Nov 5, 2019

Why don't you write a client instead of having the server write everything to a file? You could create a new UI type that sends serialized information on the socket.

I guess that is what I mean with "switch to the unix-socket": Extend the client protocol, with another type of client. In the end it is the best solution and has the best chance to be accepted.

@mawww
Copy link
Owner

mawww commented Nov 10, 2019

Hello,

It is still unclear to me what you are trying to solve with this, do we have use cases where transmitting the whole buffer has unreasonable overhead ?

Assuming it is the case, this seems to stream writes to the undo tree, I think a better approach would be to finalize the work on exposing the undo tree, then once you have a apply_modification function (which is still necessary in your approach) you can get from one buffer state to another by implementing the Buffer::move_to algorithm client side (so apply reverse modifications up to common parent, then apply modifications down to new state).

If this proves a bit too complex client side, we could expose say '$kak_modifications_from_ID_to_ID` that would expose the list of modifications to apply to get to some history id from another given one.

@eraserhd @ul What are your requirements for transfering buffer data and buffer updates, is transferring full buffer too costly ? Is there an approach you expect to be easier to work with ?

@ul
Copy link
Contributor

ul commented Nov 10, 2019

LSP requires sending entire buffer content to language server on any change. Therefore streaming changes might improve only Kakoune -> kak-lsp communication as less data needs to be transferred via stdio and then socket. It would come at additional cost of applying changes on the kak-lsp side before sending content to the language server. It wouldn't add cost of keeping the old buffer in kak-lsp memory as we already (unfortunately) have to do it to deal with position translation (https://github.com/ul/kak-lsp/#positioncharacter-interpretation). I guess for buffers less than hundreds of megabytes benefit would be marginal, if any. But hard to say without benchmarks.

kak-tree might benefit from streaming changes as tree-sitter supports incremental parsing. But this comes at the cost of making kak-tree stateful. Right now any kak-tree command is just a one-shot executable invocation.

In both cases, it's easier to work with the whole buffer transmission. I didn't notice much lag when using this strategy, but I must admit that I used Kakoune and kak-{lsp,tree} only on a high-end laptop.

@rhizoome rhizoome changed the title [WIP] experiment: buffer journal for sync with external processes [WIP] experiment: stream buffer changes to external process Nov 11, 2019
@rhizoome rhizoome changed the title [WIP] experiment: stream buffer changes to external process [WIP] experiment: stream buffer modifications to external process Nov 11, 2019
@rhizoome
Copy link
Contributor Author

rhizoome commented Nov 11, 2019

External realtime highlighting

My use-case is external, realtime highlighting using tree-sitter. Tree-sitter is an incremental parser. It has better highlighting quality than regexes, because it knows the context and what symbols refer to. It does incremental reparsing, which is much faster than parsing the whole file on every change.

When I researched the possibilities, I read in several issues that external highlighting causes too much load. Also Tree-sitter wants edits (modifications). I didn't think about exposing the modifications to the environment, when I started this. [1]

So I had the idea to stream the modifications: The modifications stream into the external highlighter and it sends highlighting commands back.

Experiment

However I don't think this is a must-have feature. Kakounes current highlighting is good.

It is interesting that LSP requires sending the whole buffer on every change. I assumed the way tree-sitter works would be common practice. This means there are a lot of implementations out there, that work fine by sending the whole buffer. Also TextDocumentSyncKind.Incremental suggests that LSP supports sending edits? Was it added later? @ul

I am prepared to drop idea at any stage, if it is clear that the benefit is not worth the cost..

But I was even more skeptic before I run the simple buffer-client. It was surprisingly easy and even the current crude implementation did not cause any noticeable additional load.

@mawww Assuming you wanted external, realtime highlighting and you wanted to send edits as tree-sitters interface suggests, how would you implement it? Do you think it would be worth the effort and cost?

[1] I will test InsertChar/Delete plus exposing the modifications. If it really causes too much load, maybe a rate limited InsertChar/Delete would help?

@eraserhd
Copy link
Contributor

eraserhd commented Nov 11, 2019

@ganwell What issues did you read that external highlighting caused too much load? I haven't seen any here, and Kakoune gets away with a lot of stuff that is conventionally "too slow".

Sending the entire buffer twice on each keystroke (they are diffed) has had no performance issues for me. One of my machines is a 3Ghz AMD and the other is a modern Macbook. If there are machines or environments that don't perform well, we should look at whether we can improve the performance first.

It's possible that parsing highlighter ranges pauses somehow. I've seen a pause that I can't explain in kak-ansi for large files. Although it also modifies the buffer with one large change, so it could Kakoune's internal diffing. If this is the issue, streaming changes won't help, and we should fix performance with the highlighter.

I'm personally excited about tree-sitter and external highlighters, and I'm pretty sure they could be run with the existing mechanisms. At least I haven't seen any counter-evidence yet.

Also, I'll bet it would be best to run external highlighters on NormalIdle. I imagine in Kakoune it would use a ranges highlighter referring to an option that would be populated asynchronously, which means that Kakoune would do a decent job of updating the highlighting while a person types, and I'll bet the highlighter would run at longest every few words for a really fast typist, since coding has a lot of pauses.

@rhizoome
Copy link
Contributor Author

rhizoome commented Nov 11, 2019

@ganwell What issues did you read that external highlighting caused too much load? I haven't seen any here, and Kakoune gets away with a lot of stuff that is conventionally "too slow".

I can't find them anymore, there was one where @mawww said that calling a shell on InsertChar would be a bad idea, but it was a long time ago (like around issue 500). The others were issues where people wanted to implement this or that in kakoune instead of using hooks,

Also, I'll bet it would be best to run external highlighters on NormalIdle. I imagine in Kakoune it would use a ranges highlighter referring to an option that would be populated asynchronously, which means that Kakoune would do a decent job of updating the highlighting while a person types, and I'll bet the highlighter would run at longest every few words for a really fast typist, since coding has a lot of pauses.

I agree with you, if you use NormalIdle, it takes a very large file for any performance issues to appear. But I don't understand how this can be realtime. If you use NormalIdle, you type something, then you have to exit insert-mode, wait a tiny moment and then the text gets highlighted.

On the other hand I just tried InsertIdle and you have to type very fast to block updates. I think InsertIdle plus NormalIdle is a good solution. But I still like to feed edits into tree-sitter and I want to only generate highlights for the text that has changed. #2673 solves that right?

@mawww what do you think about InsertIdle plus NormalIdle plus exposing modifications? I am now completely convinced that streaming is not needed and I am glad, I wanted to focus on the tree-sitter integration and got distracted by this.

  1. On InsertIdle I can send the $kak_uncommitted_modifications from Add 'history' expansion #2673 to the highlighter

  2. On NormalIdle I can send the whole buffer and diff in the highlighter

@eraserhd
Copy link
Contributor

I was hoping to get back to #2673 before Clojure/conj this year, but that's only 1.5 weeks away, so I think it isn't going to happen :(

#2673 isn't ready yet, and it was (for me) motivated by parinfer - but not for performance reasons. There are cases where parinfer wants to know exactly what's changed in order to know how to respond. For example, if you have (((, deleting the middle parenthesis should behave differently from deleting the last parenthesis, but we can't figure out which was deleted by diffing.

@rhizoome
Copy link
Contributor Author

@eraserhd I guess I will just use #2673 und follow the development.

@ul
Copy link
Contributor

ul commented Nov 13, 2019

Also TextDocumentSyncKind.Incremental suggests that LSP supports sending edits? Was it added later? @ul

Yes, you are right. I was about to answer that it's not widely supported (as it was at the time I checked it in the past), but I went and double-checked a few "popular" language servers, and most of them advertise incremental sync in server capabilities response.

@rhizoome
Copy link
Contributor Author

rhizoome commented Dec 5, 2019

@mawww

to finalize the work on exposing the undo tree

You mean this issue right? #2673

@mawww
Copy link
Owner

mawww commented Dec 5, 2019

Correct.

@rhizoome
Copy link
Contributor Author

rhizoome commented Dec 7, 2019

I started working on sync based on the history expansion.

https://github.com/ganwell/hitch

This is super WIP and the names are just working titles.

@rhizoome rhizoome closed this Dec 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants