Skip to content

buffer improvement with piece tree#41172

Merged
rebornix merged 55 commits intomasterfrom
rebornix/buffer-pt
Jan 19, 2018
Merged

buffer improvement with piece tree#41172
rebornix merged 55 commits intomasterfrom
rebornix/buffer-pt

Conversation

@rebornix
Copy link
Copy Markdown
Member

@rebornix rebornix commented Jan 5, 2018

This is an early version of adopting piece table with red black tree as new text buffer. It still contains quite a few hacks (model builder, textsource) and applyEdits is not returning good info and leads to whole page re-tokenization, but we can still get an idea of how a chunk based, tree structure buffer works in vscode.

Under the hood it's a piece table, which has two string buffers to hold the original content and newly added content. Instead of using a doubly-linked list to store the pieces, here I use a red black tree in order to get a good average time complexity of most operations.

In addition to basic red black tree node properties, each node has

  1. a reference to a Piece (a mapping to buffer)
  2. the content size of its left subtree
  3. new line count of its left subtree

I didn't use a piece's offset in the document as key because that would lead to an inorder traversal of a lot nodes when inserting/deleting. The latter two properties above can help us find the position to insert/delete a node, and make sure after each operation, we only need to do some updates from the modified node to tree root, at most.


Todo:

  • Clean up hacks
  • Fix tokenization
  • Add test cases and benchmarks
  • Optimize for bulk edits
  • Optimize for Replace All

Right now the bottleneck of this implementation is CRLF. As we don't do splits by /\r\n|\r|\n/ and the system is heavily relying on line break counts, we need to do some validation for each operation as it may split CRLF to CR and LF. When the changed_buffer size is large, this validation is costly.

Recording of some crazy file (large.c is 50MB, 1.4MM lines; Heap-xyz is 50MB, 3.3MM lines)

Left: Piece Table, Right: Insider
fileopenning

@alexdima
Copy link
Copy Markdown
Member

alexdima commented Jan 5, 2018

@rebornix I wonder if all the initial loading speed advantage is due to the memory optimization (avoiding sliced strings) that I pushed a while ago. How does it look when setting AVOID_SLICED_STRINGS to false for the current implementation ?

@rebornix
Copy link
Copy Markdown
Member Author

rebornix commented Jan 5, 2018

@alexandrudima you are right, memory optimization is the major thing that makes the line buffer slow in file opening.

image

Disable AVOID_SLICED_STRINGS for files which have a lot of lines can improve the speed of opening. After disabling it, the major thing becomes split, line splitting is more costly than simply constructing line starts

Line Buffer
image

PT
image

They are close so from performance perspective, current implementation is good. The cost goes to memory.

@rebornix rebornix changed the title [WIP] buffer improvement with new model buffer improvement with piece tree Jan 18, 2018
@rebornix rebornix merged commit e54a781 into master Jan 19, 2018
@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants