[RDY] Data structure improvements #2650

tarruda · 2015-05-14T07:09:00Z

Some changes taken from a private branch where I refactor most libuv code to run in a background IO thread(ref #2371). Pushing as a separate PR to simplify reviewing.

ZyX-I · 2015-05-14T13:24:51Z

About rbuffer: can you consider implementing it on top of ringbuf.h from shada branch? I have a generic (though currently untested) ring buffer implementation there that uses macros and it is currently used for holding an array of structures (so I can’t reuse yours which holds pointers).

Though it looks like I am missing something regarding what is a ring buffer: (size_t)(buf->end_ptr - buf->start_ptr) cannot possibly be ring buffer size because it is OK for ring buffer to have start after the end. E.g. from wiki:

This image shows a full buffer with two elements having been overwritten:

(end - start) here will return -1.

tarruda · 2015-05-14T13:54:31Z

About rbuffer: can you consider implementing it on top of ringbuf.h from shada branch? I have a generic (though currently untested) ring buffer implementation there that uses macros and it is currently used for holding an array of structures (so I can’t reuse yours which holds pointers).

I'm gonna give it a shot

tarruda · 2015-05-14T13:58:06Z

Though it looks like I am missing something regarding what is a ring buffer: (size_t)(buf->end_ptr - buf->start_ptr) cannot possibly be ring buffer size because it is OK for ring buffer to have start after the end. E.g. from wiki:

In this implementation, end_ptr is a pointer to the end of the array. The last element is represented by write_ptr("end" in your diagram)

ZyX-I · 2015-05-14T14:15:11Z

@tarruda I have just pushed some fixes to my implementation. Manual test run OK after that.

tarruda · 2015-05-14T14:18:39Z

@ZyX-I I'd like to keep the possibility of passing the ring buffer pointer to functions that write or read directly to memory locations, which is why I have implemented the RBUFFER_WHILE_NOT_{FULL,EMPTY} macros.

If the library I'm interfacing with exposes it's own internal buffer, I can just use rbuffer_read/rbuffer_write which is the case for msgpack(see channel.c/parse_msgpack)

But if the library expects a pointer(libvterm/vterm_input_write for example), I'd have to copy the memory to a temporary buffer and then pass it to the library which is a bit inefficient. With terminal_receive/vterm_input_write I usee RBUFFER_WHILE_NOT_EMPTY, which will feed the data by passing the pointer directly(possibly in 2 passes since it may be necessary to wrap around the buffer).

Could you implement something similar to RBUFFER_WHILE_NOT_{FULL,EMPTY} on your generic ring buffer?

ZyX-I · 2015-05-14T14:32:46Z

@tarruda I guess here are the differences (yours → mine):

rbuffer_capacity() → None (using rb->buf_end - rb->buf directly when needed).
rbuffer_space() → None
rb->length → rb_length()
rbuffer_free() → rb_dealloc()
rbuffer_write_ptr(), rbuffer_read_ptr() → None (not generic)
rbuffer_produced() → None (I assume that pushing to ringbuffer is done only via rb_push or rb_insert)
rbuffer_consumed() → None (it is not assumed that reading consumes ring buffer)
rbuffer_read() → None
rbuffer_write() → a sequence of rb_push() calls
None → rb_free() (simple characters need not be freed)
None → rb_insert() (you do not need to insert into the middle of the ring buffer)
None → RINGBUF_ITER_BACK (you do not need to process ring buffer from the end to the start)
None → rb_remove() (you do not need to remove random elements)

Very different ring buffer implementations for different purposes.

tarruda · 2015-05-14T14:33:00Z

Just found an interesting ring buffer implementation: https://github.com/michaeltyson/TPCircularBuffer

This ones uses a virtual memory technique that avoids the need for wrapping logic, I wonder if we can do something similar without introducing compatibility issues?

ZyX-I · 2015-05-14T14:57:31Z

With that big amount of differences it will be easier to keep your buffer implementation, but rename it to “string ring buffer” (because mine is generic). I can add callbacks, but your variant is better suited for your purposes.

tarruda · 2015-05-15T02:36:30Z

With that big amount of differences it will be easier to keep your buffer implementation, but rename it to “string ring buffer” (because mine is generic). I can add callbacks, but your variant is better suited for your purposes.

👍

But later I might spend some time refactoring both into a single generic data structure. If you think about it, rb_insert/rb_remove are simply special cases of writing/reading from the buffer(could be implemented on top of rbuffer_{read,write} passing 1 as the size).

ZyX-I · 2015-05-15T04:40:37Z

@tarruda Nope. My rb_insert/rb_remove support inserting at/removing from any position.

By the way, what is the best variant for writing tests for it? For “manual” tests I used a standalone C file with a single nvim include (ringbuf.h). Main ideas that come to mind: use this file and base test on io.open and (I guess, better): create a helper C file which will have wrapper for all tested functions (they are static inline) and use regular unit testing infrastructure, but where this is to be located?

oni-link · 2015-05-15T18:04:42Z

src/nvim/lib/rbuffer.h

+    size_t copy_count = MIN(src_size, wcnt);
+    memcpy(wptr, src, copy_count);
+    rbuffer_produced(buf, copy_count);
+    src += copy_count;


Early return with

if (src_size == copy_count) { return size; }

?

tarruda · 2015-05-16T05:02:28Z

By the way, what is the best variant for writing tests for it? For “manual” tests I used a standalone C file with a single nvim include (ringbuf.h). Main ideas that come to mind: use this file and base test on io.open and (I guess, better): create a helper C file which will have wrapper for all tested functions (they are static inline) and use regular unit testing infrastructure, but where this is to be located?

I've added a basic infrastructure for writing helpers that are only compiled in the unit tests. Basically any C file you add to test/unit/helpers will be built into the test library, so you can use it to add wrappers for iinline functions

oni-link · 2015-05-16T08:28:56Z

Using clang-format on the whole file eval.c (:'<,'>pyf ~/git/llvm/tools/clang/tools/clang-format/clang-format.py) gives the following error:

Error detected while processing function provider#python#Call:
line   12:
Channel was closed by the client

No error on master.

oni-link · 2015-05-16T09:52:19Z

src/nvim/os/input.c

-  (void)rbuffer_write(input_buffer, buf, buf_size);
-  rbuffer_consumed(read_buffer, buf_size);
+  RBUFFER_UNTIL_READ_OR_EMPTY(read_buffer, ptr, len) {
+    (void)rbuffer_write(input_buffer, ptr, len);


If input_buffer is not read for a while and is (nearly) full, we could loose some bytes.

This bug is already present(#1813) right? I intend to fix before #2371, which will be done by allocating memory for new input.

jszakmeister · 2015-05-16T11:35:52Z

@tarruda Nope. My rb_insert/rb_remove support inserting at/removing from any position.

I'm just curious @ZyX-I, but that doesn't sound like a typical ring buffer behavior--it seems more list-like. Why do you want this in a ring buffer? Isn't it inefficient too--O(n) instead of O(1)--when not at the "end" (or "beginning" for remove) of the buffer? I'm sure you have your reasons, I'm just curious about the motivations.

oni-link · 2015-05-16T12:04:21Z

src/nvim/os/shell.c

-                                rbuffer_pending(rbuffer), false, eof);
-  rbuffer_consumed(rbuffer, written);
+  RBUFFER_UNTIL_READ_OR_EMPTY(buf, ptr, len) {
+    rbuffer_consumed(buf, write_output(ptr, len, false, eof));


Does not work for the wrap around case:
If in the first half no NL and NUL are found, write_output() returns 0. So in the second (and last) iteration of the loop the first half is used again, but the second half is not used (possible NL here).
If a NUL was found in the first iteration, it would be changed to a NL and the next iteration would
output "something".

Is write_output() working correctly for eof == true, if the two buffer halfs are written not as one?

ZyX-I · 2015-05-16T12:26:59Z

@jszakmeister It is O(n), but these are simple memmoves. The reasoning is that I a) avoid a huge amount of malloc() calls and b) it should happen rather rarely. Another options considered:

If I will use a double linked list I will have to do one additional malloc() per element.
If I will use a simple array then I will have to memmove each time element is added at the end once buffer is filled.

Though one thing looks like being worth trying: double linked list based on a pre-allocated array. This is two malloc() calls per list (list itself + array of pointers to “free” list items) (can be refactored to one malloc() if absolutely needed) and O(1) for inserting/removing at any position. Did not came up with this idea until your question and ring buffer seemed more natural for history list.

ShaDa code needs iteration, insert/append, remove; NeoVim history needs iteration, append, remove, index (take element with given index). So for ShaDa linked list is completely fine (insert and remove positions are determined using iteration, so there is no difference between ring buffer and array here), for NeoVim it is O(n) index, but most of time nobody cares though: index is AFAIK for histget(, -x) only (histget(, +x) needs iteration). Maybe it is worth refactoring in both places.

oni-link · 2015-05-16T12:38:49Z

src/nvim/tui/term_input.inl

-      if (ptr[i] == '\x1b') {
-        break;
+
+    RBUFFER_UNTIL_READ_OR_EMPTY(input->read_buffer, ptr, len) {


Can't we use rbuffer_read_ptr() instead of the RBUFFER_UNTIL_READ_OR_EMPTY macro?
handle_bracket_*() is otherwise not checked after something was consumed.

oni-link · 2015-05-24T16:49:06Z

src/nvim/tui/term_input.inl

+    }
+
+    RBUFFER_UNTIL_EMPTY(input->read_buffer, ptr, len) {
+      size_t consumed = termkey_push_bytes(input->tk, ptr, MIN(count, len));


termkey_push_bytes() could return a value bigger than input->read_buffer->size, so the following call to rbuffer_consumed() would fail.

How can that happen? We are passing MIN(count, len) to termkey_push_bytes which should always be <= input->read_buffer->size

If the termkey buffer gets full it will return (size_t)-1.

size_t termkey_push_bytes(TermKey *tk, const char *bytes, size_t len) { if(tk->buffstart) { memmove(tk->buffer, tk->buffer + tk->buffstart, tk->buffcount); tk->buffstart = 0; } /* Not expecting it ever to be greater but doesn't hurt to handle that */ if(tk->buffcount >= tk->buffsize) { errno = ENOMEM; return (size_t)-1; } if(len > tk->buffsize - tk->buffcount) len = tk->buffsize - tk->buffcount; // memcpy(), not strncpy() in case of null bytes in input memcpy(tk->buffer + tk->buffcount, bytes, len); tk->buffcount += len; return len; }

I'm not sure on how to handle this error, my best guess is to treat as an out of memory condition and exit the program. What do you think?

@leonerd, is this tk->buffcount >= tk->buffsize condition related to libtermkey failing to allocate memory at some point?

Not so much failing, as simply trying to keep a bounded size. I felt it best not to grow the buffer arbitrarily.

Not so much failing, as simply trying to keep a bounded size. I felt it best not to grow the buffer arbitrarily.

ok, but if the caller consumes all parsed keys right after every termkey_push_bytes call, then this condition is not possible right?

Remove helpers.vim_init and simply perform the required initialization in helpers.lua.

The python3 emulation layer doesn't work well enough to run that test. Also add notes to test86/test87 explaining why.

This event loop is just a stub instance used in synchronous libuv function calls, it needs to be decoupled from the main event loop in order to run it from another thread.

- Add `kl_shift_at` macro and backing function. This can be used to shift elements at arbitrary positions. `kl_shift` is now defined on top of the new macro. - Change shift/push API, now `kl_push` accepts an object as parameter and `kl_shift` returns the object instead of a status. An assertion against shifting at the end of a list(or empty lists) was added. - Add `kl_iter` and `kl_iter_at` macros. `kl_iter_at` is for starting the iteration at arbitrary positions.

Libuv will return 0 to signal that the buffer allocated by `alloc_cb` wasn't used, and in this case the read_cb should simply be ignored.

Extract the RBuffer class from rstream.c and reimplement it as a ring buffer, a more efficient version that doesn't need to relocate memory. The old rbuffer_read/rbuffer_write interfaces are kept for simple reading/writing, and the RBUFFER_UNTIL_{FULL,EMPTY} macros are introduced to hide wrapping logic when more control is required(such as passing the buffer pointer to a library function that writes directly to the pointer) Also add a basic infrastructure for writing helper C files that are only compiled in the unit test library, and use this to write unit tests for RBuffer which contains some macros that can't be accessed directly by luajit. Helped-by: oni-link <knil.ino@gmail.com> Reviewed-by: oni-link <knil.ino@gmail.com> Reviewed-by: Scott Prager <splinterofchaos@gmail.com> Reviewed-by: Justin M. Keyes <justinkz@gmail.com> Reviewed-by: Michael Reed <m.reed@mykolab.com>

tarruda force-pushed the data-structure-improvements branch from 34034a6 to 2428ee8 Compare May 14, 2015 07:11

marvim added the RFC label May 14, 2015

tarruda force-pushed the data-structure-improvements branch from 2428ee8 to d30cfd0 Compare May 14, 2015 07:48

tarruda self-assigned this May 14, 2015

tarruda added enhancement feature request refactor changes that are not features or bugfixes labels May 14, 2015

oni-link reviewed May 15, 2015
View reviewed changes

tarruda force-pushed the data-structure-improvements branch 2 times, most recently from 99f40a9 to 7e58a6a Compare May 16, 2015 05:43

oni-link reviewed May 16, 2015
View reviewed changes

tarruda force-pushed the data-structure-improvements branch from fcbea1f to 0da99a5 Compare May 21, 2015 11:42

tarruda changed the title ~~[RDY] Data structure improvements~~ [WIP] Data structure improvements May 21, 2015

marvim added WIP and removed RDY labels May 21, 2015

oni-link reviewed May 24, 2015
View reviewed changes

tarruda force-pushed the data-structure-improvements branch 9 times, most recently from 77d46e6 to 0786ef9 Compare July 1, 2015 08:40

tarruda and others added 6 commits July 1, 2015 05:40

test: Ensure proper initialization in unit/helpers.lua

c8c5af5

Remove helpers.vim_init and simply perform the required initialization in helpers.lua.

test: Don't run legacy test 87

67e45e1

The python3 emulation layer doesn't work well enough to run that test. Also add notes to test86/test87 explaining why.

os/fs: Use module-local uv_loop_t instance

d6ed2b3

This event loop is just a stub instance used in synchronous libuv function calls, it needs to be decoupled from the main event loop in order to run it from another thread.

rstream: Fix bug triggered when libuv doesn't use the allocated buffer

dcaf9c6

Libuv will return 0 to signal that the buffer allocated by `alloc_cb` wasn't used, and in this case the read_cb should simply be ignored.

tarruda force-pushed the data-structure-improvements branch from 0786ef9 to 0ef80b9 Compare July 1, 2015 08:40

tarruda changed the title ~~[WIP] Data structure improvements~~ [RDY] Data structure improvements Jul 1, 2015

marvim added RDY and removed WIP labels Jul 1, 2015

tarruda merged commit 0ef80b9 into neovim:master Jul 1, 2015

jszakmeister removed the RDY label Jul 1, 2015

tarruda added a commit that referenced this pull request Jul 1, 2015

Merge PR #2650 'Data structure improvements'

b656a95

bfredl mentioned this pull request Jul 1, 2015

[RDY] Don't allow changing encoding after startup scripts #2929

Merged

fwalch mentioned this pull request Jul 5, 2015

Mouse wheel scroll inserts text into the command window #2936

Closed

tarruda mentioned this pull request Jul 7, 2015

[RDY] Refactor event loop layer #2980

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RDY] Data structure improvements #2650

[RDY] Data structure improvements #2650

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 14, 2015

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 15, 2015

ZyX-I commented May 15, 2015

oni-link May 15, 2015

tarruda May 15, 2015

tarruda commented May 16, 2015

oni-link commented May 16, 2015

oni-link May 16, 2015

tarruda May 17, 2015

jszakmeister commented May 16, 2015

oni-link May 16, 2015

ZyX-I commented May 16, 2015

oni-link May 16, 2015

oni-link May 24, 2015

tarruda May 24, 2015

oni-link May 24, 2015

tarruda Jun 30, 2015

leonerd Jun 30, 2015

tarruda Jun 30, 2015

[RDY] Data structure improvements #2650

[RDY] Data structure improvements #2650

Conversation

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 14, 2015

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 14, 2015

ZyX-I commented May 14, 2015

tarruda commented May 15, 2015

ZyX-I commented May 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarruda commented May 16, 2015

oni-link commented May 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jszakmeister commented May 16, 2015

Choose a reason for hiding this comment

ZyX-I commented May 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment