Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDY] Data structure improvements #2650

Merged
merged 6 commits into from Jul 1, 2015

Conversation

tarruda
Copy link
Member

@tarruda tarruda commented May 14, 2015

Some changes taken from a private branch where I refactor most libuv code to run in a background IO thread(ref #2371). Pushing as a separate PR to simplify reviewing.

@tarruda tarruda force-pushed the data-structure-improvements branch from 34034a6 to 2428ee8 Compare May 14, 2015 07:11
@marvim marvim added the RFC label May 14, 2015
@tarruda tarruda force-pushed the data-structure-improvements branch from 2428ee8 to d30cfd0 Compare May 14, 2015 07:48
@tarruda tarruda self-assigned this May 14, 2015
@tarruda tarruda added enhancement feature request refactor changes that are not features or bugfixes labels May 14, 2015
@ZyX-I
Copy link
Contributor

ZyX-I commented May 14, 2015

About rbuffer: can you consider implementing it on top of ringbuf.h from shada branch? I have a generic (though currently untested) ring buffer implementation there that uses macros and it is currently used for holding an array of structures (so I can’t reuse yours which holds pointers).

Though it looks like I am missing something regarding what is a ring buffer: (size_t)(buf->end_ptr - buf->start_ptr) cannot possibly be ring buffer size because it is OK for ring buffer to have start after the end. E.g. from wiki:

This image shows a full buffer with two elements having been overwritten:

(end - start) here will return -1.

@tarruda
Copy link
Member Author

tarruda commented May 14, 2015

About rbuffer: can you consider implementing it on top of ringbuf.h from shada branch? I have a generic (though currently untested) ring buffer implementation there that uses macros and it is currently used for holding an array of structures (so I can’t reuse yours which holds pointers).

I'm gonna give it a shot

@tarruda
Copy link
Member Author

tarruda commented May 14, 2015

Though it looks like I am missing something regarding what is a ring buffer: (size_t)(buf->end_ptr - buf->start_ptr) cannot possibly be ring buffer size because it is OK for ring buffer to have start after the end. E.g. from wiki:

In this implementation, end_ptr is a pointer to the end of the array. The last element is represented by write_ptr("end" in your diagram)

@ZyX-I
Copy link
Contributor

ZyX-I commented May 14, 2015

@tarruda I have just pushed some fixes to my implementation. Manual test run OK after that.

@tarruda
Copy link
Member Author

tarruda commented May 14, 2015

@ZyX-I I'd like to keep the possibility of passing the ring buffer pointer to functions that write or read directly to memory locations, which is why I have implemented the RBUFFER_WHILE_NOT_{FULL,EMPTY} macros.

If the library I'm interfacing with exposes it's own internal buffer, I can just use rbuffer_read/rbuffer_write which is the case for msgpack(see channel.c/parse_msgpack)

But if the library expects a pointer(libvterm/vterm_input_write for example), I'd have to copy the memory to a temporary buffer and then pass it to the library which is a bit inefficient. With terminal_receive/vterm_input_write I usee RBUFFER_WHILE_NOT_EMPTY, which will feed the data by passing the pointer directly(possibly in 2 passes since it may be necessary to wrap around the buffer).

Could you implement something similar to RBUFFER_WHILE_NOT_{FULL,EMPTY} on your generic ring buffer?

@ZyX-I
Copy link
Contributor

ZyX-I commented May 14, 2015

@tarruda I guess here are the differences (yours → mine):

rbuffer_capacity() → None (using rb->buf_end - rb->buf directly when needed).
rbuffer_space() → None
rb->length → rb_length()
rbuffer_free() → rb_dealloc()
rbuffer_write_ptr(), rbuffer_read_ptr() → None (not generic)
rbuffer_produced() → None (I assume that pushing to ringbuffer is done only via rb_push or rb_insert)
rbuffer_consumed() → None (it is not assumed that reading consumes ring buffer)
rbuffer_read() → None
rbuffer_write() → a sequence of rb_push() calls
None → rb_free() (simple characters need not be freed)
None → rb_insert() (you do not need to insert into the middle of the ring buffer)
None → RINGBUF_ITER_BACK (you do not need to process ring buffer from the end to the start)
None → rb_remove() (you do not need to remove random elements)

Very different ring buffer implementations for different purposes.

@tarruda
Copy link
Member Author

tarruda commented May 14, 2015

Just found an interesting ring buffer implementation: https://github.com/michaeltyson/TPCircularBuffer

This ones uses a virtual memory technique that avoids the need for wrapping logic, I wonder if we can do something similar without introducing compatibility issues?

@ZyX-I
Copy link
Contributor

ZyX-I commented May 14, 2015

With that big amount of differences it will be easier to keep your buffer implementation, but rename it to “string ring buffer” (because mine is generic). I can add callbacks, but your variant is better suited for your purposes.

@tarruda
Copy link
Member Author

tarruda commented May 15, 2015

With that big amount of differences it will be easier to keep your buffer implementation, but rename it to “string ring buffer” (because mine is generic). I can add callbacks, but your variant is better suited for your purposes.

👍

But later I might spend some time refactoring both into a single generic data structure. If you think about it, rb_insert/rb_remove are simply special cases of writing/reading from the buffer(could be implemented on top of rbuffer_{read,write} passing 1 as the size).

@ZyX-I
Copy link
Contributor

ZyX-I commented May 15, 2015

@tarruda Nope. My rb_insert/rb_remove support inserting at/removing from any position.

By the way, what is the best variant for writing tests for it? For “manual” tests I used a standalone C file with a single nvim include (ringbuf.h). Main ideas that come to mind: use this file and base test on io.open and (I guess, better): create a helper C file which will have wrapper for all tested functions (they are static inline) and use regular unit testing infrastructure, but where this is to be located?

size_t copy_count = MIN(src_size, wcnt);
memcpy(wptr, src, copy_count);
rbuffer_produced(buf, copy_count);
src += copy_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return with

if (src_size == copy_count) {
   return size;
}

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tarruda
Copy link
Member Author

tarruda commented May 16, 2015

By the way, what is the best variant for writing tests for it? For “manual” tests I used a standalone C file with a single nvim include (ringbuf.h). Main ideas that come to mind: use this file and base test on io.open and (I guess, better): create a helper C file which will have wrapper for all tested functions (they are static inline) and use regular unit testing infrastructure, but where this is to be located?

I've added a basic infrastructure for writing helpers that are only compiled in the unit tests. Basically any C file you add to test/unit/helpers will be built into the test library, so you can use it to add wrappers for iinline functions

@tarruda tarruda force-pushed the data-structure-improvements branch 2 times, most recently from 99f40a9 to 7e58a6a Compare May 16, 2015 05:43
@oni-link
Copy link
Contributor

Using clang-format on the whole file eval.c (:'<,'>pyf ~/git/llvm/tools/clang/tools/clang-format/clang-format.py) gives the following error:

Error detected while processing function provider#python#Call:
line   12:
Channel was closed by the client

No error on master.

(void)rbuffer_write(input_buffer, buf, buf_size);
rbuffer_consumed(read_buffer, buf_size);
RBUFFER_UNTIL_READ_OR_EMPTY(read_buffer, ptr, len) {
(void)rbuffer_write(input_buffer, ptr, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If input_buffer is not read for a while and is (nearly) full, we could loose some bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug is already present(#1813) right? I intend to fix before #2371, which will be done by allocating memory for new input.

@jszakmeister
Copy link
Contributor

@tarruda Nope. My rb_insert/rb_remove support inserting at/removing from any position.

I'm just curious @ZyX-I, but that doesn't sound like a typical ring buffer behavior--it seems more list-like. Why do you want this in a ring buffer? Isn't it inefficient too--O(n) instead of O(1)--when not at the "end" (or "beginning" for remove) of the buffer? I'm sure you have your reasons, I'm just curious about the motivations.

rbuffer_pending(rbuffer), false, eof);
rbuffer_consumed(rbuffer, written);
RBUFFER_UNTIL_READ_OR_EMPTY(buf, ptr, len) {
rbuffer_consumed(buf, write_output(ptr, len, false, eof));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not work for the wrap around case:
If in the first half no NL and NUL are found, write_output() returns 0. So in the second (and last) iteration of the loop the first half is used again, but the second half is not used (possible NL here).
If a NUL was found in the first iteration, it would be changed to a NL and the next iteration would
output "something".

Is write_output() working correctly for eof == true, if the two buffer halfs are written not as one?

@ZyX-I
Copy link
Contributor

ZyX-I commented May 16, 2015

@jszakmeister It is O(n), but these are simple memmoves. The reasoning is that I a) avoid a huge amount of malloc() calls and b) it should happen rather rarely. Another options considered:

  • If I will use a double linked list I will have to do one additional malloc() per element.
  • If I will use a simple array then I will have to memmove each time element is added at the end once buffer is filled.

Though one thing looks like being worth trying: double linked list based on a pre-allocated array. This is two malloc() calls per list (list itself + array of pointers to “free” list items) (can be refactored to one malloc() if absolutely needed) and O(1) for inserting/removing at any position. Did not came up with this idea until your question and ring buffer seemed more natural for history list.

ShaDa code needs iteration, insert/append, remove; NeoVim history needs iteration, append, remove, index (take element with given index). So for ShaDa linked list is completely fine (insert and remove positions are determined using iteration, so there is no difference between ring buffer and array here), for NeoVim it is O(n) index, but most of time nobody cares though: index is AFAIK for histget(, -x) only (histget(, +x) needs iteration). Maybe it is worth refactoring in both places.

if (ptr[i] == '\x1b') {
break;

RBUFFER_UNTIL_READ_OR_EMPTY(input->read_buffer, ptr, len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use rbuffer_read_ptr() instead of the RBUFFER_UNTIL_READ_OR_EMPTY macro?
handle_bracket_*() is otherwise not checked after something was consumed.

@tarruda tarruda force-pushed the data-structure-improvements branch from fcbea1f to 0da99a5 Compare May 21, 2015 11:42
@tarruda tarruda changed the title [RDY] Data structure improvements [WIP] Data structure improvements May 21, 2015
@marvim marvim added WIP and removed RDY labels May 21, 2015
}

RBUFFER_UNTIL_EMPTY(input->read_buffer, ptr, len) {
size_t consumed = termkey_push_bytes(input->tk, ptr, MIN(count, len));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

termkey_push_bytes() could return a value bigger than input->read_buffer->size, so the following call to rbuffer_consumed() would fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can that happen? We are passing MIN(count, len) to termkey_push_bytes which should always be <= input->read_buffer->size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the termkey buffer gets full it will return (size_t)-1.

size_t termkey_push_bytes(TermKey *tk, const char *bytes, size_t len)
{
  if(tk->buffstart) {
    memmove(tk->buffer, tk->buffer + tk->buffstart, tk->buffcount);
    tk->buffstart = 0;
  }

  /* Not expecting it ever to be greater but doesn't hurt to handle that */
  if(tk->buffcount >= tk->buffsize) {
    errno = ENOMEM;
    return (size_t)-1;
  }

  if(len > tk->buffsize - tk->buffcount)
    len = tk->buffsize - tk->buffcount;

  // memcpy(), not strncpy() in case of null bytes in input
  memcpy(tk->buffer + tk->buffcount, bytes, len);
  tk->buffcount += len;

  return len;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on how to handle this error, my best guess is to treat as an out of memory condition and exit the program. What do you think?

@leonerd, is this tk->buffcount >= tk->buffsize condition related to libtermkey failing to allocate memory at some point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much failing, as simply trying to keep a bounded size. I felt it best not to grow the buffer arbitrarily.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much failing, as simply trying to keep a bounded size. I felt it best not to grow the buffer arbitrarily.

ok, but if the caller consumes all parsed keys right after every termkey_push_bytes call, then this condition is not possible right?

@tarruda tarruda force-pushed the data-structure-improvements branch 9 times, most recently from 77d46e6 to 0786ef9 Compare July 1, 2015 08:40
tarruda and others added 6 commits July 1, 2015 05:40
Remove helpers.vim_init and simply perform the required initialization in
helpers.lua.
The python3 emulation layer doesn't work well enough to run that test. Also add
notes to test86/test87 explaining why.
This event loop is just a stub instance used in synchronous libuv function
calls, it needs to be decoupled from the main event loop in order to run it from
another thread.
- Add `kl_shift_at` macro and backing function. This can be used to shift
  elements at arbitrary positions. `kl_shift` is now defined on top of the new
  macro.
- Change shift/push API, now `kl_push` accepts an object as parameter and
  `kl_shift` returns the object instead of a status. An assertion against
  shifting at the end of a list(or empty lists) was added.
- Add `kl_iter` and `kl_iter_at` macros. `kl_iter_at` is for starting the
  iteration at arbitrary positions.
Libuv will return 0 to signal that the buffer allocated by `alloc_cb` wasn't
used, and in this case the read_cb should simply be ignored.
Extract the RBuffer class from rstream.c and reimplement it as a ring buffer,
a more efficient version that doesn't need to relocate memory.

The old rbuffer_read/rbuffer_write interfaces are kept for simple
reading/writing, and the RBUFFER_UNTIL_{FULL,EMPTY} macros are introduced to
hide wrapping logic when more control is required(such as passing the buffer
pointer to a library function that writes directly to the pointer)

Also add a basic infrastructure for writing helper C files that are only
compiled in the unit test library, and use this to write unit tests for RBuffer
which contains some macros that can't be accessed directly by luajit.

Helped-by: oni-link <knil.ino@gmail.com>
Reviewed-by: oni-link <knil.ino@gmail.com>
Reviewed-by: Scott Prager <splinterofchaos@gmail.com>
Reviewed-by: Justin M. Keyes <justinkz@gmail.com>
Reviewed-by: Michael Reed <m.reed@mykolab.com>
@tarruda tarruda force-pushed the data-structure-improvements branch from 0786ef9 to 0ef80b9 Compare July 1, 2015 08:40
@tarruda tarruda changed the title [WIP] Data structure improvements [RDY] Data structure improvements Jul 1, 2015
@marvim marvim added RDY and removed WIP labels Jul 1, 2015
@tarruda tarruda merged commit 0ef80b9 into neovim:master Jul 1, 2015
@jszakmeister jszakmeister removed the RDY label Jul 1, 2015
tarruda added a commit that referenced this pull request Jul 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature request refactor changes that are not features or bugfixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants