[RFC] Extended Marks #5031

Open
wants to merge 103 commits into
from

Projects

None yet

7 participants

@timeyyy
timeyyy commented Jul 10, 2016 edited

These are marks designed for developers who want to build plugins or embed neovim. (they follow all column and line changes)
issue #4816

  • line insert
  • line delete
  • line join
  • line break
  • char inserts (tag #gdbgood, test failing but works in gdb..)
  • blockwise char inserts
  • char delete
  • delete with x
  • blockwise char deletes
  • char paste
  • blockwise char paste
  • replace
  • blockwise replace
  • undo of moved marks (undo of deleted marks is deferred)
  • redo
  • :move
  • insert tab
  • shiftwidth (tag #check, some tests passing but not working within gdb)
  • merge undo info where possible

In app visual testing can be done with

" Extended marks

highlight extmark ctermbg=Blue guibg=Blue
call nvim_init_mark_ns('myplugin')
function! Testextmark(timer_id)
    let a:ns = 1
    let a:all_marks = nvim_buf_get_mark_ids(0, a:ns)

    call clearmatches()

    for mark in a:all_marks
        let a:bytes = col([mark[1], mark[2]])
        let a:ma = matchaddpos('extmark', [[mark[1], mark[2]]])
    endfor
    call timer_start(50, 'Testextmark')
endfunction

nnoremap <leader>tt :call timer_start(1, 'Testextmark')<cr>
@ZyX-I ZyX-I commented on an outdated diff Jul 10, 2016
src/nvim/eval.c
@@ -12334,6 +12336,39 @@ static void f_mapcheck(typval_T *argvars, typval_T *rettv)
get_maparg(argvars, rettv, FALSE);
}
+static void f_arbmark_set(typval_T *argvars, typval_T *rettv)
+{
+ int fnum = 0;
+ buf_T *buf;
+ if (argvars[0].v_type != VAR_STRING){
@ZyX-I
ZyX-I Jul 10, 2016 Contributor

You should not check VAR_STRING, use one of get_tv_string* functions.

@ZyX-I ZyX-I commented on an outdated diff Jul 10, 2016
src/nvim/eval.c
@@ -12334,6 +12336,39 @@ static void f_mapcheck(typval_T *argvars, typval_T *rettv)
get_maparg(argvars, rettv, FALSE);
}
+static void f_arbmark_set(typval_T *argvars, typval_T *rettv)
+{
+ int fnum = 0;
+ buf_T *buf;
+ if (argvars[0].v_type != VAR_STRING){
+ EMSG(_("mark name must be a string")); return;
+ }
+ if (STRLEN(argvars[0].vval.v_string) > ARBMARK_MAXLEN){
+ EMSG(_("mark name is to large")); return;
+ }
+ if (argvars[1].v_type != VAR_NUMBER){
@ZyX-I
ZyX-I Jul 10, 2016 Contributor

get_tv_number*

@ZyX-I ZyX-I and 2 others commented on an outdated diff Jul 10, 2016
src/nvim/editor/arbmark.c
+#include "nvim/lib/kvec.h"
+#include "nvim/editor/arbmark.h"
+
+# define FOR_ALL_ARBMARKS(buf) \
+ kbitr_t = itr; \
+ arbmark_T = *arbmark; \
+ for (;kb_itr_valid(&itr) \
+ ;kb_itr_next(arbmark_T, buf->b_arbmarks_tree; &itr)){
+
+static arbmark_T *_find_pos(pos_t pos, bool FORWARD);
+static int _arbmark_create(buf_T *buf, cstr_t *name, pos_T *pos);
+static int _arbmark_update(buf_T *buf, arbmark_T *arbmark, pos_T *pos);
+static int _arbmark_delete(buf_T *buf, cstr_t *name);
+static arbmark_T *get_arbmark(buf_T *buf, cstr_t *name);
+static int pos_lt(pos_T *pos, pos_T *pos2);
+static int pos_eq(pos_T *pos, pos_T *pos2);
@ZyX-I
ZyX-I Jul 10, 2016 Contributor

Everything here should be generated. Just if you add a new directory you need to add it to the directory list in src/nvim/CMakeLists.txt.

@justinmk
justinmk Jul 10, 2016 Member

@timeyyy IOW you can just delete these lines.

@ZyX-I
ZyX-I Jul 10, 2016 Contributor

@justinmk He is missing #include with generated functions. And he is missing it because until editor/ subdirectory is added to the list nothing is going to be generated.

@timeyyy
timeyyy Jul 11, 2016

@justinmk the define should stay? do i need any includes in the c file? what does the auto generation.

@timeyyy
timeyyy Jul 11, 2016 edited

after deleting all these declarations i get implicit declaration of function errors. Am i suppose to silence these or does this mean generation some how failed?

@ZyX-I
ZyX-I Jul 11, 2016 edited Contributor

@timeyyy Check other files, they have include guarded by INCLUDE_GENERATED_DECLARATIONS, usually right after typedef’s. But as I said, as long as you have this in new editor/ subdirectory src/nvim/CMakeLists.txt needs to be edited for declarations generator to generate anything. This also applies to *.h file, and it also should not have function declarations.

@justinmk
justinmk Jul 11, 2016 Member

@timeyyy To be clear, CMakeLists.txt does not need to be edited if you move the file(s) out of editor/...

@ZyX-I ZyX-I commented on an outdated diff Jul 10, 2016
src/nvim/editor/arbmark.c
+static int pos_eq(pos_T *pos, pos_T *pos2);
+
+/* Create or update an arbmark, */
+int arbmark_set(buf_T *buf, char_u *name, pos_T *pos)
+{
+ arbmark_T *arbmark = get_arbmark(buf, name);
+ if (!arbmark){
+ return _arbmark_create(buf, name, pos);
+ }
+ else {
+ return _arbmark_update(buf, arbmark, name, pos);
+ }
+}
+
+/* Will fail silently on missing name */
+int arbmark_unset(buf_T *buf, cstr_t *name){
@ZyX-I
ZyX-I Jul 10, 2016 Contributor

{ should be on its own line. Everywhere { should be preceded by either space or newline.

@justinmk justinmk and 3 others commented on an outdated diff Jul 10, 2016
src/nvim/editor/arbmark.c
@@ -0,0 +1,156 @@
+/*
+ * Same function but names are not limited to one char
+ * There is also no ex_command, just viml functions
+ */
+
+#include "nvim/vim.h"
+#include "nvim/globals.h"
+#include "nvim/mark.h"
+#include "nvim/memory.h"
+#include "nvim/map.h"
+#include "nvim/lib/kvec.h"
+#include "nvim/editor/arbmark.h"
@justinmk
justinmk Jul 10, 2016 edited Member

I would suggest not introducing editor/ namespace--it is redundant. We have api/ to separate non-"editor" functionality, all top-level modules are implicitly part of the core editor.

Also consider renaming the file mark_extended.{h,c} so that it sorts next to mark.{h,c}.

@ZyX-I
ZyX-I Jul 10, 2016 edited Contributor

@justinmk I also do not like name arbmark.c. But with mark_extended what should be the function prefix, mark_extended itself is too long? I can suggest extmark.

@justinmk
justinmk Jul 10, 2016 Member

extmark is fine with me for internal identifiers.

@justinmk
justinmk Jul 10, 2016 Member

Also I am wondering if this feature should be API-only (and exposed to VimL as nvim_extmark or whatever). We should perhaps make that our default choice.

@timeyyy
timeyyy Jul 11, 2016

Do i need to do something to make it API only?

@ZyX-I
ZyX-I Jul 11, 2016 Contributor

@timeyyy “API only” means that you need public interface in src/nvim/api/(I guess)buffer.c, and this should be the only public interface: no Ex commands, no VimL functions. Tests in test/functional/api.

Note: public interface. Implementation in mark_extended.c.

@bfredl
bfredl Jul 11, 2016 Member

It might be worth pointing out here that API functions will soon be accessible from VimL, though with different naming and type-checking conventions than native VimL builtins. The benefit is that the generated wrappers will make (most of) the type-checking and conversion for you.

@ZyX-I ZyX-I commented on an outdated diff Jul 10, 2016
src/nvim/editor/arbmark.c
+#include "nvim/mark.h"
+#include "nvim/memory.h"
+#include "nvim/map.h"
+#include "nvim/lib/kvec.h"
+#include "nvim/editor/arbmark.h"
+
+# define FOR_ALL_ARBMARKS(buf) \
+ kbitr_t = itr; \
+ arbmark_T = *arbmark; \
+ for (;kb_itr_valid(&itr) \
+ ;kb_itr_next(arbmark_T, buf->b_arbmarks_tree; &itr)){
+
+static arbmark_T *_find_pos(pos_t pos, bool FORWARD);
+static int _arbmark_create(buf_T *buf, cstr_t *name, pos_T *pos);
+static int _arbmark_update(buf_T *buf, arbmark_T *arbmark, pos_T *pos);
+static int _arbmark_delete(buf_T *buf, cstr_t *name);
@ZyX-I
ZyX-I Jul 10, 2016 Contributor

Starting static functions with underscore is a useless action, making them static is enough.

@marvim marvim added the WIP label Jul 10, 2016
@ZyX-I ZyX-I and 1 other commented on an outdated diff Jul 10, 2016
test/functional/api/arbmark_spec.lua
@@ -0,0 +1,18 @@
+-- Sanity checks for buffer_* API calls via msgpack-rpc
@ZyX-I
ZyX-I Jul 10, 2016 Contributor

api/ is for RPC calls or events, this should probably be directly in functional/. Also new tests are normally written with one variable per line, not local clear, nvim, buffer = …. And I would suggest to never use nvim, curbuf, curwin and avoid eval: nvim('meth', …) is meths.meth(…), same for curbuf and curwin (curbufmeths, curwinmeths), nvim('eval', 'arbmark_index(haha)') is funcs.arbmark_index('haha') (and note that you are trying to refer to variable haha in your code, I guess you actually wanted string 'haha').

@justinmk
justinmk Jul 10, 2016 Member

Tests shouldn't live in functional/ top level, how about functional/viml.

(dict_notifications_spec.lua should be moved to functional/viml ...)

@ZyX-I
ZyX-I Jul 10, 2016 Contributor

I have no idea what is supposed to be placed into functional/viml. dict_notifications_spec should be probably in eval. viml/function_spec should be in eval (and I have no idea what api_info is doing in this file, I would place it separately because function_spec is too generic; this is fine if you are testing generic function-related features like MAX_FUNC_ARGS handling, but this is no place for testing specific functions), as well as viml/errorlist_spec: other functions are tested there. Not sure about viml/lang_spec and viml/completion_spec: first may be in eval, second is such a feature that can have its own directory.

@justinmk
justinmk Jul 11, 2016 Member

Moving viml/* to eval/ is fine with me. I think they were created by coincidence long ago.

@ZyX-I ZyX-I commented on an outdated diff Jul 10, 2016
src/nvim/editor/arbmark.c
+ * There is also no ex_command, just viml functions
+ */
+
+#include "nvim/vim.h"
+#include "nvim/globals.h"
+#include "nvim/mark.h"
+#include "nvim/memory.h"
+#include "nvim/map.h"
+#include "nvim/lib/kvec.h"
+#include "nvim/editor/arbmark.h"
+
+# define FOR_ALL_ARBMARKS(buf) \
+ kbitr_t = itr; \
+ arbmark_T = *arbmark; \
+ for (;kb_itr_valid(&itr) \
+ ;kb_itr_next(arbmark_T, buf->b_arbmarks_tree; &itr)){
@ZyX-I
ZyX-I Jul 10, 2016 Contributor

; &itr? Code looks like you meant comma there.

@mhinz mhinz referenced this pull request Jul 10, 2016
Open

extended marks #4816

@ZyX-I ZyX-I commented on an outdated diff Jul 11, 2016
src/nvim/editor/arbmark.c
+# define FOR_ALL_ARBMARKS(buf) \
+ kbitr_t = itr; \
+ arbmark_T = *arbmark; \
+ for (;kb_itr_valid(&itr) \
+ ;kb_itr_next(arbmark_T, buf->b_arbmarks_tree; &itr)){
+
+static arbmark_T *_find_pos(pos_t pos, bool FORWARD);
+static int _arbmark_create(buf_T *buf, cstr_t *name, pos_T *pos);
+static int _arbmark_update(buf_T *buf, arbmark_T *arbmark, pos_T *pos);
+static int _arbmark_delete(buf_T *buf, cstr_t *name);
+static arbmark_T *get_arbmark(buf_T *buf, cstr_t *name);
+static int pos_lt(pos_T *pos, pos_T *pos2);
+static int pos_eq(pos_T *pos, pos_T *pos2);
+
+/* Create or update an arbmark, */
+int arbmark_set(buf_T *buf, char_u *name, pos_T *pos)
@ZyX-I
ZyX-I Jul 11, 2016 Contributor

BTW, this ought to be char *. char_u * is deprecated, for strings you should use char *, for sequence of numbers uint8_t * (though mostly need to cast individual characters to uint8_t when appropriate). And, I guess, this should be const char *.

@ZyX-I ZyX-I commented on an outdated diff Jul 11, 2016
src/nvim/editor/arbmark.h
@@ -0,0 +1,27 @@
+#ifndef NVIM_ARBMARK_H
+#define NVIM_ARBMARK_H
+
+/* #include "nvim/vim.h" */
+/* #include "nvim/lib/kvec.h" */
+
+#define ARBMARK_MAXLEN 24
+#define pos_cmp(a, b) (_pos_cmp((a).fmark.mark, (b).fmark.mark))
+
+typedef struct {
+ fmark_T fmark;
+ fmark_T *next;
+ fmark_T *prev;
+} arbmark_T;
@ZyX-I
ZyX-I Jul 11, 2016 Contributor

CamelCase, _T suffix is legacy. Given the rename I guess this should be ExtendedMark.

@timeyyy timeyyy changed the title from [WIP] arbitrary marks, to [WIP] Extended Marks Jul 11, 2016
@bfredl bfredl commented on an outdated diff Jul 11, 2016
src/nvim/api/mark_extended.c
+static int extmark_update(ExtendedMark *extmark, pos_T *pos)
+{
+ extmark->mark.lnum = pos->lnum;
+ extmark->mark.col = pos->col;
+ return OK;
+}
+
+static int extmark_delete(buf_T *buf, cstr_t *name)
+{
+ pmap_del(ptr_t)(buf->b_extmarks, name);
+ return OK;
+}
+
+// TODO start with current buffer
+// TODO use a lru cache,
+buf_T *extmark_buf_from_fnum(int fnum)
@bfredl
bfredl Jul 11, 2016 Member

after #4934 find_buffer_by_handle can be used for this

@ZyX-I ZyX-I and 2 others commented on an outdated diff Jul 11, 2016
src/nvim/api/mark_extended.c
+#include "nvim/memory.h"
+#include "nvim/map.h" // pmap ...
+#include "nvim/lib/kbtree.h" // kbitr ...
+#include "nvim/api/mark_extended.h"
+
+#ifdef INCLUDE_GENERATED_DECLARATIONS
+# include "api/extmark.c.generated.h"
+#endif
+
+#define FOR_ALL_EXTMARKS(buf) \
+ kbitr_t itr; \
+ ExtendedMark *extmark; \
+ for (;kb_itr_valid(&itr); kb_itr_next(ExtendedMark, buf->b_extmarks_tree, &itr)){
+
+/* Create or update an extmark, */
+int extmark_set(buf_T *buf, cstr_t *name, pos_T *pos)
@ZyX-I
ZyX-I Jul 11, 2016 Contributor

This is rather strange to see cstr_t anywhere, but where it is absolutely required because char * cannot be part of the function/etc name. It should be char ** if you really meant cstr_t * (cstr_t is an alias to char *, not char), just char * otherwise.

@timeyyy
timeyyy Jul 11, 2016

this name is going to be used as the key for the hash table, would you prefer i cast it to cstr just before using it as a key? Is it prefered that the function accept char * or cstr_t ?

@ZyX-I
ZyX-I Jul 11, 2016 Contributor

You don’t need to cast char * to cstr_t. Function should accept char *, cstr_t exists purely for technical reasons.

@ZyX-I
ZyX-I Jul 11, 2016 Contributor

I actually have no idea why *Map interface exists, khash.h is already fine and it does not require you to query hash twice to do “if KEY is in hash return VAL, otherwise put (KEY, VAL) to hash” . Also one map_new is two small mallocs alone, while my small changes to khash.h allowed to put hash on stack (or e.g. directly as a part of some structure, not as a pointer).

My code never uses map.h.

@bfredl
bfredl Jul 11, 2016 Member

“if KEY is in hash return VAL, otherwise put (KEY, VAL) to hash”

map_ref(map, key, true) supports this in most cases. (more precisely: it won't work when one wants to distinguish a nonexisting key from the initializer value, but one could easily add a bool return value to signal this if necessary)

@ZyX-I
ZyX-I Jul 11, 2016 Contributor

@bfredl Last part (nonexisting vs initializer) is exactly what I mean. Also this does not remove issue with double allocations vs no allocations and useless indirection (existing kh API is already fine, also allocated struct containing a single pointer value which is also allocated?!).

@ZyX-I
ZyX-I Jul 11, 2016 Contributor

I also have an impression that what I already said was not the only issue with map interface, but cannot remember anything else. Perhaps one of the issues was that public API is too complex: when you declare a new hash using KHASH_… you get nice names like kh_init_foo. When you declare a new hash using map interface you need to use map_new(Type1, Type2). Worse, you need to typedef Type1 and Type2 in some cases I would not typedef and this is absolutely not needed when using khash.

@bfredl
bfredl Jul 11, 2016 Member

We could refactor the Map wrapper functions (I think some of them are common enough to justify a wrapper, like putting a value in a single function (or macro) call) to operate directly on the khash_t types, not on the superfluous wrapper types.

@ZyX-I
ZyX-I Jul 11, 2016 Contributor

@bfredl There is no need in wrapper functions, just use khash directly. When needed khash.h can be adjusted, ability to work with values allocated on stack was my addition when I asked myself why should I need to allocate a hash struct itself.

Though there is additionally one thing I do not understand in khash.h: why keys/flags/vals and not one struct{key, flag, val} items? My best guess is that this has something to do with CPU cache, three allocations are definitely not going to be faster then one.

@bfredl
bfredl Jul 12, 2016 Member

I did't say wrappers must be used - just that some operations are common enough to justify it, ie a wrapper kh_putval(type, somestruct->map, key, val) instead of kh_val(somestruct->map, kh_put(type, somestruct->map, key, &ret) = val. This can just be inline functions or macros, no need for a separate compilation unit map.c.

My best guess is that this has something to do with CPU cache, three allocations are definitely not going to be faster then one.

The vectors are going to be used many times for each (re)allocation (though in principle a struct-of-arrays could also be allocated using a single alloc just fine, if one just places the most aligned type first) and as flags and keys will be accessed a lot more than values in lookup it is often beneficial to keep them close together. (though an another option is to keep the keys and values array compact, and have the sparse hash table just have indicies to them, like pypy:s implementation, though the approaches could also be combined )

@timeyyy
timeyyy commented Jul 11, 2016

having the file name mark_extended but everything else extmark and the struct ExtendedMark, is a bit annoying, because it messes up my fuzzy finding habits. maybe markext for internal references?

@justinmk
Member

markext is fine with me. But as a guideline, tooling quirks should not drive decisions about the codebase.

@timeyyy
timeyyy commented Jul 12, 2016

in buffer_defs.h the macro KBTREE_INIT line gives me a whole bunch of warnings. How can i fix/block/stop this.

../src/nvim/buffer_defs.h: In function ‘kb_init_cstr_t’:
../src/nvim/lib/kbtree.h:66:17: warning: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Wsign-conversion]
   b->t = ((size - 4 - sizeof(void*)) / (sizeof(void*) + sizeof(key_t)) + 1) >> 1; \
                 ^
../src/nvim/lib/kbtree.h:366:2: note: in expansion of macro ‘__KB_INIT’
  __KB_INIT(name, key_t)      \
  ^
../src/nvim/buffer_defs.h:122:1: note: in expansion of macro ‘KBTREE_INIT’
 KBTREE_INIT(cstr_t, ExtendedMark, pos_cmp);
 ^
../src/nvim/lib/kbtree.h:66:77: warning: conversion to ‘int’ from ‘unsigned int’ may change the sign of the result [-Wsign-conversion]
   b->t = ((size - 4 - sizeof(void*)) / (sizeof(void*) + sizeof(key_t)) + 1) >> 1; \
                                                                             ^
@ZyX-I
Contributor
ZyX-I commented Jul 12, 2016

@timeyyy All such issues are resolved by casting, you need to cast something to the type of b->t.

@ZyX-I
Contributor
ZyX-I commented Jul 12, 2016 edited

@timeyyy You should use -Wconversion -Werror, this way your simple example will not work. And you in any case need to modify it, it is not passing lint checks.

@ZyX-I
Contributor
ZyX-I commented Jul 12, 2016

@timeyyy -Wconversion forces you to write casts in code which compiler would normally guess by itself. It is enabled because compiler sometimes guesses wrong.

@ZyX-I ZyX-I commented on the diff Jul 12, 2016
src/nvim/buffer_defs.h
@@ -33,6 +33,7 @@ typedef struct file_buffer buf_T; // Forward declaration
#define MODIFIABLE(buf) (!buf->terminal && buf->b_p_ma)
+
@ZyX-I
ZyX-I Jul 12, 2016 Contributor

Too many blank lines.

@ZyX-I ZyX-I commented on an outdated diff Jul 12, 2016
src/nvim/mark.h
@@ -8,6 +8,7 @@
#include "nvim/memory.h"
#include "nvim/pos.h"
#include "nvim/os/time.h"
+/* #include "nvim/ex_cmds_defs.h" // for exarg_T */
@ZyX-I
ZyX-I Jul 12, 2016 Contributor

?

@ZyX-I ZyX-I commented on an outdated diff Jul 12, 2016
src/nvim/mark_extended.c
+ buf_T *buf;
+ FOR_ALL_BUFFERS(buf){
+ if (fnum == buf->fnum){
+ return buf;
+ }
+ }
+ return buf;
+}
+
+/* returns an extmark from it's buffer*/
+static ExtendedMark *get_extmark(buf_T *buf, char *name)
+{
+ return pmap_get(ExtendedMark)(buf->b_extmarks, name);
+}
+
+int _pos_cmp(pos_T a, pos_T b)
@ZyX-I
ZyX-I Jul 12, 2016 Contributor

This is FUNC_ATTR_CONST FUNC_ATTR_WARN_UNUSED_RESULT.

@ZyX-I ZyX-I commented on an outdated diff Jul 12, 2016
src/nvim/mark_extended.c
+}
+
+int _pos_cmp(pos_T a, pos_T b)
+{
+ if (pos_lt(a, b) == OK){
+ return -1;
+ }
+ else if (pos_eq(a, b) == OK){
+ return 0;
+ }
+ else {
+ return 1;
+ }
+}
+
+static int pos_lt(pos_T *pos, pos_T *pos2){
@ZyX-I
ZyX-I Jul 12, 2016 Contributor

And this should be bool and return true/false, not int with OK/FAIL. OK/FAIL is for statuses, it is not a failure if e.g. two positions are equal.

@ZyX-I
ZyX-I Jul 12, 2016 edited Contributor

Also I see no reason to have this and pos_eq as a separate functions since they are not used anywhere, pos_cmp is enough.

@timeyyy
timeyyy commented Jul 12, 2016

after changing all sizeof to (int)sizeof, i'm getting this one, not sure how to handle it

                b->root = (kbnode_t*)calloc(1, b->ilen);                                                \
                                     ~~~~~~    ~~~^~~~
test.c:10:1: error: operand of ? changes signedness: 'int' to 'unsigned int' [-Werror,-Wsign-conversion]
@ZyX-I
Contributor
ZyX-I commented Jul 12, 2016

@timeyyy You should really use clang, it reports correct types. The unsigned int in question is size_t and clang would report “warning: implicit conversion changes signedness: 'int' to 'size_t' (aka 'unsigned long') [-Wsign-conversion]” (unsigned int vs unsigned long depends on your system, with -m32 (compiling for 32-bit systems) it would be “(aka 'unsigned int')”). Also code should not use calloc or malloc, it should use xcalloc and xmalloc (do not forget to include nvim/memory.h): normally Neovim is using jemalloc in place of libc malloc which is faster. Casting [x]calloc/[x]malloc return value is not needed. __restrict should be restrict.

@timeyyy
timeyyy commented Jul 12, 2016

@ZyX-I i am using clang mm, i am testing kbtree.h just running clang -Wconversion -Werror test.c with the example included with kbtree.

@ZyX-I
Contributor
ZyX-I commented Jul 12, 2016

@timeyyy Strange, which clang version? I have 3.8.0, but I do not remember it ever reporting something other then size_t (aka 'unsigned long').

Though this may be a header problem: clang reports size_t here because glibc header declares calloc with size_t. If glibc header used unsigned int or happened to use #define where normal people use typedef, it would report unsigned int.

@timeyyy
timeyyy commented Jul 16, 2016 edited

I'm not sure why this is happening...

-- Build files have been written to: /home/alfa/Documents/git/neovim/build
[100/103] Linking C executable bin/nvim
FAILED: : && /usr/bin/cc  -Wconversion -g  -Wl,--no-undefined src/nvim/CMakeFiles/nvim.dir/__/__/config/auto/pathdef.c.o src/nvim/CMakeFiles/n

...

src/nvim/CMakeFiles/nvim.dir/auto/msgpack_dispatch.c.o: In function `kb_init_extmarks':
/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.h:19: multiple definition of `kb_init_extmarks'
src/nvim/CMakeFiles/nvim.dir/__/__/config/auto/pathdef.c.o:/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.h:19: first defined
 here
src/nvim/CMakeFiles/nvim.dir/api/buffer.c.o: In function `kb_init_extmarks':
/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.h:19: multiple definition of `kb_init_extmarks'
src/nvim/CMakeFiles/nvim.dir/__/__/config/auto/pathdef.c.o:/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.h:19: first defined
 here

...
@bfredl bfredl commented on an outdated diff Jul 16, 2016
src/nvim/api/buffer.c
@@ -636,6 +637,185 @@ ArrayOf(Integer, 2) buffer_get_mark(Buffer buffer, String name, Error *err)
return rv;
}
+/// Return a tuple (row,col) representing the position of the named mark
+///
+/// @param buffer The buffer handle
+/// @param name The mark's name
+/// @param[out] err Details of an error that may have occurred
+/// @return The (row, col) tuple
+ArrayOf(Integer, 2) mark_index(Buffer buffer, String name, Error *err)
@bfredl
bfredl Jul 16, 2016 Member

Please prefix api functions in buffer.c with buffer_ (Or, as I expect #4934 to be merged before this, you can just use nvim_buf_ upfront, but then you will need to write tests with helpers.request for the moment until #4934 gets merged)

@bfredl bfredl commented on an outdated diff Jul 17, 2016
src/nvim/api/buffer.c
+
+ return rv;
+}
+
+//TODO allos position index or string index
+// If the index is in numeric form, the method returns the first mark at that position. If the index is a mark, the method returns the next mark following that mark, which may be at the same numerical position.
+
+/// Returns the position of the mark following the given mark
+/// name; if there are no following marks, the method returns
+/// an empty list.
+///
+/// @param buffer The buffer handle
+/// @param name The mark's name
+/// @param[out] err Details of an error that may have occurred
+/// @return The (row, col) tuple
+ArrayOf(Integer, 2) mark_next(Buffer buffer, String name, Error *err)
@bfredl
bfredl Jul 17, 2016 Member

While this is a good api for consumers (vimscript, lua) within the nvim process, for remote plugins it could be too slow as it requires a roundtrip (two context switches) per mark it passes by. I would imagine a function nvim_buf_get_extmarks(buffer, start, end, which) which would return marks in some range [start,end] where the boundaries ideally could be either explicit positions (including the start/end of the entire buffer), or the name of marks, and one could request to either get the N first or N last or all marks in that range. (which will cover the ability of these functions)

@bfredl
Member
bfredl commented Jul 17, 2016

Also, while out of scope of this PR, some time later I would imagine all api functions that take a line number or a position (line,col) pair to be polymorphic and also take

  • the name of existing marks 'a and named positions like the cursor position .
  • the name of an extended mark
  • the numerical id of an unnamed extended mark*

* I would imagine many plugins have no reason to name their private marks and would be forced to come up with dummy myfancyplugin0/myfancyplugin1 etc names. Ideally they will be able to pass the empty string to mark_set and get a unique numerical ID in return which could be passed in everywhere the name of an extended mark is expected.

It would also be useful to reuse the binary tree defined here for other kind of marks, like bufhl, manual folds etc. As I said likely out of scope of this PR (which could focus on delivering a useful MVP), but I'm pointing it out here as it might (or might not) influence the design.

@timeyyy
timeyyy commented Jul 17, 2016

At the moment I am following the api for tk. Text tags have more functionality than text marks. Theyvprivide the functionalit you mentioned and more. The reason youbwould use tags for this is because they live in there own namespace, you can get a tag range for a specific tag type, while with marks you are limited to the one namespace. If later on one wanted ranges of different types of marks you have to switch to the tag api or handle some mapping yourself. That might be a reason the api doesnt exist, if people want I can add the range functionality to marks as well.

@timeyyy
timeyyy commented Jul 17, 2016

The btree is from klib and is therefore already reusable. On that note I am a bit stuck on initiliazing. Im not very good at macros... (see error above)

Yes I added a todo for the input of row/col, polymorphic sounds good

@bfredl
Member
bfredl commented Jul 17, 2016

The btree is from klib and is therefore already reusable.

I meant to reuse one single single btree per buffer for marks in the wider sense (anything that moves with line insertions). In the end, one should be able to ask "what kinds of marks exist on this line" without traversing 3 or 4 different btrees.

Tags would be interesting, but for now I just meant to be able to get all marks within a range, instead of needing to iterate one mark at a time using next_mark.

@bfredl
Member
bfredl commented Jul 17, 2016

The linker error seems to be that kbtree, unlike khash, is not split up into a (static inline) "declare" part to include in a .h file an a non-static "init" part to include in a .c file. The quick fix is to add a "static inline" in the beginning of line lib/kbtree.h:37 (kb_init), which makes everything static.

@timeyyy
timeyyy commented Jul 18, 2016

Stuck yet again :S,

In mark_extended_spec.lua, after using helpers.insert to insert text into the buffer. (even after removing all code for moving the extmarks).

The next call to get a pointer to the mark will return NULL.

I have made a buffer_mark_test in src/api/buffer.c that throws the error when called @ line35 in the test.

@bfredl
Member
bfredl commented Jul 18, 2016

ASAN says use-after-free of the map key, you need to allocate memory for the key:

--- a/src/nvim/mark_extended.c
+++ b/src/nvim/mark_extended.c
@@ -106,7 +106,7 @@ static int extmark_create(buf_T *buf, char *name, int row, int col)
   extmark->fmark.mark.lnum = row;
   extmark->fmark.mark.col = col;
   kb_put(extmarks, buf->b_extmarks_tree,  *extmark);
-  pmap_put(cstr_t)(buf->b_extmarks, (cstr_t)name, extmark);
+  pmap_put(cstr_t)(buf->b_extmarks, xstrdup(name), extmark);
   // TODO do we need the timestamp and additional_data ??, also pos_t has 3 fields
   SET_FMARK(&extmark->fmark, extmark->fmark.mark, buf->b_fnum);

If you can't run ASAN locally I would recommend fixing all warnings in the ASAN travis build so it can run the test and give useful information.

@timeyyy
timeyyy commented Jul 19, 2016

I was thinking to introduce a namespace for the marks. This would mean bufhl and tags/ plugins create one mark that needs to be moved per position.

when doing mark_next / mark_names etc,
The results could also be filtered namespaces as well.

If we did not do this, queryibg marks will always give all marks, and not just those that a particular user of the api cares about.

@bfredl
Member
bfredl commented Jul 19, 2016

Yes, that would be very useful. That would in fact improve performance of "numerical marks". an rplugin could then increment a private counter itself, and doadd_extmark("myFancyPlugin",0,...), add_extmark("myFancyPlugin",1,...) etc without needing to wait for it to return an unique id.

@timeyyy
timeyyy commented Jul 19, 2016

@bfredl, is there a reason why i could not run ASAN locally? The linker fails with some mf_read mf_write`

Do you mean the travis build for ASAN is broken in general or for my pr i need to fix all the warnings?

@bfredl
Member
bfredl commented Jul 19, 2016

Not in general, you just need to fix these warnings:

/home/travis/build/neovim/neovim/src/nvim/api/buffer.c:684:1: error: control may
      reach end of non-void function [-Werror,-Wreturn-type]
}
^
/home/travis/build/neovim/neovim/src/nvim/api/buffer.c:805:45: error: implicit
      conversion loses integer precision: 'Integer' (aka 'long') to 'int'
      [-Werror,-Wshorten-64-to-32]
  rv = (Integer)extmark_set(buf, name.data, row,  col);
                ~~~~~~~~~~~                 ^~~
/home/travis/build/neovim/neovim/src/nvim/api/buffer.c:805:51: error: implicit
      conversion loses integer precision: 'Integer' (aka 'long') to 'int'
      [-Werror,-Wshorten-64-to-32]
  rv = (Integer)extmark_set(buf, name.data, row,  col);
@bfredl
Member
bfredl commented Jul 19, 2016

weird, mf_write is static in memfile.c which you have not changed. could you post the entire error message?

@timeyyy
timeyyy commented Jul 19, 2016

@brefdl, i sure can, what i have done is
rm build
CC=clang ...
are there any other pre make stuff i have to do?

@bfredl
Member
bfredl commented Jul 19, 2016

Should be fine as Sanitizers have been enabled; don't use jemalloc. is in the cmake output. Clang 3.4 sounds a bit old, but as travis is using it... As __mulodi4 is an emulation of a 64-bit instruction I guess you are on 32-bit? (unlike the asan build on travis which is 64-bit) If possible I would still have tried with a later clang version.

@timeyyy
timeyyy commented Jul 19, 2016

ubuntu 14.04 32bit. That is the latest clang in th repos.. I will try on a 64bit os then..

@timeyyy
timeyyy commented Jul 22, 2016

what is the recommended way to check if a map is empty? Current implementation is using map.c

@mhinz
Member
mhinz commented Jul 22, 2016 edited

There's seems to be a kh_size(hash) in lib/khash.h. It returns a khint_t which is either an int or a long.

@timeyyy
timeyyy commented Jul 22, 2016 edited

ah because of map.h i had to dokh_size(map.table)

@timeyyy
timeyyy commented Jul 22, 2016

i am trying to declare a map to a map in map.h

...
MAP_DECLS(linenr_T, bufhl_vec_T)

typedef PMap(cstr_t) StringMap;
MAP_DECLS(cstr_t, StringMap)

however i get warning on make.
i have included map.h

/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.c:158: undefined reference to `map_cstr_t_StringMap_new'
/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.c:163: undefined reference to `map_cstr_t_StringMap_get'
/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.c:168: undefined reference to `map_cstr_t_StringMap_put'
src/nvim/CMakeFiles/nvim.dir/mark_extended.c.o: In function `extmark_delete':
/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.c:198: undefined reference to `map_cstr_t_StringMap_get'
src/nvim/CMakeFiles/nvim.dir/mark_extended.c.o: In function `extmark_get':
/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.c:219: undefined reference to `map_cstr_t_StringMap_get'
src/nvim/CMakeFiles/nvim.dir/mark_extended.c.o: In function `extmark_free_all':
/home/alfa/Documents/git/neovim/build/../src/nvim/mark_extended.c:247: undefined reference to `map_cstr_t_StringMap_free'
@bfredl
Member
bfredl commented Jul 22, 2016

You need to make the corresponding definition at the end of map.c.

@timeyyy
timeyyy commented Jul 26, 2016 edited

in mark_extended.c line 180 wiill never get called when i compile with ASAN.

The test will work properly and get quite a bit further if i don't use ASAN.

The problem is on the second call to kb_put things just go sideways

I wasn't having this problem before i introduced the namespace stuff (ns)

a call to mark_unset will raise the same error when kb_del is triggered

test/functional/api/mark_extended_spec.lua:40: Expected objects to be the same.
Passed in:
(nil)
Expected:
(number) 1

stack traceback:
        test/functional/api/mark_extended_spec.lua:40: in function <test/functional/api/mark_extended_spec.lua:20>

a call to `mark_unset` will trigger `kb_del` which also blows up with the same alignment error
-- Output to stderr:
../src/nvim/mark_extended.h:60:1: runtime error: member access within misaligned address 0x61400000f444 for type 'ExtendedMark' (aka 'struct
 ExtendedMark'), which requires 8 byte alignment
0x61400000f444: note: pointer points here
  02 00 00 00 90 61 00 00  20 60 00 00 00 00 00 00  00 00 00 00 02 00 00 00  be be be be be be be be
              ^
SUMMARY: AddressSanitizer: undefined-behavior ../src/nvim/mark_extended.h:60:1 in
@bfredl bfredl and 1 other commented on an outdated diff Jul 26, 2016
src/nvim/mark_extended.h
@@ -0,0 +1,65 @@
+#ifndef NVIM_EXTMARK_H
+#define NVIM_EXTMARK_H
+
+#include "nvim/mark_extended_defs.h"
+#include "nvim/lib/kbtree.h"
+#include "nvim/lib/kvec.h"
+#include "nvim/map.h"
+
+typedef PMap(cstr_t) StringMap2; // TODO do these have to be seperate?
+
+typedef struct ExtendedMark ExtendedMark;
+struct ExtendedMark {
+ StringMap2 *names; // key = namespace, value = id
+ fmark_T fmark;
@bfredl
bfredl Jul 26, 2016 Member

why not just pos_T mark ?

@bfredl bfredl and 1 other commented on an outdated diff Jul 26, 2016
src/nvim/mark_extended.h
@@ -0,0 +1,65 @@
+#ifndef NVIM_EXTMARK_H
+#define NVIM_EXTMARK_H
+
+#include "nvim/mark_extended_defs.h"
+#include "nvim/lib/kbtree.h"
+#include "nvim/lib/kvec.h"
+#include "nvim/map.h"
+
+typedef PMap(cstr_t) StringMap2; // TODO do these have to be seperate?
+
+typedef struct ExtendedMark ExtendedMark;
+struct ExtendedMark {
+ StringMap2 *names; // key = namespace, value = id
@bfredl
bfredl Jul 26, 2016 Member

not sure what this map is for?

@timeyyy
timeyyy Jul 26, 2016

the map is used for holding all of the names that a mark might have.

mark_set('fancyplugin', 'mark1' ..)
mark_set('morefancyplugin', 1 ..)

the two calls above will create only one mark (less memory, less stuff to update)
where mark->names = {fancyplugin: mark1, morefancyplugin: 1}
On querying all the marks in a range, we just have to check if the namespace exists for that mark.
afaik maps are the fastest for doing this check?

@bfredl
bfredl Jul 26, 2016 Member

Not sure, the common case I assume would be a position having just one or two marks, it seems wrong optimize specifically for the case of a position having plenty of marks. Also it seems semantically wrong: what if you have "text[ns1:mark1]moretext[ns1:mark2]yetmore"and delete moretext? now "ns1" would expect to find both mark1 and 2 at the same position, which this representation forbids.
For now, I would suggest just storing these as separate objects. If it later turns out that multiple marks at the same position are common, we can try the solution: store the first (and maybe second) name directly in the same struct, allocate an array as soon as two(three) or more marks appears on the same position.

@timeyyy
timeyyy Jul 26, 2016

preoptimization at work i guess :(, the representation doesn't forbit it, if two marks are on the same position they can still be found/deleted/used correctly. Seperate objects it is.

@bfredl
bfredl Jul 26, 2016 Member

the representation doesn't forbit it,

Technically no, but it would just be needless complexity to manage both levels (multiple "physical" marks per position, multiple "logical" marks per "physical" mark)

@bfredl bfredl and 1 other commented on an outdated diff Jul 26, 2016
src/nvim/mark_extended.c
+ /* if (!ns_map || !kh_size(ns_map->table)) { */
+ ns = xstrdup(ns);
+ if (!ns_map) {
+ ns_map = pmap_new(cstr_t)();
+ pmap_put(cstr_t)(buf->b_extmarks, (cstr_t)ns, ns_map);
+ }
+
+ name = xstrdup(name);
+ ExtendedMark *extmark = (ExtendedMark *)xmalloc(sizeof(ExtendedMark));
+ /* extmark->fmark.mark = gen_relative(buf, ns, row, col); */
+ extmark->fmark.mark.lnum = row;
+ extmark->fmark.mark.col = col;
+ extmark->names = pmap_new(cstr_t)();
+ pmap_put(cstr_t)(extmark->names, (cstr_t)ns, &name);
+
+ kb_put(extmarks, buf->b_extmarks_tree, *extmark);
@bfredl
bfredl Jul 26, 2016 Member

It seems strange to put extmark by value here. the extmark stored in the tree will be a separate object than the one referred to in the name map. (though if the kb_tree storage of keys is stable we could use pointers into the structure, but for now let's use a separate heap allocation)

@timeyyy
timeyyy Jul 26, 2016

i have tried to use pointers with kbtree but it seems the current implementation requires a value, can we "work" that?

@bfredl
bfredl Jul 26, 2016 Member

Here is the thing: a pointer is also a value. The kb_tree api is not very convenient for this use case, but it works. Here is a sketch: (I have not tested it)

typedef struct {
    pos_T key;
    ... more data ..
} elem_t;

#define elem_cmp(a, b) (pos_cmp(a->key, b->key))
KBTREE_INIT(test, elem_t, elem_cmp)

static elem_t *insert_elem(bktree_t(test) *b, pos_T pos, bool* is_new) {
    elem_t t, *p , **pp;
    t.key = pos;
    pp = kb_get(test, b, &t);
    // IMPORTANT: put() only works if key is absent
    if (pp) {
        *is_new = false;
        return *pp;
    }
    p = xmalloc(sizeof(*p));
    p->key = pos;
    kb_put(test, b, p);
    *is_new = true;
    return p;
}
@bfredl
Member
bfredl commented Jul 26, 2016 edited

Also I'm not sure we want two nested string maps. I would suggest, while namespaces having string names, marks in a namespace will just be identified by an integer id (managed by the plugin that defined the namespace). Then the plugin can associate with the mark id whatever it wants, a string or some other arbitrary data.

@bfredl bfredl commented on an outdated diff Jul 28, 2016
src/nvim/mark_extended.c
+ if (!ns_obj) {
+ ns_obj = (ExtmarkNamespace *)xmalloc(sizeof(ExtmarkNamespace));
+ ns_obj->map = pmap_new(cstr_t)();
+ ns_obj->tree = kb_init(extmarks, KB_DEFAULT_SIZE);
+ pmap_put(cstr_t)(buf->b_extmark_ns, (cstr_t)ns, ns_obj);
+ }
+
+ name = xstrdup(name);
+ ExtendedMark *extmark = (ExtendedMark *)xmalloc(sizeof(ExtendedMark));
+ /* extmark->mark = gen_relative(buf, ns, row, col); */
+ extmark->mark.lnum = row;
+ extmark->mark.col = col;
+ extmark->name = name;
+
+ kb_put(extmarks, buf->b_extmarks, extmark);
+ kb_put(extmarks, ns_obj->tree, extmark);
@bfredl
bfredl Jul 28, 2016 Member

not sure what the second put is for. Seems redundant to a have a tree per namespace.

@timeyyy
timeyyy commented Jul 28, 2016

The btree lets us peform range look ups efficently.

When we had the namespace to id map stored on the mark , I was doing a range lookup in all the marks and then checking if the ns existed for that mark.

@bfredl
Member
bfredl commented Jul 28, 2016

I'm not sure that would be worth it. Cache locality is as important as "algoritmic" complexity. spreading infromation about a buffer section over many trees will need to update all trees when doing position adjustment, spread out over many memory locations. If we only have one tree, the entire part of that one tree corresponding to the part of the buffer being edited will be hot in cache, and so namespaced lookups in the common case will be efficient even if it needs to loop through marks not in that namespace.

@bfredl
Member
bfredl commented Jul 28, 2016

I'm not sure if it is related to the error you get, but kbtree (unlike khash) requires you to check whether the key is present with get before attempting a put, as sketched in the example before, otherwise the btree will break.

Also to learn more about kbtree I tried to use it for bufhl. It is not working fully yet; adjustment is broken in some cases, as I think is not safe to delete keys in the middle of an iteration. What would be needed is a kb_del_itr which would delete the key at the iterator and correctly update the iterator to point at the key after the deleted key. Also kbtree doesn't seem to allow you to start an iteration at some position, you have to iterate from the beginning. Anyway the code is here. extmark could use the same structure to store multiple marks at the same position (or even the same line like bufhl does) in an array (ideally namespaced marks would share the same tree with bufhl (and signs etc) by making the array entry a tagged union between the different kinds.)

@timeyyy
timeyyy commented Jul 28, 2016 edited

Thanks, I will take a look.

How would you suggest we perform mark_nextrange('myplugin', 12, [1, 0], [5, 0])

if we can only have one mark per position in the btree. We will have to do the logic for insert/moving anyway, Should i bring back the map on the marks which stores which namespaces and names?

@bfredl
Member
bfredl commented Jul 29, 2016

Should i bring back the map on the marks which stores which namespaces and names?

Just store them in a kvec like bufhl. This is both more performant in the common case and allows multiple marks per namespace at the same position. We should intern the namespace names (we can use the keys in the namespace map for this) so strcmp is never done in a loop, just a pointer compare per element. (it also saves needless memory allocation).

How would you suggest we perform mark_nextrange('myplugin', 12, [1, 0], [5, 0])

iterate over the region (we need to able to start the iterator at an arbritary position, the latest commit of my branch adds kb_get_itr which does that), emit all marks belonging to to 'myplugin', ignore the rest. I suppose a plugin in the common case will either extract marks from a quite small region (say the visible screen or a visually selected region), or in some cases the entire buffer, and in this case we can just iterate over the namespace map.

@bfredl bfredl commented on an outdated diff Jul 29, 2016
src/nvim/mark_extended.h
+#include "nvim/lib/kbtree.h"
+#include "nvim/lib/kvec.h"
+#include "nvim/map.h"
+
+/* typedef struct ExtendedMark ExtendedMark; */
+typedef struct ExtendedMark {
+ char *name;
+ pos_T mark;
+} ExtendedMark, *ExtendedMarkPtr;
+
+#define extmark_pos_cmp(a, b) (pos_cmp((a)->mark, (b)->mark))
+
+/* prev pointer is used for accessing the previous extmark iterated over */
+/* TODO there should be a better way to get this.. maybe build it in to kbtree */
+/* exists checks if the extmark is there for a particular buffer namespace */
+#define FOR_ALL_EXTMARKS_WITH_PREV(buf) \
@bfredl
bfredl Jul 29, 2016 Member

it seems that we want kb_itr_prev, shouldn't be too hard to extrapolate from kb_itr_next.

@bfredl
Member
bfredl commented Jul 30, 2016

Oh and I forgot to say, the alignment error could also be because kbtree does place keys at alignment 4 even if 8 is needed. A modern cpu will not mind this but it is UB and ASAN will complain about it. You could try to fix by changing the magic 4:s in kb_init and KB_KEY to 8, otherwise I will fix it when I get back to a computer in a few days.

@jamessan
Member

The x86 family of processors handle unaligned memory access for the programmer, but some other architectures don't so you'll get a SIGBUS when performing unaligned memory access.

@justinmk
Member

Is kb_tree stable enough for the platforms we target? Some klib modules are not very battle tested...

@timeyyy
timeyyy commented Jul 30, 2016 edited

@bfredl

Just store them in a kvec like bufhl. This is both more performant in the common case and allows multiple marks per namespace at the same position. We should intern the namespace names

So each mark will have a kvec of tuples? [(myfancy, 1), (myfancy, 2), (superfancy, 1)]

This is a summary of the implemtation so far, i still haven't got my head around how to do moves efficiently, Can that be baked into this implementaion without alot of refactoring?

Also we were thinking to use these marks internally for bufhl etc, are the current interfaces of suitable form?

Implementation Summary

  • to allow users to only have to deal with there own marks a namespace is provided
  • indexes that are queried are returned as: [name, [row, col]] so additional queries over rpc are minimized
  • for each namespace we have a map with all the marks in that namespace, this is for fast deleting and inserting
  • The btree is for range lookups and only supports one mark per position (i think). We need a
    way to tell if a mark belongs to a certain namespace, this is done by having a list of tuples on the mark, [(namespace, name)]
@timeyyy
timeyyy commented Jul 30, 2016

@justinmk,

Is kb_tree stable enough for the platforms we target? Some klib modules are not very battle tested...

Hard to tell, codesearch turned up 60x more results for khash than for kbtree. Kbtree has really nice performance as benchmarked. There are alot of alternatives to choose from on that page as well. I can seperate the logic as much as possible so we can switch it out if need be.

@bfredl
Member
bfredl commented Jul 31, 2016

IIUC we should be able to remove the pointer manipulating shenanigans from
kbtree quite easily, this would ensure it only ever uses valid c structs
(and as a side effect not be a major PITA to debug) at the "prize" of only
supporting one node width per typedef, which we only would use anyway.

Den 30 juli 2016 10:52 em skrev "timothy eichler" <notifications@github.com

:

@justinmk https://github.com/justinmk,

Is kb_tree stable enough for the platforms we target? Some klib modules
are not very battle tested...

Hard to tell, codesearch https://searchcode.com/?q=kbtree turned up 60x
more results for khash than for kbtree. Kbtree has really nice
performance as benchmarked
https://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/.
There are alot of alternatives to choose from on that page as well. I can
seperate the logic as much as possible so we can switch it out if need be.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5031 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABTMoBVR9ZOXeop2GSDZ-_1mkQS33yQyks5qa7mFgaJpZM4JI74g
.

@bfredl
Member
bfredl commented Jul 31, 2016

Generally agree with this. We can reuse the same structure for bufhl, signs
etc, by making the vector items a tagged union. I can explain/prototype the
details when I get home to a computer and non-roaming internet connection.

Den 30 juli 2016 10:46 em skrev "timothy eichler" <notifications@github.com

:

@bfredl https://github.com/bfredl

Just store them in a kvec like bufhl. This is both more performant in the
common case and allows multiple marks per namespace at the same position.
We should intern the namespace names

So each mark will have a kvec of tuples? [(myfancy, 1), (myfancy, 2),
(superfancy, 1)]

This is a summary of the implemtation so far, i still haven't got my head
around how to do moves efficiently, Can that be baked into this
implementaion without alot of refactoring?

Also we were thinking to use these marks internally for bufhl etc, are the
current interfaces of suitable form?

** Implementation Summary **

to allow users to only have to deal with there own marks a namespace
is provided

indexes that are queried are returned as: [name, [row, col]] so
additional queries over rpc are minimized

for each namespace we have a map with all the marks in that namespace,
this is for fasst deleting and inserting

The btree is for range lookups and only supports one mark per position
(i think). We need a
way to tell if a mark belongs to a certain namespace, this is done by
having a list of tuples on the mark, [(namespace, name)]


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5031 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABTMoGTVDCIt9EQ-nxoGusbj6Yh4aailks5qa7gKgaJpZM4JI74g
.

@bfredl
Member
bfredl commented Jul 31, 2016

And I think we can implement moves efficiently enough with a quite small change to the structure and with the change to kbtree I have in mind (basically we will be able to (re)interpret the tree of marks as a tree of lines containing trees of column positions at a small additional memory cost). Later I will show how it all fits together.

@timeyyy
timeyyy commented Aug 1, 2016

Just an idea, you might consider another btree library, there alot listed on the page above, maybe one is maintained and already does what we need

@justinmk
Member
justinmk commented Aug 1, 2016

indexes that are queried are returned as: [name, [row, col]] so additional queries over rpc are minimized

Should that be a [name, row, col] tuple? Or perhaps a {"name":name, "row":row, "col":col} map. Multidimensional arrays may be uncomfortable for client authors to work with.

@justinmk
Member
justinmk commented Aug 1, 2016

I don't think it should be part of this PR, but has it been considered whether it makes sense to store extended marks in shada?

@timeyyy
timeyyy commented Aug 1, 2016 edited

@justinmk i updated above to use a flat array

some other things to clarify

  • mark names should only be integers? (clients can then map to strings if desired)
  • as is mark_set will create or update marks and namespaces, do we want to make this behavior explicit?
@bfredl
Member
bfredl commented Aug 1, 2016

@justinmk whether it makes sense to store marks and the semantics thereof depends entirely on the plugin using them, and so I think the responsibility of doing so should be with the plugin. That being said we should consider offer primitives to the plugins to store and restore marks in shada or shada-like storage efficiently , but as you say that is better another PR.

@timeyyy I like to ask that question the other way around: is the a good reason to assume a vast majority of plugins would like to identify its marks with unique, immutable strings? (For internal bookkeeping that is, how the plugin presents marks to its users is always up to the plugin)

Not sure I follow? What would it mean to have this behavior implicit?

@bfredl
Member
bfredl commented Aug 1, 2016

another btree library

I don't think so, of the 3-4 pure c libraries I looked btree was by far the most feature complete. The change to use proper c structs will be very small, no change to the actual algorithms needed. It is almost as if the btree developer has considered that someone would want to do that.

@timeyyy
timeyyy commented Aug 1, 2016 edited

@bfredl, i saw we keep it simple for now :), just ints. Also for the namespaces, it might be easier just to uses ints, users who want a name can simply define a constant or global.

implicit means first you would do mark_create, then afterwards mark_update must be called. Both calls would return 1 if the mark already exists or 0 if it does not exists respectively.

maybe a mark_namespace_create could be used as well and return a 1 or 0 if the namespace is taken, that will stop plugins accidentally messing up other marks. calls using a namespace that isn't yet created would error out.

@timeyyy
timeyyy commented Aug 1, 2016 edited

@justinmk , should neovim/you fork kbtree/khash ? maybe other people will find these improvements useful and even contribute to maintaining these for us as well.

@ZyX-I ZyX-I commented on an outdated diff Aug 2, 2016
src/nvim/api/buffer.c
+ name_obj.size = strlen(name);
+ ADD(pos_obj, INTEGER_OBJ(pos.lnum));
+ ADD(pos_obj, INTEGER_OBJ(pos.col));
+
+ ADD(rv, STRING_OBJ(name_obj));
+ ADD(rv, ARRAY_OBJ(pos_obj));
+ }
+ return rv;
+}
+
+
+/// at the giving position. If the mark already exists, it is
+/// moved to the new location.
+//
+/// @param buffer The buffer handle
+/// @param name The mark's name
@ZyX-I
ZyX-I Aug 2, 2016 Contributor

Missing namespace argument description.

@bfredl
Member
bfredl commented Aug 2, 2016

Well the iterator improvements (get_itr and itr_prev) I will simply send upstream (after fixing the style o be identical with the existing library) , they are pure additions of static functions and won't add any code or data cost to existing users of the library.

@timeyyy Namespace ids is probably also good enough.A way could be to have mark_namespace_create take a name and return ing an id. This id will then be used in most api and internal storage, but the name can be looked up for debugging purposes (what plugin is creating a million marks in my buffer?). Or just ids or no names, but we should still require a create function so ids will be unique.

I'd say one function that both creates and updates marks, no need for one api function per special case. We simply return a flag whether the mark was created or update, the plugin can simply add an assert round the call to debug or check internal consistency.

@bfredl
Member
bfredl commented Aug 4, 2016

Actually with numerical namespace ids (technically with interned namespace names as well, but this way it is more obvious) we can get rid of the array inside of positions object, as we can order marks completely as the tuple (line, col, namespace-id, mark-id) It will get a bit more complicated with bufhl and other mark kinds in the structure, but it should be doable: bufhl src_id:s should be just be identified with namespace ids, signs will be its own hidden namespace etc. I will detail my full suggestion how to do that part tomorrow. (we don't need to implement that in this PR, but it should at least be forward compatible with it)

@timeyyy
timeyyy commented Aug 5, 2016

I came across this the other day, the first answer data oriented design

I think the way tk implemented there code they took a similar approach.

@bfredl
Member
bfredl commented Aug 5, 2016

To make movements fast I would change the structure as follows


KBTREE_INIT(extmark, ExtMarkLine*, extline_cmp)
KBTREE_INIT(extline, ExtendedMark*, extmark_cmp)

typedef struct {
    linenr_T lnum;
    kbtree_t(extline) marks;
} ExtMarkLine;

typedef struct {
    // back-reference needed to get the linenr
    ExtMarkLine *line;
    colnr_T col;
    int ns_id;
    int mark_id;
} ExtendedMark;

// symbolically, really PMap
typedef Map(int, ExtendedMark*) ExtMarkNS;

This has the aforementioned performance characteristics: the cost of adjusting line numbers scales with the number of lines with marks, and the cost of adjusting columns only depends on the number of marks on that line. Though one could easily imagine variations. One could avoid having a required heap allocation per mark and only require it for the line:


KBTREE_INIT(extline, ExtendedMark, extline_cmp)
typedef struct {
    colnr_T col;
    int ns_id;
    int mark_id;
} ExtendedMark;

typedef Map(int, ExtMarkLine*) ExtMarkNS;

i e store the marks inline the nested tree like before. This is probably better for cache locality as well, but then marks will not have stable adresses, so the ns map should instead point to the line object where they are stored (and possibly also the column number for faster lookup, as lookup happens more often than column adjustment).

This would require constructing a kbtree_t inplace, which would be a quite small change on top of the proper-struct refactor (only kb_init and kb_destroy need to be changed for this).

@bfredl
Member
bfredl commented Aug 5, 2016

Different kinds of marks could then be supported using a tagged union like so

typedef enum {
    kMarkExtended,
    kMarkBufhl,
    kMarkSign,
} MarkKind;

typedef struct {
    colnr_T col;
    int ns_id;
    int mark_id;
    MarkKind kind;
    union {
        struct {
            int hl_id;
            colnr_T endcol;
        } bufhl;
        struct {
            int typenr;
        } sign;
    }

} ExtendedMark;

(where one of course could argue for just unifying bufhl with extmark (i e every extmark has optional highlighting added to it), but just to show the general principle)

@bfredl
Member
bfredl commented Aug 6, 2016

Hmm, kb_itr_get already exists upstream (though it looks not to be precise in the not-found case, so a patch will still be needed). @timeyyy Was not the latest version (sep 2015) imported?

@timeyyy
timeyyy commented Aug 6, 2016 edited

ahh, sorry, i think i just google kbtree and assumed i had landed on the latest revision...

@bfredl i just used a diff tool on it, seems the kb_itr_get was the only difference. I also have added some casts but to be honest i was just silencing errors i had no idea what i was doing :/

@bfredl
Member
bfredl commented Aug 7, 2016

No worries, it looks like kb_itr_get is completely broken anyway unless I missed something...

@bfredl
Member
bfredl commented Aug 7, 2016

For reference: attractivechaos/klib#73

The changes are also on my branch here plus work on the struct refactor. It is a little bit bigger than I thought, for instance iterators need to be typed.

@bfredl
Member
bfredl commented Aug 8, 2016
@bfredl bfredl referenced this pull request in arakashic/chromatica.nvim Aug 16, 2016
Open

Performance improvement #18

@timeyyy
timeyyy commented Aug 23, 2016 edited

is it cool if i squash this? or maybe start a new pr, there is alot of noise here :(

@justinmk
Member
justinmk commented Aug 23, 2016 edited

It's fine (to start a new PR)

@justinmk justinmk commented on an outdated diff Aug 23, 2016
- os: linux
compiler: clang
- env: CLANG_SANITIZER=ASAN_UBSAN
- - os: linux
- compiler: clang
- env: CLANG_SANITIZER=TSAN
- - os: osx
- compiler: clang
- - os: osx
- compiler: gcc-4.9
+ env: GCOV=llvm-cov CLANG_SANITIZER=ASAN_UBSAN CMAKE_FLAGS="$CMAKE_FLAGS -DUSE_GCOV=ON"
@justinmk
justinmk Aug 23, 2016 Member

Should we use llvm-cov on master instead of gcov? Does it work with coveralls?

@justinmk justinmk and 2 others commented on an outdated diff Aug 23, 2016
src/nvim/api/buffer.c
@@ -636,6 +634,297 @@ ArrayOf(Integer, 2) buffer_get_mark(Buffer buffer, String name, Error *err)
return rv;
}
+/// Returns mark info at the given position or mark index
+///
+/// @param buffer The buffer handle
+/// @param namespace a identifier returned previously with namespace_create
+/// @param id [row, col] or mark_id of mark
+/// @param[out] err Details of an error that may have occurred
+/// @return [mark_id, row, col]
+ArrayOf(Object) buffer_mark_index(Buffer buffer, Integer namespace, Object id, Error *err)
@justinmk
justinmk Aug 23, 2016 Member

API function names should be namespace-verb-thing. So buf_get_mark_index.

@timeyyy
timeyyy Aug 23, 2016 edited

i agree, but there exists already buf_get_mark people will probably get confused until they read up and find out they have to use buf_get_mark_index for extmarks...

that was the reason behind prefixing with the noun..

@justinmk
justinmk Aug 23, 2016 Member

They will be confused in either case. It's confusing to have inconsistent convention.

buf_get_mark_index is at least a bit more "discoverable" since it will appear immediately next to buf_get_mark in documentation, code-completion, API function list, etc.

@timeyyy
timeyyy Aug 23, 2016 edited

when the nvim api is callable from viml, we will lose nvim_mark_ though :(, small loss for the disoverable factor

@justinmk
justinmk Aug 23, 2016 Member

when the nvim api is callable from viml, we will lose nvim_mark_

@bfredl Is that correct? I thought we decided that the API function names would be identical in all contexts.

@bfredl
bfredl Aug 23, 2016 Member

Not sure I understand the question here. A C API function called nvim_buf_mark_whatever will be called the same in vimL and rpc in #4934 . What would be the alternative?

@justinmk
justinmk Aug 23, 2016 Member

@bfredl That's what I thought. I guess I don't understand @timeyyy's comment:

when the nvim api is callable from viml, we will lose nvim_mark_

@timeyyy
timeyyy Aug 23, 2016 edited

beforewith mark_set, mark_xyz, I can hit tab and see a list of all functions that are mark_

but now that it's set_mark etc that's not possible.

It's nothing really i'm just grumbling :)

@justinmk
justinmk Aug 23, 2016 edited Member

@timeyyy I personally prefer that convention for the same reason you mention, but I'm not sure if anyone else does. If others are ok with that, then we should do it before #4934 so that we can rename all API methods follow that convention:

nvim_{namespace}_[{noun}_]{verb}_{arbitrary qualifiers}

Note that {noun} is optional because e.g. in the case of the buffer "namespace", verbs may operate on buffers and nvim_buf_buf_get would be redundant.

@bfredl
bfredl Aug 24, 2016 Member

Hmm, surely you mean after (or in) #4934. We don't want to have both vim_old_name, vim_new_name and nvim_new_name, only the first and the last.

But then again, I don't see why we are forced to mass rename existing API names (if that was what you meant) in order for extmark ( one specific feature that happens to have many API functions) to have have a "sub-"namespace. We can just do the latter if we deem it warranted.

@justinmk justinmk commented on the diff Aug 23, 2016
src/nvim/api/buffer.c
+ }
+ ExtmarkArray *extmarks_in_range = extmark_nextrange(buf, (uint64_t)namespace,
+ l_extmark->line->lnum, l_extmark->col,
+ u_extmark->line->lnum, u_extmark->col);
+
+ if (!extmarks_in_range) {
+ ADD(rv, INTEGER_OBJ(-9));
+ return rv;
+ }
+
+ Array mark = ARRAY_DICT_INIT;
+ ExtendedMark *extmark;
+ for (size_t i = 0; i < kv_size(*extmarks_in_range); i++) {
+ mark.size = 0;
+ mark.capacity = 0;
+ mark.items = 0;
@justinmk
justinmk Aug 23, 2016 edited Member
Array mark = ARRAY_DICT_INIT;
@justinmk justinmk commented on an outdated diff Aug 23, 2016
src/nvim/api/buffer.c
+/// Setup a new namepsace for holding you marks
+///
+/// @param namespace String name for the namespace
+/// @param[out] err Details of an error that may have occurred
+/// @return integer id to be used with future mark_ calls, or 0 if name exists
+Integer buffer_mark_ns_init(String namespace, Error *err)
+{
+ uint64_t ns_id = extmark_ns_create(namespace.data);
+ if (!ns_id) {
+ api_set_error(err, Validation, _("Namespace already exists"));
+ return 0;
+ }
+ return ns_id;
+}
+
+/// Returns a list of mark namespaces in nvim
@justinmk
justinmk Aug 23, 2016 Member

"in nvim" need not be mentioned in documentation. It is implied.

timeyyy and others added some commits Jul 16, 2016
@timeyyy timeyyy remove crud 8be64a4
@timeyyy timeyyy mm trying to get kb_tree working 9947463
@timeyyy timeyyy required for kbtree to work :/ cfec799
@timeyyy timeyyy some kbtree stuff 020679c
@timeyyy timeyyy fixed up some map stuff 2674462
@timeyyy timeyyy lined up slash ee420e3
@timeyyy timeyyy fix call to SET_XFMARK 5119af3
@timeyyy timeyyy hide errors XD 0d00696
@timeyyy timeyyy improve return types c5d2dc5
@timeyyy timeyyy add index function 674e484
@timeyyy timeyyy revert eval.c 3eef668
@timeyyy timeyyy Add api to buffer.c 207da65
@timeyyy timeyyy some more stuff to do.. 48a5f74
@timeyyy timeyyy quick fix static inline 95bd073
@timeyyy timeyyy add unit test to test pos_cmp function 72a20fc
@timeyyy timeyyy fix comment f32bd01
@timeyyy timeyyy prefix names with buffer_ 5b21229
@timeyyy timeyyy added required header ea1f8cb
@timeyyy timeyyy mm working on tests 6d6fbf5
@timeyyy timeyyy simple test of adding mark works 08dd077
@timeyyy timeyyy can update marks 976c46f
@timeyyy timeyyy should follow line edits, but not col edits 8a7a27d
@timeyyy timeyyy wconversion 7574dd3
@timeyyy timeyyy free stuff when buffer closes, mark_adjust, cb6037e
@timeyyy timeyyy TRAVIS ASAN 519a851
@timeyyy timeyyy allocate memory for key
names can now be gotten
can now delete
next works
f61f71c
@timeyyy timeyyy now storing pointers, kb_put still blows up.. 913d72e
@bfredl @timeyyy bfredl bufhl: use kbtree for bufhl c7fb0c3
@bfredl @timeyyy bfredl kbtree: allow iterators to start at arbritary position b13ca26
@timeyyy timeyyy checkpoint 49bab0c
@timeyyy timeyyy trying to cleanup tests b118a90
@bfredl @timeyyy bfredl kbtree: use proper structs a88ca2e
@timeyyy timeyyy mm 536b07e
@bfredl @timeyyy bfredl test fix b5bab9f
@bfredl @timeyyy bfredl More work 01d5fd8
@timeyyy timeyyy tidy up tests 0c1930c
@timeyyy timeyyy added bfredl kbtree 86e9ec1
@timeyyy timeyyy :) cf6b3f7
@timeyyy timeyyy marks now live in a line structure (one broken test , tagged #broken) 168cc57
@timeyyy timeyyy Docs c50905c
@timeyyy timeyyy marks move with:
line splits, modified open_line
deleting charwise
put charwise
put blockwise

namespace required bfore marks are created
error is now raised if invalid rows or columns are passed in
9f5a0b3
@timeyyy timeyyy renamed api functions a2bf7da
@timeyyy timeyyy rename functions for nvim_api in tests
replace works
blockwise insert
shift line and a few fixes to tests
tabs...
open_line works i guess
0f46e78
@bfredl @timeyyy bfredl kbtree: eliminate unneccesary heap allocation b15ff88
@bfredl @timeyyy bfredl kbtree: make warning free 6d3e423
@bfredl @timeyyy bfredl some style cleanup 050153a
@bfredl @timeyyy bfredl bufhl: fix move 4faf8c7
@timeyyy timeyyy undo works but only for one level 2f1e05e
@timeyyy timeyyy fixup after rebase ed4201b
@timeyyy timeyyy undo should work more than only one time a1bb38b
@timeyyy timeyyy undo redo for most things working 5130a15
@timeyyy timeyyy change kv list to store enumerated array 7444e5a
@timeyyy timeyyy explictily calling extmark_adjust (no longer from within mark_adjust)
was required as mark_adjust is called from within undo.c
5313e3e
@timeyyy timeyyy multijoin working 71d3d95
@timeyyy timeyyy made a start at fixing the delete off by one error... 31ff26d
@timeyyy timeyyy fixed up code after cherry picking e21957d
@timeyyy timeyyy fixed backspace and using x to the right of a mark cbe0d2b
@timeyyy timeyyy fix open_line when opening to the right of the mark bb55eb9
@timeyyy timeyyy extmarks support :move , undo and redo ffdfaeb
@timeyyy timeyyy keep the extended mark btree ordered when :move 37ec50a
@timeyyy timeyyy fixed redo, and some cleanup.. 8b7f90e
@timeyyy timeyyy extmark prev functions work dcd9019
@timeyyy timeyyy fix bug with open_line eb16cc0
@timeyyy timeyyy mm afa5e94
@timeyyy timeyyy compacting the extmark undo array 3733587
@timeyyy timeyyy extmark undos only get added when required c1577e2
@timeyyy timeyyy extmark undo bug for opening lines fixed 91dbad3
@timeyyy timeyyy update function arguments 5d737c3
@timeyyy timeyyy remove some compiler pragmas ac7a040
@timeyyy timeyyy rever .travis.yml 8751d3a
@timeyyy timeyyy casting f4dca88
@timeyyy timeyyy linter 6ca1a73
@timeyyy timeyyy fix undo bug in del_bytes 3aca903
@timeyyy timeyyy linting 03c4baa
@timeyyy timeyyy compile error fixes 2d4895b
@timeyyy timeyyy lint 5cd2f7e
@timeyyy timeyyy linting 8d2b31d
@timeyyy timeyyy linting 576bfc4
@timeyyy timeyyy fixup documentation
a54bfd0
@timeyyy timeyyy switched to storing lines in a kbtree, nsmap points to the line
5afbc0b
@timeyyy timeyyy fixups for prev commit, mark_cmp works correctly
d6e28e0
@timeyyy timeyyy next and next range working again
91fe270
@timeyyy timeyyy fix test for get_prev_mark af212f1
@timeyyy timeyyy add some tests that need to be completed 640a865
@timeyyy timeyyy lint
5c7a507
@timeyyy timeyyy changed the title from [WIP] Extended Marks to [RFC] Extended Marks Jan 1, 2017
@marvim marvim added RFC and removed WIP labels Jan 1, 2017
timeyyy added some commits Jan 2, 2017
@timeyyy timeyyy deleting and freeing marks improved
25b595d
@timeyyy timeyyy undo and redo of extmark_set 66f1046
@timeyyy timeyyy undo and redo of extmark_unset and extmark_set with update fe1a713
@timeyyy timeyyy cleanup old asserts
421e753
@timeyyy timeyyy undo and redo of marks deleted during edits 875917d
@timeyyy timeyyy Merge branch 'extmarks_undo_set_and_undset' into extended_marks 779d08f
@hardenedapple hardenedapple referenced this pull request in hardenedapple/vsh Jan 4, 2017
Open

Don't use a user modifiable mark internally #2

timeyyy added some commits Jan 7, 2017
@timeyyy timeyyy extmark namespace tests added 303af9f
@timeyyy timeyyy remove unrequired function 5c4367a
@timeyyy timeyyy extmarks range searching fixed
126c07c
@timeyyy timeyyy remove useless test bbe0175
@timeyyy timeyyy extmark free improved
3a75927
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment