Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multiple neovim instances support (viminfo) #999

Closed
ZyX-I opened this issue Jul 27, 2014 · 25 comments
Closed

Improve multiple neovim instances support (viminfo) #999

ZyX-I opened this issue Jul 27, 2014 · 25 comments
Labels
enhancement feature request needs:decision discussion has run its course, need decision how to proceed needs:design needs a clear design proposal
Milestone

Comments

@ZyX-I
Copy link
Contributor

ZyX-I commented Jul 27, 2014

There is one known problem with Vim: if you launched two instances, did some things in both of them and quit only data from last Vim is saved into viminfo: Vim does not bother reading viminfo before writing it (except for marks). Thus my suggestion:

  1. Tag all saved values with the time they were modified. Specifically
    • Variables (:h viminfo-!), registers, uppercase marks should be tagged per-element (one tag for one variable, register and mark). Default tag (for unset or converted values) is -inf.
    • Buffer list should not be tagged at all (last one always wins).
    • Each previous edited file (:h viminfo-') should be tagged on its own, and each saved lowercase mark should be tagged as well. Tags on edited files are updated whenever file is closed (if file is not closed then it should use exit time).
    • Each history item should be tagged on its own.
  2. When saving read this back and
    • Update variables, registers and uppercase marks only in case they were modified in this vim instance after those in viminfo or at the same time. Decision is to be made element-wise.
    • Buffer list is just saved.
    • When deciding on previously edited file first decide on the list (if it is too lengthy drop the oldest elements), then retag remaining files using the most recent time. After this merge lowercase marks just like uppercase marks were merged before.
    • History items are ordered according to their tag (if times are equal quiting neovim instance’s history comes last) and drop some oldest elements if list became too lengthy.
  3. Use some kind of synchronization before saving file in order to not mess up with viminfo. I think documenting officially that viminfo must not be a symlink, writing viminfo to some adjacent location and overwriting viminfo with new one iff inode did not change should work (if inode did change merging should be repeated) (some sort of CAS) nearly always, even though between reading inode and moving saved viminfo it may be overwritten: there is not much sense in having 100% bullet-proof solution here. I also know about flock(), but never saw it working. And using lock files is slow and requires detection of the situation “other neovim crashed before releasing lock”.

Before implementing any of the above it is good to look at zsh: it has such interesting options as INC_APPEND_HISTORY (update history file as soon as history item appears), APPEND_HISTORY (the most simple way to avoid problem Vim has) and SHARE_HISTORY (history file is read each time it is updated, so all zsh instances have the same history list always).

It may be better to take alternative approach: write viminfo as a log and when it grows too large clean it. This makes it possible to directly take code from zsh and support all of the above options. Log should look like this: timestamp:action:arguments:

1406449750:set_previously_edited_file:/home/zyx/foo
1406449750:set_file_mark:\0/home/zyx/foo\0:.:18:0

(Note: using NUL to terminate file name. Newline can also be present in the filename and I do not want to bother with escaping. First NUL is there to indicate that newline may be in the argument.)

Though I am not sure whether we should make this log human-readable: using uint64_t casted to char[8] (need to ensure endianness though), one character for action (with switch) and special parser for each of the actions should work much faster and reqire much less on-disk space. Will need a tool to show this log in a human-readable fashion, but this is really simple.

By cleaning I mean first removing successive elements that obsolete each other (e.g. two elements setting one uppercase mark). Then remove oldest duplicate elements from history (e.g. two successive appends of identical history lines). Last: remove oldest items that makes some list (of old files, history lines, etc) grow too large.

This should be decided before first release because we do not want to add compatibility code, do we?


Want to back this issue? Place a bounty on it! We accept bounties via Bountysource.

@ZyX-I ZyX-I added the decision label Jul 27, 2014
@ZyX-I ZyX-I added this to the first release milestone Jul 27, 2014
@dumblob
Copy link

dumblob commented Jan 26, 2015

If the viminfo file format should change (i.e. get incompatible with vim), we should rename it and also ensure future extensibility without breaking the old formatting. Plain TLV approach (that is e.g. "these 256 bytes have a type and/or version field I don't understand/support and thus I can skip them while issuing some warning") might be a viable concept e.g. for using msgpack directly for the data.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 7, 2015

I think I have a good option:

  1. Rename viminfo to shada. Everything, but &viminfo option name must be also renamed. Default file name will become ~/.nvim/shada/main.shada
  2. Move all viminfo-related code to src/nvim/shada.c. Currently main code lives in ex_cmds.c for whatever reason and local code for writing history, registers, variables, etc lives in local files.
  3. Format of the ShaDa file:
    1. Concat of msgpack objects. Two shada files can be safely concatenated together and this action will not make the result wrong.

    2. Each object in the file is a concat of three msgpack values on its own: (integer) type of the item, (integer) timestamp (whether the last one is useful is rather subjectable and depends on how shada file merging is supposed to be implemented), (integer) data length. May be followed by another msgpack object with length exactly equal to the previous integer.

      Last integer and the whole reasoning behind using exactly concat lists is allowing faster skipping because not all entries are needed in some exact situations. It may be wiser to put it before the timestamp and include timestamp length in data length.

      Every string in the file is in UTF-8 encoding unless it is binary. This means that if &encoding is not UTF-8 then conversion needs to be performed before register values, history strings, strings in variables, etc. File names are always considered binary.

    3. After these two values actual data goes:

      1. Header item: most useless thing, is used to contain information about who have written shada file useful for debugging. Data is a mapping, keys: version, pid. Unknown keys are ignored.
      2. Last search pattern. Data: mapping, keys: magic (false or absent (default: true)), smartcase (true or absent (default: false)), offtype (integer: enum, may be absent), off (integer, absent when offtype is absent (default: 0)), islast (false or absent (default: true)), pattern (string, required). See wvsp_one for details. Unknown keys are ignored.
      3. Last :s replacement string. Data: same as above, except that it is string in place of pattern and smartcase is not dumped for it being useless. Unknown keys are ignored.
      4. History item. Data: array with one element: string. Second and further array elements are attached to the history item as Array (holds only second and further elements) and preserved, but cannot be queried from VimL.
      5. Register value. Same as above (regarding outer type and how unknown elements are dealt with), but required elements are four: register name (integer), register type (integer), width (integer) (for block registers) and register value (string).
      6. Variable value. Array of two elements: variable name (string), variable value (integer, float, string, array, map). Third and further elements are ignored.
      7. Global mark definition. Map with the following keys: name (string), line (integer), column (integer), file (binary string). All other keys are saved and preserved until mark redifinition. All marks which have weird names are saved separately and then redumped as-is, including timestamps.
      8. Jump list item definition. Map like above, but without name key.
      9. Buffer list definition. Data is an array of maps, each map contains file (binary string), lnum (integer), col (integer) keys, other keys are saved within the buffer structure and dumped back unchanged. Dumped timestamp is always time of the exit, even if since starting nvim buffer list did not change.
      10. Local mark definition. Exactly like global mark, except that different set of names is considered to be normal. Different rule of merging applies.

      Note: “string” means “binary string which was converted from &encoding to UTF-8”. It is expected to be UTF-8, but it may be not a UTF-8 string. “binary string” means that no conversions were performed.

      Note 2: Unless said otherwise timestamps are saved and updated only on value redifinition (in case of history: when history line is used again).

      Note 3: Rules for saving and preserving unexpected keys or list items are there for compatibility reasons.

If this variant is acceptable I can write a PR for this. Note: I am going to change format, not implement additional options for sharing data between nvim instances. Some simple merging algorythm will be added, but duplicating full set of zsh features is not needed before the release unlike format changes and merging function will not care a tiny bit about locks. Not sure whether or not I will implement additional information saving (i.e. information for compatibility and timestamps): depends on the complexity. Key change is new format, not new features.

Ping @tarruda.

@tarruda
Copy link
Member

tarruda commented Apr 9, 2015

If this variant is acceptable I can write a PR for this. Note: I am going to change format, not implement additional options for sharing data between nvim instances. Some simple merging algorythm will be added, but duplicating full set of zsh features is not needed before the release unlike format changes and merging function will not care a tiny bit about locks. Not sure whether or not I will implement additional information saving (i.e. information for compatibility and timestamps): depends on the complexity. Key change is new format, not new features.

I have nothing against what you proposed, it's not like changing the format will have serious implications. In fact, using a standard serialization format such as msgpack means it can be parsed more easily by other programs

Have you thought about how to synchronize access to the file?

BTW where does the "ShaDa" name comes from?

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 9, 2015

Have you thought about how to synchronize access to the file?

Since this is merely a log I was originally going to just take code from zsh. (Though just taking code means that you now only control shada length in term of top-level item concats: can’t say “1000 lines of history” (for each history kind), only “100 000” elements in shada file whatever elements will be.)

At the first stage this will be simple “read shada -> merge -> write shada.tmp -> rename”. Without any kind of synchronization. I guess though that when I will start looking at how merging stage should be implemented I may deduce slightly different format.

BTW where does the "ShaDa" name comes from?

SHAred DAta. It is really great way of constructing easily memorable identifiers if you don’t have enough letters for good abbreviation.

@dumblob
Copy link

dumblob commented Apr 9, 2015

At the first stage this will be simple “read shada -> merge -> write shada.tmp -> rename”. Without any kind of synchronization.

I have a positive experience with this simple approach (and it should be pretty fast in this case as the whole file won't need to be read through due to time stamps).

Btw nice format specification, I looking forward to using it :)

@justinmk
Copy link
Member

justinmk commented Apr 9, 2015

After these two values actual data goes:

@ZyX-I Your shared-data spec looks something like a subset of the editor "state". What do you think about generalizing this concept as the main (only) serialization format for the Neovim editor state? This would be useful for spawning child nvim processes for parallel execution (e.g. for a :vimgrep search). And, the C data structures that arise from this general concept could potentially be passed around instead of globals.

@tarruda
Copy link
Member

tarruda commented Apr 10, 2015

@justinmk it should already be possible to run vimscript in parallel after #1446 is merged, there's no need to restore editor state. In fact, I think spawning with -u NONE is preferable when you only want to run a script in parallel.

Besides, saving the editor state is much more complex than what @ZyX-I is proposing, it would involve not only refactoring every global variable, but also static variables in each module(which there are plenty). Not to mention that there things out of our control(file descriptors, child processes...).

Still, being able to serialize most editor state is a worthly long term goal. I might open an issue to discuss a plan for distributing work in entry-level tasks.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 10, 2015

I have a positive experience with this simple approach (and it should be pretty fast in this case as the whole file won't need to be read through due to time stamps).

Timestamps come into the action when it needs to be deduced which of the items need to be kept during merging. Items in the file are not going to appear sorted because I have no intention in keeping the whole file contents in memory, twice (three times considering that shada file needs to be read back).

For this reasons I will either have to add file size and some offsets into the header (works only as long as used approach is that simple one, but currently used format is supposed to be more zsh-history-like with possible support of all its features in the future) or always process the whole file should I need only part of it (note: current code for some reasons defines flags which tell NeoVim to read only marks or only data for v:oldfiles or everything else; did not yet check where such capabilities are actually used so cannot say whether new format will add overhead to some actions or will remove it from startup).

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 10, 2015

@tarruda About static variables: I usually omit them as long as possible, keeping only const ones. This refactoring is also going to get rid of at least one other global. I hope that during all the refactorings many of them will vanish.

@dumblob
Copy link

dumblob commented Apr 10, 2015

Timestamps come into the action when it needs to be deduced which of the items need to be kept during merging. Items in the file are not going to appear sorted because I have no intention in keeping the whole file contents in memory, twice (three times considering that shada file needs to be read back).

Still, external merge sort should be easy enough to implement to avoid this issue of keeping the whole file in memory ;)

@justinmk
Copy link
Member

it should already be possible to run vimscript in parallel after #1446 is merged, there's no need to restore editor state. In fact, I think spawning with -u NONE is preferable when you only want to run a script in parallel.

@tarruda If some state is not sent, the results of :vimgrep and other potential use cases will not be correct. E.g., 'path' and 'magic' affect vimgrep.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 10, 2015

Still, external merge sort should be easy enough to implement to avoid this issue of keeping the whole file in memory ;)

I am wondering how is this supposed to help?

  1. I want to have temporary files anywhere even less then having file in memory trice.
  2. NeoVim instances run in parallel. But shada files are written at exit. So new items will appear between already read ones and the whole file will still be needed to read.
  3. Viminfo files are read only once at startup as well.
  4. It goes directly against the feature “concat of shada files is a correct shada file”. I.e. you either can concat and must be prepared that timestamps are out of order or can’t do this and need to merge always. I have added it to the list because I was planning to make running nvim instances write new items to something like .nvim/shada/{host}.{pid}.shada (without synchronization as it will not be needed) and write to main file only at exit. In case of crashes simple shell script at startup is all what is needed in order not to have anything lost.

@dumblob
Copy link

dumblob commented Apr 10, 2015

@ZyX-I right, it seems I completely misunderstood your intent. E.g. I didn't assume shada won't be shared among parallel neovim instances (I thought, that after adding a single new item, the one shared shada file will get updated and the new parts automatically re-read by other neovim instances as long as monotonic time source is used for time stamps).

Regarding the concat, this shouldn't be an issue - when starting a new instance, you anyway need to read the whole shada file to get the full history and therefore it's a perfect time to sort the file if any time-incompatibility is encountered.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 10, 2015

(I thought, that after adding a single new item, the one shared shada file will get updated and the new parts automatically re-read by other neovim instances as long as monotonic time source is used for time stamps).

It is never going to happen by default. Even zsh with sharehistory option is not that intrusive: shada not only contains history, but a jumplist and file-local and global marks as well. So even if option like SHARE_HISTORY will be added this will be a list showing what exactly will be read back.

And I explicitly said that I am not going to do this in first PR regarding the issue.

Regarding the concat, this shouldn't be an issue - when starting a new instance, you anyway need to read the whole shada file to get the full history and therefore it's a perfect time to sort the file if any time-incompatibility is encountered.

I am not going to write file at startup. And especially I am not going to do such possibly time-consuming task as sorting at startup as well.

@dumblob
Copy link

dumblob commented Apr 10, 2015

It is never going to happen by default. Even zsh with sharehistory option is not that intrusive: shada not only contains history, but a jumplist and file-local and global marks as well. So even if option like SHARE_HISTORY will be added this will be a list showing what exactly will be read back.

And I explicitly said that I am not going to do this in first PR regarding the issue.

Let me clarify some bits. By history I meant everything done/saved in the past. I'm not much familiar with the way how zhs does it under the hood, but I'm using my own setup for shells having the non-POSIX history command and thanks to it, I'm doing it exactly the "intrusive" way and it works well (even on different systems with different shells and configurations). But of course, such central sharing is rather subjective.

I am not going to write file at startup. And especially I am not going to do such possibly time-consuming task as sorting at startup as well.

I meant something like

shada_file = open_file_and_return_stream()
shada_history = {}  // empty list

while (r = read_next_record(shada_file)) {
  x = shada_history.last_item_index()
  while (r.timestamp() < shada_history.at(x)) --x;  // this might be a binary search
  shada_history.insert_after(x, r)
}

I.e. no temporary file, no more-than-once reading of the shada file, logarithmic performance in the worst case. And that's already pretty fast.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 10, 2015

Let me clarify some bits. By history I meant everything done/saved in the past. I'm not much familiar with the way how zhs does it under the hood, but I'm using my own setup for shells having the non-POSIX history command and thanks to it, I'm doing it exactly the "intrusive" way and it works well (even on different systems with different shells and configurations). But of course, such central sharing is rather subjective.

By history I mean only lines (commands, searches, etc) typed in command mode. Zsh history files have only this and thus are not intrusive with SHARE_HISTORY. Viminfo (and thus shada) contains things like jumplist. I do not think any person is going to be happy in case he had done “go to tag” in one NeoVim instance in one project, same in another (instance and project) and found out that in first NeoVim instance <C-o> reopens the file opened in second NeoVim instance because jump was saved and read back. Also there are file-local and global marks overwriting which from shada file from different instance (should it happen to open just the same file in case of file-local marks) may be confusing, but much less likely to actually occur then jumplist mess.

I.e. no temporary file, no more-than-once reading of the shada file, logarithmic performance in the worst case. And that's already pretty fast.

This is not external merge sort. And this is exactly what will leave three instances of shada file: one shada_history from current nvim instance, one is shada_history from current nvim instance expressed in other structures (this one is source of the data), one shada_history from shada file and one is the result of the merge. The result of the merge can be put in either one of shada_histories that are represented as such (in-place list merging), so three.

At shutdown I was actually going to pretend that both history in shada file and history in current NeoVim instance are already sorted and thus keep minimum amount of additional memory by using much simpler sorted lists merging (additionally needs an explicit test that this behaves well enough when lists are not actually sorted). (Note: things like marks occupy less memory and are not history lists, so they are not going to cause any problems.) I think this will make code both not allocate much additional memory and use only one iteration to process shada file which needs to be merged with current instance data.

Also I thought you meant I need to sort the whole file and then write it. Amount of things that needs to be sorted is less then amount of things which are stored in the shada file and in any case you don’t need to sort the whole file at startup because two shada file entries may contain different type of data: it is useless to first position command history line before search history line and just then put these lines into separate lists without keeping information about their ordering anywhere. Only entries with identical type (for history specifically: and identical history type) need sorting.

@dumblob
Copy link

dumblob commented Apr 10, 2015

Thanks for explanation of the terms you use. Now I see your point and agree with your conclusions.

And yes, the last pseudo code had nothing to do with external merge sort as it was a different situation than I originally understood.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 18, 2015

Found a problem with marks: I added timestamp field to mark definition (including local marks) and modified corresponding code in undo.c which saves marks within undo tree, but the result can’t be saved in undo file. Currently code is saving nothing and restoring zero timestamp, but I do not like the idea.

Persistent undo file is the next think which needs attention because it can’t be extended at all.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 25, 2015

There is a question: &viminfo option contains such an interesting suboption s described as

  Maximum size of an item in Kbyte.  If zero then registers are
  not saved.  Currently only applies to registers.  The default
  "s10" will exclude registers with more than 10 Kbyte of text.
  Also see the '<' item above: line count limit.

I am going to extend this option according to its description (“Maximum size of an item in Kbyte”), except that it is ignored for buffer list + description is off by 3 unsigned integers (type, timestamp, length):

  Maximum size of an item in KiB.  If zero then nothing is 
  saved.  Unlike Vim this applies to all items, except for the 
  buffer list.  Actual item size is off by three unsigned 
  integers: maximum with `s10` item size may be 1 byte (type: 
  7-bit integer) + 9 bytes (timestamp: up to 64-bit integer) 
  + 3 bytes (item size: up to 16-bit integer because 2^8 < 10240 
  < 2^16) + 10240 bytes (requested maximum item size) = 10253 
  bytes.

Another possible option is narrow it to registers only like it was in Vim, but make this behaviour official. Note that 10240 bytes of an item ≠ register text 10240 bytes long: register is saved as {"name": "a", "type": 1, "contents": ["line1", "line2"]} (in msgpack, of course) and this adds additional overhead.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 25, 2015

Note that one can still limit register size by using < (“Maximum number of lines saved for each register.”), but this limit is not that precise.

@dumblob
Copy link

dumblob commented Apr 25, 2015

I would prefer the first solution with a minor change in wording - from Maximum size of an item in Kbyte. to Maximum size of an item content in Kbyte.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 25, 2015

@dumblob I use KiB, not Kbyte (not sure what Kbyte is supposed to mean, but in current Vim and NeoVim sources it means 1024 bytes).

@dumblob
Copy link

dumblob commented Apr 25, 2015

@ZyX-I correct, but the point with the word content stays the same.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Apr 25, 2015

@dumblob I have no objections to the word “content”, just to Kbyte.

ZyX-I added a commit to ZyX-I/neovim that referenced this issue Apr 25, 2015
What works:

1. ShaDa file dumping: header, jump list, history, search patterns, substitute 
   strings.

What does not:

1. ShaDa file dumping: registers.
2. ShaDa file reading.

Most was not tested.

TODO:

1. Merging.
2. Reading (ShaDa file reading is present, it just does not work) history, local 
   marks, jump and buffer lists.
3. Documentation update.
4. Converting some data from &encoding.
5. Safer variant of dumping viminfo (dump to temporary file then rename).
6. Removing old viminfo code (currently masked with `#if 0` in a ShaDa file for 
   reference).

Ref neovim#999.
ZyX-I added a commit to ZyX-I/neovim that referenced this issue Apr 25, 2015
What works:

1. ShaDa file dumping: header, jump list, history, search patterns, substitute 
   strings.

What does not:

1. ShaDa file dumping: registers.
2. ShaDa file reading.

Most was not tested.

TODO:

1. Merging.
2. Reading (ShaDa file reading is present, it just does not work) history, local 
   marks, jump and buffer lists.
3. Documentation update.
4. Converting some data from &encoding.
5. Safer variant of dumping viminfo (dump to temporary file then rename).
6. Removing old viminfo code (currently masked with `#if 0` in a ShaDa file for 
   reference).

Ref neovim#999.
ZyX-I added a commit to ZyX-I/neovim that referenced this issue Apr 25, 2015
What works:

1. ShaDa file dumping: header, jump list, history, search patterns, substitute 
   strings.

What does not:

1. ShaDa file dumping: registers.
2. ShaDa file reading.

Most was not tested.

TODO:

1. Merging.
2. Reading (ShaDa file reading is present, it just does not work) history, local 
   marks, jump and buffer lists.
3. Documentation update.
4. Converting some data from &encoding.
5. Safer variant of dumping viminfo (dump to temporary file then rename).
6. Removing old viminfo code (currently masked with `#if 0` in a ShaDa file for 
   reference).

Ref neovim#999.
ZyX-I added a commit to ZyX-I/neovim that referenced this issue Apr 25, 2015
What works:

1. ShaDa file dumping: header, jump list, history, search patterns, substitute
   strings.

What does not:

1. ShaDa file dumping: registers.
2. ShaDa file reading.

Most was not tested.

TODO:

1. Merging.
2. Reading (ShaDa file reading is present, it just does not work) history, local
   marks, jump and buffer lists.
3. Documentation update.
4. Converting some data from &encoding.
5. Safer variant of dumping viminfo (dump to temporary file then rename).
6. Removing old viminfo code (currently masked with `#if 0` in a ShaDa file for
   reference).

Ref neovim#999.
@justinmk justinmk added the needs:design needs a clear design proposal label Jul 16, 2015
@justinmk
Copy link
Member

justinmk commented Jan 8, 2018

closed by #2506

@justinmk justinmk closed this as completed Jan 8, 2018
@justinmk justinmk added the enhancement feature request label Jan 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature request needs:decision discussion has run its course, need decision how to proceed needs:design needs a clear design proposal
Projects
None yet
Development

No branches or pull requests

4 participants