[RFC] system() prefer pipes over tmpfiles #807

Closed
wants to merge 8 commits into
from

Projects

None yet

4 participants

@aktau
Neovim member

As mentioned by @tarruda, it's a bit nasty that system() has to use tempfiles, we can remove that. This is a proof of concept for that.

@Shougo, if you feel up for it, you can benchmark this branch to see if this gains us anything. I might have to turn off the debugging messages though, they might be slowing everything down (I suspect so).

ref #473

aktau added some commits Jun 3, 2014
@aktau aktau os/shell: constify arguments
Minor fixes
4a65208
@aktau aktau os/shell: implement os_system and os_run_sync
Trying to separate the concerns that os_call_shell had, with the goal that
os_call_shell will be slimmer in the end and we can support pipe-only
system() calls.
051e46d
@aktau aktau eval: reimplement f_system on top of os_system
This eludes the tempfile problem (unless of course one manually adds
redirects to the shell commandline, which some plugins seem to do, e.g.:
vim-easytags).
c03d305
@justinmk
Neovim member

Really looking forward to this, awesome work @aktau .

@Shougo

Thanks. I will test it.
But it is failed with Travis CI. Please check it.

@justinmk justinmk and 1 other commented on an outdated diff Jun 4, 2014
src/nvim/eval.c
- res = get_cmd_output(get_tv_string(&argvars[0]), infile,
- kShellOptSilent | kShellOptCooked);
+ set_vim_var_nr(VV_SHELL_ERROR, (long) status);
+
+ if (res) {
+ DLOG("os_system <- %.*s\n", nread, res);
+ } else {
+ DLOG("os_system <- (null)");
+ }
#ifdef USE_CR
@justinmk
justinmk Jun 4, 2014

I think we can remove this.

couldn't help myself: #808

@justinmk justinmk commented on the diff Jun 4, 2014
src/nvim/os/shell.c
+ (*rd->exited)++;
+ uv_read_stop(stream);
+ uv_close((uv_handle_t *) stream, NULL);
+ }
+
+ return;
+ }
+
+ // check if libuv is done reading for now
+ if (nread == 0) {
+ return;
+ }
+
+ // grow the dynamic buffer if necessary
+ // TODO(aktau): extract into an ensure() function
+ if (rd->buf.cap < rd->buf.len + (size_t) nread) {
@justinmk
justinmk Jun 4, 2014

where is cap initialized?

@aktau
aktau Jun 4, 2014

It's initialized by memset(&rd, 0, sizeof(rd)). Unfortunately, clang likes to warn when I assign = { 0 }, which is a real bummer as it's faster and cleaner.

This dyn_buffer_t should be folded into kvec I think (we already forked kvec anyway). As you can see, it uses the same growing algorithm. The reason I didn't use kvec is because:

  1. There's no macro for getting the underlying array (so you'd have to know what its name is, which is not very clean)
  2. There's no macro for "ensuring" a bigger size (there's kv_resize but that one doesn't do array doubling afaik)
@justinmk justinmk commented on the diff Jun 4, 2014
src/nvim/os/shell.c
+/// @param[out] nread the number of bytes in the returned buffer (if the
+/// returned buffer is not NULL)
+/// @return the exit status code, traditionally 0 for success on UNIX.
+int os_run_sync(char *const argv[],
+ const char *input,
+ size_t len,
+ char **output,
+ size_t *nread)
+{
+ assert(argv);
+
+ if (len == 0) {
+ input = NULL;
+ }
+
+ uv_loop_t *loop = uv_default_loop();
@justinmk
justinmk Jun 4, 2014

I think it's ok to just call uv_default_loop() whenever you need it. Can't remember if there was a reason we stopped assigning it like this.

@aktau
aktau Jun 4, 2014

I don't think it matters much either way. Most of the code is taken from @tarruda's work on os_call_shell() and what I learned from the (outdated) libuv book. It functions as a proof of concept. What do you think of the API and the fact that it isn't async as first proposed?

@justinmk
justinmk Jun 4, 2014

The API looks natural. 👍 to hiding the libuv plumbing.

Where was the suggestion for async? I don't see how it could be async without breaking legacy plugins. If scripts need async system() they should use jobstart().

Looking at the old get_cmd_output and make_filter_cmd, it was literally building a command string to send to the shell, and input was send via a < shell redirect. :help shelltemp says Vim uses pipes if shelltemp=0, but that doesn't appear to affect system().

@aktau
aktau Jun 4, 2014

Yes, after grokking get_cmd_output and make_filter_cmd all I could think was: what a mess.

I think system() can't benefit from async (at least not without explicit callback support in VimL, thus possibly adding a system_async() call as we can't change system()). For filtering !range it could be nice to provide continuous updates for very long running commands though. That should probably be another function: os_run_async(), which would conform more to @tarruda's suggested interface. The optimal case for filtering huge files is actually async both ways: with a write callback and a read callback. Perhaps I can layer that in somewhere.

Note that in my opinion, when on requires async, one can't inherit the neovim FD's, so that will hopefully allows us to separate some concerns and hide the complexity of the implementation, providing more code reuse.

@aktau
aktau Jun 4, 2014

Also, only slighty related but: it bothers me to no end that files such as eval.c have whitespace at the end of some lines. I have to use git addnw (which doesn't always work correctly because sometimes I want whitespace changes) or worse git add -p after every file that has this. Can we just get rid of that?

@justinmk
justinmk Jun 4, 2014

adding a system_async() call

How would that differ from jobstart() + autocmd JobActivity (added in #475)?

files such as eval.c have whitespace at the end of some lines

Depends on how much conflicts it would cause...

@aktau
aktau Jun 4, 2014

How would that differ from jobstart() + autocmd JobActivity (added in #475)?

It wouldn't (really) for the VimL system_async() I was talking about, so what I said is moot. It could be useful for faster and/or more responsive filtering, though.

Depends on how much conflicts it would cause...

Conflicts with? Are there many PRs changing eval.c in the pipeline? (save for mine and yours).

@aktau
aktau Jun 4, 2014

After reading it over, I think we could reuse some code from the jobs implementation. We would especially need a way to wait synchronously for jobs.

@aktau
Neovim member

@Shougo it should compile fine now. It's possible that there's still a leak somewhere but most of the travis steps are working.

It appears I had some problems with clint (of course), an unused variable (clang doesn't warn for that, gcc does) and on_exit already being declared in a standard header on linux. All fixed. You can test now :).

@aktau aktau changed the title from System prefer pipes over tmpfiles to [rfc] system() prefer pipes over tmpfiles Jun 4, 2014
@aktau aktau changed the title from [rfc] system() prefer pipes over tmpfiles to [RFC] system() prefer pipes over tmpfiles Jun 4, 2014
@aktau
Neovim member

All tests pass. Just removed debugging output so real benchmarking can be done.

@aktau
Neovim member

One thing that I've been wondering is that this just ignores stderr completely. This can be fixed by the user (on UNIX at least) by doing 2>&1, but does anyone think we should be returning a separate buffer for stderr or something? Are there tools that always return significant info on stderr? What happens in old-vim (afaik the same: user can redirect if (s)he wants)?

@justinmk
Neovim member

What happens in old-vim (afaik the same: user can redirect if (s)he wants)?

Yes: http://stackoverflow.com/a/2575803/152142

does anyone think we should be returning a separate buffer for stderr

The job control feature returns an array of stdout/stderr, if I recall. It could be useful to provide a synchronous alternative to job control, that returns a similarly structure array. When job control gets the "job info" feature that also returns the pid, etc, then the system() counterpart should also include that type of information.

@aktau
Neovim member

Yes, or via temp files: http://stackoverflow.com/a/2575803/152142

That question indicates that bang :! captures both stdout and stderr by default. Doesn't seem like very nice behaviour.

The job control feature returns an array of stdout/stderr, if I recall. It could be useful to provide a synchronous alternative to job control, that returns a similarly structure array. When job control gets the "job info" feature that also returns the pid, etc, then the system() counterpart should also include that type of information.

I've been thinking of making system() return richer information. At first I wasn't sure if the pid would be helpful,. There's two cases.

  1. regular shell pipeline: ... | grep | ...
  2. quasi-async run a process: ... | grep | ... &

The later one could perhaps benefit from returning a pid, but it would be the pid of the running shell process. Don't know if that would be... useful? Maybe if people starting using exec? Maybe we should create our own pipeline fallback so we can get at the raw process? Seems like duplication of effort though, we're not a shell.

@aktau
Neovim member

@tarruda do you think something like this could work for synchronously waiting for a job to exit?

static void job_wait_timer_cb(uv_timer_t *handle)
{
  handle->data = (void *) (intptr_t) true;
  uv_close((uv_handle_t *) handle, NULL);
}

/// job_wait - synchronously waits for a job to finish
///
/// @param id The job id
/// @param timeout Number of milliseconds to wait, 0 for not waiting, -1 for
///        waiting until the job quits.
/// @return returns the status code of the exited job, -1 if the job was not
///         found. -2 if the job is still running and the `timeout` has
///         expired. -3 if an interrupt was caught.
int job_wait(int id, int timeout)
{
  // find the job
  Job *job = find_job(id);

  if (job == NULL) {
    return -1;
  }

  // if the job is still busy but the caller doesn't want to wait
  if (timeout == 0 && job->pending_closes > 0) {
    return -2;
  }

  // the caller wants to wait a set amount of time, set a timer
  uv_timer_t timer;
  if (timeout > 0) {
    uv_timer_init(uv_default_loop(), &timer);
    timer.data = (void *) (intptr_t) false;
    uv_timer_start(&timer, job_wait_timer_cb, (uint64_t) timeout, 0);
  }

  // keep running the loop until everything is completely closed or the
  // timer expires
  while (job->pending_closes > 0) {
    if (timeout > 0 && (bool) (intptr_t) timer.data == true) {
      return -2;
    }

    uv_run(uv_default_loop(), UV_RUN_ONCE);

    if (got_int) {
      // forward SIGINT to the job, stop waiting
      job_stop(id);
      return -3;
    }
  }

  return (int) job->status;
}

That way we might be able to repurpose some of the job api to do everything we want (and eliminate the bulk of shell.c), if I'm not mistaken.

Sadly, the got_int flag is a bit of a conundrum. We don't know which event-loop it was for, and absorbing it in one causes the other one not to "get it" anymore.

@tarruda
Neovim member

@aktau great work here

@tarruda do you think something like this could work for synchronously waiting for a job to exit?

Yes, we can reuse job control/stream abstractions for this but some work needs to be done first. I was going to focus on this when I started implementing the python ex command because a similar mechanism will be required(we need to simulate the blocking nature of the current implementation), but since you started working on system I'm going to describe the problem so you can have a shot at it.

What we need is a way to 'lock' a set of event sources by ignoring all other event sources except the locked ones. There are a two of ways I can see to achieve this:

  • By writing a libuv helper function(probably in uv_helpers.c) that receives a set of libuv handles to lock it's 'attention' on. What this helper does is relatively simple: It iterates through all registered handles, temporarily disabling those that are not contained in the set.
  • We already defer the real processing of events to vim's main loop by pushing them to the queue in event.c, so another approach is to reuse this in some way to ignore all event sources except those we are interested in locking on.

I think the first approach would require more libuv knowledge because each handle type is 'disabled' in a different way. We need to call uv_read_stop of uv_stream_t, for example. Perhaps @saghul can offer some extra insight on this.

For the second approach, some refactoring would be needed: Events that come from the locked sources would need to be put in a separate temporary queue so the main loop wouldn't accidentally process events from other sources that are being ignored.

I'm not sure which approach is better for this, or if there are problems I can't see yet, but unfortunately I cant work on this right now because I'm focused on getting the redraw events right. In any case I will be here to answer questions in case you want to give it a shot.

If you decide to try implementing this lock mechanism, make sure it also receives a 'lock timeout' so we won't be blocked forever if the 'python' process hangs while running the python ex command, for example(we can report something like 'the python co-process stopped responded and was killed). For 'system' a timeout isn't needed, but we need to take into account interrupts sent by the user(this is already done by placing the terminal in cooked mode and checking for 'got_int')

@aktau
Neovim member

What we need is a way to 'lock' a set of event sources by ignoring all other event sources except the locked ones.

When you find the time, could you explain the rationale behind this a bit more? Maybe with a specific example, like why wouldn't something like the job_wait I proposed work. I think I know a bit too little about neovim's new evented architecture to see all the dangers (or even libuv, which I'm just discovering).

The way I reasoned about it was thus:

  • In os_call_shell, os_run_sync and job_wait we are running the default loop recursively (I assume we always come from a callback called by the uv_run in event_poll). This would be bad if it caused the loop in event_poll to miss things, but as far as I can see this is not the case, it's just a simple uv_run with no processing. This indicates to me that all the real work is done in the callbacks, which is a good thing. The big problem I see is when one of the recursive loops absorbs all of the events and then there's no more events for a while (user doesn't type anything, all reads, writes and closes done). Then it could stop the other loop dead in its tracks. I'm not even sure if this can happen. It boggles the mind a little bit.

Another thing I thought about was: what if we run things like one-off jobs (like os_system and jobs destined for job_wait) in their own event loop? Could that work?

@Shougo

I tested it.

Vim: 0.277ms
neovim - current: 0.151ms
neovim - this: 0.140ms

This branch is faster than current implementation.

@tarruda
Neovim member

When you find the time, could you explain the rationale behind this a bit more?

We would need the lock mechanism anyway because python/ruby/perl.. ex commands will be running code in a co-process. Consider this scenario:

python << EOF
vim.current.line = 'line'
... some work
current_line = vim.current.line
EOF

Plugins that have snippets like this, assume nothing can change vim's state between the first and last line, but that is clearly not the case for the new architecture where everything is asynchronous(another plugin could very well change the current line or the contents of the whole buffer while the above snippet is running). We need the lock mechanism to tell neovim this: "While this snippet of python code is being executed, we need to ignore events that are not related to the channel of the python process". Here's how the ex command is translated in terms of rpc messages:

neovim -> python host: {event: 'eval', arg: '...python code...'}
python host -> neovim: {method: 'vim_set_current_line'...}
python host -> neovim: {method: 'vim_get_current_line'...}
python host -> neovim: {method: 'unlock'} // an 'eval event implicitly locks neovim, and must be unlocked by the host after evaluation is complete

This lock/unlock mechanism would also be exposed as API functions, so plugins could synchronize code running in a background thread, for example.

Maybe with a specific example, like why wouldn't something like the job_wait I proposed work

That wouldn't work because asynchronous events are queued in event.c and processed by vim's main loop(including events emitted jobs). So the job callback wouldn't be executed until we yield back to vim.

In os_call_shell, os_run_sync and job_wait we are running the default loop recursively (I assume we always come from a callback called by the uv_run in event_poll).

Libuv does not support recursion in the event loop(it might work on some cases but it's not supported), so no, it's not running recursively because when events arrive we simply yield to vim's main loop(events are simulated by returning a special key to the main loop)

Another thing I thought about was: what if we run things like one-off jobs (like os_system and jobs destined for job_wait) in their own event loop? Could that work?

The problem with this is that even when running one-off synchronous jobs, we would still be interested in some other events such as interrupts sent by the user. It could work, but we would have to re-add all watchers we want to send events to this new loop. I think it would be simpler to temporarily 'silence' all other event sources with the lock helper I mentioned.

@tarruda
Neovim member

Now that I think better, it might be possible to use your job_wait function if you add an 'async' flag to job_start that is passed to the RStream constructor. Passing 'false' to this flag would allow rstream events to be processed immediately: https://github.com/neovim/neovim/blob/master/src/nvim/os/rstream.c#L41-L44

*update: Yes your job_wait function is the right way to go. The lock mechanism will only be used for synchronizing access to the RPC API and wouldn't work for this case.

@aktau
Neovim member

This branch is faster than current implementation.

Good to know. I expected more of a speedup from using only pipes as temporary files do have some overhead. Where did you test this, on windows? Also, which commands did you pass to system()?

EDIT: very funny of me, neovim can't even be compiled on windows yet...

Now that I think better, it might be possible to use your job_wait function if you add an 'async' flag to job_start that is passed to the RStream constructor. Passing 'false' to this flag would allow rstream events to be processed immediately: https://github.com/neovim/neovim/blob/master/src/nvim/os/rstream.c#L41-L44

I'm going to need to look at that a little bit harder next time I get some more time to tinker. Because I need to revise my mental model of the neovim architecture, as you said there are no recursive calls.

@Shougo
let cnt = 25

let start = reltime()

for i in range(1, cnt)
  call system('ls')
endfor

echomsg 'system()         = ' . reltimestr(reltime(start))

This is test script. I tested it in Ubuntu 14.04.

@tarruda
Neovim member

I'm going to need to look at that a little bit harder next time I get some more time to tinker. Because I need to revise my mental model of the neovim architecture, as you said there are no recursive calls.

This issue made me think harder about how to simulate the synchronous nature of the python ex command, and the answer is indeed a function like job_wait, even for the RPC calls. I can't simply 'silence' all events, the call stack itself needs to be locked on another event-processing loop. I still need to think more about the subject before implementing the python ex command, but for now this will be enough:

  • Add an 'async' parameter to the job_start function
  • Pass this flag to the RStream constructor for stdout and stderr created on job_start
  • When starting the job you will call job_wait on, pass 'false' as the async argument.(Pass true in all other ocurrences of job_start)
  • Take the 'async' flag into consideration for the job's exit_cb(Don't push the event to the queue and invoke the callback immediately if async is false)

I think 'async' is a bad name for the argument, 'defer' is a better choice(since we are deferring event processing to vim's main loop)

@aktau
Neovim member

@Shougo thanks!

@tarruda: Alright. As far as I can follow, the only thing async = true really does is defer the callback for the read stream another time (by pushing it on the event queue). Now I'm not so sure how the event queue and the uv loop interact anymore. It's getting pretty tangly. I'm a bit more sure though, that the behaviour of os_run_sync as I had implemented it, can be emulated with async = false.

@aktau
Neovim member

What I'm trying to say is: I don't immediately see why this double deferral is necessary for the async case?

if async == true:
  push_event -> (next event loop) poll_event (calls: rstream->cb())
if async == false:
  rstream->cb()

Why is it like this?

@tarruda
Neovim member

What I'm trying to say is: I don't immediately see why this double deferral is necessary for the async case?

It's because uv_run doesn't support recursive calls, which could happen if we processed a callback immediately(since they are invoked from uv_run). To be safe, we exit the uv_run call before processing events.

In the case of a function like os_call_shell, this would not be possible since it does not call vimscript(see the confirm function for an example)

@aktau
Neovim member

That should probably be enough information for my next analysis. Thanks!

@tarruda
Neovim member

And in case you are wondering why we have to yield the KE_EVENT key to vim's main loop instead of simply processing the events here(after the uv_run call), it's because redraw wasn't working properly when doing so.

As we increase our test coverage and gain more knowledge of how vim main loop works, I hope to get rid of KE_EVENT and process asynchronous events without yielding.

@aktau
Neovim member

The events + uv-loop + vim loop interaction is dark and full of terrors, I'm surprised you got it working this well. Perhaps a diagram or even a dedicated wiki page wouldn't be bad...

By the way, I like the defer parameter name better too.

@tarruda
Neovim member

The events + uv-loop + vim loop interaction is dark and full of terrors, I'm surprised you got it working this well. Perhaps a diagram or even a dedicated wiki page wouldn't be bad...

It was the only way I found to implement support for asynchronous events in the short term. The reason is because vim has it's own event loops(one loop for each mode, where the only event is "keypress") that is fully entangled with the rest of the code.

Eventually when nvim state is decoupled from IO, we'll be able to use libuv in a more natural way.

@aktau
Neovim member

@tarruda I haven't been keeping up on the job system code. If time permits I'll have a look at this again this week so we have something good for next newsletter (users love performance improvements that fix issues at the same time). I know you sometimes change up things in the event and jobs code so I'd like to ask if your recommendations still stand on implementing this?

@tarruda
Neovim member

@aktau this should be much simpler now since it's possible to choose a set of event sources when calling event_poll.

I recommend you to look at channel_send_call implementation since the problem is similar, but here's an idea of how the event_poll bits will look like:

EventSource sources[] = {job_event_source(job), signal_event_source(), NULL};
 do {
    event_poll(-1, sources);
  } while (condition);

condition can be some flag that will be set on the job_exit callback(maybe pass the address as the job data). signal_event_source() is required to handle signals while polling for events(if the terminal was set to cooked mode, got_int will be set on SIGINT)

@aktau
Neovim member

@aktau this should be much simpler now since it's possible to choose a set of event sources when calling event_poll.

Alright. I've been looking at it a bit. Took a little while to wrap my head around it. BTW: can we get in the habit of prefixing globals with g_ or something? This always trips me up when reading code: "huh, where did that variable come from?". Also, we can use C99 variable declarations (loop() in event.c declares size_t count; up front like in C89 for example).

@aktau
Neovim member

#978 was merged, closing this.

@aktau aktau closed this Jul 27, 2014
@aktau aktau deleted the aktau:system-prefer-pipes-over-tmpfiles branch Jul 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment