[RFC] Reimplement job control patch with libuv #475

Merged
merged 1 commit into from Apr 7, 2014

Projects

None yet

5 participants

@tarruda
Member
tarruda commented Apr 5, 2014

This is an improved version of the job control patch I sent to vim mailing list a few months ago, and it will be the base for the new plugin architecture. The job module implemented here will be reused for spawning plugins.

The basic difference between plugins and plain jobs started with jobstart is that instead of invoking an auto commands passing the raw data to the vimscript programmer, plugins will have access to Neovim msgpack API directly

This is still a WIP so I'm still going to fix documentation, formatting, etc(this was basically a copy and paste from the patch)

@schmee
Contributor
schmee commented Apr 5, 2014

Great to see the new plugin architecture taking form! 🎉

@oni-link oni-link commented on an outdated diff Apr 5, 2014
+static void write_cb(uv_write_t *req, int status)
+{
+ free(req->data);
+}
+
+static void exit_cb(uv_process_t *proc, int64_t status, int term_signal)
+{
+ Job *job = proc->data;
+
+ table[job->id - 1] = NULL;
+ uv_close((uv_handle_t *)&job->proc_stdout, NULL);
+ uv_close((uv_handle_t *)&job->proc_stdin, NULL);
+ uv_close((uv_handle_t *)&job->proc_stderr, NULL);
+ shell_free_argv(job->proc_opts.args);
+ free(job);
+}
@oni-link
oni-link Apr 5, 2014 Contributor

Looks like the process watcher is not closed.

@justinmk
Member
justinmk commented Apr 6, 2014

Couldn't wait, so I just tested it. It almost works :)

In case it's of any value, I get Abort trap: 6 by the following steps:

Create a simple bash script:

$ echo "sleep 3 && exit 42" > test.sh && chmod u+x test.sh

Do the following in nvim:

:au JobActivity * echom string(v:job_data)
:call jobstart('foo', '/Users/justin/test.sh') 
@tarruda
Member
tarruda commented Apr 6, 2014

@justinmk this last commit should have fixed it. Here's something else you can use to play with it:

let g:nc = jobstart('netcat', 'nc', ['-l', '1234'])
au JobActivity netcat call append(line('$'), v:job_data[1])

Open another terminal and type the following:

nc localhost 1234

and start entering lines. Also calling jobwrite(g:nc, str) should write str to the terminal running the netcat client.

Note that I'm still not handling the redraw issue correctly, for now it's simply calling shell_resized() which is sub-optimal.(Still need to figure out the right functions to call for redrawing only the invalidated columns)

@oni-link oni-link commented on an outdated diff Apr 6, 2014
+
+typedef void (*job_read_cb)(int id, void *data, char *buffer, uint32_t len, ReadType type);
+
+/// Initializes job control resources
+void job_init(void);
+
+/// Releases job control resources and terminates running jobs
+void job_teardown(void);
+
+/// Starts a new job
+///
+/// @param argv Argument vector for the process
+/// @param data Caller data that will be associated with the job
+/// @param cb Callback that will be invoked everytime data is available in
+/// the job's stdout/stderr
+/// @return The job id
@oni-link
oni-link Apr 6, 2014 Contributor

job_start can fail. Returns 0 in case of error.

@tarruda
Member
tarruda commented Apr 6, 2014

I'm curious, is there some "rule" against usage of goto statements? Many say it's always a bad practice, but personally I think they are a simple way to perform cleanup in a function(Currently using it in f_job_start. One can also extract cleanup code into a separate function like I have done here but I think it's a bit of overkill. Here is an interesting article about the subject.

I think this is something we might want to establish in our code style, as I said I'm ok with gotos for cleanup code.

@justinmk
Member
justinmk commented Apr 6, 2014

@tarruda The dogmatism surrounding anti-goto sentiment is cargo-cult adherence to Dijktra's famous "goto considered harmful" essay. It's still the right tool for C occasionally.

The essay was originally written in the context of trying to convince assembly programmers of the 1970s that function dispatch was not going to kill performance.

One can also extract cleanup code into a separate function like I have done here but I think it's a bit of overkill

Exactly. In languages with private functions (and no forward declaration requirement) and RAII or garbage collection, goto is a code smell. But C doesn't have those things, so AFAIK goto is acceptable in some limited cases.

@tarruda
Member
tarruda commented Apr 7, 2014

@justinmk thanks for the link, interesting read :)

@tarruda tarruda changed the title from [WIP] Reimplement job control patch with libuv to [RFC] Reimplement job control patch with libuv Apr 7, 2014
@tarruda
Member
tarruda commented Apr 7, 2014

I'm not satisfied with having to call shell_resized after handling each job event. The problem seems to be that the whole redrawing logic is scattered across edit.c/normal.c(not sure though, I have tried playing with the functions in screen.c but couldn't find anything obvious) and is executed whenever keys are pressed. The shell_resized call can be removed by using the 'events as keys' hack I proposed in #395, this will make vim redraw after asynchronous events more 'naturally'. In any case this is something to be fixed in another PR

Some may be wondering why I'm using job ids instead of opaque pointers(could have made the fields of Job private to the job module for example) and the reason is simple: I couldn't find a way to pass arbitrary pointers to vimscript, the simplest solution was to use integer ids.

I think this is ready for review, but I will only merge it after #476 . After this is merged I can implement the msgpack API hopefully we'll be able to use python plugins.

@justinmk
Member
justinmk commented Apr 7, 2014

@tarruda Playing around with netcat worked beautifully. One question so far: there doesn't seem to be any protection from overwriting an existing job with the same job name:

:jobstart('netcat', 'nc', ['-l', '1234'])
:jobstart('netcat', 'nc', ['-l', '1234'])

For a plugin to avoid collisions like this, it would be useful if jobstart() with, say, an empty string as the first parameter returned the next available job id and registered the job without overwriting an existing job. Would it be a bad idea for au JobActivity 1 to match job id 1? That would probably require disallowing job names starting with a number.

@tarruda
Member
tarruda commented Apr 7, 2014

@justinmk they won't overwrite each other, I think what happened is that the second call to netcat failed because it tried listening on the same port as the first.

When you pass a name to jobstart you are actually saying to Neovim: 'When that job sends you some data, emit an 'event' with that name'. This is similar to the event emitter pattern in javascript, except that I tried to use the autocommand infrastructure for it. You can also listen for a namespace of events like this:

:let srv1_id = jobstart('netcat-server-1', 'nc', ['-l', '9991'])
:let srv2_id = jobstart('netcat-server-2', 'nc', ['-l', '9992'])

au JobActivity netcat-server-* call append(line('$'), 'message from server '.v:job_data[0].': '. v:job_data[1])
@tarruda
Member
tarruda commented Apr 7, 2014

I've updated to use the modifications sent by @philix in #476 but I dont like the way we have to forward declare the Event/Job types to avoid circular dependency problem between the headers. Perhaps we should move typedefs/structs to *_def.h files?

@justinmk
Member
justinmk commented Apr 7, 2014

similar to the event emitter pattern in javascript, except that I tried to use the autocommand infrastructure

From a user interface perspective, I think the design works well, so far.

@tarruda
Member
tarruda commented Apr 7, 2014

From a user interface perspective, I think the design works well, so far.

There's still some things missing:

  • Get a list of jobs
  • Get detailed information about a job, such a pid
  • Notification about when a job exits

But I think it's better to worry about those after the plugin infrastructure is implemented because there might still be some modifications to the job module.

@tarruda
Member
tarruda commented Apr 7, 2014

Rebased/squashed, if no one objects in one hour I will merge it

@justinmk justinmk commented on an outdated diff Apr 7, 2014
src/globals.h
@@ -1009,6 +1009,9 @@ EXTERN size_t (*iconv)(iconv_t cd, const char **inbuf, size_t *inbytesleft,
EXTERN char_u e_invrange[] INIT(= N_("E16: Invalid range"));
EXTERN char_u e_invcmd[] INIT(= N_("E476: Invalid command"));
EXTERN char_u e_isadir2[] INIT(= N_("E17: \"%s\" is a directory"));
+EXTERN char_u e_invjob[] INIT(= N_("E900: Invalid job id"));
+EXTERN char_u e_jobtblfull[] INIT(= N_("E901: Job table is full"));
+EXTERN char_u e_jobexe[] INIT(= N_("E902: '%s' is not an executable"));
@justinmk
justinmk Apr 7, 2014 Member

Probably should use \"%s\" to be consistent with the existing error messages.

@justinmk justinmk commented on an outdated diff Apr 7, 2014
+static void f_job_start(typval_T *argvars, typval_T *rettv)
+{
+ list_T *args = NULL;
+ listitem_T *arg;
+ int i, argvl, argsl;
+ char **argv = NULL;
+
+ rettv->v_type = VAR_NUMBER;
+ rettv->vval.v_number = 0;
+
+ if (check_restricted() || check_secure()) {
+ goto cleanup;
+ }
+
+ if (argvars[0].v_type != VAR_STRING ||
+ argvars[1].v_type != VAR_STRING ||
@justinmk
justinmk Apr 7, 2014 Member

style guide advises that || and && go on the left side

@justinmk justinmk commented on an outdated diff Apr 7, 2014
+
+#define EXIT_TIMEOUT 25
+#define MAX_RUNNING_JOBS 100
+#define JOB_BUFFER_SIZE 1024
+
+/// Possible lock states of the job buffer
+typedef enum {
+ kBufferLockNone = 0, ///< No data was read
+ kBufferLockStdout, ///< Data read from stdout
+ kBufferLockStderr ///< Data read from stderr
+} BufferLock;
+
+struct _Job {
+ // Job id the index in the job table plus one.
+ int id;
+ // Number of times the job will can be sent SIGTERM before a SIGKILL
@justinmk
justinmk Apr 7, 2014 Member

typo: "will can"

@oni-link oni-link and 1 other commented on an outdated diff Apr 7, 2014
+
+ uv_read_stop((uv_stream_t *)&job->proc_stdout);
+ uv_read_stop((uv_stream_t *)&job->proc_stderr);
+ job->stopped = true;
+
+ return true;
+}
+
+bool job_write(int id, char *data, uint32_t len)
+{
+ uv_buf_t uvbuf;
+ uv_write_t *req;
+ Job *job = find_job(id);
+
+ if (job == NULL || job->stopped) {
+ return false;
@oni-link
oni-link Apr 7, 2014 Contributor

Memory leak, free(data) is missing.

@justinmk justinmk commented on the diff Apr 7, 2014
src/os_unix.c
@@ -589,6 +590,7 @@ void mch_exit(int r)
{
exiting = TRUE;
+ job_teardown();
@justinmk
justinmk Apr 7, 2014 Member

is this also needed in preserve_exit() or some part of that call chain?

@tarruda
tarruda Apr 7, 2014 Member

preserve_exit calls getout which calls mch_exit

@justinmk justinmk and 1 other commented on an outdated diff Apr 7, 2014
+ if (remaining_tries--) {
+ // Since this is the first time we're checking, wait 300ms so
+ // every job has a chance to exit normally
+ os_delay(50, 0);
+ }
+ uv_process_kill(&job->proc, SIGKILL);
+ }
+ }
+}
+
+int job_start(char **argv, void *data, job_read_cb cb)
+{
+ int i;
+ Job *job;
+
+ // Search for a free flot in the table
@tarruda
tarruda Apr 7, 2014 Member

I gotta learn to use the spell checker :)

@justinmk justinmk commented on the diff Apr 7, 2014
src/os/job.c
+ Job *job = event.data.job;
+
+ // Invoke the job callback
+ job->read_cb(job->id,
+ job->data,
+ job->buffer,
+ job->length,
+ job->lock == kBufferLockStdout);
+ shell_resized();
+ // restart reading
+ job->lock = kBufferLockNone;
+ uv_read_start((uv_stream_t *)&job->proc_stdout, alloc_cb, read_cb);
+ uv_read_start((uv_stream_t *)&job->proc_stderr, alloc_cb, read_cb);
+}
+
+static bool is_alive(Job *job)
@justinmk
justinmk Apr 7, 2014 Member

is_foo() functions with side effects are questionable. Maybe rename to try_kill() or something.

@tarruda
tarruda Apr 7, 2014 Member

But it's not trying to kill or send a signal, sending 0 is just a way to see if the process is alive

@justinmk justinmk commented on the diff Apr 7, 2014
src/os/job.c
+ if (cnt <= 0) {
+ if (cnt != UV_ENOBUFS) {
+ // Assume it's EOF and exit the job. Doesn't harm sending a SIGTERM
+ // at this point
+ uv_process_kill(&job->proc, SIGTERM);
+ }
+ return;
+ }
+
+ job->length = cnt;
+ event.type = kEventJobActivity;
+ event.data.job = job;
+ event_push(event);
+}
+
+static void write_cb(uv_write_t *req, int status)
@justinmk
justinmk Apr 7, 2014 Member

status intentionally unused?

@tarruda
tarruda Apr 7, 2014 Member

status might indicate a failure, we have two choices here:

  • Notify the user in some way
  • Kill the process(not necessary if the reason for failure is because it was killed already)

I think the sane choice here is to kill the process, we don't want a job that we cant send data to, either way the user will be notified(I'm thinking of calling JobActivity with 0 or empty string is a reasonable way of notifying the job has exited)

@justinmk justinmk commented on an outdated diff Apr 7, 2014
+ if ((job = table[i]) == NULL || !job->stopped) {
+ continue;
+ }
+
+ if ((job->exit_timeout--) == EXIT_TIMEOUT) {
+ // Job was just stopped, close all stdio handles and send SIGTERM
+ uv_process_kill(&job->proc, SIGTERM);
+ } else if (job->exit_timeout == 0) {
+ // We've waited long enough, send SIGKILL
+ uv_process_kill(&job->proc, SIGKILL);
+ }
+ }
+}
+
+/// Puts the job into a 'reading state' which 'locks' the job buffer for
+/// until the data is consumed
@justinmk
justinmk Apr 7, 2014 Member

typo: "for until"

@oni-link oni-link and 1 other commented on an outdated diff Apr 7, 2014
+ job->stdio[0].flags = UV_CREATE_PIPE | UV_READABLE_PIPE;
+ job->stdio[0].data.stream = (uv_stream_t *)&job->proc_stdin;
+
+ uv_pipe_init(uv_default_loop(), &job->proc_stdout, 0);
+ job->proc_stdout.data = job;
+ job->stdio[1].flags = UV_CREATE_PIPE | UV_WRITABLE_PIPE;
+ job->stdio[1].data.stream = (uv_stream_t *)&job->proc_stdout;
+
+ uv_pipe_init(uv_default_loop(), &job->proc_stderr, 0);
+ job->proc_stderr.data = job;
+ job->stdio[2].flags = UV_CREATE_PIPE | UV_WRITABLE_PIPE;
+ job->stdio[2].data.stream = (uv_stream_t *)&job->proc_stderr;
+
+ // Spawn the job
+ if (uv_spawn(uv_default_loop(), &job->proc, &job->proc_opts) != 0) {
+ return -1;
@oni-link
oni-link Apr 7, 2014 Contributor

Is the process callback always called?

  • If not, memory leak for job.
  • If it is always called, then you can get a segfault, because argv is freed in the callback and in f_job_start.
@tarruda
tarruda Apr 7, 2014 Member

👍 memory leak

@tarruda
Member
tarruda commented Apr 7, 2014

@justinmk / @oni-link Thanks for the feedback, these last commits should have fixed the issues reported

@oni-link oni-link and 1 other commented on an outdated diff Apr 7, 2014
+
+ os_delay(10, 0);
+ // Right now any exited process are zombies waiting for us to acknowledge
+ // their status with `wait` or handling SIGCHLD. libuv does that
+ // automatically (and then calls `exit_cb`) but we have to give it a chance
+ // by running the loop one more time
+ uv_run(uv_default_loop(), UV_RUN_NOWAIT);
+
+ // Prepare to start shooting
+ for (i = 0; i < MAX_RUNNING_JOBS; i++) {
+ if ((job = table[i]) == NULL) {
+ continue;
+ }
+
+ // Still alive
+ while (remaining_tries-- && is_alive(job)) {
@oni-link
oni-link Apr 7, 2014 Contributor

Change order of operands, so that we really wait 1s. 20 consecutive jobs, that are not alive anymore at this point, decrease remaining_tries to zero without delay.

@oni-link oni-link commented on the diff Apr 7, 2014
src/os/job.c
+ free(job);
+}
+
+/// Iterates the table, sending SIGTERM to stopped jobs and SIGKILL to those
+/// that didn't die from SIGTERM after a while(exit_timeout is 0).
+static void job_prepare_cb(uv_prepare_t *handle, int status)
+{
+ Job *job;
+ int i;
+
+ for (i = 0; i < MAX_RUNNING_JOBS; i++) {
+ if ((job = table[i]) == NULL || !job->stopped) {
+ continue;
+ }
+
+ if ((job->exit_timeout--) == EXIT_TIMEOUT) {
@oni-link
oni-link Apr 7, 2014 Contributor

Here only one SIGTERM is ever sent, but description of exit_timeout says up to 25.

Did you mean something like this?

int signal = job->exit_timeout-- > 0 ? SIGTERM : SIGKILL;
uv_process_kill(&job->proc, signal);
@tarruda
tarruda Apr 7, 2014 Member

Theoretically one SIGTERM should be enough, so the comment was wrong. The timeout represents the number of polls that will be done before actually sending SIGKILL

@tarruda tarruda Implement job control
- Add a job control module for spawning and controlling co-processes
- Add three vimscript functions for interfacing with the module
- Use dedicated header files for typedefs/structs in event/job modules
4b063ea
@oni-link oni-link commented on the diff Apr 7, 2014
src/os/job.c
+static Job * find_job(int id)
+{
+ if (id <= 0 || id > MAX_RUNNING_JOBS) {
+ return NULL;
+ }
+
+ return table[id - 1];
+}
+
+static void free_job(Job *job)
+{
+ uv_close((uv_handle_t *)&job->proc_stdout, NULL);
+ uv_close((uv_handle_t *)&job->proc_stdin, NULL);
+ uv_close((uv_handle_t *)&job->proc_stderr, NULL);
+ uv_close((uv_handle_t *)&job->proc, NULL);
+ free(job);
@oni-link
oni-link Apr 7, 2014 Contributor

Is it save to free job at this point? Are all callbacks that use job canceled (synchronously), or are they waiting for one last run of the event loop?

@tarruda
tarruda Apr 7, 2014 Member

As far as I know it is safe. Calling uv_close should remove any references from the event loop so no callbacks will be run after this.

@tarruda tarruda merged commit 4b063ea into neovim:master Apr 7, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@philix philix commented on the diff Apr 8, 2014
src/os/job.c
+#include "vim.h"
+#include "memory.h"
+#include "term.h"
+
+#define EXIT_TIMEOUT 25
+#define MAX_RUNNING_JOBS 100
+#define JOB_BUFFER_SIZE 1024
+
+/// Possible lock states of the job buffer
+typedef enum {
+ kBufferLockNone = 0, ///< No data was read
+ kBufferLockStdout, ///< Data read from stdout
+ kBufferLockStderr ///< Data read from stderr
+} BufferLock;
+
+struct _Job {
@philix
philix Apr 8, 2014 Contributor

Per the style-guide it should be struct job. If job conflicts with something then you could call it job_s or job_struct. Consistent rules for type names is the most useful thing that comes from a coding style guide IMO.

@tarruda
tarruda Apr 8, 2014 Member

Missed that, sorry

@oni-link oni-link commented on the diff Apr 18, 2014
src/os/job.c
+static bool is_alive(Job *job);
+static Job * find_job(int id);
+static void free_job(Job *job);
+
+// Callbacks for libuv
+static void job_prepare_cb(uv_prepare_t *handle, int status);
+static void alloc_cb(uv_handle_t *handle, size_t suggested, uv_buf_t *buf);
+static void read_cb(uv_stream_t *stream, ssize_t cnt, const uv_buf_t *buf);
+static void write_cb(uv_write_t *req, int status);
+static void exit_cb(uv_process_t *proc, int64_t status, int term_signal);
+
+void job_init()
+{
+ uv_disable_stdio_inheritance();
+ uv_prepare_init(uv_default_loop(), &job_prepare);
+ uv_prepare_start(&job_prepare, job_prepare_cb);
@oni-link
oni-link Apr 18, 2014 Contributor

@tarruda Why not call uv_prepare_start when the first job is started and call uv_prepare_stop after the job table is empty? Otherwise the job_prepare_cb is called every iteration of the uv loop.

@oni-link oni-link commented on the diff Apr 18, 2014
src/eval.c
+
+ if (rettv->vval.v_number <= 0) {
+ if (rettv->vval.v_number == 0) {
+ EMSG(_(e_jobtblfull));
+ } else {
+ EMSG(_(e_jobexe));
+ }
+ }
+
+cleanup:
+ if (rettv->vval.v_number > 0) {
+ // Success
+ return;
+ }
+ // Cleanup argv memory in case the `job_start` call failed
+ shell_free_argv(argv);
@oni-link
oni-link Apr 18, 2014 Contributor

@tarruda Why not move this call and the call shell_free_argv from job_exit_event to close_cb in job.c. Then the job module can always take care of the free and the whole cleanup:-block can be removed (also replace goto cleanup;, see above).

@tarruda
tarruda Apr 18, 2014 Member

Not sure about this, the cleanup label exists for handling errors detected in this function, before job_start is even called.

@oni-link
oni-link Apr 18, 2014 Contributor

Not sure if I have understood your concerns correctly:

There is no error handling really, the function just returns, if any goto cleanup; is reached:
When jumped to this label rettv->vval.v_number is 0 and argv is NULL, always. A call to shell_free_argv returns immediatley, without freeing anything. So the cleanup does nothing.

The only purpose of this label is to free argv if job_start failed, which could be done more consitent in the job module.

@tarruda
tarruda Apr 18, 2014 Member

You are right, the label is useless. I will push the fix to #556

@oni-link oni-link commented on the diff Apr 18, 2014
src/eval.c
+ // Copy program name
+ argv[0] = xstrdup((char *)argvars[1].vval.v_string);
+
+ i = 1;
+ // Copy arguments to the vector
+ if (argsl > 0) {
+ for (arg = args->lv_first; arg != NULL; arg = arg->li_next) {
+ argv[i++] = xstrdup((char *)arg->li_tv.vval.v_string);
+ }
+ }
+
+ // The last item of argv must be NULL
+ argv[i] = NULL;
+
+ rettv->vval.v_number = job_start(argv,
+ xstrdup((char *)argvars[0].vval.v_string),
@oni-link
oni-link Apr 18, 2014 Contributor

@tarruda Could not find a free for this pointer. Look like a job for close_cb in job.c.

@tarruda
tarruda Apr 18, 2014 Member

👍

thanks for having the patience to read the code and point errors/improvements. I'm pushing fixes to #556

@LeifW LeifW referenced this pull request in idris-hackers/idris-vim Mar 3, 2015
Closed

Idris not started? #34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment