New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDY] Refactor event loop layer #2980

Merged
merged 6 commits into from Jul 17, 2015

Conversation

Projects
None yet
7 participants
@tarruda
Copy link
Member

tarruda commented Jul 7, 2015

Like #2650, this extracts some changes from my private #2371/fix branch. This PR separates the event loop layer from the rest of the code and builds a small abstraction around libuv/watcher API, which is done to keep some sanity for when the event loop runs in a separate thread. The goal is to contain multithreaded code under src/nvim/event and leave the rest as untouched as possible, with minimum amount of data/modules shared between threads.

@marvim marvim added the WIP label Jul 7, 2015

// be processed between reads.
uv_idle_init(UV(loop), &rstream->uv.idle);
rstream->uv.idle.data = NULL;
rstream->uv.idle.data = rstream;

This comment has been minimized.

@oni-link

oni-link Jul 7, 2015

Contributor

Previous line could be removed.

This comment has been minimized.

@tarruda

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch 3 times, most recently from fae6396 to 30e020c Jul 7, 2015

@tarruda tarruda changed the title [WIP] Separate event loop layer [RDY] Separate event loop layer Jul 8, 2015

@marvim marvim added RDY and removed WIP labels Jul 8, 2015

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch from 2f3da97 to 27c66af Jul 8, 2015

@@ -262,7 +263,7 @@ static void read_cb(RStream *rstream, RBuffer *buf, void *data, bool eof)
break;
}
}
} while (rbuffer_size(input->read_buffer));
} while (rbuffer_size(input->read_stream.buffer));

// Make sure the next input escape sequence fits into the ring buffer
// without wrap around, otherwise it could be misinterpreted.

This comment has been minimized.

@oni-link

oni-link Jul 8, 2015

Contributor

rbuffer_reset(input->read_buffer); needs to be replaced with rbuffer_reset(input->read_stream.buffer); in the following line.

This comment has been minimized.

@tarruda

@tarruda tarruda changed the title [RDY] Separate event loop layer [WIP] Separate event loop layer Jul 8, 2015

@marvim marvim added WIP and removed RDY labels Jul 8, 2015

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch 5 times, most recently from 26efec9 to 1b1682f Jul 8, 2015

@justinmk justinmk added this to the 0.1-first-public-release milestone Jul 11, 2015


void loop_run(Loop *loop)
{
uv_run(&loop->uv, UV_RUN_DEFAULT);

This comment has been minimized.

@oni-link

oni-link Jul 11, 2015

Contributor

Is the return value intentionally dropped?

This comment has been minimized.

@tarruda

tarruda Jul 11, 2015

Member

Not intentionally, for now this is just a stub module that I plan to finish later

uv_close((uv_handle_t *)job->proc_std##stream, close_cb); \
} \
if (!job->stream.closed) { \
stream_close(&job->stream, on_##stream_close); \

This comment has been minimized.

@oni-link

oni-link Jul 12, 2015

Contributor

on_##stream_close -> on_stream_close

This comment has been minimized.

@tarruda
if (opts.stderr_cb) {
uv_close((uv_handle_t *)job->proc_stderr, close_cb);
if (job->opts.stderr_cb) {
uv_close((uv_handle_t *)job->proc_stderr, NULL);
}
process_close(job);
event_poll(0);

This comment has been minimized.

@oni-link

oni-link Jul 12, 2015

Contributor

event_poll(0) should be called before process_close(job), to prevent freeing the job structure too soon.
Without this change the following patch (simulates an error while spawning a pty) and usage of :term should produce an crash:

diff --git a/src/nvim/os/pty_process.c b/src/nvim/os/pty_process.c
index cb59355..ebffac1 100644
--- a/src/nvim/os/pty_process.c
+++ b/src/nvim/os/pty_process.c
@@ -72,6 +72,7 @@ bool pty_process_spawn(Job *job) FUNC_ATTR_NONNULL_ALL
     init_child(job);
     abort();
   }
+  goto error;

   // make sure the master file descriptor is non blocking
   int master_status_flags = fcntl(master, F_GETFL);

This comment has been minimized.

@tarruda

tarruda Jul 12, 2015

Member

I'm currently refactoring the job module in a similar way that the server/rstream/wstream modules were refactored in this PR. I will take this into consideration for the refactored version, thanks

This comment has been minimized.

@tarruda

tarruda Jul 13, 2015

Member

This should no longer be a problem, the process memory is now managed by the caller

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch 2 times, most recently from 4a6cf49 to 06af880 Jul 13, 2015

return true;
}

static void chld_handler(uv_signal_t *handle, int signum)

This comment has been minimized.

@oni-link

oni-link Jul 13, 2015

Contributor

Are UvProcesses on a different loop than PtyProcesses? Otherwise an UvProcess could be handled here and in its exit callback (also two calls to internal_exit_cb()).

This comment has been minimized.

@tarruda

tarruda Jul 13, 2015

Member

They are handled in the same loop, but it's only possible to have waitpid return a pid > 0 once(at least in my tests). So if libuv SIGCHLD handler receives it first, this one will short circuit because the process will have already exited. The same will happen for libuv if we catch it first: https://github.com/libuv/libuv/blob/v1.x/src/unix/process.c#L78-L80

@@ -156,31 +98,36 @@ bool pty_process_spawn(Job *job) FUNC_ATTR_NONNULL_ALL
return false;

This comment has been minimized.

@oni-link

oni-link Jul 13, 2015

Contributor

After kill(pid, SIGKILL) we should call waitpid(pid,NULL,0).

This comment has been minimized.

@tarruda
if (argvars[1].v_type == VAR_STRING) {
char *stream = (char *)argvars[1].vval.v_string;
if (!strcmp(stream, "stdin")) {
job_close_in(job);
((TerminalJobData *)job_data(job))->stdin_closed = true;
stream_close(proc->in, NULL);

This comment has been minimized.

@oni-link

oni-link Jul 13, 2015

Contributor

jobclose() could be called for the same stream multiple times. stream_close() would assert(), but process_close_{in,out,err,streams} should be safe to use here and in the following cases.

This comment has been minimized.

@tarruda
return;
}
TerminalJobData *data = opts.data;
TerminalOptions topts = TERMINAL_OPTIONS_INIT;

This comment has been minimized.

@oni-link

oni-link Jul 13, 2015

Contributor

Initialization should not be necessary. All fields are set in the following lines.

This comment has been minimized.

@tarruda
return false;
} else if (pid == 0) {
init_child(job);
init_child(ptyproc);
abort();
}

This comment has been minimized.

@oni-link

oni-link Jul 14, 2015

Contributor

Adding here a goto error; leads to a seg fault when using :term, because later on the uninitialized callback pointer internal_close_cb is used in pty_process_close().

This comment has been minimized.

@tarruda

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch from a59a9dd to e4aef83 Jul 15, 2015

test: lower sleep value in job test
Since sleep is a grandchild of nvim, it is not killed after the test ends.
Using a low sleep value allows it to exit automatically after a small interval.

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch 2 times, most recently from 395a2ec to faf3b22 Jul 17, 2015

@tarruda tarruda changed the title [WIP] Separate event loop layer [RDY] Refactor event loop layer Jul 17, 2015

@marvim marvim added RDY and removed WIP labels Jul 17, 2015

tarruda added some commits Jul 17, 2015

event loop: New abstraction layer with refactored time/signal API
- Add event loop abstraction module under src/nvim/event. The
  src/nvim/event/loop module replaces src/nvim/os/event
- Remove direct dependency on libuv signal/timer API and use the new abstraction
  instead.
- Replace all references to uv_default_loop() by &loop.uv, a new global variable
  that wraps libuv main event loop but allows the event loop functions to be
  reused in other contexts.
rstream/wstream: Unify structures and simplify API
- Simplify RStream/WStream API and make it more consistent with libuv.
- Move into the event loop layer(event subdirectory)
- Remove uv_helpers module.
- Simplify job/process internal modules/API.
- Unify RStream and WStream into a single structure. This is necessary because
  libuv streams can be readable and writable at the same time(and because the
  uv_helpers.c hack to associate multiple streams with libuv handle was removed)
- Make struct definition public, allowing more flexible/simple memory
  management by users of the module.
- Adapt channel/job modules to cope with the changes.
server: Extract most logic into the new socket abstraction
- Move event loop code into event/socket
- Reimplement server.c on top of the new SocketWatcher class
- Adapt msgpack_rpc/channel.c

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch from faf3b22 to ea7f079 Jul 17, 2015

job: Replace by a better process abstraction layer
- New libuv/pty process abstraction with simplified API and no globals.
- Remove nvim/os/job*. Jobs are now a concept that apply only to programs
  spawned by vimscript job* functions.
- Refactor shell.c/channel.c to use the new module, which brings a number of
  advantages:
  - Simplified API, less code
  - No slots in the user job table are used
  - Not possible to acidentally receive data from vimscript
- Implement job table in eval.c, which is now a hash table with unilimited job
  slots and unique job ids.

@tarruda tarruda force-pushed the tarruda:separate-event-loop-layer branch from ea7f079 to aa9cb48 Jul 17, 2015

@tarruda

This comment has been minimized.

Copy link
Member

tarruda commented Jul 17, 2015

Merging now

The quickbuild seems to have failed because it couldn't publish the test results, so I'm going to disregard: http://neovim-qb.szakmeister.net/build/2089/step_status

Mingw link seems to be broken, so I will ignore it too.

@tarruda tarruda merged commit aa9cb48 into neovim:master Jul 17, 2015

2 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
default Build pr-2980 finished with status FAILED
Details
continuous-integration/appveyor AppVeyor build succeeded
Details
coverage/coveralls Coverage decreased (-0.007%) to 66.901%
Details

tarruda added a commit that referenced this pull request Jul 17, 2015

Merge PR #2980 'Refactor event loop layer'
Helped-by: oni-link <knil.ino@gmail.com>
Reviewed-by: oni-link <knil.ino@gmail.com>
Reviewed-by: Scott Prager <splinterofchaos@gmail.com>

@jszakmeister jszakmeister removed the RDY label Jul 17, 2015

@tarruda

This comment has been minimized.

Copy link
Member

tarruda commented Jul 17, 2015

Done, thanks for the help.

Tomorrow I start integrating the threading code. With this clearer event loop separation, the changes will be more contained.

// Read error or EOF, either way stop the stream and invoke the callback
// with eof == true
uv_read_stop(uvstream);
stream->read_cb(stream, stream->buffer, stream->data, true);

This comment has been minimized.

@oni-link

oni-link Jul 17, 2015

Contributor

stream->read_cb needs a NULL check before making the call. The default value for the variable is NULL.

This comment has been minimized.

@tarruda

tarruda Jul 18, 2015

Member

Is there any reason for allowing NULL read_cb? How about putting an assertion or FUNC_NONNULL_ATTR_ALL in rstream_start?

This comment has been minimized.

@oni-link

oni-link Jul 18, 2015

Contributor

In do_os_system() the output rstreams could be started with callback NULL.

This comment has been minimized.

@tarruda

@oni-link oni-link referenced this pull request Jul 18, 2015

Closed

core dump in centos #3040

goto error;
}

ptyproc->tty_fd = master;
job->pid = pid;
proc->pid = pid;
return true;

error:

This comment has been minimized.

@oni-link

oni-link Jul 18, 2015

Contributor

When this label is reached, the child ignores the signal SIGTERM and we have to wait for 5 seconds until we can send signal SIGKILL.
If that happens, nvim hangs with

Vim: Caught deadly signal 'SIGTERM'

Vim: Finished.

Can this error handling be simplified to

close(master);
kill(pid,SIGKILL);
waitpid(pid,NULL,0);
return false;

?

This comment has been minimized.

@tarruda

tarruda Jul 18, 2015

Member

I suppose it can. This cleanup code was created in @a2b45357 so I opted to not mess with it right now.

@elmart, is there any specific reason you added this timeout?

This comment has been minimized.

@elmart

elmart Jul 18, 2015

Member

Well, intention was just giving some time for spawned child to quit orderly, responding to SIGTERM, and only use SIGKILL if no response after some time (KILL_RETRIES * KILL_TIMEOUT).

data->proc.pty.width = curwin->w_width;
data->proc.pty.height = curwin->w_height;
data->proc.pty.term_name = xstrdup("xterm-256color");
if (!common_job_start(data, rettv)) {
return;

This comment has been minimized.

@oni-link

oni-link Jul 18, 2015

Contributor

TerminalJobData and term_name are not freed:
Cleanup could be:

    free(data->proc.pty.term_name);
    free_term_job_data(data);

This comment has been minimized.

@tarruda

This comment has been minimized.

@tarruda

tarruda Jul 18, 2015

Member

Added this and some other fixes to the beginning of #3029

@jszakmeister

This comment has been minimized.

Copy link
Member

jszakmeister commented Jul 18, 2015

Quickbuild checks the outcome of the tests on the publish step, so it's not a good idea to ignore it. One thing I really dislike about busted is that it doesn't have a nice setup for the junit style testing, and it mixes in other stuff into the XML output--which corrupts the document. In this case, there was an error, and there was this emitted into the beginning of the document:

Warning: Screen changes have been received after the expected state was seen.
This is probably due to an indeterminism in the test. Try adding
`wait()` (or even a separate `screen:expect(...)`) at a point of possible
indeterminism, typically in between a `feed()` or `execute()` which is non-
synchronous, and a synchronous api call.

Note that sometimes a `wait` can trigger redraws and consequently generate more
indeterminism. If adding `wait` calls seems to increase the frequency of these
messages, try removing every `wait` call in the test.

If everything else fails, use Screen:redraw_debug to help investigate what is
  causing the problem.

stack traceback:
    ./test/functional/ui/screen.lua:270: in function 'wait'
    ./test/functional/ui/screen.lua:219: in function 'expect'
    test/functional/ui/screen_basic_spec.lua:492: in function <test/functional/ui/screen_basic_spec.lua:490>

The test case that failed was this one:

        <testcase name='Screen resize has minimum width/height values' classname='test/functional/ui/screen_basic_spec.lua:490'>
            <error>./test/functional/ui/screen.lua:276: Row 1 didn&apos;t match.
Expected: &quot;-- INS^ERT --&quot;
Actual:   &quot;resize^      &quot;
stack traceback:
    ./test/functional/ui/screen.lua:276: in function &apos;wait&apos;
    ./test/functional/ui/screen.lua:219: in function &apos;expect&apos;
    test/functional/ui/screen_basic_spec.lua:492: in function &lt;test/functional/ui/screen_basic_spec.lua:490&gt;
</error>
        </testcase>
@tarruda

This comment has been minimized.

Copy link
Member

tarruda commented Jul 18, 2015

@jszakmeister I admit that sometimes I have trouble parsing quickbuild output. This particular error was already present though, it's a indeterminism I failed to fix in this PR.


if (!is_user_job(job)) {
TerminalJobData *data = pmap_get(uint64_t)(jobs, argvars[0].vval.v_number);
if (!data || data->stopped) {

This comment has been minimized.

@oni-link

oni-link Jul 18, 2015

Contributor

Most of the other checks for a valid "job" do not check for data->stopped. Should probably also be added there.

This comment has been minimized.

@tarruda
@jszakmeister

This comment has been minimized.

Copy link
Member

jszakmeister commented Jul 18, 2015

@jszakmeister I admit that sometimes I have trouble parsing quickbuild output.

Me too. I wish it QuickBuild was better in that regard.

@@ -133,6 +137,41 @@ static const char *err_extra_cmd =
N_("Too many \"+command\", \"-c command\" or \"--cmd command\" arguments");


void event_init(void)
{
loop_init(&loop, NULL);

This comment has been minimized.

@oni-link

oni-link Jul 21, 2015

Contributor

This call to loop_init() is too late. command_line_scan() in main() could already need the loop initialized.
See #3045 (comment).

This comment has been minimized.

@tarruda

This comment has been minimized.

@mhinz

mhinz Jul 21, 2015

Member

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment