Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep plugins persistently running in the background #12064

Merged
merged 15 commits into from Mar 9, 2024

Conversation

devyn
Copy link
Contributor

@devyn devyn commented Mar 4, 2024

Description

This PR uses the new plugin protocol to intelligently keep plugin processes running in the background for further plugin calls.

Running plugins can be seen by running the new plugin list command, and stopped by running the new plugin stop command.

This is an enhancement for the performance of plugins, as starting new plugin processes has overhead, especially for plugins in languages that take a significant amount of time on startup. It also enables plugins that have persistent state between commands, making the migration of features like dataframes and stor to plugins possible.

Plugins are automatically stopped by the new plugin garbage collector, configurable with $env.config.plugin_gc:

  $env.config.plugin_gc = {
      # Configuration for plugin garbage collection
      default: {
          enabled: true # true to enable stopping of inactive plugins
          stop_after: 10sec # how long to wait after a plugin is inactive to stop it
      }
      plugins: {
          # alternate configuration for specific plugins, by name, for example:
          #
          # gstat: {
          #     enabled: false
          # }
      }
  }

If garbage collection is enabled, plugins will be stopped after stop_after passes after they were last active. Plugins are counted as inactive if they have no running plugin calls. Reading the stream from the response of a plugin call is still considered to be activity, but if a plugin holds on to a stream but the call ends without an active streaming response, it is not counted as active even if it is reading it. Plugins can explicitly disable the GC as appropriate with engine.set_gc_disabled(true).

The version command now lists plugin names rather than plugin commands. The list of plugin commands is accessible via plugin list.

Recommend doing this together with #12029, because it will likely force plugin developers to do the right thing with mutability and lead to less unexpected behavior when running plugins nested / in parallel.

User-Facing Changes

  • new command: plugin list
  • new command: plugin stop
  • changed command: version (now lists plugin names, rather than commands)
  • new config: $env.config.plugin_gc
  • Plugins will keep running and be reused, at least for the configured GC period
  • Plugins that used mutable state in weird ways like inc did might misbehave until fixed
  • Plugins can disable GC if they need to
  • Had to change plugin signature to accept &EngineInterface so that the GC disable feature works. Add support for engine calls from plugins #12029 does this anyway, and I'm expecting (resolvable) conflicts with that

Tests + Formatting

  • 🟢 toolkit fmt
  • 🟢 toolkit clippy
  • 🟢 toolkit test
  • 🟢 toolkit test stdlib

Because there is some specific OS behavior required for plugins to not respond to Ctrl-C directly, I've developed against and tested on both Linux and Windows to ensure that works properly.

After Submitting

I think this probably needs to be in the book somewhere

@fdncred
Copy link
Collaborator

fdncred commented Mar 4, 2024

This is very cool! Excited to play with it.

I'm wondering if there should be some type of flag or mode that means "run once and exit" vs "keep running in the background". Not all plugins need to hang around. Some just need to process their data and exit.

On one hand I think about gstat. It's meant to get the git status and stop. But then again, maybe it would benefit from the performance improvement of not having to restart at every prompt? Or maybe gstat could be redesigned with persistent running in mind where it's continually evaluating the git status and updating the prompt even when there isn't a new prompt?

@devyn
Copy link
Contributor Author

devyn commented Mar 4, 2024

@fdncred My current thinking on this is to make it user-configurable. So, for example, you might have plugin-specific config of when plugins should be stopped if they haven't been used in long enough:

$env.config.plugin_gc = {
    default: {
        enabled: true
        remove_after: 10sec
    }
    plugins: {
        dataframe: {
            enabled: false
        }
        slow_python_thing: {
            remove_after: 5min
        }
    }
}

The one thing I'm not quite sure about with this is whether and how a plugin should communicate to the engine that it doesn't want to be stopped automatically, even if the time is up. There are a few things I've thought about:

  1. Tie it to custom values: if a plugin has custom values in engine memory, it doesn't get stopped automatically. This would work ideally for the hypothetical dataframe plugin and other plugins that work on a similar handle model.
  2. Let plugins send a control message to turn GC on/off - though this could lead to redundant code for plugins that really just want to have this track resources
  3. Let plugins notify the engine of some sort of resource count increment/decrement - if it's above zero, the plugin doesn't get GC'd. This means less work for the plugin to track, and less likely that a plugin is going to implement it in a bug-prone way, but feels like a bit too specific of a design
  4. Rather than having GC stop the plugin, just have GC send a message to the plugin to suggest that it stop. It's up to the plugin whether it actually does

I kind of like the last one personally. Not all custom values are going to be handles, some will be perfectly fine if the plugin is restarted because they contain all of the data they need

@fdncred
Copy link
Collaborator

fdncred commented Mar 4, 2024

Just shooting from the hip, but I could see 2 or 4 working pretty well. (i enumerated your bullets for easier discussion). I kind of lean towards 2 because it gives plugins the power to say, "keep me aliver", or, "ok, i'm ready to close".

@fdncred
Copy link
Collaborator

fdncred commented Mar 4, 2024

For another idea that is dataframe-like, I've always wanted to use this in nushell somehow https://github.com/vincev/dply-rs.

@devyn
Copy link
Contributor Author

devyn commented Mar 4, 2024

@fdncred

For another idea that is dataframe-like, I've always wanted to use this in nushell somehow https://github.com/vincev/dply-rs.

That's cool, though I think dfr commands cover most of that, and probably more could be added for things that it doesn't do. There's a lot of overlap for sure

@devyn
Copy link
Contributor Author

devyn commented Mar 4, 2024

I think I'll incorporate GC into this as there are probably users who won't like this feature very much if we don't have it.

@fdncred you've convinced me to go with (2) I think, with just an explicit on/off control. It's a little more work for a plugin developer if they're doing something obvious like giving out handles in custom values, but they're probably tracking those all somewhere anyway - they just need to then make sure that when they tell the engine to turn that on/off, they're synchronizing it so they don't end up with a race condition where thread A is in the condition to turn it off, thread B is in the condition to keep it on, and then the off message gets sent without turning it back on again. But it does have the most flexibility for plugin developers to decide what makes the most sense for them - if it's something that really just always needs to be persistent, they can just turn GC off when they run a command for the first time and leave it that way.

Outside of that though I will definitely make it user configurable, as there will probably be different feelings about GC even on a per-plugin basis.

@rgwood
Copy link
Contributor

rgwood commented Mar 5, 2024

Wow! This is very cool.

The one thing I'm not quite sure about with this is whether and how a plugin should communicate to the engine that it doesn't want to be stopped automatically

Would a heartbeat system make sense? i.e. plugins would have to tell Nu "keep me alive" every X seconds to avoid being killed.

@devyn
Copy link
Contributor Author

devyn commented Mar 5, 2024

@rgwood

Would a heartbeat system make sense? i.e. plugins would have to tell Nu "keep me alive" every X seconds to avoid being killed.

Hmm... it wouldn't really fit the current API very well, because usually the only way you get to tell the engine anything is from within some kind of call of the plugin; you don't generally just get to communicate with the engine whenever you want. But that's not impossible to support by any means - it can be done.

I think what I don't really love about it though is that it would require plugins to basically all do the same thing, spawning a thread just to deliver heartbeats every so often, reading some kind of internal state to decide whether to stop sending heartbeats. It would make more sense in a network context, where you don't always really know for sure if the other side is gone unless you send something and expect a reply - but this over stdio, and if a plugin crashes, we should know immediately. It's also not really a good indication of whether the plugin is hung and not responding to input, as a thread that's just sending out heartbeats won't necessarily be reflective of that.

I think it's better to keep it simple. Worst case, if a plugin is keeping itself alive and the user doesn't want that (or thinks the plugin is messed up), there is the plugin stop command, and it's also possible to get the process ID from plugin list and kill it if it's really badly messed up.

@devyn
Copy link
Contributor Author

devyn commented Mar 5, 2024

I believe adding the GC feature with plugins being able to opt out of it will require #12029, because plugins don't currently get &EngineInterface to be able to interact with the engine, so I'll probably wait for that to be merged first before adding to this PR. I can still prepare it, but it won't be able to be included in this PR (or a subsequent one, perhaps) until that's in.

@WindSoilder
Copy link
Collaborator

The one thing I'm not quite sure about with this is whether and how a plugin should communicate to the engine that it doesn't want to be stopped automatically

Sorry I have a question: is there a case that a plugin needs to tell nushell engine that it doesn't want to stop? I think setting remove_after, then engine close the plugin process is good enough.
If we enable such feature, there might be a case that user create a evil plugin, and it always want to stay in the system.

@devyn
Copy link
Contributor Author

devyn commented Mar 5, 2024

@WindSoilder

Sorry I have a question: is there a case that a plugin needs to tell nushell engine that it doesn't want to stop? I think setting remove_after, then engine close the plugin process is good enough.
If we enable such feature, there might be a case that user create a evil plugin, and it always want to stay in the system.

Yes: there are some plugins that won't work correctly with the default settings otherwise, since it should be on by default. For example, for dataframes, they'd basically just disappear a short time later. The plugin author would have to let their users know to disable the GC, and they'd probably still get a lot of complaints about it.

It isn't really a concern that an evil plugin would be created - the user can always explicitly plugin stop if they want to, and since a plugin is a binary running on the system it can really do whatever it wants anyway, including refusing to exit even after nushell asks it to stop. Nushell does not take responsibility for process management of plugins that behave badly.

@rgwood
Copy link
Contributor

rgwood commented Mar 5, 2024

I was (am?) a little unclear on the intended plugin lifecycle after this PR. Is the following correct?

  1. The goal is to keep plugin processes alive as long as the parent Nushell process is alive, but we do not attempt to keep plugins alive across multiple Nu invocations
  2. Killing plugin processes is done on a best-effort basis; if Nushell exits unexpectedly, it's possible that plugin processes may not be cleaned up

@devyn
Copy link
Contributor Author

devyn commented Mar 6, 2024

@rgwood

  1. The goal is to keep plugin processes alive as long as the parent Nushell process is alive, but we do not attempt to keep plugins alive across multiple Nu invocations

This is correct - plugins belong to a specific Nu invocation, and if you launch Nu from Nu, if that Nu uses plugins it will have to launch its own copies of them.

  1. Killing plugin processes is done on a best-effort basis; if Nushell exits unexpectedly, it's possible that plugin processes may not be cleaned up

Yes, kind of. OS process management like kill is not used to terminate the plugins, but their stdio will be closed when Nushell exits, and plugins using nu-plugin will automatically exit at that point. This does mean that plugin processes built specifically to avoid that could keep running after Nushell exits, or plugin processes that are misbehaving might stick around. But we're not really concerned with that - plugins are expected to exit gracefully, whatever that means to them, when their I/O is closed.

Anyway, plugins that are behaving properly will always exit even if Nushell crashes. There isn't anything that Nushell has to explicitly send to the plugins to make them stop.

@devyn devyn marked this pull request as draft March 8, 2024 00:35
@devyn
Copy link
Contributor Author

devyn commented Mar 8, 2024

Taking this back to draft while I work on the plugin GC feature

@devyn devyn marked this pull request as ready for review March 8, 2024 06:35
@devyn
Copy link
Contributor Author

devyn commented Mar 8, 2024

Okay, I'm happy with the plugin GC feature now. See the edited description above.

@fdncred
Copy link
Collaborator

fdncred commented Mar 9, 2024

@devyn as predicted, conflicts. LOL. I think we can land this once they're resolved.

This PR uses the new plugin protocol to intelligently keep plugin
processes running in the background for further plugin calls.

Running plugins can be seen by running the new `plugin list` command,
and stopped by running the new `plugin stop` command.

This is an enhancement for the performance of plugins, as starting new
plugin processes has overhead, especially for plugins in languages
that take a significant amount of time on startup.

It also enables plugins that have persistent state between commands,
making the migration of features like dataframes and `stor` to plugins
possible.

The `version` command now lists plugin names rather than plugin
commands. The list of plugin commands is accessible via `plugin list`.
@devyn
Copy link
Contributor Author

devyn commented Mar 9, 2024

@fdncred all green now!

I also improved the handling of critical errors (such as version mismatch, protocol error, unexpected exit, etc.) for plugins so that

  1. they won't just stay "running" even though they're not - it will attempt a new run the next time, so an explicit plugin stop isn't needed
  2. the error is always available from the interface, and even if a thread holds onto it and tries a plugin call after the error was received, it's still going to get it. This means users get the actual 'version mismatch' error instead of just a vague 'stopped unexpectedly' error, for sure, without a race condition making that uncertain

@fdncred fdncred added pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes pr:release-note-mention Addition/Improvement to be mentioned in the release notes pr:language This PR makes some language changes labels Mar 9, 2024
@fdncred
Copy link
Collaborator

fdncred commented Mar 9, 2024

oh boy, here we go! thanks again!

@fdncred fdncred merged commit bc19be2 into nushell:main Mar 9, 2024
20 checks passed
@fdncred
Copy link
Collaborator

fdncred commented Mar 9, 2024

@devyn ugh, can't compile now.

error[E0308]: mismatched types
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:50:33
    |
50  |             .and_then(|_| fcntl(pipe.0, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC)))
    |                           ----- ^^^^^^ expected `i32`, found `OwnedFd`
    |                           |
    |                           arguments to this function are incorrect
    |
note: function defined here
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/nix-0.28.0/src/fcntl.rs:600:8
    |
600 | pub fn fcntl(fd: RawFd, arg: FcntlArg) -> Result<c_int> {
    |        ^^^^^

error[E0308]: mismatched types
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:51:33
    |
51  |             .and_then(|_| fcntl(pipe.1, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC)));
    |                           ----- ^^^^^^ expected `i32`, found `OwnedFd`
    |                           |
    |                           arguments to this function are incorrect
    |
note: function defined here
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/nix-0.28.0/src/fcntl.rs:600:8
    |
600 | pub fn fcntl(fd: RawFd, arg: FcntlArg) -> Result<c_int> {
    |        ^^^^^

error[E0308]: mismatched types
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:56:33
    |
56  |             .and_then(|_| fcntl(pipe.0, FcntlArg::F_SETFL(OFlag::O_NONBLOCK)))
    |                           ----- ^^^^^^ expected `i32`, found `OwnedFd`
    |                           |
    |                           arguments to this function are incorrect
    |
note: function defined here
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/nix-0.28.0/src/fcntl.rs:600:8
    |
600 | pub fn fcntl(fd: RawFd, arg: FcntlArg) -> Result<c_int> {
    |        ^^^^^

error[E0308]: mismatched types
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:57:33
    |
57  |             .and_then(|_| fcntl(pipe.1, FcntlArg::F_SETFL(OFlag::O_NONBLOCK)));
    |                           ----- ^^^^^^ expected `i32`, found `OwnedFd`
    |                           |
    |                           arguments to this function are incorrect
    |
note: function defined here
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/nix-0.28.0/src/fcntl.rs:600:8
    |
600 | pub fn fcntl(fd: RawFd, arg: FcntlArg) -> Result<c_int> {
    |        ^^^^^

error[E0308]: mismatched types
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:61:21
    |
61  |         Ok(_) => Ok(pipe),
    |                  -- ^^^^ expected `(i32, i32)`, found `(OwnedFd, OwnedFd)`
    |                  |
    |                  arguments to this enum variant are incorrect
    |
    = note: expected tuple `(i32, i32)`
               found tuple `(OwnedFd, OwnedFd)`
help: the type constructed contains `(OwnedFd, OwnedFd)` due to the type of the argument passed
   --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:61:18
    |
61  |         Ok(_) => Ok(pipe),
    |                  ^^^----^
    |                     |
    |                     this argument influences the type of `Ok`
note: tuple variant defined here
   --> /Users/fdncred/.rustup/toolchains/1.74.1-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:506:5
    |
506 |     Ok(#[stable(feature = "rust1", since = "1.0.0")] T),
    |     ^^

error[E0308]: mismatched types
    --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:63:35
     |
63   |             let _ = unistd::close(pipe.0);
     |                     ------------- ^^^^^^ expected `i32`, found `OwnedFd`
     |                     |
     |                     arguments to this function are incorrect
     |
note: function defined here
    --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/nix-0.28.0/src/unistd.rs:1087:8
     |
1087 | pub fn close(fd: RawFd) -> Result<()> {
     |        ^^^^^

error[E0308]: mismatched types
    --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ctrlc-3.4.3/src/platform/unix/mod.rs:64:35
     |
64   |             let _ = unistd::close(pipe.1);
     |                     ------------- ^^^^^^ expected `i32`, found `OwnedFd`
     |                     |
     |                     arguments to this function are incorrect
     |
note: function defined here
    --> /Users/fdncred/.cargo/registry/src/index.crates.io-6f17d22bba15001f/nix-0.28.0/src/unistd.rs:1087:8
     |
1087 | pub fn close(fd: RawFd) -> Result<()> {
     |        ^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `ctrlc` (lib) due to 7 previous errors
warning: build failed, waiting for other jobs to finish...
error: failed to compile `nu v0.91.1 .

@fdncred
Copy link
Collaborator

fdncred commented Mar 9, 2024

FYI - adding --locked fixes the build issue so cargo is upgrading something that's breaking things. i'm on macos if that makes any difference.

@WindSoilder
Copy link
Collaborator

I've tried to do cargo clean; cargo build on linux, it works fine for me

@devyn
Copy link
Contributor Author

devyn commented Mar 9, 2024

Looks to be Detegr/rust-ctrlc#115

@fdncred
Copy link
Collaborator

fdncred commented Mar 9, 2024

hopefully the ctrlc people are responsive and fix it quick. i don't like having nushell main compiling problems. at least --locked still works.

WindSoilder pushed a commit that referenced this pull request Mar 9, 2024
…gin' feature not enabled (#12144)

# Description

There is a warning about unused `IntoSpanned` currently when running
`cargo check -p nu-parser`, introduced accidentally by #12064. This
fixes that.

# User-Facing Changes
None

# Tests + Formatting
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`
@hustcer hustcer added this to the v0.92.0 milestone Mar 10, 2024
devyn added a commit to devyn/nushell.github.io that referenced this pull request Mar 10, 2024
sholderbach pushed a commit to nushell/nushell.github.io that referenced this pull request Mar 10, 2024
@fdncred
Copy link
Collaborator

fdncred commented Mar 10, 2024

ctrlc 3.4.4 was released and fixed this build error.

@devyn devyn deleted the plugin-daemons branch March 10, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes pr:language This PR makes some language changes pr:release-note-mention Addition/Improvement to be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants