New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share the stack amongst CLI plugins instead of cloning it #11288
Conversation
This looks like some pretty deep surgery. I'm interested in trying it out after the CI is green. Thanks for taking a look at this. I'm always interested in these type of performance improvements as long as the performance is all that is changed. |
So there are definitely changes across many files but in practice the main change here is in the The biggest sort of trickiness of course is that when using a mutex you have to be carefaul about locking. I tried to use explicit scopes when possible but for the big "main loop" of the REPL there were some places where |
Stack clones end up copying all the values inside of enviornment variables, which can be super costly if we have very large variables inside of the environment. Instead, we can pay some small costs to lock the environment (at least that's the theory). How to experience a huge slowdown on your machine: `let $stuff = lsof | from ssv -m 1` (This is routinely a 6-digit-length table)
961a718
to
3d8c331
Compare
I did one force push here to get tests back up to snuff, and CI green. There are a lot of mutex unwraps of the "if this unwrap fails there's a deadlock" variety. The test code got a bit less ergonomic here, I think really the best thing might be to rework a helper for the completion tests, but I'm trying to avoid getting too much in the weeds at first. |
I'm not sure this will fly with the rest of the core-team with all the expects in place of unwraps, but at least it should be good enough to test it. We'll have to wait and see what others say. |
I've trying to test this on my mac and I'm not noticing a benefit with 5912ms with pr 12,519 items in this list |
@IanManske so I was thinking COW would cause a problem if, say, your prompt command had stack-affecting side effects. But I didn't realize that captures of mutable variables weren't allowed! Is it possible that maybe a prompt command couldn't have side effects (for the purposes of a stack anyways)? If those existed, then the prompt generator would not need a mutable reference. I don't believe that the completion engine needs one either (except for the fact that the prompt generator needs one, so we need to EDIT: just realized you were talking more about |
@fdncred it's not the command itself that is slow. It's that after running the command, every time you get your prompt it's slow. So the process is
notice how ( |
gotcha! |
I think @kubouch is our resident expert for the stack and scope intricacies. My guess is that we can not mutably share the whole stack between the primary execution and all the reading utilities like completions/prompt etc. if a mutation to the stored values would occur that would affect the main state of the program. If a Our mutable capture prevention logic I think is implemented in the scoping logic elsewhere. |
Eliminating Stack clones, or any big clones, is highly desirable, but I'm not entirely sure about the impact. We might need to elevate to Sophia. A few thoughts, anyway. I noticed a potential semantic problem with the PR. Completer, prompt update, and running any closure in general shouldn't have any side effects. The purpose of Maybe a potential solution could be to have a data structure like struct StackSnapshot {
parent: &Stack,
stack: Stack
} with an API identical to Regarding locking, Nushell is essentially a single-threaded application, so we shouldn't need to worry about locking and use just references/Rc? One notable exception here is |
I think if this is the rule, then there really should be some abstraction that codifies running without side effects (potentially through a cheap COW clone?). Right now the prompt updater can call some prompt command. That prompt command having side effects that just magically don't appear later seems way more confusing than, say, side effects actually blowing things up. If they really aren't supposed to have side effects, is there a way to enforce that? I mentioned this before though, but is there actually a way for a function to even have side effects on the global state? Closures aren't allowed to capture mutable variables according to my tests, so is it possible that "a function call in itself will not affect global state" is an actual property of the system here? Can I write a prompt updater that would theoretically bump a value on each call? I couldn't find such a thing. |
FWIW I think the layering sounds good in theory, but I'm worried that that would require rewriting a bunch of |
Functions (=commands) and closures cannot have a direct impact on the global state. The environment preserving appears on a high level that it changes the parent environment (i.e., a global state), but in fact, only a local environment is modified inside the function which is applied on the parent environment after the function is done running. I think the only way to alter a global state is via writing to files or, recently, the The StackSnapshot was just an immediate idea. Before doing it, a thorough design should be made, and I think it should be adapted into the entire runtime model to avoid having two different models. |
I would like to be able to move forward here in one way or another, given the real nature of this performance issue. There seems to be three ways forward here (which do not conflict at all, in my opinion, but some might see as a side-step/step back in code quality):
I don't mind putting some effort into exploring alternatives, but I don't really want to throw myself into any of these if other people have a stronger idea of what is right. At the very least I hope that this PR can at least highlight that there are three ways of tackling the specific performance issue I was facing. |
As explained above, I believe that sharing a mutable reference of Stack would have different semantics than clone and would be incorrect. These are all important points, but they probably require rather deep changes. I'll try to bring it up in one of our core team meetings which we hold on Discord every Wednesday, we likely need Sophia to give her input. The meetings are open to anyone, or you can just drop by our design discussion channel. |
Hi @rtpg, in this week's meeting we decided that the "stack layer" / "split stack" approach that you and @kubouch mentioned looks the most promising. Of course, there still might be some things to work out and explore. One thing that I will point out for everybody is that [Edit] One crazy idea I just had: |
Working on a new changeset, hope to have another PR up by the end of the day.... but a couple things I noticed:
I believe I have a nice idea to get us across the finish line here, though. Will post details with some code in a bit. |
Alright the PR isn't happening just yet (getting late where I'm at), the short version of what I did though:
The end result is that the changes are limited to Other attempts that fell apart:
|
I think having a parent stack might work (we'd create a linked list of stacks, basically). I'm just not sure if it needs to be Mutex since we don't need to (and shouldn't) mutate the parent stack. The concern here is performance: If we fetch something from the parent stack in a hot loop, we'd be spamming the lock. But I think having just |
While working on #11288 I was having some trouble tracking the main REPL loop, so I've sent in a bunch of tiny refactorings on this branch. These are almost all of the "move code from one place to another" variety, and each commit is meant to be independent, _except for the last one_, which is trying to be a bit more clever to handle the decision of autocd'ing vs running a command. Feel free to just go through each commit and cherry pick the ones that look good. This leads to `evaluate_repl` going from ending on line 715 to ending on line 395. Again, this is mostly just moving code around, but I think this set of changes will make other changes around juggling the stack to avoid cloning easier to review.
I have opened a new PR, #11654, to tackle this. It implements a notion of stack parents through |
While working on nushell#11288 I was having some trouble tracking the main REPL loop, so I've sent in a bunch of tiny refactorings on this branch. These are almost all of the "move code from one place to another" variety, and each commit is meant to be independent, _except for the last one_, which is trying to be a bit more clever to handle the decision of autocd'ing vs running a command. Feel free to just go through each commit and cherry pick the ones that look good. This leads to `evaluate_repl` going from ending on line 715 to ending on line 395. Again, this is mostly just moving code around, but I think this set of changes will make other changes around juggling the stack to avoid cloning easier to review.
This is another attempt on #11288 This allows for a `Stack` to have a parent stack (behind an `Arc`). This is being added to avoid constant stack copying in REPL code. Concretely the following changes are included here: - `Stack` can now have a `parent_stack`, pointing to another stack - variable lookups can fallback to this parent stack (env vars and everything else is still copied) - REPL code has been reworked so that we use parenting rather than cloning. A REPL-code-specific trait helps to ensure that we do not accidentally trigger a full clone of the main stack - A property test has been added to make sure that parenting "looks the same" as cloning for consumers of `Stack` objects --------- Co-authored-by: Raphael Gaschignard <rtpg@rokkenjima.local> Co-authored-by: Ian Manske <ian.manske@pm.me>
How to experience a huge slowdown on your machine:
(This is routinely a 6-digit-length table)
After doing this, at least on my pretty beefy machine prompts take about 3 or 4 seconds to load instead of ~instant.
This shows up in
--log-level info
. This shows up inperf
as a lot of activity onValue::Clone
in particualr.Stack clones end up copying all the values inside of enviornment variables, which can be super costly if we have very large variables inside of the environment. Instead, we can pay some small costs to lock the environment (at least that's the theory), and share all of this.
I believe there's probably some subtle bugs that were present in
main
do to cloning of the stack instead of sharing (given that we were cloning to get a mutable copy, after all). I did not really try to find such bugs though, as here I mainly wanted to grasp where the initial performance issue was.After this patch, the performance issue goes away entirely for me. To be seen whether there are other issues though.