New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make EngineState clone cheaper with Arc on all of the heavy objects #12229
Make EngineState clone cheaper with Arc on all of the heavy objects #12229
Conversation
This makes many of the larger objects in `EngineState` into `Arc`, and uses `Arc::make_mut` to do clone-on-write if the reference is not unique. This is generally very cheap, giving us the best of both worlds - allowing us to mutate without cloning if we have an exclusive reference, and cloning if we don't. The exact objects that were wrapped in `Arc`: - `files`, `file_contents` - the strings and byte buffers - `decls` - the whole `Vec`, but mostly to avoid lots of individual `malloc()` calls on Clone rather than for memory usage - `blocks` - the blocks themselves, rather than the outer Vec - `modules` - the modules themselves, rather than the outer Vec - `env_vars`, `previous_env_vars` - the entire maps - `config` The changes required were relatively minimal, but this is a breaking API change. In particular, blocks are added as Arcs, to allow the parser cache functionality to work. With my normal nu config, running on Linux, this saves me about 15 MiB of process memory usage when running interactively (65 MiB β 50 MiB). This also makes quick command executions cheaper, particularly since every REPL loop now involves a clone of the engine state so that we can recover from a panic. It also reduces memory usage where engine state needs to be cloned and sent to another thread or kept within an iterator.
Interesting. I'd be willing to try it. Did you notice any runtime difference? Since this is intended as a perf optimization, it would be good to have some benchmarks. Also to make sure we don't regress, but it doesn't seem like we're introducing any overhead to the hot path. |
Yeah, I totally forgot to paste them π€¦ Before change
After change
|
Cool, you can see from the To me, it looks good, one thing I'd add is a short comment to EngineState explaining why there are suddenly Arcs everywhere. Tagging @sholderbach to get a second pair of eyes on it. Also, @rgwood, this should make creating lazy records much cheaper now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @devyn. Amazing that we get this benefit, without deep viral surgery (would have expected more ripple effects for all those fields).
Convinced myself that this is currently sound.
I think there are a few things to tweak here in the future (Arc<Vec<Module>>
instead of Vec<Arc<Module>>
) but overall I don't have anything to block to land this.
files: Vec<(Arc<String>, usize, usize)>, | ||
file_contents: Vec<(Arc<Vec<u8>>, usize, usize)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting those beauties of a type on my future TODO list π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, yup. Very self-describing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's potentially a bit of a benefit that could be had here by switching from Arc<String>
to Arc<str>
, and Arc<Vec<u8>>
to Arc<[u8]>
fyi, I just thought it would be more pain than it's worth. But it would remove one indirect pointer reference and we really never need to modify these after the fact
@@ -325,26 +327,26 @@ impl Expression { | |||
expr.replace_span(working_set, replaced, new_span); | |||
} | |||
Expr::Block(block_id) => { | |||
let mut block = working_set.get_block(*block_id).clone(); | |||
let mut block = (**working_set.get_block(*block_id)).clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's maybe a little subtle in the syntax. Do we want to modify a clone of the contained value (or would we actually like to modify the underlying block at this block id if we could sanely?)
If it is the former, maybe worth pointing out the intention.
let mut block = (**working_set.get_block(*block_id)).clone(); | |
// here we want to obtain a clone of the underlying `Block` to add to the `working_set` after mutation | |
let mut block = (**working_set.get_block(*block_id)).clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if I can make it look nicer with like Block::clone
or something like that
let range = ..block.pipelines.len() - 1; | ||
Arc::make_mut(&mut block).pipelines.drain(range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be written on a single line like this?
In case strong_count > 1
we would get the copy in the return value and mutate there.
(OK I checked https://doc.rust-lang.org/src/alloc/sync.rs.html#2131 will in fact reassign the in &mut
to the created copy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not on a single line, but the make_mut
can be relocated to be on the other line instead
for pipeline in output.pipelines.iter() { | ||
for pipeline_element in &pipeline.elements { | ||
let flattened = flatten_pipeline_element(&working_set, pipeline_element); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good bycatch
pub(super) blocks: Vec<Arc<Block>>, | ||
pub(super) modules: Vec<Arc<Module>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner types are already large to probably benefit from an Arc
themselves (see the cloning of Block
s).
Future direction may be to Arc
the Vec
. There will be quite a number of elements.
(looking at it further, Module
itself doesn't get cloned elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree - I realized this later too, the list could get quite large and we don't want to refcount++ a thousand Arcs every time. I think I'll patch this
Yeah, that would be a good idea. Thinking about it again, it's probably also a good idea to wrap the block list in another |
Get rid of two parallel `Vec`s in `StateDelta` and `EngineState`, that also duplicated span information. Use a struct with documenting fields. Also use `Arc<str>` and `Arc<[u8]>` for the allocations as they are never modified and cloned often (see nushell#12229 for the first improvement). This also makes the representation more compact as no capacity is necessary.
@sholderbach left a very helpful review and this just implements the suggestions he made. Didn't notice any difference in performance, but there could potentially be for a long running Nushell session or one that loads a lot of stuff.
@devyn Awesome work, I was confused why |
# Description Get rid of two parallel `Vec`s in `StateDelta` and `EngineState`, that also duplicated span information. Use a struct with documenting fields. Also use `Arc<str>` and `Arc<[u8]>` for the allocations as they are never modified and cloned often (see #12229 for the first improvement). This also makes the representation more compact as no capacity is necessary. # User-Facing Changes API breakage on `EngineState`/`StateWorkingSet`/`StateDelta` that should not really affect plugin authors.
# Description @sholderbach left a very helpful review and this just implements the suggestions he made. Didn't notice any difference in performance, but there could potentially be for a long running Nushell session or one that loads a lot of stuff. I also caught a bug where nu-protocol won't build without `plugin` because of the previous conditional import. Oops. Fixed. # User-Facing Changes `blocks` and `modules` type in `EngineState` changed again. Shouldn't affect plugins or anything else though really # Tests + Formatting - π’ `toolkit fmt` - π’ `toolkit clippy` - π’ `toolkit test` - π’ `toolkit test stdlib` # After Submitting --------- Co-authored-by: sholderbach <sholderbach@users.noreply.github.com>
I'm not sure, |
This commit also seems to lower the initial cost of doing an |
Description
This makes many of the larger objects in
EngineState
intoArc
, and usesArc::make_mut
to do clone-on-write if the reference is not unique. This is generally very cheap, giving us the best of both worlds - allowing us to mutate without cloning if we have an exclusive reference, and cloning if we don't.This started as more of a curiosity for me after remembering that
Arc::make_mut
exists and can make usingArc
for mostly immutable data that sometimes needs to be changed very convenient, and also after hearing someone complain about memory usage on Discord - this is a somewhat significant win for that.The exact objects that were wrapped in
Arc
:files
,file_contents
- the strings and byte buffersdecls
- the wholeVec
, but mostly to avoid lots of individualmalloc()
calls on Clone rather than for memory usageblocks
- the blocks themselves, rather than the outer Vecmodules
- the modules themselves, rather than the outer Vecenv_vars
,previous_env_vars
- the entire mapsconfig
The changes required were relatively minimal, but this is a breaking API change. In particular, blocks are added as Arcs, to allow the parser cache functionality to work.
With my normal nu config, running on Linux, this saves me about 15 MiB of process memory usage when running interactively (65 MiB β 50 MiB).
This also makes quick command executions cheaper, particularly since every REPL loop now involves a clone of the engine state so that we can recover from a panic. It also reduces memory usage where engine state needs to be cloned and sent to another thread or kept within an iterator.
User-Facing Changes
Shouldn't be any, since it's all internal stuff, but it does change some public interfaces so it's a breaking change
Tests + Formatting
toolkit fmt
toolkit clippy
toolkit test
toolkit test stdlib
After Submitting