-
Notifications
You must be signed in to change notification settings - Fork 2k
LazyRecord #7619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LazyRecord #7619
Conversation
|
Ah, good catch - the order of Rust hashmaps is (understandably) not well-defined. That can be fixed up.
How are you measuring that? |
|
Gotcha. I'm seeing |
On my mac, main is 9-10ms and 11-12ms on this PR. I'm wondering why mine is 3x yours. I'm just guessing that I have more stuff that I'm sourcing and you're just using defaults? I'm also not convinced the PR is loading everything, e.g. 516 commands on main vs 397 commands on this PR. |
Probably that and the fact that I'm running a desktop CPU that consumes more electricity than a small town.
Weird! I don't see any difference, let me know if you can figure out a pattern. |
Ugh. I found it and it's my mistake. I built your PR without dataframes and my regular nu is with dataframes. At least it's not your code 🤣 Now my perf is ~14ms without the PR and ~13ms with this PR - so it's faster-ish. So sorry for all the misleading perf metrics. So, I think you can ignore all my comments except the out of order results. |
|
I'll try redoing this with a update: ugh that gets into some unpleasant territory; to make a long story short I am fighting with the Rust type system and haven't found a nice way to do this. |
| #[serde(skip)] | ||
| pub engine_state: EngineState, | ||
| #[serde(skip)] | ||
| pub stack: Stack, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure do we need the ownership of EngineState and Stack? What about just take reference of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That gets difficult with lifetimes... might be nice to get that working eventually though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might check the ScopeData struct that stores reference to EngineState and Stack https://github.com/nushell/nushell/blob/main/crates/nu-engine/src/scope.rs . But yeah, it might be a pickle having the references stored inside Value without spilling the lifetime annotations everywhere.
An alternative: Don't store the EngineState and Stack at all and instead have get_column_map(&self, engine_state: &EngineState, stack: &Stack). Could we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative: Don't store the EngineState and Stack at all and instead have
get_column_map(&self, engine_state: &EngineState, stack: &Stack). Could we do that?
I suspect that it will be difficult to thread the EngineState and Stack through everywhere cell paths are evaluated. But it's worth a try.
|
Very cool! I'd like to see that working. My main concern is storing the engine state and stack inside the value. Somehow I feel like it goes a bit against Nushell's design where we always tag these objects along wherever we're doing something with Values so that the Values do not need to store any sort of state inside of them. For the HashMap order problem, see IndexMap (used in https://github.com/nushell/nushell/blob/main/crates/nu-protocol/src/module.rs). For some reason JT wasn't keen on using that but it might be the easiest way to solve the ordering problem. |
|
Spent a bit more time on this today. I haven't found a great way to avoid storing engine state and stack inside the value. I think Jakub's suggestion is the most promising:
That might be doable but it involves threading EngineState and Stack through a lot of code, and the cell path code is actively being worked on for error handling. I might wait until the cell path stuff settles down before picking this up again. |
|
Passing EngineState and Stack everywhere that I have also been trying to rewrite this to use I'm going to drop this for now. In the short term, if we want to speed up |
|
OK, I can't stay away from this. I've simplified the I think the
|
|
You're totally, right, we need to access the engine state / stack from the point when the variable was created. I can't think of a different way to do it it than storing a copy. |
|
This PR is ready for review. |
|
land me already 😄 |
|
Alright, let's give it a try. |
This is an attempt to implement a new `Value::LazyRecord` variant for performance reasons. `LazyRecord` is like a regular `Record`, but it's possible to access individual columns without evaluating other columns. I've implemented `LazyRecord` for the special `$nu` variable; accessing `$nu` is relatively slow because of all the information in `scope`, and [`$nu` accounts for about 2/3 of Nu's startup time on Linux](nushell#6677 (comment)). ### Benchmarks I ran some benchmarks on my desktop (Linux, 12900K) and the results are very pleasing. Nu's time to start up and run a command (`cargo build --release; hyperfine 'target/release/nu -c "echo \"Hello, world!\""' --shell=none --warmup 10`) goes from **8.8ms to 3.2ms, about 2.8x faster**. Tests are also much faster! Running `cargo nextest` (with our very slow `proptest` tests disabled) goes from **7.2s to 4.4s (1.6x faster)**, because most tests involve launching a new instance of Nu. ### Design (updated) I've added a new `LazyRecord` trait and added a `Value` variant wrapping those trait objects, much like `CustomValue`. `LazyRecord` implementations must implement these 2 functions: ```rust // All column names fn column_names(&self) -> Vec<&'static str>; // Get 1 specific column value fn get_column_value(&self, column: &str) -> Result<Value, ShellError>; ``` ### Serializability `Value` variants must implement `Serializable` and `Deserializable`, which poses some problems because I want to use unserializable things like `EngineState` in `LazyRecord`s. To work around this, I basically lie to the type system: 1. Add `#[typetag::serde(tag = "type")]` to `LazyRecord` to make it serializable 2. Any unserializable fields in `LazyRecord` implementations get marked with `#[serde(skip)]` 3. At the point where a `LazyRecord` normally would get serialized and sent to a plugin, I instead collect it into a regular `Value::Record` (which can be serialized)


This is an attempt to implement a new
Value::LazyRecordvariant for performance reasons.LazyRecordis like a regularRecord, but it's possible to access individual columns without evaluating other columns. I've implementedLazyRecordfor the special$nuvariable; accessing$nuis relatively slow because of all the information inscope, and$nuaccounts for about 2/3 of Nu's startup time on Linux.Benchmarks
I ran some benchmarks on my desktop (Linux, 12900K) and the results are very pleasing.
Nu's time to start up and run a command (
cargo build --release; hyperfine 'target/release/nu -c "echo \"Hello, world!\""' --shell=none --warmup 10) goes from 8.8ms to 3.2ms, about 2.8x faster.Tests are also much faster! Running
cargo nextest(with our very slowproptesttests disabled) goes from 7.2s to 4.4s (1.6x faster), because most tests involve launching a new instance of Nu.Design (updated)
I've added a new
LazyRecordtrait and added aValuevariant wrapping those trait objects, much likeCustomValue.LazyRecordimplementations must implement these 2 functions:Serializability
Valuevariants must implementSerializableandDeserializable, which poses some problems because I want to use unserializable things likeEngineStateinLazyRecords. To work around this, I basically lie to the type system:#[typetag::serde(tag = "type")]toLazyRecordto make it serializableLazyRecordimplementations get marked with#[serde(skip)]LazyRecordnormally would get serialized and sent to a plugin, I instead collect it into a regularValue::Record(which can be serialized)