Skip to content

Conversation

@rgwood
Copy link
Contributor

@rgwood rgwood commented Dec 27, 2022

This is an attempt to implement a new Value::LazyRecord variant for performance reasons.

LazyRecord is like a regular Record, but it's possible to access individual columns without evaluating other columns. I've implemented LazyRecord for the special $nu variable; accessing $nu is relatively slow because of all the information in scope, and $nu accounts for about 2/3 of Nu's startup time on Linux.

Benchmarks

I ran some benchmarks on my desktop (Linux, 12900K) and the results are very pleasing.

Nu's time to start up and run a command (cargo build --release; hyperfine 'target/release/nu -c "echo \"Hello, world!\""' --shell=none --warmup 10) goes from 8.8ms to 3.2ms, about 2.8x faster.

Tests are also much faster! Running cargo nextest (with our very slow proptest tests disabled) goes from 7.2s to 4.4s (1.6x faster), because most tests involve launching a new instance of Nu.

Design (updated)

I've added a new LazyRecord trait and added a Value variant wrapping those trait objects, much like CustomValue. LazyRecord implementations must implement these 2 functions:

// All column names
fn column_names(&self) -> Vec<&'static str>;

// Get 1 specific column value
fn get_column_value(&self, column: &str) -> Result<Value, ShellError>;

Serializability

Value variants must implement Serializable and Deserializable, which poses some problems because I want to use unserializable things like EngineState in LazyRecords. To work around this, I basically lie to the type system:

  1. Add #[typetag::serde(tag = "type")] to LazyRecord to make it serializable
  2. Any unserializable fields in LazyRecord implementations get marked with #[serde(skip)]
  3. At the point where a LazyRecord normally would get serialized and sent to a plugin, I instead collect it into a regular Value::Record (which can be serialized)

@rgwood rgwood marked this pull request as draft December 27, 2022 21:47
@fdncred
Copy link
Contributor

fdncred commented Dec 28, 2022

It's interesting that running $nu consecutively displays that results in different orders.
Screenshot 2022-12-27 at 6 49 40 PM

Not trying to be critical but it seems a tiny bit slower when doing $nu by a few ms than without this PR, which probably isn't significant but was surprising to me. On the other hand, doing $nu.scope seems a tiny bit faster that without this PR.

Without this PR

$nu.scope 
╭──────────────┬───────────────────╮
│ vars         │ [table 39 rows]   │
│ commands     │ [table 516 rows]  │
│ aliases      │ [table 12 rows]   │
│ modules      │ [list 2 items]    │
│ engine_state │ {record 7 fields} │
╰──────────────┴───────────────────╯

With this PR

$nu.scope
╭──────────────┬───────────────────╮
│ vars         │ [table 39 rows]   │
│ commands     │ [table 397 rows]  │
│ aliases      │ [table 12 rows]   │
│ modules      │ [list 2 items]    │
│ engine_state │ {record 7 fields} │
╰──────────────┴───────────────────╯

Note the difference in command counts.

@rgwood
Copy link
Contributor Author

rgwood commented Dec 28, 2022

Ah, good catch - the order of Rust hashmaps is (understandably) not well-defined. That can be fixed up.

Not trying to be critical but it seems a tiny bit slower when doing $nu by a few ms

How are you measuring that? benchmark?

@fdncred
Copy link
Contributor

fdncred commented Dec 28, 2022

How are you measuring that? benchmark?

Like this. In my prompt on the right side (to the left of the time) I have the $env.CMD_DURATION_MS. Just using that.
Screenshot 2022-12-27 at 7 11 45 PM

@rgwood
Copy link
Contributor Author

rgwood commented Dec 28, 2022

Gotcha. I'm seeing $nu take roughly 3-4ms on main and 4-5ms on this PR. The clone of engine_state and stack might be responsible for a bit of the slowdown relative to the default.

@fdncred
Copy link
Contributor

fdncred commented Dec 28, 2022

3-4ms on main and 4-5ms on this PR

On my mac, main is 9-10ms and 11-12ms on this PR.

I'm wondering why mine is 3x yours. I'm just guessing that I have more stuff that I'm sourcing and you're just using defaults?

I'm also not convinced the PR is loading everything, e.g. 516 commands on main vs 397 commands on this PR.

@rgwood
Copy link
Contributor Author

rgwood commented Dec 28, 2022

I'm wondering why mine is 3x yours. I'm just guessing that I have more stuff that I'm sourcing and you're just using defaults?

Probably that and the fact that I'm running a desktop CPU that consumes more electricity than a small town.

I'm also not convinced the PR is loading everything, e.g. 516 commands on main vs 397 commands on this PR.

Weird! I don't see any difference, let me know if you can figure out a pattern.

@fdncred
Copy link
Contributor

fdncred commented Dec 28, 2022

Weird! I don't see any difference, let me know if you can figure out a pattern.

Ugh. I found it and it's my mistake. I built your PR without dataframes and my regular nu is with dataframes. At least it's not your code 🤣

Now my perf is ~14ms without the PR and ~13ms with this PR - so it's faster-ish. So sorry for all the misleading perf metrics.

So, I think you can ignore all my comments except the out of order results.

@rgwood
Copy link
Contributor Author

rgwood commented Dec 29, 2022

I'll try redoing this with a vec of OnceCell items; that should address the ordering issue and avoid evaluating columns twice.

update: ugh that gets into some unpleasant territory; to make a long story short I am fighting with the Rust type system and haven't found a nice way to do this.

#[serde(skip)]
pub engine_state: EngineState,
#[serde(skip)]
pub stack: Stack,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure do we need the ownership of EngineState and Stack? What about just take reference of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gets difficult with lifetimes... might be nice to get that working eventually though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might check the ScopeData struct that stores reference to EngineState and Stack https://github.com/nushell/nushell/blob/main/crates/nu-engine/src/scope.rs . But yeah, it might be a pickle having the references stored inside Value without spilling the lifetime annotations everywhere.

An alternative: Don't store the EngineState and Stack at all and instead have get_column_map(&self, engine_state: &EngineState, stack: &Stack). Could we do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative: Don't store the EngineState and Stack at all and instead have get_column_map(&self, engine_state: &EngineState, stack: &Stack). Could we do that?

I suspect that it will be difficult to thread the EngineState and Stack through everywhere cell paths are evaluated. But it's worth a try.

@kubouch
Copy link
Contributor

kubouch commented Dec 31, 2022

Very cool! I'd like to see that working.

My main concern is storing the engine state and stack inside the value. Somehow I feel like it goes a bit against Nushell's design where we always tag these objects along wherever we're doing something with Values so that the Values do not need to store any sort of state inside of them.

For the HashMap order problem, see IndexMap (used in https://github.com/nushell/nushell/blob/main/crates/nu-protocol/src/module.rs). For some reason JT wasn't keen on using that but it might be the easiest way to solve the ordering problem.

@rgwood
Copy link
Contributor Author

rgwood commented Jan 2, 2023

Spent a bit more time on this today. IndexMap seems like a good way to solve the ordering problem.

I haven't found a great way to avoid storing engine state and stack inside the value. I think Jakub's suggestion is the most promising:

An alternative: Don't store the EngineState and Stack at all and instead have get_column_map(&self, engine_state: &EngineState, stack: &Stack). Could we do that?

That might be doable but it involves threading EngineState and Stack through a lot of code, and the cell path code is actively being worked on for error handling. I might wait until the cell path stuff settles down before picking this up again.

@rgwood
Copy link
Contributor Author

rgwood commented Jan 6, 2023

Passing EngineState and Stack everywhere that LazyRecord might be evaluated is unfortunately not very practical. Cell path access sometimes happens in code without access to EngineState and Stack (I should have written down exactly where, but I forgot).

I have also been trying to rewrite this to use OnceCell; it would be nice if column values were only evaluated once. That's probably possible, but it gets very tricky with lifetimes (the cell initializer functions need to keep a reference to NuValue) and it requires more fighting with rustc than I have the energy for right now.

I'm going to drop this for now. In the short term, if we want to speed up $nu I would suggest just moving the scope information elsewhere.

@rgwood rgwood closed this Jan 6, 2023
@rgwood
Copy link
Contributor Author

rgwood commented Jan 11, 2023

OK, I can't stay away from this. I've simplified the LazyRecord trait and fixed up most issues. If we can get tentative agreement on the way forward, this is nearly ready for code review. One point that I would like to seek consent on:

I think the $nu implementation of LazyRecord needs to take ownership of EngineState and Stack.

IMO this is important for ease of implementation and correctness. Consider a situation where a user assigns $nu to a variable and then accesses it later:

let foo = $nu
alias some_alias = some_command
let bar = $foo.scope

Before LazyRecord, bar would not contain some_alias. But if LazyRecord references the "living" EngineState, bar will contain some_alias.

I don't think we should change the existing behaviour; as much as possible, laziness should be an implementation detail that is unobservable to users. Taking ownership of EngineState solves this nicely by using a snapshot of EngineState from the time the LazyRecord was created.

@rgwood rgwood reopened this Jan 11, 2023
@kubouch
Copy link
Contributor

kubouch commented Jan 11, 2023

You're totally, right, we need to access the engine state / stack from the point when the variable was created. I can't think of a different way to do it it than storing a copy.

@rgwood rgwood changed the title Proof of concept: LazyRecord LazyRecord Jan 11, 2023
@rgwood rgwood marked this pull request as ready for review January 11, 2023 20:04
@rgwood
Copy link
Contributor Author

rgwood commented Jan 11, 2023

This PR is ready for review.

@fdncred
Copy link
Contributor

fdncred commented Jan 19, 2023

land me already 😄

@rgwood
Copy link
Contributor Author

rgwood commented Jan 19, 2023

Alright, let's give it a try.

@rgwood rgwood merged commit 3b5172a into nushell:main Jan 19, 2023
Xoffio pushed a commit to Xoffio/nushell that referenced this pull request Feb 7, 2023
This is an attempt to implement a new `Value::LazyRecord` variant for
performance reasons.

`LazyRecord` is like a regular `Record`, but it's possible to access
individual columns without evaluating other columns. I've implemented
`LazyRecord` for the special `$nu` variable; accessing `$nu` is
relatively slow because of all the information in `scope`, and [`$nu`
accounts for about 2/3 of Nu's startup time on
Linux](nushell#6677 (comment)).

### Benchmarks

I ran some benchmarks on my desktop (Linux, 12900K) and the results are
very pleasing.

Nu's time to start up and run a command (`cargo build --release;
hyperfine 'target/release/nu -c "echo \"Hello, world!\""' --shell=none
--warmup 10`) goes from **8.8ms to 3.2ms, about 2.8x faster**.

Tests are also much faster! Running `cargo nextest` (with our very slow
`proptest` tests disabled) goes from **7.2s to 4.4s (1.6x faster)**,
because most tests involve launching a new instance of Nu.

### Design (updated)

I've added a new `LazyRecord` trait and added a `Value` variant wrapping
those trait objects, much like `CustomValue`. `LazyRecord`
implementations must implement these 2 functions:

```rust
// All column names
fn column_names(&self) -> Vec<&'static str>;

// Get 1 specific column value
fn get_column_value(&self, column: &str) -> Result<Value, ShellError>;
 ```

### Serializability

`Value` variants must implement `Serializable` and `Deserializable`, which poses some problems because I want to use unserializable things like `EngineState` in `LazyRecord`s. To work around this, I basically lie to the type system:

1. Add `#[typetag::serde(tag = "type")]` to `LazyRecord` to make it serializable
2. Any unserializable fields in `LazyRecord` implementations get marked with `#[serde(skip)]`
3. At the point where a `LazyRecord` normally would get serialized and sent to a plugin, I instead collect it into a regular `Value::Record` (which can be serialized)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants