Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make EngineState clone cheaper with Arc on all of the heavy objects #12229

Merged
merged 2 commits into from Mar 19, 2024

Conversation

devyn
Copy link
Contributor

@devyn devyn commented Mar 18, 2024

Description

This makes many of the larger objects in EngineState into Arc, and uses Arc::make_mut to do clone-on-write if the reference is not unique. This is generally very cheap, giving us the best of both worlds - allowing us to mutate without cloning if we have an exclusive reference, and cloning if we don't.

This started as more of a curiosity for me after remembering that Arc::make_mut exists and can make using Arc for mostly immutable data that sometimes needs to be changed very convenient, and also after hearing someone complain about memory usage on Discord - this is a somewhat significant win for that.

The exact objects that were wrapped in Arc:

  • files, file_contents - the strings and byte buffers
  • decls - the whole Vec, but mostly to avoid lots of individual malloc() calls on Clone rather than for memory usage
  • blocks - the blocks themselves, rather than the outer Vec
  • modules - the modules themselves, rather than the outer Vec
  • env_vars, previous_env_vars - the entire maps
  • config

The changes required were relatively minimal, but this is a breaking API change. In particular, blocks are added as Arcs, to allow the parser cache functionality to work.

With my normal nu config, running on Linux, this saves me about 15 MiB of process memory usage when running interactively (65 MiB β†’ 50 MiB).

This also makes quick command executions cheaper, particularly since every REPL loop now involves a clone of the engine state so that we can recover from a panic. It also reduces memory usage where engine state needs to be cloned and sent to another thread or kept within an iterator.

User-Facing Changes

Shouldn't be any, since it's all internal stuff, but it does change some public interfaces so it's a breaking change

Tests + Formatting

  • 🟒 toolkit fmt
  • 🟒 toolkit clippy
  • 🟒 toolkit test
  • 🟒 toolkit test stdlib

After Submitting

This makes many of the larger objects in `EngineState` into `Arc`, and
uses `Arc::make_mut` to do clone-on-write if the reference is not
unique. This is generally very cheap, giving us the best of both worlds
- allowing us to mutate without cloning if we have an exclusive
reference, and cloning if we don't.

The exact objects that were wrapped in `Arc`:

- `files`, `file_contents` - the strings and byte buffers
- `decls` - the whole `Vec`, but mostly to avoid lots of individual
  `malloc()` calls on Clone rather than for memory usage
- `blocks` - the blocks themselves, rather than the outer Vec
- `modules` - the modules themselves, rather than the outer Vec
- `env_vars`, `previous_env_vars` - the entire maps
- `config`

The changes required were relatively minimal, but this is a breaking
API change. In particular, blocks are added as Arcs, to allow the parser
cache functionality to work.

With my normal nu config, running on Linux, this saves me about 15 MiB
of process memory usage when running interactively (65 MiB β†’ 50 MiB).

This also makes quick command executions cheaper, particularly since
every REPL loop now involves a clone of the engine state so that we
can recover from a panic. It also reduces memory usage where engine
state needs to be cloned and sent to another thread or kept within an
iterator.
@kubouch
Copy link
Contributor

kubouch commented Mar 18, 2024

Interesting. I'd be willing to try it. Did you notice any runtime difference? Since this is intended as a perf optimization, it would be good to have some benchmarks. Also to make sure we don't regress, but it doesn't seem like we're introducing any overhead to the hot path.

@devyn
Copy link
Contributor Author

devyn commented Mar 18, 2024

@kubouch

Yeah, I totally forgot to paste them 🀦

Before change

Timer precision: 30 ns
benchmarks                       fastest       β”‚ slowest       β”‚ median        β”‚ mean          β”‚ samples β”‚ iters
β”œβ”€ load_standard_lib             17.87 ms      β”‚ 22.92 ms      β”‚ 18.08 ms      β”‚ 18.18 ms      β”‚ 100     β”‚ 100
β”œβ”€ decoding_benchmarks                         β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ json_decode                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ (100, 5)                816.4 Β΅s      β”‚ 939.1 Β΅s      β”‚ 837.9 Β΅s      β”‚ 845.3 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ (10000, 15)             276.8 ms      β”‚ 337.5 ms      β”‚ 279.7 ms      β”‚ 280.6 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ msgpack_decode                           β”‚               β”‚               β”‚               β”‚         β”‚
β”‚     β”œβ”€ (100, 5)                340.1 Β΅s      β”‚ 379.4 Β΅s      β”‚ 348.8 Β΅s      β”‚ 349.9 Β΅s      β”‚ 100     β”‚ 100
β”‚     ╰─ (10000, 15)             105 ms        β”‚ 116.8 ms      β”‚ 106.9 ms      β”‚ 107.1 ms      β”‚ 100     β”‚ 100
β”œβ”€ encoding_benchmarks                         β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ json_encode                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ (100, 5)                125.6 Β΅s      β”‚ 262.7 Β΅s      β”‚ 132.2 Β΅s      β”‚ 136.4 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ (10000, 15)             31.7 ms       β”‚ 33.95 ms      β”‚ 32.92 ms      β”‚ 32.9 ms       β”‚ 100     β”‚ 100
β”‚  ╰─ msgpack_encode                           β”‚               β”‚               β”‚               β”‚         β”‚
β”‚     β”œβ”€ (100, 5)                58.04 Β΅s      β”‚ 114.9 Β΅s      β”‚ 60.11 Β΅s      β”‚ 70.18 Β΅s      β”‚ 100     β”‚ 100
β”‚     ╰─ (10000, 15)             14.3 ms       β”‚ 18.17 ms      β”‚ 14.67 ms      β”‚ 14.72 ms      β”‚ 100     β”‚ 100
β”œβ”€ eval_benchmarks                             β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ eval_default_config        4.343 ms      β”‚ 15.27 ms      β”‚ 4.51 ms       β”‚ 4.85 ms       β”‚ 100     β”‚ 100
β”‚  ╰─ eval_default_env           689.4 Β΅s      β”‚ 1.006 ms      β”‚ 697.6 Β΅s      β”‚ 710.5 Β΅s      β”‚ 100     β”‚ 100
β”œβ”€ eval_commands                               β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ each                                     β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 1                       3.106 ms      β”‚ 5.146 ms      β”‚ 3.164 ms      β”‚ 3.214 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 5                       3.328 ms      β”‚ 3.602 ms      β”‚ 3.383 ms      β”‚ 3.39 ms       β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 10                      3.59 ms       β”‚ 4.222 ms      β”‚ 3.676 ms      β”‚ 3.708 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 100                     8.423 ms      β”‚ 9.15 ms       β”‚ 8.53 ms       β”‚ 8.585 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 1000                    56.87 ms      β”‚ 64.93 ms      β”‚ 62.56 ms      β”‚ 62.22 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ for_range                                β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 1                       3.053 ms      β”‚ 5.401 ms      β”‚ 3.181 ms      β”‚ 3.275 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 5                       3.278 ms      β”‚ 3.965 ms      β”‚ 3.401 ms      β”‚ 3.419 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 10                      3.533 ms      β”‚ 4.02 ms       β”‚ 3.655 ms      β”‚ 3.658 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 100                     8.384 ms      β”‚ 9.031 ms      β”‚ 8.524 ms      β”‚ 8.553 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 1000                    56.99 ms      β”‚ 66.22 ms      β”‚ 62.42 ms      β”‚ 61.99 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ interleave                               β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 100                     1.989 ms      β”‚ 3.294 ms      β”‚ 2.103 ms      β”‚ 2.168 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 1000                    11.26 ms      β”‚ 15.39 ms      β”‚ 12.66 ms      β”‚ 13.2 ms       β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 10000                   92.83 ms      β”‚ 141.7 ms      β”‚ 117.2 ms      β”‚ 120.3 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ interleave_with_ctrlc                    β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 100                     1.777 ms      β”‚ 2.656 ms      β”‚ 2.102 ms      β”‚ 2.086 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 1000                    8.579 ms      β”‚ 15.53 ms      β”‚ 12.81 ms      β”‚ 13.06 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 10000                   108 ms        β”‚ 145.6 ms      β”‚ 130.3 ms      β”‚ 129.3 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ par_each_1t                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 1                       1.02 ms       β”‚ 1.664 ms      β”‚ 1.085 ms      β”‚ 1.102 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 5                       1.254 ms      β”‚ 1.416 ms      β”‚ 1.318 ms      β”‚ 1.319 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 10                      1.503 ms      β”‚ 1.84 ms       β”‚ 1.587 ms      β”‚ 1.594 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 100                     6.758 ms      β”‚ 8.777 ms      β”‚ 7.431 ms      β”‚ 7.42 ms       β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 1000                    55.41 ms      β”‚ 62.91 ms      β”‚ 60.05 ms      β”‚ 59.39 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ par_each_2t                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚     β”œβ”€ 1                       1.034 ms      β”‚ 1.854 ms      β”‚ 1.084 ms      β”‚ 1.147 ms      β”‚ 100     β”‚ 100
β”‚     β”œβ”€ 5                       1.162 ms      β”‚ 1.404 ms      β”‚ 1.209 ms      β”‚ 1.221 ms      β”‚ 100     β”‚ 100
β”‚     β”œβ”€ 10                      1.254 ms      β”‚ 1.665 ms      β”‚ 1.323 ms      β”‚ 1.361 ms      β”‚ 100     β”‚ 100
β”‚     β”œβ”€ 100                     4.005 ms      β”‚ 5.285 ms      β”‚ 4.573 ms      β”‚ 4.535 ms      β”‚ 100     β”‚ 100
β”‚     ╰─ 1000                    28.74 ms      β”‚ 31.87 ms      β”‚ 30.79 ms      β”‚ 30.58 ms      β”‚ 100     β”‚ 100
╰─ parser_benchmarks                           β”‚               β”‚               β”‚               β”‚         β”‚
   β”œβ”€ parse_default_config_file  3.208 ms      β”‚ 6.518 ms      β”‚ 3.476 ms      β”‚ 3.662 ms      β”‚ 100     β”‚ 100
   ╰─ parse_default_env_file     558.5 Β΅s      β”‚ 791.4 Β΅s      β”‚ 567.5 Β΅s      β”‚ 582.7 Β΅s      β”‚ 100     β”‚ 100

After change

benchmarks                       fastest       β”‚ slowest       β”‚ median        β”‚ mean          β”‚ samples β”‚ iters
β”œβ”€ load_standard_lib             17.93 ms      β”‚ 22.86 ms      β”‚ 18.12 ms      β”‚ 18.29 ms      β”‚ 100     β”‚ 100
β”œβ”€ decoding_benchmarks                         β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ json_decode                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ (100, 5)                851.1 Β΅s      β”‚ 967 Β΅s        β”‚ 859.8 Β΅s      β”‚ 863.6 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ (10000, 15)             282.9 ms      β”‚ 352.9 ms      β”‚ 289.2 ms      β”‚ 290.2 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ msgpack_decode                           β”‚               β”‚               β”‚               β”‚         β”‚
β”‚     β”œβ”€ (100, 5)                321.1 Β΅s      β”‚ 365.5 Β΅s      β”‚ 322 Β΅s        β”‚ 323.7 Β΅s      β”‚ 100     β”‚ 100
β”‚     ╰─ (10000, 15)             101.1 ms      β”‚ 106.1 ms      β”‚ 104.1 ms      β”‚ 104 ms        β”‚ 100     β”‚ 100
β”œβ”€ encoding_benchmarks                         β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ json_encode                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ (100, 5)                128.9 Β΅s      β”‚ 144.7 Β΅s      β”‚ 129.3 Β΅s      β”‚ 130.5 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ (10000, 15)             29.9 ms       β”‚ 34.82 ms      β”‚ 30.93 ms      β”‚ 31.3 ms       β”‚ 100     β”‚ 100
β”‚  ╰─ msgpack_encode                           β”‚               β”‚               β”‚               β”‚         β”‚
β”‚     β”œβ”€ (100, 5)                55.82 Β΅s      β”‚ 67.32 Β΅s      β”‚ 56.43 Β΅s      β”‚ 57.42 Β΅s      β”‚ 100     β”‚ 100
β”‚     ╰─ (10000, 15)             13.57 ms      β”‚ 14.66 ms      β”‚ 13.7 ms       β”‚ 13.74 ms      β”‚ 100     β”‚ 100
β”œβ”€ eval_benchmarks                             β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ eval_default_config        4.209 ms      β”‚ 15.78 ms      β”‚ 4.229 ms      β”‚ 4.456 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ eval_default_env           688.8 Β΅s      β”‚ 785.8 Β΅s      β”‚ 696.8 Β΅s      β”‚ 699.3 Β΅s      β”‚ 100     β”‚ 100
β”œβ”€ eval_commands                               β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ each                                     β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 1                       166.3 Β΅s      β”‚ 272.1 Β΅s      β”‚ 167.9 Β΅s      β”‚ 170.3 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 5                       345.4 Β΅s      β”‚ 418.4 Β΅s      β”‚ 384.9 Β΅s      β”‚ 385 Β΅s        β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 10                      635.9 Β΅s      β”‚ 719.9 Β΅s      β”‚ 655.9 Β΅s      β”‚ 658.9 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 100                     5.475 ms      β”‚ 7.33 ms       β”‚ 5.513 ms      β”‚ 5.544 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 1000                    53.89 ms      β”‚ 61.87 ms      β”‚ 58.48 ms      β”‚ 57.15 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ for_range                                β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 1                       147.1 Β΅s      β”‚ 307.4 Β΅s      β”‚ 176.8 Β΅s      β”‚ 216 Β΅s        β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 5                       342.5 Β΅s      β”‚ 539 Β΅s        β”‚ 448.1 Β΅s      β”‚ 444.3 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 10                      617.8 Β΅s      β”‚ 831.1 Β΅s      β”‚ 657.2 Β΅s      β”‚ 684.9 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 100                     5.494 ms      β”‚ 6.477 ms      β”‚ 5.966 ms      β”‚ 5.92 ms       β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 1000                    53.84 ms      β”‚ 59.47 ms      β”‚ 57.75 ms      β”‚ 57.59 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ interleave                               β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 100                     1.224 ms      β”‚ 1.637 ms      β”‚ 1.339 ms      β”‚ 1.38 ms       β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 1000                    10.54 ms      β”‚ 14.29 ms      β”‚ 12.28 ms      β”‚ 12.37 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 10000                   90.11 ms      β”‚ 135.1 ms      β”‚ 120.9 ms      β”‚ 121.2 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ interleave_with_ctrlc                    β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 100                     1.151 ms      β”‚ 1.622 ms      β”‚ 1.273 ms      β”‚ 1.301 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 1000                    10.43 ms      β”‚ 12.44 ms      β”‚ 11.59 ms      β”‚ 11.54 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 10000                   101 ms        β”‚ 138.2 ms      β”‚ 121.2 ms      β”‚ 122.4 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ par_each_1t                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”‚  β”œβ”€ 1                       219.6 Β΅s      β”‚ 405.8 Β΅s      β”‚ 246.7 Β΅s      β”‚ 250.2 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 5                       446.5 Β΅s      β”‚ 939.9 Β΅s      β”‚ 512.7 Β΅s      β”‚ 539.8 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 10                      831.1 Β΅s      β”‚ 1.136 ms      β”‚ 892.7 Β΅s      β”‚ 897.7 Β΅s      β”‚ 100     β”‚ 100
β”‚  β”‚  β”œβ”€ 100                     5.819 ms      β”‚ 6.832 ms      β”‚ 6.221 ms      β”‚ 6.208 ms      β”‚ 100     β”‚ 100
β”‚  β”‚  ╰─ 1000                    54.59 ms      β”‚ 61.37 ms      β”‚ 58.94 ms      β”‚ 58.85 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ par_each_2t                              β”‚               β”‚               β”‚               β”‚         β”‚
β”‚     β”œβ”€ 1                       346.5 Β΅s      β”‚ 471.2 Β΅s      β”‚ 370.6 Β΅s      β”‚ 375.9 Β΅s      β”‚ 100     β”‚ 100
β”‚     β”œβ”€ 5                       342.9 Β΅s      β”‚ 641.1 Β΅s      β”‚ 383.4 Β΅s      β”‚ 389.2 Β΅s      β”‚ 100     β”‚ 100
β”‚     β”œβ”€ 10                      457.5 Β΅s      β”‚ 771.5 Β΅s      β”‚ 519.9 Β΅s      β”‚ 537.8 Β΅s      β”‚ 100     β”‚ 100
β”‚     β”œβ”€ 100                     3.193 ms      β”‚ 3.506 ms      β”‚ 3.367 ms      β”‚ 3.359 ms      β”‚ 100     β”‚ 100
β”‚     ╰─ 1000                    28.26 ms      β”‚ 30.14 ms      β”‚ 29.73 ms      β”‚ 29.62 ms      β”‚ 100     β”‚ 100
╰─ parser_benchmarks                           β”‚               β”‚               β”‚               β”‚         β”‚
   β”œβ”€ parse_default_config_file  3.098 ms      β”‚ 6.874 ms      β”‚ 3.124 ms      β”‚ 3.532 ms      β”‚ 100     β”‚ 100
   ╰─ parse_default_env_file     549 Β΅s        β”‚ 643.6 Β΅s      β”‚ 560.1 Β΅s      β”‚ 563.9 Β΅s      β”‚ 100     β”‚ 100

@kubouch
Copy link
Contributor

kubouch commented Mar 18, 2024

Cool, you can see from the each benchmark that cloning the full EngineState took about 3 ms, that's significant.

To me, it looks good, one thing I'd add is a short comment to EngineState explaining why there are suddenly Arcs everywhere.

Tagging @sholderbach to get a second pair of eyes on it. Also, @rgwood, this should make creating lazy records much cheaper now.

Copy link
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @devyn. Amazing that we get this benefit, without deep viral surgery (would have expected more ripple effects for all those fields).

Convinced myself that this is currently sound.

I think there are a few things to tweak here in the future (Arc<Vec<Module>> instead of Vec<Arc<Module>>) but overall I don't have anything to block to land this.

Comment on lines +77 to +78
files: Vec<(Arc<String>, usize, usize)>,
file_contents: Vec<(Arc<Vec<u8>>, usize, usize)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting those beauties of a type on my future TODO list πŸ˜„

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yup. Very self-describing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's potentially a bit of a benefit that could be had here by switching from Arc<String> to Arc<str>, and Arc<Vec<u8>> to Arc<[u8]> fyi, I just thought it would be more pain than it's worth. But it would remove one indirect pointer reference and we really never need to modify these after the fact

@@ -325,26 +327,26 @@ impl Expression {
expr.replace_span(working_set, replaced, new_span);
}
Expr::Block(block_id) => {
let mut block = working_set.get_block(*block_id).clone();
let mut block = (**working_set.get_block(*block_id)).clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's maybe a little subtle in the syntax. Do we want to modify a clone of the contained value (or would we actually like to modify the underlying block at this block id if we could sanely?)

If it is the former, maybe worth pointing out the intention.

Suggested change
let mut block = (**working_set.get_block(*block_id)).clone();
// here we want to obtain a clone of the underlying `Block` to add to the `working_set` after mutation
let mut block = (**working_set.get_block(*block_id)).clone();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if I can make it look nicer with like Block::clone or something like that

Comment on lines +93 to +94
let range = ..block.pipelines.len() - 1;
Arc::make_mut(&mut block).pipelines.drain(range);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be written on a single line like this?

In case strong_count > 1 we would get the copy in the return value and mutate there.

(OK I checked https://doc.rust-lang.org/src/alloc/sync.rs.html#2131 will in fact reassign the in &mut to the created copy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not on a single line, but the make_mut can be relocated to be on the other line instead

Comment on lines +127 to +129
for pipeline in output.pipelines.iter() {
for pipeline_element in &pipeline.elements {
let flattened = flatten_pipeline_element(&working_set, pipeline_element);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good bycatch

Comment on lines +82 to +83
pub(super) blocks: Vec<Arc<Block>>,
pub(super) modules: Vec<Arc<Module>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner types are already large to probably benefit from an Arc themselves (see the cloning of Blocks).

Future direction may be to Arc the Vec. There will be quite a number of elements.

(looking at it further, Module itself doesn't get cloned elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree - I realized this later too, the list could get quite large and we don't want to refcount++ a thousand Arcs every time. I think I'll patch this

@devyn
Copy link
Contributor Author

devyn commented Mar 19, 2024

I think there are a few things to tweak here in the future (Arc<Vec<Module>> instead of Vec<Arc<Module>>) but overall I don't have anything to block to land this.

Yeah, that would be a good idea. Thinking about it again, it's probably also a good idea to wrap the block list in another Arc on the outside so we don't need to refcount++ each one of them every time, only on non-exclusive-modify

@sholderbach sholderbach merged commit cf321ab into nushell:main Mar 19, 2024
15 checks passed
@kubouch kubouch added this to the v0.92.0 milestone Mar 19, 2024
@kubouch kubouch added the pr:release-note-mention Addition/Improvement to be mentioned in the release notes label Mar 19, 2024
@devyn devyn deleted the make-enginestate-clone-cheaper branch March 19, 2024 20:02
sholderbach added a commit to sholderbach/nushell that referenced this pull request Mar 19, 2024
Get rid of two parallel `Vec`s in `StateDelta` and `EngineState`, that
also duplicated span information. Use a struct with documenting fields.

Also use `Arc<str>` and `Arc<[u8]>` for the allocations as they are
never modified and cloned often (see nushell#12229 for the first improvement).
This also makes the representation more compact as no capacity is
necessary.
devyn added a commit to devyn/nushell that referenced this pull request Mar 19, 2024
@sholderbach left a very helpful review and this just implements the
suggestions he made.

Didn't notice any difference in performance, but there could potentially
be for a long running Nushell session or one that loads a lot of stuff.
@FilipAndersson245
Copy link
Contributor

@devyn Awesome work, I was confused why each was slower then par-each, seems fixed after this issue, I guess each was getting crippled by the cloning.

sholderbach added a commit that referenced this pull request Mar 20, 2024
# Description
Get rid of two parallel `Vec`s in `StateDelta` and `EngineState`, that
also duplicated span information. Use a struct with documenting fields.

Also use `Arc<str>` and `Arc<[u8]>` for the allocations as they are
never modified and cloned often (see #12229 for the first improvement).
This also makes the representation more compact as no capacity is
necessary.

# User-Facing Changes
API breakage on `EngineState`/`StateWorkingSet`/`StateDelta` that should
not really affect plugin authors.
sholderbach added a commit that referenced this pull request Mar 20, 2024
# Description
@sholderbach left a very helpful review and this just implements the
suggestions he made.

Didn't notice any difference in performance, but there could potentially
be for a long running Nushell session or one that loads a lot of stuff.

I also caught a bug where nu-protocol won't build without `plugin`
because of the previous conditional import. Oops. Fixed.

# User-Facing Changes
`blocks` and `modules` type in `EngineState` changed again. Shouldn't
affect plugins or anything else though really

# Tests + Formatting
- 🟒 `toolkit fmt`
- 🟒 `toolkit clippy`
- 🟒 `toolkit test`
- 🟒 `toolkit test stdlib`

# After Submitting

---------

Co-authored-by: sholderbach <sholderbach@users.noreply.github.com>
@devyn
Copy link
Contributor Author

devyn commented Mar 21, 2024

@devyn Awesome work, I was confused why each was slower then par-each, seems fixed after this issue, I guess each was getting crippled by the cloning.

I'm not sure, par-each also has to do it. Not clear why it would be faster.

@Dorumin
Copy link
Contributor

Dorumin commented Mar 26, 2024

This commit also seems to lower the initial cost of doing an each, and likely lots of other commands like tee. 0..0 | each { ignore } went from 4ms to ~100us. I haven't bijected but it's a big difference from my installed and main. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:release-note-mention Addition/Improvement to be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants