Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box ShellError in Value::Error #8375

Merged
merged 1 commit into from Mar 12, 2023
Merged

Conversation

sholderbach
Copy link
Member

@sholderbach sholderbach commented Mar 9, 2023

Description

Our ShellError at the moment has a std::mem::size_of<ShellError> of 136 bytes (on AMD64). As a result Value directly storing the struct also required 136 bytes (thanks to alignment requirements).

This change stores the Value::Error ShellError on the heap.

Pro:

  • Value now needs just 80 bytes
  • Should be 1 cacheline less (still at least 2 cachelines)

Con:

  • More small heap allocations when dealing with Value::Error
    • More heap fragmentation
    • Potential for additional required memcopies

Further code changes

Includes a small refactor of try due to a type mismatch in its large match.

User-Facing Changes

None for regular users.

Plugin authors may have to update their matches on Value if they use nu-protocol

Needs benchmarking to see if there is a benefit in real world workloads.
Update small improvements in runtime for workloads with high volume of values. Significant reduction in maximum resident set size, when many values are held in memory.

Tests + Formatting

Our `ShellError` at the moment has a `std::mem::size_of<ShellError>` of
`136` bytes (on AMD64).
As a result `Value` directly storing the struct also required `136`
bytes (thanks to alignment requirements).

This change stores the `Value::Error` `ShellError` on the heap.

Pro:
- Value now needs just 80 bytes
- Should be 1 cacheline less (still at least 2 cachelines)

Con:
- More small heap allocations when dealing with `Value::Error`
  - More heap fragmentation
  - Potential for additional required memcopies
@codecov
Copy link

codecov bot commented Mar 9, 2023

Codecov Report

Merging #8375 (15bb32e) into main (ccd72fa) will decrease coverage by 0.08%.
The diff coverage is 17.82%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #8375      +/-   ##
==========================================
- Coverage   68.49%   68.41%   -0.08%     
==========================================
  Files         620      620              
  Lines       99480    99604     +124     
==========================================
+ Hits        68139    68149      +10     
- Misses      31341    31455     +114     
Impacted Files Coverage Δ
crates/nu-cli/src/nu_highlight.rs 94.64% <0.00%> (-3.51%) ⬇️
crates/nu-command/src/bits/and.rs 87.01% <0.00%> (ø)
crates/nu-command/src/bits/not.rs 78.51% <0.00%> (ø)
crates/nu-command/src/bits/or.rs 87.01% <0.00%> (ø)
crates/nu-command/src/bits/rotate_left.rs 72.44% <0.00%> (ø)
crates/nu-command/src/bits/rotate_right.rs 74.04% <0.00%> (ø)
crates/nu-command/src/bits/shift_left.rs 70.94% <0.00%> (ø)
crates/nu-command/src/bits/shift_right.rs 68.11% <0.00%> (ø)
crates/nu-command/src/bits/xor.rs 87.01% <0.00%> (ø)
crates/nu-command/src/bytes/add.rs 92.70% <0.00%> (ø)
... and 142 more

... and 6 files with indirect coverage changes

@sholderbach
Copy link
Member Author

/usr/bin/time -v of the sg.nu synthetic benchmark

main branch:

        Command being timed: "./stuffed_nu ../nu_scripts/benchmarks/sg.nu"
        User time (seconds): 965.46
        System time (seconds): 2.62
        Percent of CPU this job got: 172%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 9:21.26
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 827816
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 206138
        Voluntary context switches: 121252
        Involuntary context switches: 2155
   ...
        Page size (bytes): 4096
        Exit status: 0

This PR:

        Command being timed: "./boxed_nu ../nu_scripts/benchmarks/sg.nu"
        User time (seconds): 953.02
        System time (seconds): 3.30
        Percent of CPU this job got: 177%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 8:58.74
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 827608
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 205820
        Voluntary context switches: 121116
        Involuntary context switches: 1109
   ...
        Page size (bytes): 4096
        Exit status: 0

(some background tasks were running so some potential bias/noise)

Maximum RSS is less affected than I expected (but might be result of the large working set due to the large number of source code loads.

@fdncred
Copy link
Collaborator

fdncred commented Mar 9, 2023

wow. i was looking at this earlier dreading the poor soul who was going to make the change. LOL! great job, as usual!

@sholderbach
Copy link
Member Author

wow. i was looking at this earlier dreading the poor soul who was going to make the change. LOL! great job, as usual!

Sadly this does not address all the clippy::result_large_err yet. So there are a few tedious lines to change in some places. (or yak-shaving on ShellError)

@sholderbach
Copy link
Member Author

Benchmarking experiment - focus data processing

To test with a bit more data processing I grabbed a tsv file of a few megabytes I had laying around with comments and some sparsely reoccuring value to group by and ran:

open my_workstuff.tsv --raw | lines | skip while {|line| $line | str starts-with '#'} | 
    to text | from tsv | 
    group-by some_loosely_duplicate_key

Results including display output

Including the output to the screen, the runtime improvement was less than 5% on wall time but the maximum resident size dropped by ~30%

main branch:

 Command being timed: "./stuffed_nu testscript.nu"
        User time (seconds): 2.41
        System time (seconds): 0.74
        Percent of CPU this job got: 43%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.26
       ...
        Maximum resident set size (kbytes): 1559736
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 353648
        Voluntary context switches: 6476
        Involuntary context switches: 5
       ...
        Page size (bytes): 4096
        Exit status: 0

this branch:

 Command being timed: "./boxed_nu testscript.nu"
        User time (seconds): 2.35
        System time (seconds): 0.59
        Percent of CPU this job got: 41%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.03
       ...
        Maximum resident set size (kbytes): 1122868
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 258953
        Voluntary context switches: 6481
        Involuntary context switches: 5
       ...
        Page size (bytes): 4096
        Exit status: 0

Results without costly display output

Hiding the effect of output by appending | length gave an improvement to the runtime greater 10%:

main branch:

Command being timed: "./stuffed_nu testscript.nu"
        User time (seconds): 0.89
        System time (seconds): 0.41
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.32
        ...
        Maximum resident set size (kbytes): 1558272
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 350768
        Voluntary context switches: 3
        Involuntary context switches: 1
        ...
        Page size (bytes): 4096
        Exit status: 0

this branch:

Command being timed: "./boxed_nu testscript.nu"
        User time (seconds): 0.85
        System time (seconds): 0.24
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.10
        ...
        Maximum resident set size (kbytes): 1120500
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 259771
        Voluntary context switches: 3
        Involuntary context switches: 3
        ...
        Page size (bytes): 4096
        Exit status: 0

@sholderbach sholderbach added the performance Work to make nushell quicker and use less resources label Mar 10, 2023
@sholderbach sholderbach marked this pull request as ready for review March 10, 2023 12:18
@sholderbach sholderbach added the pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes label Mar 10, 2023
@sholderbach
Copy link
Member Author

Marking as a breaking change as it may require changes to plugins to follow the change to Value

@sophiajt
Copy link
Member

Are we going to try to land this before the release? Seems big enough to want to wait until after imho

@sholderbach
Copy link
Member Author

Are we going to try to land this before the release? Seems big enough to want to wait until after imho

The diff is certainly big due to the type-error driven development ripple effect.

Active changes are intended to be non-functional and affect the crates/nu-protocol/src/value/mod.rs and the implementation of try only as I had to refactor things a bit to get a match statement to work.

The rest is basic fixing of the changed type.

Keeping this PR open for long will cause headaches when I want to continue making changes to ShellError as I have been doing in #8229 and friends
The conflicts in ShellError usage locations that would arise would most likely require manual intervention.

@sholderbach sholderbach merged commit a52386e into nushell:main Mar 12, 2023
18 checks passed
@sholderbach sholderbach deleted the box-shell-error branch March 13, 2023 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Work to make nushell quicker and use less resources pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants