Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxes record for smaller Value enum. #12252

Merged
merged 8 commits into from Mar 26, 2024
Merged

Conversation

FilipAndersson245
Copy link
Contributor

@FilipAndersson245 FilipAndersson245 commented Mar 20, 2024

Description

Boxes Record inside Value to reduce memory usage, Value goes from 72 -> 56 bytes after this change.

User-Facing Changes

Tests + Formatting

After Submitting

@FilipAndersson245
Copy link
Contributor Author

I ran the benchmark suit and saw no noticeable slowdown by the indirection of boxing records.

@sholderbach
Copy link
Member

Mhh we currently don't have any benchmarks for heavy cellpath access or records in general. Probably a gap worth filling.

Personally I would like to see the boxing route compared to a move of Record to a Vec<(String, Value)> which would also knock the size down to the same ceiling.
(tradeoffs all around: small vs big column counts, pure lookup, mutation)

@kubouch
Copy link
Contributor

kubouch commented Mar 21, 2024

Since this is a performance optimization, it would be great to have a benchmark demonstrating some performance benefit.

@devyn
Copy link
Contributor

devyn commented Mar 22, 2024

I think if this isn't a big performance regression and it improves memory usage, that would also be a win

@FilipAndersson245
Copy link
Contributor Author

@kubouch I was thinking of trying to add some benchmark for this by generating and accessing records, but I overall think it won't be any significant regression, I think the Mayer win would be that Value goes below 64 bytes and thereby could be loaded in one cache lane, compared to 2.

@FilipAndersson245
Copy link
Contributor Author

FilipAndersson245 commented Mar 22, 2024

I added a small tests creation time, and access time with a simple record of 1-1000 columns
there seem to be no mayor difference between the two

Before

Timer precision: 10 ns
Timer precision: 10 ns
benchmarks            fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ record                           │               │               │               │         │
   ├─ access                        │               │               │               │         │
   │  ├─ 1            14.65 µs      │ 92.82 µs      │ 15 µs         │ 16.63 µs      │ 100     │ 100
   │  ├─ 10           15.77 µs      │ 34.09 µs      │ 16.14 µs      │ 17.2 µs       │ 100     │ 100
   │  ├─ 100          23.94 µs      │ 239.5 µs      │ 24.54 µs      │ 28 µs         │ 100     │ 100
   │  ╰─ 1000         99.09 µs      │ 206.9 µs      │ 102.5 µs      │ 106.2 µs      │ 100     │ 100
   ╰─ create                        │               │               │               │         │
      ├─ 1            21.12 µs      │ 51.25 µs      │ 21.45 µs      │ 22.25 µs      │ 100     │ 100
      ├─ 10           29.98 µs      │ 60.01 µs      │ 30.7 µs       │ 32.42 µs      │ 100     │ 100
      ├─ 100          130 µs        │ 293.7 µs      │ 132.5 µs      │ 143.4 µs      │ 100     │ 100
      ╰─ 1000         1.296 ms      │ 1.844 ms      │ 1.353 ms      │ 1.382 ms      │ 100     │ 100

After

Timer precision: 10 ns
benchmarks            fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ record                           │               │               │               │         │
   ├─ access                        │               │               │               │         │
   │  ├─ 1            16.81 µs      │ 30.32 µs      │ 17.19 µs      │ 17.64 µs      │ 100     │ 100
   │  ├─ 10           17.69 µs      │ 31.13 µs      │ 18.11 µs      │ 18.4 µs       │ 100     │ 100
   │  ├─ 100          24.96 µs      │ 41.66 µs      │ 25.41 µs      │ 26.26 µs      │ 100     │ 100
   │  ╰─ 1000         88.65 µs      │ 192.7 µs      │ 105.8 µs      │ 107.6 µs      │ 100     │ 100
   ╰─ create                        │               │               │               │         │
      ├─ 1            20.93 µs      │ 41.1 µs       │ 21.37 µs      │ 21.86 µs      │ 100     │ 100
      ├─ 10           30.25 µs      │ 58.25 µs      │ 30.79 µs      │ 31.64 µs      │ 100     │ 100
      ├─ 100          134.7 µs      │ 219.3 µs      │ 139.3 µs      │ 141 µs        │ 100     │ 100
      ╰─ 1000         1.206 ms      │ 1.448 ms      │ 1.315 ms      │ 1.301 ms      │ 100     │ 100

benches/benchmarks.rs Outdated Show resolved Hide resolved
benches/benchmarks.rs Show resolved Hide resolved
@FilipAndersson245
Copy link
Contributor Author

FilipAndersson245 commented Mar 22, 2024

@sholderbach I added benchmark that expand to something like this

{col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col: {col_final: 0}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

And access it with $record.col.col.....
This should hit the indirection I would expect?

Before

Timer precision: 10 ns
benchmarks         fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ record                        │               │               │               │         │
   ├─ create                     │               │               │               │         │
   │  ├─ 1         17.78 µs      │ 89.86 µs      │ 18.29 µs      │ 21.76 µs      │ 100     │ 100
   │  ├─ 10        28.29 µs      │ 184.9 µs      │ 30.24 µs      │ 37 µs         │ 100     │ 100
   │  ├─ 100       134.4 µs      │ 265.6 µs      │ 140.6 µs      │ 153.5 µs      │ 100     │ 100
   │  ╰─ 1000      1.22 ms       │ 2.476 ms      │ 1.358 ms      │ 1.413 ms      │ 100     │ 100
   ├─ flat_access                │               │               │               │         │
   │  ├─ 1         16.64 µs      │ 734.5 µs      │ 18.56 µs      │ 29.6 µs       │ 100     │ 100
   │  ├─ 10        15.09 µs      │ 66.04 µs      │ 16 µs         │ 20.6 µs       │ 100     │ 100
   │  ├─ 100       22.16 µs      │ 75.01 µs      │ 22.58 µs      │ 24.79 µs      │ 100     │ 100
   │  ╰─ 1000      100.3 µs      │ 223.8 µs      │ 107.3 µs      │ 117.4 µs      │ 100     │ 100
   ╰─ nest_access                │               │               │               │         │
      ├─ 1         16.77 µs      │ 40.89 µs      │ 17.16 µs      │ 17.61 µs      │ 100     │ 100
      ├─ 2         15.36 µs      │ 43.98 µs      │ 15.82 µs      │ 16.99 µs      │ 100     │ 100
      ├─ 4         19.01 µs      │ 275.6 µs      │ 19.52 µs      │ 23.84 µs      │ 100     │ 100
      ├─ 8         19.26 µs      │ 69.84 µs      │ 20.32 µs      │ 25.32 µs      │ 100     │ 100
      ├─ 16        30.37 µs      │ 73.14 µs      │ 31.15 µs      │ 32.95 µs      │ 100     │ 100
      ├─ 32        53.08 µs      │ 180.1 µs      │ 54.37 µs      │ 60.28 µs      │ 100     │ 100
      ├─ 64        137 µs        │ 824.4 µs      │ 141.8 µs      │ 156.9 µs      │ 100     │ 100
      ╰─ 128       483.8 µs      │ 625.7 µs      │ 514.2 µs      │ 522.8 µs      │ 100     │ 100

After

Timer precision: 10 ns
benchmarks         fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ record                        │               │               │               │         │
   ├─ create                     │               │               │               │         │
   │  ├─ 1         20.38 µs      │ 88.34 µs      │ 20.77 µs      │ 23.22 µs      │ 100     │ 100
   │  ├─ 10        28.52 µs      │ 96.63 µs      │ 29.19 µs      │ 32.78 µs      │ 100     │ 100
   │  ├─ 100       137.7 µs      │ 257.1 µs      │ 140.3 µs      │ 148.2 µs      │ 100     │ 100
   │  ╰─ 1000      1.161 ms      │ 2.345 ms      │ 1.245 ms      │ 1.297 ms      │ 100     │ 100
   ├─ flat_access                │               │               │               │         │
   │  ├─ 1         16.32 µs      │ 53.61 µs      │ 16.68 µs      │ 17.73 µs      │ 100     │ 100
   │  ├─ 10        15.05 µs      │ 30.96 µs      │ 15.36 µs      │ 15.63 µs      │ 100     │ 100
   │  ├─ 100       23.75 µs      │ 129.5 µs      │ 33.42 µs      │ 36.74 µs      │ 100     │ 100
   │  ╰─ 1000      97.98 µs      │ 293.5 µs      │ 99.66 µs      │ 105.5 µs      │ 100     │ 100
   ╰─ nest_access                │               │               │               │         │
      ├─ 1         14.82 µs      │ 43.37 µs      │ 15.26 µs      │ 16.5 µs       │ 100     │ 100
      ├─ 2         15.37 µs      │ 39.79 µs      │ 15.75 µs      │ 16.8 µs       │ 100     │ 100
      ├─ 4         16.73 µs      │ 36.34 µs      │ 17.05 µs      │ 18.46 µs      │ 100     │ 100
      ├─ 8         19.79 µs      │ 45.33 µs      │ 20.4 µs       │ 21.61 µs      │ 100     │ 100
      ├─ 16        28.81 µs      │ 55.02 µs      │ 29.36 µs      │ 29.8 µs       │ 100     │ 100
      ├─ 32        60.1 µs       │ 854.1 µs      │ 64.71 µs      │ 90.8 µs       │ 100     │ 100
      ├─ 64        164.1 µs      │ 1.034 ms      │ 190.6 µs      │ 224.2 µs      │ 100     │ 100
      ╰─ 128       621.3 µs      │ 1.22 ms       │ 645.5 µs      │ 662.4 µs      │ 100     │ 100

It seems the penalty of indirection is hit roughly at 32 nested records but no significant regression bellow that.

@FilipAndersson245
Copy link
Contributor Author

Is this satisfactory, or is additional benchmarks required?

@kubouch
Copy link
Contributor

kubouch commented Mar 26, 2024

Seems good to me, there doesn't seem to be many regressions in the cases where it matters (I don't think >32 indirection is that common), and some benchmarks are even slightly faster.

@kubouch kubouch merged commit b70766e into nushell:main Mar 26, 2024
15 checks passed
@sholderbach
Copy link
Member

Thanks for bringing the receipts!

@FilipAndersson245 FilipAndersson245 deleted the BoxRecord branch March 26, 2024 17:54
@hustcer hustcer added this to the v0.92.0 milestone Mar 27, 2024
sholderbach added a commit to sholderbach/nushell that referenced this pull request Mar 29, 2024
sholderbach added a commit to sholderbach/nushell that referenced this pull request Mar 30, 2024
Only short-term solution

Competes with nushell#12305
sholderbach added a commit to sholderbach/nushell that referenced this pull request Mar 31, 2024
Only short-term solution

Competes with nushell#12305
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants