Skip to content

Conversation

@mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Jan 5, 2024

Analogous to #1066 , this PR moves read decryption into the rayon thread pool. It uses exactly the same infrastructure (particularly DeferredQueue), and is implemented in a very similar way.

I need to do some benchmarking before this goes in, but wanted to open it for review early.

@mkeeter mkeeter requested review from jmpesp and leftwo January 5, 2024 22:30
@mkeeter
Copy link
Contributor Author

mkeeter commented Jan 8, 2024

I did some benchmarking on c12, using the following fio script:

[global]
filename=/dev/nvme0n1
iodepth=25
ioengine=aio
time_based
runtime=60
numjobs=1
direct=1
stonewall=1

[randwrite-4M]
bs=4M
rw=randwrite

[randread-4K]
bs=4K
rw=randread

[randread-1M]
bs=1M
rw=randread

[randread-4M]
bs=4M
rw=randread

(the 4M writes are to put some data on the virtual disk, which is otherwise blank)

I see the following performance (in MiB/s)

Read size main offload-read-decryption
4K 50.5 42.1
1M 164 216
4M 156 218

It looks like the overhead of moving decryption off-thread isn't worth it for small reads, which isn't totally surprising.

Doing a more detailed sweep, I see this:
Screenshot 2024-01-08 at 11 02 05 AM

The crossover point looks like about 8 KiB, so I'm going to tweak the code to decrypt immediately below that size.

@mkeeter
Copy link
Contributor Author

mkeeter commented Jan 8, 2024

Running with the change above (8f3ba46) to only defer > 8 KiB read decryption, I'm only seeing a small improvement (42.7 -> 44.2 MiB/s). This is still below the performance on main (50.6 MiB/s), so I think more investigation is needed.

@mkeeter
Copy link
Contributor Author

mkeeter commented Jan 9, 2024

Re-running on Crucible main (099c201), I see 44.5 MiB/s, so I suspect the 50.6 MiB/s reading was a fluke. This means that there's no performance regression, so we can move forward with the PR.

@faithanalog
Copy link
Contributor

I am not getting the exact same numbers you are on my test setup, but I am getting a similar ratio of improvement. about 40-45% better for 16k reads, 75-80% better for 4M reads.

@mkeeter mkeeter mentioned this pull request Jan 16, 2024
@mkeeter mkeeter force-pushed the offload-read-decryption branch from 264cbef to f2f67ff Compare January 22, 2024 15:08
@mkeeter mkeeter force-pushed the offload-read-decryption branch from 9d9f15f to a448bda Compare January 23, 2024 16:56
@mkeeter mkeeter merged commit df343e7 into oxidecomputer:main Jan 23, 2024
@mkeeter mkeeter deleted the offload-read-decryption branch January 23, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants