-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor string encoding to be consistent and more memory efficient #223
Merged
badrishc
merged 23 commits into
microsoft:main
from
PaulusParssinen:reduce-intermediate-str-allocations
Apr 5, 2024
Merged
Refactor string encoding to be consistent and more memory efficient #223
badrishc
merged 23 commits into
microsoft:main
from
PaulusParssinen:reduce-intermediate-str-allocations
Apr 5, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Let's not encourage allocations
* They're the samething aka "TryCopyTo" if it was BCL naming
* Update ASCII encoding logic to increment the curr pointer with the "actual" encoded byte count, just to be sure.
* Makes it clear what was the assumption for the input string
* Either they should all assume the input to be ASCII, or none of them, or there should be explicit overloads for "unsafe" assumptions.
* Now using the WriterDirectAscii
* Removed unnecessary explicit ROSpan<byte> casts from arrays
* Skip encoding of constant strings by making them u8-literals
* Remove couple allocations from WriteSimpleString calls
* Skip encoding of constant strings by making them u8-literals
badrishc
approved these changes
Apr 3, 2024
vazois
requested changes
Apr 4, 2024
libs/cluster/Server/Replication/PrimaryOps/PrimarySendCheckpoint.cs
Outdated
Show resolved
Hide resolved
vazois
approved these changes
Apr 4, 2024
altall
pushed a commit
to altall/garnet
that referenced
this pull request
Apr 5, 2024
…icrosoft#223) * Remove intermediate array allocations from ASCII decoding * Update RespWriteUtils to take spans instead of arrays * Let's not encourage allocations * Merge WriteResponse with WriteDirect * They're the samething aka "TryCopyTo" if it was BCL naming * Remove intermediate array allocations from ASCII string encoding * Add WriteAsciiDirect to RespWriteUtils * Update ASCII encoding logic to increment the curr pointer with the "actual" encoded byte count, just to be sure. * Rename WriteBulkString to WriteAsciiBulkString * Makes it clear what was the assumption for the input string * Make all simple string encoding variants behave consistently * Either they should all assume the input to be ASCII, or none of them, or there should be explicit overloads for "unsafe" assumptions. * Remove intermediate array allocations from ASCII string encoding * Now using the WriterDirectAscii * Remove more intermediate array allocations from ASCII string encoding * Removed unnecessary explicit ROSpan<byte> casts from arrays * Remove more intermediate array allocations from ASCII string encoding * Skip encoding of constant strings by making them u8-literals * Remove more intermediate array allocations from ASCII string encoding * Remove couple allocations from WriteSimpleString calls * Remove more intermediate array allocations from ASCII string encoding * Skip encoding of constant strings by making them u8-literals * Fix merge * Add WriteUtf8BulkString overload to RespWriteUtils * Add simple regression test for unicode SET value * and watch as tests hang * Fix GarnetClient encoding of unicode bulk strings as keys or values * Add missing newline to generic error response in ReplicaOfCommand * Do not repeat the error message in PrimarySendCheckpoint
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR:
WriteResponse
withWriteDirect
. Same logic.CustomFunctions
that called underlyingWriteSimpleString
with the buffer length calculated usingEncoding.ASCII.GetByteCount
, while inRespWriteUtils
the parameters were assumed to be well formed ASCII. In my opinion, there's couple options here;Encoding.ASCII.GetByteCount
calculation (it's tiny bit overhead: a fast path doing vectorized scan for any invalid ASCII and depending on the output falling on slow path)WriteBulkString
toWriteAsciiBulkString
to make it clear to caller the underlying assumptions of the API.Encoding.UTF8.GetBytes
(which can expandstring
ofx
characters up to3*(x+1)
bytes). So I also changed the logic to useEncoding.ASCII.GetBytes
.If we want to format UTF8 strings as bulk strings, we can addWriteUtf8BulkString
overload with correct length checks.WriteUtf8BulkString
overload to fix Char that occupy multiple bytes in string as key or value would make GarnetClient.ExecuteForStringResultAsync() unable to continue execution. #236WriteAsciiDirect
orWriteAsciiBulkString
.data
section using theu8
-literals to avoid repeated re-encodingFixes #236