Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor string encoding to be consistent and more memory efficient #223

Merged

Conversation

PaulusParssinen
Copy link
Contributor

@PaulusParssinen PaulusParssinen commented Apr 2, 2024

In this PR:

  • Merged WriteResponse with WriteDirect. Same logic.
  • Made ASCII encoding consistent across the overloads.
    • More specifically, there were couple overloads in CustomFunctions that called underlying WriteSimpleString with the buffer length calculated using Encoding.ASCII.GetByteCount, while in RespWriteUtils the parameters were assumed to be well formed ASCII. In my opinion, there's couple options here;
      1. either all public methods that feed input into ASCII encoding should be changed to do the "proper" Encoding.ASCII.GetByteCount calculation (it's tiny bit overhead: a fast path doing vectorized scan for any invalid ASCII and depending on the output falling on slow path)
      2. keep current behaviour and trust the input
      3. offer two set of overloads, other maybe internal and explicitly documented to expect well formed ASCII
  • Renamed overloads such as WriteBulkString to WriteAsciiBulkString to make it clear to caller the underlying assumptions of the API.
    • The previous logic checked the available buffer space using the assumption that the input was well formed ASCII, and then formatted the input using Encoding.UTF8.GetBytes (which can expand string of x characters up to 3*(x+1) bytes). So I also changed the logic to use Encoding.ASCII.GetBytes. If we want to format UTF8 strings as bulk strings, we can add WriteUtf8BulkString overload with correct length checks.
  • Added WriteUtf8BulkString overload to fix Char that occupy multiple bytes in string as key or value would make GarnetClient.ExecuteForStringResultAsync() unable to continue execution. #236
    • note: This logic can be optimized by common pattern used by serializers where calculating maximum buffer sizes is often cheapr than calculating exact sizes. However because we need to encode the exact size as ASCII digits, there's isn't straightforward way to do that here. (as always, variable length elements are annoying in serialization).
  • Now thanks to earlier renaming, safely skip intermediate array allocations and instead encode & copy in WriteAsciiDirect or WriteAsciiBulkString
  • Also move more constant strings in .data section using the u8-literals to avoid repeated re-encoding

Fixes #236

* They're the samething aka "TryCopyTo" if it was BCL naming
* Update ASCII encoding logic to increment the curr pointer with the "actual" encoded byte count, just to be sure.
* Makes it clear what was the assumption for the input string
* Either they should all assume the input to be ASCII, or none of them, or there should be explicit overloads for "unsafe" assumptions.
* Removed unnecessary explicit ROSpan<byte> casts from arrays
* Skip encoding of constant strings by making them u8-literals
* Remove couple allocations from WriteSimpleString calls
* Skip encoding of constant strings by making them u8-literals
@PaulusParssinen PaulusParssinen marked this pull request as ready for review April 2, 2024 15:59
@TalZaccai TalZaccai requested a review from vazois April 2, 2024 18:43
@badrishc badrishc requested a review from TedHartMS April 3, 2024 21:56
@PaulusParssinen PaulusParssinen marked this pull request as ready for review April 4, 2024 17:42
@PaulusParssinen PaulusParssinen changed the title Reduce intermediate allocations from RESP writing Refactor bulk string encoding to be consistent and more memory efficient Apr 4, 2024
@PaulusParssinen PaulusParssinen changed the title Refactor bulk string encoding to be consistent and more memory efficient Refactor string encoding to be consistent and more memory efficient Apr 4, 2024
@badrishc badrishc merged commit f2ee39a into microsoft:main Apr 5, 2024
21 checks passed
altall pushed a commit to altall/garnet that referenced this pull request Apr 5, 2024
…icrosoft#223)

* Remove intermediate array allocations from ASCII decoding

* Update RespWriteUtils to take spans instead of arrays

* Let's not encourage allocations

* Merge WriteResponse with WriteDirect

* They're the samething aka "TryCopyTo" if it was BCL naming

* Remove intermediate array allocations from ASCII string encoding

* Add WriteAsciiDirect to RespWriteUtils

* Update ASCII encoding logic to increment the curr pointer with the "actual" encoded byte count, just to be sure.

* Rename WriteBulkString to WriteAsciiBulkString

* Makes it clear what was the assumption for the input string

* Make all simple string encoding variants behave consistently

* Either they should all assume the input to be ASCII, or none of them, or there should be explicit overloads for "unsafe" assumptions.

* Remove intermediate array allocations from ASCII string encoding

* Now using the WriterDirectAscii

* Remove more intermediate array allocations from ASCII string encoding

* Removed unnecessary explicit ROSpan<byte> casts from arrays

* Remove more intermediate array allocations from ASCII string encoding

* Skip encoding of constant strings by making them u8-literals

* Remove more intermediate array allocations from ASCII string encoding

* Remove couple allocations from WriteSimpleString calls

* Remove more intermediate array allocations from ASCII string encoding

* Skip encoding of constant strings by making them u8-literals

* Fix merge

* Add WriteUtf8BulkString overload to RespWriteUtils

* Add simple regression test for unicode SET value

* and watch as tests hang

* Fix GarnetClient encoding of unicode bulk strings as keys or values

* Add missing newline to generic error response in ReplicaOfCommand

* Do not repeat the error message in PrimarySendCheckpoint
@PaulusParssinen PaulusParssinen deleted the reduce-intermediate-str-allocations branch April 5, 2024 10:38
@github-actions github-actions bot locked and limited conversation to collaborators Jun 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants