Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commands can contain non-string values, but Redis server supports only strings. Consider enforcing this with the type system. #27

Open
jwoertink opened this issue Sep 19, 2022 · 5 comments

Comments

@jwoertink
Copy link
Contributor

While adding #26 I get this error from running the specs

ERR Protocol error: expected '$', got ':' (Redis::Error)
         from src/parser.cr:54:9 in 'read'
         from src/connection.cr:300:7 in 'read'
         from src/connection.cr:251:18 in 'run'
         from src/connection.cr:245:5 in 'run'
         from src/commands/hash.cr:39:5 in 'hincrby'
         from src/client.cr:30:7 in 'Redis::Client#hincrby<String, String, Int32>:(Array(Redis::Value) | Int64 | String | Nil)'
         from spec/redis_spec.cr:237:7 in '->'
         from /usr/share/crystal/src/spec/example.cr:45:13 in 'internal_run'
         from /usr/share/crystal/src/spec/example.cr:33:16 in 'run'
         from /usr/share/crystal/src/spec/context.cr:18:23 in 'internal_run'
         from /usr/share/crystal/src/spec/context.cr:339:7 in 'run'
         from /usr/share/crystal/src/spec/context.cr:18:23 in 'internal_run'
         from /usr/share/crystal/src/spec/context.cr:339:7 in 'run'
         from /usr/share/crystal/src/spec/context.cr:18:23 in 'internal_run'
         from /usr/share/crystal/src/spec/context.cr:156:7 in 'run'
         from /usr/share/crystal/src/spec/dsl.cr:220:7 in '->'
         from /usr/share/crystal/src/crystal/at_exit_handlers.cr:14:19 in 'run'
         from /usr/share/crystal/src/crystal/main.cr:50:5 in 'exit'
         from /usr/share/crystal/src/crystal/main.cr:45:5 in 'main'
         from /usr/share/crystal/src/crystal/main.cr:127:3 in 'main'
         from /lib/x86_64-linux-gnu/libc.so.6 in '??'
         from /lib/x86_64-linux-gnu/libc.so.6 in '__libc_start_main'
         from /home/jeremy/.cache/crystal/crystal-run-spec.tmp in '_start'
         from ???

@jgaskins
Copy link
Owner

I've been meaning to come up with a way to enforce this in the type system rather than at runtime such that encoding a command is meaningfully different from encoding an Enumerable containing arbitrary types (including other Enumerables) as an array over the wire.

In fact, I'm not actually sure if there's ever a reason for the client to encode arbitrary types by the client (everything I've looked at only uses strings), so maybe we remove support for encoding anything but Enumerable(String) and String over the wire?

struct Writer
  def encode(values : Enumerable(String))
    io << '*' << values.size << CRLF
    values.each { |part| encode part }
  end

  # :nodoc:
  def encode(string : String)
    io << '$' << string.bytesize << CRLF
    io << string << CRLF
  end
end

I'm a little uncertain of that idea, but in all the things I've been doing I haven't seen anything not use arrays of strings coming from the client. Maybe I should try monkey patching one of my apps to emit metrics if the other encode overloads are ever called. 😂

@jgaskins jgaskins changed the title Error expected $ got : Commands can contain non-string values, but Redis server supports only strings. Consider enforcing this with the type system. Sep 19, 2022
@z64
Copy link

z64 commented Apr 30, 2023

Hi,

It is perfectly legal to store binary data in Redis. It is one of the considerations in the protocol that this shard already uses: $ bulk strings are intended to store binary-safe strings that may include CRLF internally, as opposed to simple strings (+) that may also be binary data, but may not contain CRLF. This is safe because it is length-prefixed.

The following change will allow you to store Bytes:

--- a/src/commands.cr
+++ b/src/commands.cr
@@ -54,7 +54,7 @@ module Redis
     # sleep 1.second
     # redis.get("foo") # => nil
     # ```
-    def set(key : String, value : String, ex : (String | Int)? = nil, px : String | Int | Nil = nil, nx = false, xx = false, keepttl = false)
+    def set(key : String, value : String | Bytes, ex : (String | Int)? = nil, px : String | Int | Nil = nil, nx = false, xx = false, keepttl = false)
       command = {"set", key, value}
       command += {"ex", ex.to_s} if ex
       command += {"px", px.to_s} if px
--- a/src/writer.cr
+++ b/src/writer.cr
@@ -21,6 +21,13 @@ module Redis
       io << string << CRLF
     end

+    # :nodoc:
+    def encode(bytes : Bytes)
+      io << '$' << bytes.size << CRLF
+      io.write(bytes)
+      io << CRLF
+    end

Reading will of course still yield a crystal String, which will likely be (silently) invalid utf8. You can use to_slice to get bytes back.

Or, a "hacky" approach is something like:

--- a/src/connection.cr
+++ b/src/connection.cr
@@ -244,7 +244,7 @@ module Redis
         append:      Int64,
         decr:        Int64,
         decrby:      Int64,
-        get:         String?,
+        get:         String | Bytes?,
--- a/src/parser.cr
+++ b/src/parser.cr
@@ -45,7 +45,11 @@ module Redis
         if length >= 0
           value = @io.read_string length
           @io.skip 2 # Skip CRLF
-          value
+          if value.valid_encoding?
+            value
+          else
+            value.to_slice
+          end

Of course, it would be even better if you could tell the shard you want binary data back, and you can skip the extra processing of io.read_string. But I don't know how one would work that into the API as written.

@jgaskins
Copy link
Owner

jgaskins commented May 1, 2023

Indeed, and what I said above about strings being the only data type allowed in commands was misleading, I guess. That's technically Redis strings (this shard only ever emits RESP bulk strings, never simple strings) rather than Crystal's String type.

I actually store binary data in Redis with this shard pretty regularly (armature's Redis cache store serializes data into the cache using MessagePack). It's not technically correct with String because it's encoded as binary, but it does work completely transparently.

I'd love to add first-class support for Bytes to this shard. Adding it to the set command is easy enough, but I feel like add Bytes to the return types for get adds confusion. Feels like there needs to be something to distinguish it from the one that returns String | Nil. Do any of these seem reasonable?

  • get_bytes
  • get_binary
  • get(key, as: Bytes)

I don't know what makes the most sense here. I don't think I want to use the valid_encoding? trick to detect binary vs UTF-8 data, though. Automating it is great, but it operates in O(n) time, and while binary values will be plenty fast since it short-circuits when it detects an invalid UTF-8 codepoint, if large strings are being stored in Redis (for example, someone is serializing as JSON instead of MessagePack), it gets a lot slower. Here is a quick benchmark of running valid_encoding? on the same data structure serialized as JSON vs MessagePack:

   json  49.11k ( 20.36µs) (± 0.48%)  0.0B/op  1177.75× slower
msgpack  57.84M ( 17.29ns) (± 1.46%)  0.0B/op          fastest

Letting the caller tell us whether they want the data as a String vs Bytes should be fast enough for the general case to justify adding a whole new method for it.

@jwoertink
Copy link
Contributor Author

I didn't realize you can use msgpack with redis 🤯

I like those names though get_bytes or get(key, as: Bytes) I think makes sense.

@jgaskins
Copy link
Owner

jgaskins commented May 1, 2023

I posted PR #42 to discuss how to get binary data in and out of Redis without using improperly encoded Strings.

I didn't realize you can use msgpack with redis 🤯

Yep! I love it. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants