-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bundle protocol parser #38 #39
Conversation
Those results certainly look encouraging. |
Would it be easier to just leave the encoding part out of the protocol parser since it only applies to bulk-string replies anyway? |
lib/Mojo/Redis/Connection.pm
Outdated
my ($protocol, $msg) = @_; | ||
my $p = shift @{$self->{waiting} || []}; | ||
|
||
if ($msg->{type} eq '-') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't emit an error if there is an error contained within a multi-bulk reply somewhere; it seems like the current unwrapping method does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the best way to handle it, should a sub-error (like in ->exec response) cause the whole command to error? But we have no way to differentiate errors within the returned arrayref from strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work for array types? Doesn't it need to change the data from hashrefs to their contents (and so on if it contains an array...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to write a test for error inside arrays. Do you know how to provoke such an error? If so, then I think I want to return it as a single error, since that's what I support in Connection.pm:
mojo-redis/lib/Mojo/Redis/Connection.pm
Line 216 in 3901fef
return $p ? $p->reject($err) : $self->emit(error => $err) unless $res; |
6f803fb
to
d764919
Compare
my $type = $ref eq 'ARRAY' ? '*' : !$ref ? '$' : $message->{type}; | ||
|
||
if ($type eq '+' or $type eq '-' or $type eq ':') { | ||
$data = $encoder->encode($data, 0) if $encoder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these types should be encoded, because they are not binary safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of specifying 0 for encode and decode here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if you specify any second argument for encode or decode, you should copy the string
(like https://metacpan.org/source/SRI/Mojolicious-8.12/lib/Mojo/Util.pm#L134) otherwise Encode will modify it in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these types should be encoded, because they are not binary safe.
What does "binary safe" mean? What will be the consequence of doing what I do here? Why should we not encode "+", but encode "$"?
What is the purpose of specifying 0 for encode and decode here?
To make sure it returns the value instead of replacing in place.
Note that if you specify any second argument for encode or decode, you should copy the string
I don't agree. I think that is only correct for the functions, not the methods. https://metacpan.org/pod/Encode::Encoding#-%3Eencode($string-[,$check])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary safe means that bulk strings can contain any byte. Namely, simple strings, errors, and integers cannot contain the bytes \r\n
, and there is no guarantee that encoding values in some encoding will result in this. I added a test for this.
The encode specification is a mess. What you linked is the documentation for how Encode encoders maybe should work. But it's not how they work.
} | ||
elsif ($type eq '$') { | ||
if (defined $data) { | ||
$data = $encoder->encode($data, 0) if $encoder and $type ne ':'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already only for $ type so the type check here is extraneous
$curr->{data} += 0; | ||
} | ||
elsif ($encoder ||= $self->{encoder}) { | ||
$curr->{data} = $encoder->decode($curr->{data}, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise these types should not be decoded, they will always be ascii bytes.
@@ -46,8 +46,12 @@ sub write_q { | |||
|
|||
sub _encode { | |||
my $self = shift; | |||
my $protocol = $self->protocol; | |||
|
|||
return $protocol->encoding($self->encoding)->encode(\@_) if $protocol->can('encoding'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too fragile, it should check ->isa('Mojo::Redis::Protocol') specifically because the encode method has different behavior, if someone makes a subclass of Protocol::Redis that accidentally has a sub named encoding it would be bad.
|
||
Scalar::Util::weaken($self); | ||
|
||
if ($self->protocol->can('encoding')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for ->isa check
The protocol spec (https://redis.io/topics/protocol) unfortunately does not make any recommendation as to how to handle multi-bulk replies that contain errors, but it does include an example of one. This issue also discusses a similar problem. redis/node-redis#689 My opinion is that it should not cause an error condition, because there are still potentially other responses that the client needs to have returned (redis does not roll back successes). However, that would require some way to distinguish an error response within a returned arrayref from a regular string. |
Perhaps the embedded error response could be turned into an exception object like Mojo::Exception? |
Here is a simple way to get an error in multi-bulk response:
The response to the EXEC in the redis-cli is:
And the foo key remains successfully created. |
last CHUNK if length($$buf) - 2 < $self->{len}; # Wait for more data | ||
|
||
$curr->{data} = $self->{len} < 0 ? undef : substr $$buf, 2, $self->{len}, ''; | ||
$curr->{data} = $encoder->decode($curr->{data}, 0) if $encoder ||= $self->{encoder}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a defined check so it does not try to decode undef
if ($curr->{type} eq ':') { | ||
$curr->{data} += 0; | ||
} | ||
elsif ($encoder ||= $self->{encoder}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the $encoder variable really does anything useful in this sub
last CHUNK if $self->{stop} < 0; # Wait for more data | ||
|
||
$self->{len} = substr $$buf, 0, $self->{stop}, ''; | ||
last CHUNK if length($$buf) - 2 < $self->{len}; # Wait for more data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be - 4, because the \r\n before and after the string data are both still there
last CHUNK if $self->{stop} < 0; # Wait for more data | ||
|
||
$curr->{len} = substr $$buf, 0, $self->{stop}, ''; | ||
last CHUNK if length($$buf) - 2 < $curr->{len}; # Wait for more data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this line should be here. The size of the array does not indicate anything about how much data is required.
print $protocol->encode({type => "-", data => "foo"}); | ||
print $protocol->encode(["foo", "bar", "baz"]); | ||
|
||
while (my $msg = $protocol->parse($bytes)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This synopsis is wrong, since it will keep passing the same $bytes into the parser forever...
|
||
$protocol = Mojo::Redis::Protocol->new; | ||
|
||
=head2 parse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to be updated to match the current API (returns a list of messages)
I wrote a test file based on Protocol::Redis::Test plus some protocol error and encoding tests, it does not currently pass mostly because messages are all coming back with a 'level' key, and it blows up on the first split bulk string message. https://gist.github.com/Grinnz/7b5322e8dcf9b50e8ae2ac7cab0c6709 |
Sorry for wasting your time @Grinnz, but there seems to be too many pitfalls for me to finish this, so I've converted most of the code over to Protocol::Redis instead: I might revisit later on if I find the time. (I don't mind anyone else to finish it) |
I'm not quite sure how stable this protocol parser is, but it looks a lot faster 馃槃
What do you think?
Update 8. feb:
One of the big reasons why this is so much faster is that we don't have to loop over the data twice:
mojo-redis/lib/Mojo/Redis/Connection.pm
Line 178 in 3901fef
_on_message_cb()
is not in use at all when using Mojo::Redis::Protocol, which again saves at least one method call.In addition, there's a lot less state and method calls inside the actual parser.