Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle protocol parser #38 #39

Closed
wants to merge 4 commits into from
Closed

Bundle protocol parser #38 #39

wants to merge 4 commits into from

Conversation

jhthorsen
Copy link
Owner

@jhthorsen jhthorsen commented Feb 7, 2019

I'm not quite sure how stable this protocol parser is, but it looks a lot faster 馃槃

What do you think?

TEST_BENCHMARK=1000 TEST_ONLINE=redis://localhost prove -vl t/benchmark.t
t/benchmark.t ..
ok 1 - lrange Mojo::Redis::Protocol 2.55848 wallclock secs ( 2.39 usr +  0.05 sys =  2.44 CPU) @ 409.84/s (n=1000)
ok 2 - lrange Protocol::Redis 6.05703 wallclock secs ( 5.86 usr +  0.07 sys =  5.93 CPU) @ 168.63/s (n=1000)
ok 3 - lrange Protocol::Redis::XS 1.85635 wallclock secs ( 1.70 usr +  0.05 sys =  1.75 CPU) @ 571.43/s (n=1000)
# Cannot compare Redis/PP and Redis2/PP
                       Rate Protocol::Redis Mojo::Redis::Protocol Protocol::Redis::XS
Protocol::Redis       169/s              --                  -59%                -70%
Mojo::Redis::Protocol 410/s            143%                    --                -28%
Protocol::Redis::XS   571/s            239%                   39%                  --
1..3
ok
All tests successful.
Files=1, Tests=3, 11 wallclock secs ( 0.02 usr  0.01 sys + 10.23 cusr  0.19 csys = 10.45 CPU)
Result: PASS

Update 8. feb:

One of the big reasons why this is so much faster is that we don't have to loop over the data twice:

sub _parse_message_cb {

_on_message_cb() is not in use at all when using Mojo::Redis::Protocol, which again saves at least one method call.

In addition, there's a lot less state and method calls inside the actual parser.

Protocol::Redis       167/s              --                  -60%                -71%
Mojo::Redis::Protocol 422/s            153%                    --                -26%
Protocol::Redis::XS   571/s            243%                   35%                  --

@jhthorsen jhthorsen added the enhancement New feature or request label Feb 7, 2019
@jhthorsen jhthorsen self-assigned this Feb 7, 2019
@jhthorsen jhthorsen requested a review from Grinnz February 7, 2019 13:37
@kraih
Copy link

kraih commented Feb 7, 2019

Those results certainly look encouraging.

lib/Mojo/Redis/Protocol.pm Outdated Show resolved Hide resolved
@Grinnz
Copy link
Collaborator

Grinnz commented Feb 7, 2019

Would it be easier to just leave the encoding part out of the protocol parser since it only applies to bulk-string replies anyway?

my ($protocol, $msg) = @_;
my $p = shift @{$self->{waiting} || []};

if ($msg->{type} eq '-') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't emit an error if there is an error contained within a multi-bulk reply somewhere; it seems like the current unwrapping method does this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the best way to handle it, should a sub-error (like in ->exec response) cause the whole command to error? But we have no way to differentiate errors within the returned arrayref from strings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work for array types? Doesn't it need to change the data from hashrefs to their contents (and so on if it contains an array...)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to write a test for error inside arrays. Do you know how to provoke such an error? If so, then I think I want to return it as a single error, since that's what I support in Connection.pm:

return $p ? $p->reject($err) : $self->emit(error => $err) unless $res;

my $type = $ref eq 'ARRAY' ? '*' : !$ref ? '$' : $message->{type};

if ($type eq '+' or $type eq '-' or $type eq ':') {
$data = $encoder->encode($data, 0) if $encoder;
Copy link
Collaborator

@Grinnz Grinnz Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these types should be encoded, because they are not binary safe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of specifying 0 for encode and decode here?

Copy link
Collaborator

@Grinnz Grinnz Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you specify any second argument for encode or decode, you should copy the string
(like https://metacpan.org/source/SRI/Mojolicious-8.12/lib/Mojo/Util.pm#L134) otherwise Encode will modify it in place.

Copy link
Owner Author

@jhthorsen jhthorsen Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these types should be encoded, because they are not binary safe.

What does "binary safe" mean? What will be the consequence of doing what I do here? Why should we not encode "+", but encode "$"?

What is the purpose of specifying 0 for encode and decode here?

To make sure it returns the value instead of replacing in place.

Note that if you specify any second argument for encode or decode, you should copy the string

I don't agree. I think that is only correct for the functions, not the methods. https://metacpan.org/pod/Encode::Encoding#-%3Eencode($string-[,$check])

Copy link
Collaborator

@Grinnz Grinnz Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary safe means that bulk strings can contain any byte. Namely, simple strings, errors, and integers cannot contain the bytes \r\n, and there is no guarantee that encoding values in some encoding will result in this. I added a test for this.

The encode specification is a mess. What you linked is the documentation for how Encode encoders maybe should work. But it's not how they work.

}
elsif ($type eq '$') {
if (defined $data) {
$data = $encoder->encode($data, 0) if $encoder and $type ne ':';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already only for $ type so the type check here is extraneous

$curr->{data} += 0;
}
elsif ($encoder ||= $self->{encoder}) {
$curr->{data} = $encoder->decode($curr->{data}, 0);
Copy link
Collaborator

@Grinnz Grinnz Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise these types should not be decoded, they will always be ascii bytes.

@@ -46,8 +46,12 @@ sub write_q {

sub _encode {
my $self = shift;
my $protocol = $self->protocol;

return $protocol->encoding($self->encoding)->encode(\@_) if $protocol->can('encoding');
Copy link
Collaborator

@Grinnz Grinnz Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too fragile, it should check ->isa('Mojo::Redis::Protocol') specifically because the encode method has different behavior, if someone makes a subclass of Protocol::Redis that accidentally has a sub named encoding it would be bad.


Scalar::Util::weaken($self);

if ($self->protocol->can('encoding')) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for ->isa check

@Grinnz
Copy link
Collaborator

Grinnz commented Feb 8, 2019

The protocol spec (https://redis.io/topics/protocol) unfortunately does not make any recommendation as to how to handle multi-bulk replies that contain errors, but it does include an example of one.

This issue also discusses a similar problem. redis/node-redis#689

My opinion is that it should not cause an error condition, because there are still potentially other responses that the client needs to have returned (redis does not roll back successes). However, that would require some way to distinguish an error response within a returned arrayref from a regular string.

@Grinnz
Copy link
Collaborator

Grinnz commented Feb 8, 2019

Perhaps the embedded error response could be turned into an exception object like Mojo::Exception?

@Grinnz
Copy link
Collaborator

Grinnz commented Feb 8, 2019

Here is a simple way to get an error in multi-bulk response:

MULTI
SET foo bar
LPOP foo
EXEC

The response to the EXEC in the redis-cli is:

1) OK
2) (error) WRONGTYPE Operation against a key holding the wrong kind of value

And the foo key remains successfully created.

last CHUNK if length($$buf) - 2 < $self->{len}; # Wait for more data

$curr->{data} = $self->{len} < 0 ? undef : substr $$buf, 2, $self->{len}, '';
$curr->{data} = $encoder->decode($curr->{data}, 0) if $encoder ||= $self->{encoder};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a defined check so it does not try to decode undef

if ($curr->{type} eq ':') {
$curr->{data} += 0;
}
elsif ($encoder ||= $self->{encoder}) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the $encoder variable really does anything useful in this sub

last CHUNK if $self->{stop} < 0; # Wait for more data

$self->{len} = substr $$buf, 0, $self->{stop}, '';
last CHUNK if length($$buf) - 2 < $self->{len}; # Wait for more data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be - 4, because the \r\n before and after the string data are both still there

last CHUNK if $self->{stop} < 0; # Wait for more data

$curr->{len} = substr $$buf, 0, $self->{stop}, '';
last CHUNK if length($$buf) - 2 < $curr->{len}; # Wait for more data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this line should be here. The size of the array does not indicate anything about how much data is required.

print $protocol->encode({type => "-", data => "foo"});
print $protocol->encode(["foo", "bar", "baz"]);

while (my $msg = $protocol->parse($bytes)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This synopsis is wrong, since it will keep passing the same $bytes into the parser forever...


$protocol = Mojo::Redis::Protocol->new;

=head2 parse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be updated to match the current API (returns a list of messages)

@Grinnz
Copy link
Collaborator

Grinnz commented Feb 8, 2019

I wrote a test file based on Protocol::Redis::Test plus some protocol error and encoding tests, it does not currently pass mostly because messages are all coming back with a 'level' key, and it blows up on the first split bulk string message. https://gist.github.com/Grinnz/7b5322e8dcf9b50e8ae2ac7cab0c6709

@jhthorsen
Copy link
Owner Author

Sorry for wasting your time @Grinnz, but there seems to be too many pitfalls for me to finish this, so I've converted most of the code over to Protocol::Redis instead:

und3f/protocol-redis#8

I might revisit later on if I find the time. (I don't mind anyone else to finish it)

@jhthorsen jhthorsen closed this Feb 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants