Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when ruby-protobuf decodes lots of messages with maps concurrently #3250

Closed
apolcyn opened this issue Jun 20, 2017 · 6 comments
Closed

Comments

@apolcyn
Copy link

apolcyn commented Jun 20, 2017

When looking into a grpc-ruby crash described in grpc/grpc#11331, it looks like the repro could be trimmed down to include only the ruby google/protobuf library.

What seems necessary to trigger the crash is:

  • large or lots of messages (garbage collector pressure needed?).
  • map types with any key type and a value of a Message type.
  • trying to decode these messages
  • at least some concurrency (it seems to need at least two threads to trigger).

The minimal repro seems to be:

repro.proto:

syntax = "proto3";

package repro;

message Outer {
        map<int32,Inner> items = 1;
}

message Inner {}

In the same directory, the repro.rb file looks like:

require_relative 'repro_pb'

thds = []

2.times do
  thds << Thread.new do
    o = Repro::Outer.new
    o.items[0] = Repro::Inner.new
    raw = Repro::Outer.encode(o)

    100000.times do
      Repro::Outer.decode(raw)
    end
  end
end

thds.each(&:join)

repro_pb.rb was generated with:

protoc -I. --ruby_out=. repro.proto

see generated repro_pb.rb

After generating, running ruby repro.rb should segfault during a decode call (I'm seeing it repro about 9/10 tries - increasing the number of calls to Repro::Outer.decode seems to cause it more frequently):

example stack trace

Environment details:

protoc --version gives libprotoc 3.3.0
(protoc built from head of grpc repo with make plugins, and used from the <grpc_repo>/bins/opt/protobuf/protoc)

gem list google-protobuf gives google-protobuf (3.3.0 x86_64-linux)

ruby platform: x86_64-linux
ruby version: 2.4.0

@haberman
Copy link
Member

Thanks so much for the detailed repro! This should make it not too difficult to get to the bottom of this.

@Simonot
Copy link

Simonot commented Jul 12, 2017

Hello @haberman ,

Is there any update on this issue ?

We are still struggling with few random segfault on our platform.

Also we have some issues when unmarshalling big map objects with concurrency.
We get strange values for a key in the ruby object, that does not match to the .proto binray event. When we replay the same .proto binary event, the value is correct. When we will have time we will try to make a repo to reproduce the mentioned issue, but it may be linked to this one. So we prefer waiting the updates on this topic first.

Regards

@tenderlove
Copy link
Contributor

Hi.

I think that the SEGV is due to the fact that the decoder is using a global variable to keep state. The comment says that it doesn't allow concurrent decodes, but I don't see any mechanisms that would actually prevent that. That means if there are threads, the array could be in an unknown state.

I rewrote the example a little to isolate the decoder:

require_relative 'repro_pb'

thds = []

o = Repro::Outer.new
o.items[0] = Repro::Inner.new
raw = Repro::Outer.encode(o)

thds = 2.times.map do
  Thread.new do
    100000.times do
      Repro::Outer.decode(raw)
    end
  end
end

thds.each(&:join)

This narrows the problem to the decoder.

I think #3560 will fix this. It fixes the repro case. I am just not sure if Map is the right place to put the stack. Regardless, the stack shouldn't be a global. Hope that helps!

tenderlove added a commit to tenderlove/protobuf that referenced this issue Aug 30, 2017
This makes the frame stack per-parser, and per-thread.  Fixes protocolbuffers#3250
@geigerj
Copy link

geigerj commented Sep 11, 2017

@apolcyn Can you make a minor release of gRPC so that we can use and verify the fix?

@apolcyn
Copy link
Author

apolcyn commented Sep 11, 2017

IIC, grpc shouldn't need a new release, since the fix for this is in google-protobuf and grpc isn't pinning the google-protobuf package?

@geigerj
Copy link

geigerj commented Sep 11, 2017

@apolcyn Oops, mistake on my part. Sorry!

@haberman Could we get a new release of google-protobuf?

TeBoring pushed a commit to TeBoring/protobuf that referenced this issue Sep 13, 2017
This makes the frame stack per-parser, and per-thread.  Fixes protocolbuffers#3250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants