Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libruby.so.2.4 dependency issue with google-protobuf ruby gem 3.5.1.1 #4210

Closed
qingling128 opened this issue Jan 22, 2018 · 27 comments · Fixed by grpc/grpc#14634
Closed

libruby.so.2.4 dependency issue with google-protobuf ruby gem 3.5.1.1 #4210

qingling128 opened this issue Jan 22, 2018 · 27 comments · Fixed by grpc/grpc#14634
Assignees
Labels

Comments

@qingling128
Copy link

When building a package with google-protobuf-3.5.1.1 on a Debian 8 machine, we ran into this dependency error.

    --> /opt/google-fluentd/embedded/lib/ruby/gems/2.4.0/gems/google-protobuf-3.5.1.1-x86_64-linux/lib/google/protobuf_c.so
    DEPENDS ON: libruby.so.2.4
      COUNT: 1
      PROVIDED BY: not found
      FAILED BECAUSE: Unresolved dependency

            [HealthCheck] I | 2018-01-22T19:20:04+00:00 | Health check time: 5.4719s
The health check failed! Please see above for important information.


/usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/health_check.rb:339:in `block in run!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/instrumentation.rb:23:in `measure'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/health_check.rb:239:in `run!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/health_check.rb:207:in `run!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/project.rb:1083:in `build'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/cli.rb:84:in `build'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/cli/base.rb:33:in `dispatch'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/cli.rb:42:in `execute!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/bin/omnibus:16:in `<top (required)>'
  bin/omnibus:29:in `load'
  bin/omnibus:29:in `<main>'

The issue is resolved after we pinned google-protobuf to 3.5.1 (GoogleCloudPlatform/google-fluentd#63).

@acozzette
Copy link
Member

Is this the same issue as #4235? If so, it should be fixed with the 3.5.1.2 gem.

@qingling128
Copy link
Author

Sounds like it. Let me try it out.

@qingling128
Copy link
Author

Hmm... Seems like the same issue persists for 3.5.1.2:

            [HealthCheck] E | 2018-01-31T23:01:52+00:00 | The following libraries have unsafe or unmet dependencies:
    --> /opt/google-fluentd/embedded/lib/ruby/gems/2.4.0/gems/google-protobuf-3.5.1.2-x86_64-linux/lib/google/protobuf_c.so

            [HealthCheck] E | 2018-01-31T23:01:52+00:00 | The following binaries have unsafe or unmet dependencies:

            [HealthCheck] E | 2018-01-31T23:01:52+00:00 | The following requirements could not be resolved:
    --> libruby.so.2.4

            [HealthCheck] E | 2018-01-31T23:01:52+00:00 | The precise failures were:
    --> /opt/google-fluentd/embedded/lib/ruby/gems/2.4.0/gems/google-protobuf-3.5.1.2-x86_64-linux/lib/google/protobuf_c.so
    DEPENDS ON: libruby.so.2.4
      COUNT: 1
      PROVIDED BY: not found
      FAILED BECAUSE: Unresolved dependency

            [HealthCheck] I | 2018-01-31T23:01:52+00:00 | Health check time: 3.8094s
The health check failed! Please see above for important information.

/usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/health_check.rb:339:in `block in run!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/instrumentation.rb:23:in `measure'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/health_check.rb:239:in `run!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/health_check.rb:207:in `run!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/project.rb:1083:in `build'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/cli.rb:84:in `build'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/cli/base.rb:33:in `dispatch'
  /usr/local/rvm/gems/ruby-2.4.1/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/lib/omnibus/cli.rb:42:in `execute!'
  /usr/local/rvm/gems/ruby-2.4.1/gems/omnibus-5.5.0/bin/omnibus:16:in `<top (required)>'
  bin/omnibus:29:in `load'
  bin/omnibus:29:in `<main>'

@qingling128
Copy link
Author

BTW, we are using Ruby 2.4.

@liujisi
Copy link
Contributor

liujisi commented Jan 31, 2018

We used the ruby building environment from grpc to release the gems. The difference between 3.5.1 and 3.5.1.1 is a newly supported ruby2.5 release. I don't know if there's anything changed for 2.4. Adding grpc folks to take a look: @apolcyn and @nicolasnoble

@apolcyn
Copy link

apolcyn commented Feb 2, 2018

The thing from the log of the OP that sticks out the most is:

/opt/google-fluentd/embedded/lib/ruby/gems/2.4.0/gems/google-protobuf-3.5.1.1-x86_64-linux/lib/google/protobuf_c.so

... the directory is not the platform specific

.../google/2.4/protobuf_c.so

but is instead trying to load the fallback binary normally expected to be used on source builds.

The grpc docker file updates that added ruby 2.5 support do make a change to that "fallback binary" in that it now depends on the libruby.so.2.4, but it's strange that it's getting used at all (I'm not actually certain why it's present in the binary grpc package).

@liujisi
Copy link
Contributor

liujisi commented Feb 2, 2018

I did another check. It seems only the x86_64_linux gem contains the protobuf_c.so outside of the platform specific directoryes.

├── lib
│   └── google
│       ├── 2.0
│       │   └── protobuf_c.so
│       ├── 2.1
│       │   └── protobuf_c.so
│       ├── 2.2
│       │   └── protobuf_c.so
│       ├── 2.3
│       │   └── protobuf_c.so
│       ├── 2.4
│       │   └── protobuf_c.so
│       ├── 2.5
│       │   └── protobuf_c.so
│       ├── protobuf
│       │   ├── any_pb.rb
│       │   ├── api_pb.rb
│       │   ├── duration_pb.rb
│       │   ├── empty_pb.rb
│       │   ├── field_mask_pb.rb
│       │   ├── message_exts.rb
│       │   ├── repeated_field.rb
│       │   ├── source_context_pb.rb
│       │   ├── struct_pb.rb
│       │   ├── timestamp_pb.rb
│       │   ├── type_pb.rb
│       │   ├── well_known_types.rb
│       │   └── wrappers_pb.rb
│       ├── protobuf_c.so
│       └── protobuf.rb

The google/protobuf_c.so doesn't exist in all other gems. Any idea why that would happen?

@apolcyn
Copy link

apolcyn commented Feb 2, 2018

The google/protobuf_c.so doesn't exist in all other gems. Any idea why that would happen?

Reading through rake-compiler extensiontask.rb for the root of this.... and this appears to be just the intended behavior of rake-compiler when doing a rake cross native gem. When rake cross native gem is ran and RUBY_CC_VERSIONS is set, rake-compiler adds one extra task to compile and link a gem's c-extension library against the current ruby version and current ruby platform, on top of all of the cross-compiled c-extension libraries built against rubies under ~/.rake-compiler/ and specified with the extension task config and RUBY_CC_VERSIONS. As I understand, by default rake-compiler will link against libruby.so.<version> (as it does for the native gem), but links against ruby statically when cross compiling, which explains why the dependency on libruby.so.2.4 exists for the fallback shared object but not the primary one under the 2.4 directory, and also why the fallback shared object exists only for the x86_64-linux platform (since that is the platform of the ruby invoking rake cross native gem.

What I'm still wondering though is why the 2.4/protobuf_c.so didn't get loaded correctly in the first place.

@liujisi
Copy link
Contributor

liujisi commented Feb 2, 2018

Thanks for the investigation. I'm wondering why this wasn't the case before we add the 2.5 support. Is there anyway to workaround the issue? e.g. force running the cross compile docker even the platform is the same; or static link libruby.so without cross compile?

Also, I checked the fallback protobuf_c.so, on the machine I built the gem:

ldd google-protobuf-3.5.1.2-x86_64-linux/lib/google/protobuf_c.so 
	linux-vdso.so.1 (0x00007ffd511d0000)
	libruby.so.2.4 => not found
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6c0d651000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6c0d34d000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6c0cfae000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f6c0dace000

The libruby.so.2.4 is also not found. There must be something wrong with the rake-compiler configuration..

@liujisi
Copy link
Contributor

liujisi commented Feb 2, 2018

Hmm, my installed ruby version was 2.3. But this should not matter as we use the docker environment. Also the gem can be installed and tested successfully in my machine as I'm probably hitting the platform specific extension.

@apolcyn
Copy link

apolcyn commented Feb 2, 2018

The version of ruby installed on your machine shouldn't make a difference, since the ruby installation installed with rvm in the docker file is what we're using to compile/link that "fallback" shared object. Is suspect that there is some difference between the installed ruby version in the new vs. old docker file.

Still though, the real bug here is the fact that the fallback binary is getting used at all. When a platform-specific gem package is installed that contains pre-built binaries, we should always be using the protobuf_c.so that's under the 2.4 directory (or a different 2.x directory if we're running a different version of ruby). The fact that lib/google/protobuf_c.so is attempted at all shows that the shared object under the 2.4 directory failed to load for some reason.

@liujisi
Copy link
Contributor

liujisi commented Feb 2, 2018

@blowmage Did you see any issues using the ruby2.4 native gem?

@blowmage
Copy link
Contributor

blowmage commented Feb 2, 2018

@pherl No, we don't see any issues running the native gem (x86_64-linux) using Ruby 2.4. Here is the latest CI build using that combination:

https://circleci.com/gh/GoogleCloudPlatform/google-cloud-ruby/3187#tests/containers/0

@apolcyn
Copy link

apolcyn commented Feb 2, 2018

Does anyone have a tip for how to reproduce this?

I am able to load google-protobuf without issues when running in a debian:jessie based dockerfile and using ruby 2.4 installed with rvm.

I am also able to require google/protobuf when version 3.5.1.1 is installed within the google-fluentd package: /opt/google-fluentd/embedded/lib/ruby/gems/2.4.0/gems/google-protobuf-3.5.1.1-x86_64-linux/lib/google/2.4/protobuf_c.so

On Debian 9, I installed the fluentd logging agent via

curl -sSO https://dl.google.com/cloudagents/install-logging-agent.sh
sudo bash install-logging-agent.sh

and then did:

sudo /opt/google-fluentd/embedded/bin/gem uninstall -I google-protobuf
sudo /opt/google-fluentd/embedded/bin/gem uninstall/install google-protobuf --version 3.5.1.1

... but it still works. I haven't yet tried running the logging agent with protobuf 3.5.1.1 on a debian 8 machine though. @qingling128 do you know if debian 8 is the only distribution that this failure happens on?


update I'm also seeing /usr/bin/google-fluentd run without problems when using protobuf 3.5.1.1 on Debian 8

@apolcyn
Copy link

apolcyn commented Feb 3, 2018

Actually, @qingling128, could you try the following: in your folder, change the following:

In lib/google/protobuf.rb, comment out the following lines this way:

if RUBY_PLATFORM == "java"
  require 'json'
  require 'google/protobuf_java'
else
#  begin
    require "google/#{RUBY_VERSION.sub(/\.\d+$/, '')}/protobuf_c"
#  rescue LoadError
#    require 'google/protobuf_c'
#  end
end

And then try to reload the extension. This will give us the actual error message that's being problematic here.

@qingling128
Copy link
Author

qingling128 commented Feb 5, 2018

@apolcyn - Thanks for taking a look!

I've tried commenting out the section above. The bundler tool we use automatically re-pull the gem when building the package though. I'll poke around to see if I can disable the auto pull.

As for the reproduction, the issues comes when we are building the package. The steps are:

  1. On any devbox / desktop
# Clone the repo.
git clone https://github.com/GoogleCloudPlatform/google-fluentd.git

# Switched to lingshi-removepin branch
cd google-fluentd && git checkout lingshi-removepin

# Zip the repo.
cd .. && tar --exclude=google-fluentd/.git -czvf google-fluentd.tar.gz google-fluentd

# SCP it onto a newly created Debian8 GCE VM.
gcloud compute scp google-fluentd.tar.gz <some_vm_name>:/tmp
  1. SSH into the VM and do the following.
# Become root.
sudo -s

# Install Ruby and bundler.
gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
curl -sSL https://get.rvm.io | bash -s stable
source /etc/profile.d/rvm.sh
rvm install 2.4
rvm use 2.4
gem install bundler
gem update

# Install dependencies
apt-get update
apt-get -y install git make g++ autoconf bzip2 fakeroot

# Untar the google-fluentd source and change to that directory.
cd /tmp
tar zxf google-fluentd.tar.gz
cd google-fluentd

# Build the package
bundle install --binstubs

bin/gem_downloader core_gems.rb
bin/gem_downloader plugin_gems.rb

mkdir -p /opt/google-fluentd /var/cache/omnibus
chown root /opt/google-fluentd
chown root /var/cache/omnibus

bin/omnibus build google-fluentd # This step is where it fails.

@apolcyn
Copy link

apolcyn commented Feb 6, 2018

@qingling128 thanks for the repro script. Unfortunately it is not producing the error for me though.

I have ran exactly those commands above, using a fresh debian 8 GCE machine, and bin/omnibus build google-fluentd exits cleanly with return code 0.

Any ideas?


update: FYI I tweaked this line in core_gems.rb to be download "google-protobuf", "3.5.1.1", in my repro attempt.

@qingling128
Copy link
Author

@apolcyn - Oops, the instruction of switching the branch in step 1 should be git checkout lingshi-removepin instead of git checkout -b lingshi-removepin. (I just updated the previous comment).

Updating the line to download "google-protobuf", "3.5.1.1" should have the same effect though. I was still able to reproduce this morning. Let me double check.

@qingling128
Copy link
Author

@apolcyn - Turned out that when I specified 3.5.1.1 explicitly, the build passed. But when I excluded the google-protobuf line (I ran this on a brand new VM instead of one that had a passed build already) and expected it to be pulled as a dependency, the build failed. The dependency chain is:

  • google-cloud-logging depends on google-gax.
  • google-gax depends on google-protobuf and grpc.
  • grpc also depends on google-protobuf.

It was not required to explicit specify this dependency when we were using Ruby 2.2 though. The PR is here.

@apolcyn
Copy link

apolcyn commented Feb 6, 2018

After looking into this, IMO the main problem is on this line, omnibus checks the health of every gem binary currently installed, even if that binary is never going to be used. When google-protobuf is pulled in an indirect dependency, the platform-specific gem is installed, with pre-build binaries available under all of the 2.x directories as well as the pre-built "fallback binary" in .../google/protobuf_c.so. Note that the fallback binary will never be used if the platform-specific gem was downloaded; it is only there for use by source-builds (see binary selection logic in here).

As for why the error happens only in certain situations here:
When google-protobuf exists in core_gems.rb, gem_downloader fetches and installs the source-only package of the google-protobuf package rather than the platform-specific one with pre-built binaries. When the source-only version of the protobuf package is built and installed, protobuf_c.so is dynamically linked against the ruby runtime (ruby.so.2.4 in this case) that it finds on the current system (in the rvm 2.4.1 directory in this case), and so everything is ok.

When we updated our docker file to be based off of rake-compiler-dock's Dockerfile, the fallback binary started getting dynamically linked to libruby, where as it didn't before, and that is what's causing the error on new versions but not old ones, however this is not a real problem IMO since that fallback binary isn't actually intended for use (it's sort of a side effect of the package build cross-compilation).

Possible fixes:

  1. Keep core_gems.rb updated with an entry for the latest google-protobuf (grpc will also be needed here started in its the next release).

  2. Somehow get omnibus to not inspect every binary in the installation directory (perhaps we can rm .../lib/google/protobuf_c.so at some point after installing it).

  3. Modify the grpc/protobuf docker file to not dynamically link the fallback binary on libruby, but I'd prefer to avoid this if possible.

@qingling128 are 1) or 2) possible here?

@qingling128
Copy link
Author

  1. "Keep core_gems.rb updated with an entry for the latest google-protobuf (grpc will also be needed here started in its the next release)."
  • The gems we use don't always pull the latest grpc. They normally have a pin (example) to the current major version at least. If the grpc latest version exceeds those pins, we might still get similar issues I guess. I'll do some experiment.
    Alternatively we can specify the gem versions in core_gems.rb manually as a temporary workaround (each time we upgrade the gems, we re-pin grpc and google-protobuf versions) if the long-term fix comes in a reasonable time since we don't upgrade these gems very often.
  1. Somehow get omnibus to not inspect every binary in the installation directory (perhaps we can rm .../lib/google/protobuf_c.so at some point after installing it).
  • Could be a temporary workaround. Will need to take a look at that part to figure out how much effort it is.
  1. Modify the grpc/protobuf docker file to not dynamically link the fallback binary on libruby, but I'd prefer to avoid this if possible.
  • As discussed in the PR, it would be nice to fix this in rake-compiler. Is there any estimate of how long that might take? It seems to be a third-party tool.

@apolcyn
Copy link

apolcyn commented Feb 8, 2018

@qingling128 thanks for comments, I agree that those choices are not ideal.

As discussed in the PR, it would be nice to fix this in rake-compiler. Is there any estimate of how long that might take? It seems to be a third-party tool.

I created a feature request for the rake-compiler gem. I don't have much of an idea for ETA.

@qingling128
Copy link
Author

@apolcyn - Thank you so much! I'll keep an eye on that feature request as well.

@TeBoring TeBoring added the ruby label Apr 18, 2018
robbkidd added a commit to chef/chef-workstation that referenced this issue May 15, 2018
Train 1.4.7 added Google GCP libraries. The google-protobuf gem does not coexist
in an omnibus build well because its C-extensions build seemingly for
every version of Ruby which are then caught by the omnibus health check for
.so files built against non-existent libraries.

Example of failure can be found on protocolbuffers/protobuf#4210

Applying this pin WHICH SHOULD TOTALLY BE REMOVED when we have solved
for omnibus packaging of google-protobuf.

Signed-off-by: Robb Kidd <robb@thekidds.org>
@TeBoring
Copy link
Contributor

TeBoring commented Jun 6, 2018

@apolcyn there is a similar issue in #4460
The linux x86_64 gem doesn't work on alpine. But alpine built gem can work on alpine.
Is it possible to force cross compiling even for x86-64?

@nicolasnoble
Copy link
Contributor

Alpine Linux has its own unique set of problems, and this is most likely unrelated to this issue here.

@brodock
Copy link

brodock commented Sep 14, 2018

I see 3.6.x is released. Does it suffer the same issue?

maxlazio pushed a commit to gitlabhq/gitlabhq that referenced this issue Oct 24, 2018
It looks like gRPC may have worked around
protocolbuffers/protobuf#4210 via
grpc/grpc#14634.

This is needed to support Ruby 2.5
(https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/22555).
@TeBoring TeBoring self-assigned this Dec 27, 2018
@haberman
Copy link
Member

Closing as obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment