Skip to content
This repository has been archived by the owner on Apr 14, 2021. It is now read-only.

Bundler 1.16 incorrectly captures/stores BUNDLE_GEMFILE information into binstubs, which breaks in scenarios where multiple Gemfiles have been bundle installed. #6162

Closed
jchesterpivotal opened this issue Nov 8, 2017 · 19 comments

Comments

@jchesterpivotal
Copy link

Issue

Bundler 1.16 incorrectly captures/stores BUNDLE_GEMFILE information into binstubs, which breaks in scenarios where multiple Gemfiles have been bundle installed.

We think we did not observe this sooner because of the time it took for bundler 1.16 to propagate to our CI pipeline: first from Bundler to the standard ruby:2.4.2 Dockerfile, then from there to our CI-image-building job, and then a delay until a job using our CI image was triggered. But we believe the issue existed from the time that bundler 1.16 was released.

Investigation

Yesterday our CI builds began to fail inexplicably:

bundle exec rake
rake aborted!
LoadError: cannot load such file -- active_record/railtie

We investigated the commits leading up to the build, but none of them were related to Bundler, Docker, Rake, any Gemfiles etc.

We noticed that our logs had changed in an interesting way, however:

Seen in broken build:

Bundle complete! 12 Gemfile dependencies, 48 gems now installed.
Bundled gems are installed into `/usr/local/bundle`

Seen in working build immediately prior to broken build:

Bundle complete! 86 Gemfile dependencies, 220 gems now installed.
Bundled gems are installed into /usr/local/bundle.

At first we suspected we'd messed up our Gemfile, but it had not changed. We then realised that the "12 Gemfile dependencies" result was from a different Gemfile from the one in the directory.

Our CI / Docker process

How could this happen? Well, to begin with, we run all our builds inside Docker containers using Concourse. As an optimisation, we pre-build images that contain dependencies used by multiple codebases in our system. Then all CI builds use the common image. bundle exec is used to keep dependencies straight and previously worked fine.

Here is a Dockerfile based on our internal one:

FROM ruby:2.4.2

WORKDIR /usr/src/app

COPY package.json \
     first.gemfile \
     first.gemfile.lock \
     second.gemfile \
     second.gemfile.lock \
     third.gemfile \
     third.gemfile.lock \
     /usr/src/app/

RUN bundle config --global jobs 16 && \
    BUNDLE_GEMFILE=/usr/src/app/first.gemfile bundle install && \
    BUNDLE_GEMFILE=/usr/src/app/second.gemfile bundle install && \
    BUNDLE_GEMFILE=/usr/src/app/third.gemfile bundle install

When we fly intercept a broken build to poke inside a container using first.gemfile, we can recreate the failure behaviour. After some more poking around, we realise that the list of installed gems is from third.gemfile, not first.gemfile.

It seems as though, during the docker build used to create our CI image, Bundler has somehow locked onto the third BUNDLE_GEMFILE:

root@a8aa3692-997a-4779-6944-ab2e041a3bd2:/tmp/build/bb397313# bundle config
Settings are listed in order of priority. The top value will be used.
jobs
Set for the current user (/root/.bundle/config): "16"

app_config
Set via BUNDLE_APP_CONFIG: "/usr/local/bundle"

bin
Set via BUNDLE_BIN: "/usr/local/bundle/bin"

path
Set via BUNDLE_PATH: "/usr/local/bundle"

gemfile
Set via BUNDLE_GEMFILE: "/usr/src/app/third.gemfile"

A bit weird, but we figure easily fixed. But it is not fixed by bundle config --delete:

root@a8aa3692-997a-4779-6944-ab2e041a3bd2:/tmp/build/bb397313# bundle config --delete gemfile
root@a8aa3692-997a-4779-6944-ab2e041a3bd2:/tmp/build/bb397313# bundle config
Settings are listed in order of priority. The top value will be used.
jobs
Set for the current user (/root/.bundle/config): "16"

app_config
Set via BUNDLE_APP_CONFIG: "/usr/local/bundle"

bin
Set via BUNDLE_BIN: "/usr/local/bundle/bin"

path
Set via BUNDLE_PATH: "/usr/local/bundle"

gemfile
Set via BUNDLE_GEMFILE: "/usr/src/app/third.gemfile"

We check all the .bundle/config places the docs tell us to look, but none of them refer to BUNDLE_GEMFILE or third.gemfile.

So where is the configuration coming from?

Proximate cause of issue

After using grep over the whole FS, we find it being hard-coded into binstubs:

usr/local/bundle/bin/aws.rb
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/ldiff
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/rspec
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/oauth
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/rake
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/htmldiff
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/ruby-parse
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/bundle
44:    File.expand_path("../../../../src/app/third.gemfile", __FILE__)

usr/local/bundle/bin/nokogiri
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/pry
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/rackup
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/retry
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/rubocop
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/httparty
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/ruby-rewrite
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

usr/local/bundle/bin/coderay
15:ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../src/app/third.gemfile",

So that unless we set BUNDLE_GEMFILE every time we run bundler, it will not work for first.gemfile and second.gemfile cases.

This is a regression vs 1.15.4, where it worked as expected.

Workaround

Our workaround is to add this to our Dockerfile:

RUN gem uninstall --all --executables --force --install-dir /usr/local/lib/ruby/gems/2.4.0 bundler && \
    gem install bundler --version 1.15.4

ENV BUNDLER_VERSION=1.15.4
@jchesterpivotal
Copy link
Author

Likely related: #6154

@ghostsquad
Copy link

I'm seeing this too. @jchesterpivotal very thorough investigation and report. 😍

@ebeigarts
Copy link

ebeigarts commented Nov 30, 2017

This workaround stopped working for us

Step 25/27 : RUN bundle install --jobs=4 --path /bundle
 ---> Running in 49c2ddd726be
/usr/local/bundle/bin/bundle:23:in `load': cannot load such file -- /usr/local/lib/ruby/gems/2.4.0/gems/bundler-1.16.0/exe/bundle (LoadError)
	from /usr/local/bundle/bin/bundle:23:in `<main>'

We had to also add rm /usr/local/lib/ruby/gems/2.4.0/specifications/default/bundler-1.16.0.gemspec to fix this.

ENV BUNDLER_VERSION=1.15.4
RUN \
  gem uninstall --all --executables --force --install-dir /usr/local/lib/ruby/gems/2.4.0 bundler && \
  rm /usr/local/lib/ruby/gems/2.4.0/specifications/default/bundler-1.16.0.gemspec && \
  gem install bundler --version "$BUNDLER_VERSION"

@deivid-rodriguez
Copy link
Member

I'm not sure whether this is a bug in bundler or in the base ruby image. The base ruby image seems to make a lot of choices for its users by setting some bundler environment variables on it.

In particular, with the BUNDLER_BIN setting, it's telling bundler to install binstubs, and install them in a global path, so those binstubs inherit the last Gemfile context which installed each of them. These kind of choices seem out of scope for a base docker image to me...

In my case, doing ENV BUNDLER_BIN= right after FROM ruby:2.4.2 seems to do the trick.

@indirect
Copy link
Member

To be clear: Bundler already creates executables that don't capture a Gemfile as the regular RubyGems executables. Those executables are always created. Bundler binstubs are an optional addition, intended to be application-specific. Capturing the relevant Gemfile is a feature, not a bug.

If you don't want to use the captured Gemfile, either stop generating Bundler binstubs or set BUNDLE_GEMFILE.

@deivid-rodriguez
Copy link
Member

Makes total sense @indirect. So this is a new feature in 1.16 I assume? Do you agree with me that official ruby docker image should stop setting BUNDLE_BIN to /usr/local/bundle/bin by default from now on?

@deivid-rodriguez
Copy link
Member

I'm sure "bundler binstubs" are not a new feature per se (maybe the bundler binstub for bundler itself is, though), I'm just trying to figure out why stuff that worked in 1.15.4 no longer works.

@jchesterpivotal
Copy link
Author

I'd be fine being wrong if it's downstream in the docker image. But it was a surprising change in behaviour.

@indirect
Copy link
Member

indirect commented Dec 3, 2017

The intended change was to support situations like this one:

cd app1/subdir && /path/to/app2/bin/rake app2:task

If the binstub at /path/to/app2/bin/rake remembers the location of the app2 Gemfile, everything works as expected. Without the change we made, the app2 rake would find the app1 Gemfile in a parent directory, and the command would fail in a surprising and unexpected way.

The change in observed behavior might come down to a failure in imagination on our part... we didn't think that anyone would be generating application-specific binstubs and then trying to use those stubs in other applications. In this case, I feel like it's reasonable to expect users to run the RubyGems binstubs if they want application-independent behavior, and run app/bin/ stubs only if they want to run the command from that specific app. That said, I'm open to further discussion if you feel strongly otherwise.

@deivid-rodriguez
Copy link
Member

I see, the more I think about it the more this seems like an issue with ruby's official docker image configuration.

Ruby's official docker image is setting these in its base ruby image:

ENV GEM_HOME /usr/local/bundle
ENV BUNDLE_PATH="$GEM_HOME" \
	BUNDLE_BIN="$GEM_HOME/bin" \
	BUNDLE_SILENCE_ROOT_WARNING=1 \
	BUNDLE_APP_CONFIG="$GEM_HOME"
ENV PATH $BUNDLE_BIN:$PATH

So that means every docker image using ruby as its base image is supposed to host a single application (furthermore, a single Gemfile), otherwise each application's binstubs will be overwriting each other everytime they run bundle install. Am I getting this right?

@indirect
Copy link
Member

indirect commented Dec 3, 2017

@deivid-rodriguez yes, I believe that is correct. The config you pasted above looks extremely optimized for a single Ruby application, with a single Gemfile. I think you would likely want to remove some or all of those ENV settings if you have more than one application inside your container.

@deivid-rodriguez
Copy link
Member

deivid-rodriguez commented Dec 3, 2017

Yeah, the thing is docker makes it pretty hard (impossible?) to remove environment variables from a base image. Setting BUNDLE_BIN= (empty) is working for me, but still these seems like decisions out of scope for the official base image of the ruby language :S

@jchesterpivotal
Copy link
Author

The change in observed behavior might come down to a failure in imagination on our part... we didn't think that anyone would be generating application-specific binstubs and then trying to use those stubs in other applications.

To be clear, this comes about as a CI consideration. Running bundle for each test adds a lot of cumulative overhead. As an optimisation we build a single CI docker image for a range of ruby-based systems we test. It might be that we push this wheelbarrow back up to the Docker base image. I'm not super hopeful, given that Docker's general doctrine is that one container == one process.

We've also encountered similar-but-not-quite-the-same problems with monorepo projects. Bundler assumes you have a monolithic ruby codebase and that it still has a single Gemfile. But we have cases where a subdirectory has a Gemfile (server/Gemfile) and the toplevel directory has a Gemfile intended to serve a toplevel spec/ directory.

BUNDLE_GEMFILE works for these, once we work out the mismatch, but it'd be useful if bundle exec took account of the PWD in determining which Gemfile to work with.

@deivid-rodriguez
Copy link
Member

I'm not super hopeful, given that Docker's general doctrine is that one container == one process.

Yeah, I was thinking the same thing...

BUNDLE_GEMFILE works for these, once we work out the mismatch, but it'd be useful if bundle exec took account of the PWD in determining which Gemfile to work with.

I think #6201 might be an improvement in this regard?

@jchesterpivotal
Copy link
Author

I honestly don't feel qualified to say, but ... maybe? It's a pity @RochesterinNYC isn't still here at Pivotal, normally I'd have ambushed him for a closer look.

Our better move might be disentangling our Docker images. We've tried once or twice and been thrown back from Fort Fiddly Interactions. I have a longstanding grudge against Dockerfiles as a way of building images, mostly because they tend to encourage these kinds of ball-of-mud situations.

@indirect
Copy link
Member

indirect commented Dec 5, 2017

@jchesterpivotal I think I now understand what you're saying... bundle exec foo is not picking up $PWD/Gemfile, and is instead picking up whatever the gemfile was when bundle binstubs foo was run. That seems... bad. I think bundle exec foo should always find the Gemfile of the current directory (when BUNDLE_GEMFILE is not set) and then run the foo for that Gemfile.

@indirect
Copy link
Member

indirect commented Dec 5, 2017

Ugh, accidentally commented too soon.

Can you try Bundler built from master, now that #6201 has landed? I am hoping that it will fix that specific issue of bundle exec picking up the wrong gemfile.

@jchesterpivotal
Copy link
Author

As it happens I've rotated away from the affected team. One of @mbildner, @xtreme-debbie-chen or @pivotal-ivan-wang might need to carry this forward.

@jwarnier
Copy link

Duplicate of #6154

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants