Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bundler-cache with sassc leads to [BUG] Illegal instruction at 0x00007f9xxx FFI error #141

Closed
hidde-jan opened this issue Jan 8, 2021 · 7 comments

Comments

@hidde-jan
Copy link

I ran into a bug where loading sassc leads to a ffi error. At first, the following action setup ran fine, however, after removing installation of bundler and the extra bundle install step, all subsequent runs of the actions failed, because of an illegal instruction in libffi. The error occurred in the db setup step. This seems to be the case because there is a non portable cached binary for sassc in the bundler cache.

Removing the bundler-cache fixed my pipeline.

Action setup:

# This is a basic workflow to help you get started with Actions

name: Rails CI

# Controls when the action will run. 
on:
  # Triggers the workflow on push or pull request events but only for the master branch
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

jobs:
  verify:
    name: Build
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgis/postgis:11-2.5
        env:
          POSTGRES_USER: rails_github_actions
          POSTGRES_DB: rails_github_actions_test
          POSTGRES_PASSWORD: postgres
        ports: ["5432:5432"]
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v1
      - name: Set up Ruby
        uses: ruby/setup-ruby@v1
        with:
          bundler-cache: true # runs 'bundle install' and caches installed gems automatically
      - name: Set up Node
        uses: actions/setup-node@v1
        with:
          node-version: 10.13.0
      - name: Install dependencies
        run: |
          sudo apt-get -yqq install libpq-dev build-essential libcurl4-openssl-dev
          gem install bundler
          bundle install --jobs 4 --retry 3
          npm ci
      - name: Setup test database
        env:
          RAILS_ENV: test
          DATABASE_URL: postgres://rails_github_actions:postgres@localhost:${{ job.services.postgres.ports[5432] }}/rails_github_actions_test
        run: |
          bundle exec rails db:test:prepare
      - name: Run tests
        env:
          RAILS_ENV: test
          DATABASE_URL: postgres://rails_github_actions:postgres@localhost:${{ job.services.postgres.ports[5432] }}/rails_github_actions_test
        run: |
          bundle exec rails test

Error:

bundle exec rails db:test:prepare
3
  shell: /bin/bash -e {0}
4
  env:
5
    RAILS_ENV: test
6
    DATABASE_URL: ***localhost:5432/rails_github_actions_test
7
/home/runner/work/portfolio-api/portfolio-api/vendor/bundle/ruby/2.6.0/gems/ffi-1.12.2/lib/ffi/library.rb:112: [BUG] Illegal instruction at 0x00007f98039729b0
8
ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-linux]
9

10
-- Control frame information -----------------------------------------------
11
c:0067 p:---- s:0332 e:000331 CFUNC  :open
12
c:0066 p:0022 s:0326 e:000325 BLOCK  /home/runner/work/portfolio-api/portfolio-api/vendor/bundle/ruby/2.6.0/gems/ffi-1.12.2/lib/ffi/library.rb:112 [FINISH]
13
c:0065 p:---- s:0317 e:000316 CFUNC  :each
14
c:0064 p:0113 s:0313 e:000312 BLOCK  /home/runner/work/portfolio-api/portfolio-api/vendor/bundle/ruby/2.6.0/gems/ffi-1.12.2/lib/ffi/library.rb:109 [FINISH]
15
c:0063 p:---- s:0306 e:000305 CFUNC  :map
16
c:0062 p:0069 s:0302 e:000301 METHOD /home/runner/work/portfolio-api/portfolio-api/vendor/bundle/ruby/2.6.0/gems/ffi-1.12.2/lib/ffi/library.rb:99
17
c:0061 p:0079 s:0295 e:000294 CLASS  /home/runner/work/portfolio-api/portfolio-api/vendor/bundle/ruby/2.6.0/gems/sassc-2.2.1/lib/sassc/native.rb:11
18
c:0060 p:0007 s:0291 e:000290 CLASS  /home/runner/work/portfolio-api/portfolio-api/vendor/bundle/ruby/2.6.0/gems/sassc-2.2.1/lib/sassc/native.rb:6
19
c:0059 p:0014 s:0288 e:000287 TOP    /home/runner/work/portfolio-api/portfolio-api/vendor/bundle/ruby/2.6.0/gems/sassc-2.2.1/lib/sassc/native.rb:5 [FINISH]
20
c:0058 p:---- s:0285 e:000284 CFUNC  :require_relative
@eregon
Copy link
Member

eregon commented Jan 8, 2021

Do you have links to the logs (or the logs themselves), and could you share the image versions?
Did it run on Ubuntu 20.04 or 18.04?
I assume it ran on GitHub runners, correct?

The only thing I can imagine that could cause this, if it is not a bug of FFI or sassc, is that the OS image updated to a newer C++ runtime, that might not be binary compatible (C ABI is guaranteed compatible IIRC, but C++ might not).
For instance, actions/runner-images@afa8bfb#diff-6be9f9ce03ba57ca01e4ae937d5c2bb04893520fdf391ea87eb46acbc43ecdc5 mentioned clang might have been updated, although it seems that was postponed.

I don't there is anything this action can do, this is either a change in the image by actions/virtual-environments or a bug in FFI or sassc.
This action already uses separate caches per OS version, true Ruby ABI version, etc.
I'll close this because I can't imagine how it could be an issue of this action, but let's try to figure out further what is happening.

@eregon eregon closed this as completed Jan 8, 2021
@eregon
Copy link
Member

eregon commented Jan 8, 2021

We could potentially include the image version in the cache key potentially, if that is helpful to fix this.
The trade-off would be that the cache would be invalidated much more often though, and most of times not needed.

@myhro
Copy link

myhro commented Jan 13, 2021

I'm facing this problem since yesterday and I'm yet to find the proper solution to overcome it and still make use of the bundler-cache option.

The only thing I can imagine that could cause this, if it is not a bug of FFI or sassc, is that the OS image updated to a newer C++ runtime, that might not be binary compatible (C ABI is guaranteed compatible IIRC, but C++ might not).

Based on this upstream report, my understanding is that by default sassc is compiled in way that's optimized for the set of the instructions available for the current CPU. Any future run that makes use of it after being cached will only work if both CPUs, the one used to build it and the one using the cached version of it, have an equivalent set of instructions.

After clicking on the Re-run jobs multiple times, I realized that:

  • It worked on GitHub runners that had Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz and Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz processors, which have exactly the same set of instructions (copied from /proc/cpuinfo):
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc
rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2
movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt
xsavec xsaves md_clear
  • It failed with the [BUG] Illegal instruction catastrophic error when running on an older Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz which has a reduced instruction set:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc
rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2
movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single
pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear

In my specific use case, caching the portable version of the dependency, even if being a bit slower than the CPU-optimized one, would be better as the time took by bundle install is way bigger than the time taken to compile the .scss files. The main problem right now is that I cannot nuke the broken cache to refresh it with the portable version of sassc.

@eregon
Copy link
Member

eregon commented Jan 13, 2021

@myhro Thank you for the details, that makes sense, I was also thinking that -march=native -mtune=native might be the cause of this issue.
IMHO, this is a bug of sassc, it's the only gem that needs recompilation per CPU, and I think sassc should change the default, as I expressed in sass/sassc-ruby#146 (comment).

In my specific use case, caching the portable version of the dependency, even if being a bit slower than the CPU-optimized one, would be better as the time took by bundle install is way bigger than the time taken to compile the .scss files.

How do you do that?

The main problem right now is that I cannot nuke the broken cache to refresh it with the portable version of sassc.

More of a workaround, but for the purpose of testing, you could add a dependency on some new gem, and change its version, and each will have a different cache.
It's unfortunate that GitHub does not provide a way to remove a cache.

@eregon
Copy link
Member

eregon commented Jan 13, 2021

Potentially we could cache per CPU, but I wonder how many different CPUs there are for GitHub-hosted runners.
Also I think users expect that 2 runs on some given image (e.g., ubuntu-20.04) use the same cache, even if the CPU differs.

@eregon
Copy link
Member

eregon commented Jan 13, 2021

It seems sassc 2.3.0+ already default to false for -march=native -mtune=native, please update to sassc 2.3.0+: sass/sassc-ruby#146 (comment)
The original report uses sassc-2.2.1, which doesn't have the fix.

@myhro
Copy link

myhro commented Jan 13, 2021

I was trying to have a .bundle/config file with the required flags to disable the native compilation options, but indeed, by just updating to current sassc 2.4.0 version fixed it for me. I was using exactly the 2.2.1 one.

I can confirm that the cached version worked for the older Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz as well. Thank you very much for pointing that out.

joshpencheon added a commit to ukhsa-collaboration/data_management_system that referenced this issue Feb 24, 2021
joshpencheon added a commit to ukhsa-collaboration/data_management_system that referenced this issue Feb 25, 2021
* # CI: try and cache the bundle between builds

It currently takes ~5 minutes of compute time per job to do this.

* # update sassc, to try and avoid CPU model-specific issues

ref: ruby/setup-ruby#141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants