-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ibf_dump_object_unsupported […] T_NONE (NotImplementedError) #436
Comments
Another one, probably related:
|
Thanks for the report. Based on the report, it's quite certainly a Ruby bug, triggered by Bootsnap perhaps, but can't be fixed by us.
So it will be best to open this upstream, but let's see if we can craft a good error report.
That's the start of an error report, it could be very useful if you could include the extended crash report, most notably the Overall I suspect that this is a write barrier issue of some sort, in other words some Ruby object is being garbage collected when it shouldn't, but that's just a guess. We might be able to reproduce this by compiling that big AWS file with cc @peterzhu2118 (in case this issue rings a bell). |
Here is the full output of the crash (wasn't sure if that was helpful at all, so I did not included it right away, sorry). Full output / [BUG] try to mark T_NONE object
|
Thank you for the full crash report, one thing though, are you sure bootsnap has anything to do with it? From the report it crashes in GC (like I suspected) when GC triggers inside the I'll open the bug report upstream. |
Actually, |
Report submitted upstream: https://bugs.ruby-lang.org/issues/19419 |
@casperisfine we are facing the exact same issue since our upgrade to ruby 3.2. Mostly in Github Actions also, but I have some reports from devs that have it locally. It's flaky but happens a lot. Disabling Also, it's maybe something else, but sounds really linked, when we don't have this error, we have this one:
I can't reproduce either. But I can try to contact some devs that have the issue locally. What would interest you the most? |
Yes. What is happening is that one reference isn't properly marked, so when gc trigger the slot is recycled. If the slot is empty you'll get that At this stage what could help:
|
I have the same error:
I can see kind of a pattern. In our case the backtrace shows that it comes from In addition, it is not happening always. We are using kubernetes for starting a bunch of pods and it is failing some times, we haven't found a way to reproduce it always. |
In our project's Ruby 3.2.0 update branch, it fails currently about ~10-20% of the time - that's just way too unwieldy in this setup. I tried to load /cc @casperisfine |
If you're setup to build Ruby with patches, try applying the following against Ruby 3.2.0: diff --git a/compile.c b/compile.c
index f252339ee5..5a9ad9d6cc 100644
--- a/compile.c
+++ b/compile.c
@@ -13062,7 +13062,7 @@ ibf_dump_memsize(const void *ptr)
static const rb_data_type_t ibf_dump_type = {
"ibf_dump",
{ibf_dump_mark, ibf_dump_free, ibf_dump_memsize,},
- 0, 0, RUBY_TYPED_WB_PROTECTED | RUBY_TYPED_FREE_IMMEDIATELY
+ 0, 0, RUBY_TYPED_FREE_IMMEDIATELY
};
static void If the patched version does not crash, it confirms an explanation for the bug. |
There has been a change in Ruby 3.2.1 in At first it looked like the initial issue ( The crash (#436 (comment)) however I do see with quite a bit lower frequency. |
The change in @tisba did you have the chance to try Alan's patch? Or any other information you could share for us to be able to reproduce? Your |
Unfortunately I don't have a ready to use setup to compile Ruby with patches (and lacking the time to research and set it up). I just spend some time this morning to try it again. I can't reliably reproduce the crash ( I created a repo: https://github.com/tisba/bootsnap-ruby-crash using Docker and a minimal
One observation that might be relevant: If I remove |
Interesting, so I'm able to repro with your docker image on an M1. I tried the same on our CI infra, so x86_64-linux, but without luck. I'll keep digging. |
I get the same results on |
Hum, actually I'm able to repro on macos:
So that makes debugging this easier. |
Ok, I confirm that @XrXr 's patch does make the issue disappear. So it's indeed a Write Barrier bug. Now the question is how to track it down. I'll report that finding upstream. |
Could you also confirm that it is related to nokogiri? |
In case it might help: I can't get it to fail on MacOS with |
I don't think it's related to Nokogiri, if I replace it by
|
I tried to review the change to That being said, |
It doesn't have the right write barriers in place. For example, there is rb_mark_set(dump->global_buffer.obj_table); in the mark function, but there is no corresponding write barrier when adding to the table in the `ibf_dump_object() -> ibf_table_find_or_insert() -> st_insert()` code path. To insert write barrier correctly, we need to store the T_STRUCT VALUE inside `struct ibf_dump`. Instead of doing that, let's just demote it to WB unproected for correctness. These dumper object are ephemeral so there is not a huge benefit for having them WB protected. Users of the bootsnap gem ran into crashes due to this issue: Shopify/bootsnap#436 Fixes [Bug #19419]
Ok, so Alan removed the write barrier and marked the patch as needing a backport, so this should be fixed in Ruby 3.2.2. Unfortunately there's not much bootsnap can do to work around this bug. On your end you may want to do some dirty hack such as: GC.disable
require "aws-sdk-ec2"
GC.enable I'll close this as there isn't much to do now. |
It doesn't have the right write barriers in place. For example, there is rb_mark_set(dump->global_buffer.obj_table); in the mark function, but there is no corresponding write barrier when adding to the table in the `ibf_dump_object() -> ibf_table_find_or_insert() -> st_insert()` code path. To insert write barrier correctly, we need to store the T_STRUCT VALUE inside `struct ibf_dump`. Instead of doing that, let's just demote it to WB unproected for correctness. These dumper object are ephemeral so there is not a huge benefit for having them WB protected. Users of the bootsnap gem ran into crashes due to this issue: Shopify/bootsnap#436 Fixes [Bug #19419]
Thanks a lot! I think in our case, we'll either wait for 3.2.2 to land or disable bootsnap in production. Regarding the workaround: I'm a little worried that in other GC heavy codepaths we might run into an issue as well (unless I misunderstood the underlying issue entirely). |
Well, the good news is that this only happens when loading code from bootsnap, generally speaking that doesn't happen at runtime, so if it crashes, it should only crash on boot. So if that workaround works on your machine, I wouldn't be worried about production. But yeah, disabling bootsnap ISeq caches (don't have to disable all of bootsnap) is probably the simplest assuming your app is small enough that the boot time isn't too bad. |
ah, okay, thanks for clarification. So testing the workaround to get the cache set up and using Disabling the ISeq caches would be an option. The app is not small but the added boot time should be tolerable temporarily. |
Remove ibf_dumper's WB_PROTECTED status It doesn't have the right write barriers in place. For example, there is rb_mark_set(dump->global_buffer.obj_table); in the mark function, but there is no corresponding write barrier when adding to the table in the `ibf_dump_object() -> ibf_table_find_or_insert() -> st_insert()` code path. To insert write barrier correctly, we need to store the T_STRUCT VALUE inside `struct ibf_dump`. Instead of doing that, let's just demote it to WB unproected for correctness. These dumper object are ephemeral so there is not a huge benefit for having them WB protected. Users of the bootsnap gem ran into crashes due to this issue: Shopify/bootsnap#436 Fixes [Bug #19419] --- compile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Remove ibf_dumper's WB_PROTECTED status It doesn't have the right write barriers in place. For example, there is rb_mark_set(dump->global_buffer.obj_table); in the mark function, but there is no corresponding write barrier when adding to the table in the `ibf_dump_object() -> ibf_table_find_or_insert() -> st_insert()` code path. To insert write barrier correctly, we need to store the T_STRUCT VALUE inside `struct ibf_dump`. Instead of doing that, let's just demote it to WB unproected for correctness. These dumper object are ephemeral so there is not a huge benefit for having them WB protected. Users of the bootsnap gem ran into crashes due to this issue: Shopify/bootsnap#436 Fixes [Bug #19419] --- compile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
See: Ruby upstream issue https://bugs.ruby-lang.org/issues/19419
See: Repo with reproduction case https://github.com/tisba/bootsnap-ruby-crash
Steps to reproduce
I'm having a hard time to produce a minimal version that reliably reproduces this error. Locally, with a
Gemfile
just containingaws-sdk-ec2
,nokogiri
andbootsnap
I have seen it too, but cannot reproduce that reliably on my local machine yet. It seems to happen on a slightly more regular basis in our GH Action CI run, but also not 100% of the time.According to the trace (see below) the
NotImplementedError
happens when loadingclients_api.rb
from the AWS SDK. Note thatclients_api.rb
is a MASSIVE file with >20,000 LOC. Maybe the size is (part of) the problem?Is there something I can do to provide more useful information to debug this issue?
Expected behavior
It should not crash.
Actual behavior
It crashes.
System configuration
Bootsnap version: 1.16.0
Ruby version: 3.2.0, 3.2.1 (on aarch64, x86)
Rails version: 7.0.4.2 (not relevant)
Backtrace
In our CI it fails when we do a sanity check of the Rails app via
RAILS_ENV=production bundle exec rails runner 'puts "Hello from #{Rails.env}!"'
:The text was updated successfully, but these errors were encountered: