New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault related to scheme_intern_exact_symbol #2882
Comments
This is likely to be the use of the SIGSEGV handler for the GC write barrier. When starting GDB on Racket, you should do |
Is there some good documentation on the GC write barrier I could look at? It would be nice to have a clearer idea of what's happening under the hood.
Thanks, hopefully this won't mask too many errors in my own code. |
Actually - is it possible that Racket's SIGSEGV handler is masking a SIGSEGV elsewhere in a program that uses embedded Racket, and this is causing my program to hang instead of crash and generate a backtrace? |
@elfprince13 that shouldn't happen -- I've had plenty of segfaults that I've debugged successfully. I don't think there's any regular Racket documentation of the signal handling behavior, but the GC approach in traditional Racket is described in these two papers: https://www.cs.utah.edu/plt/publications/ismm04-wf.pdf https://www.cs.utah.edu/plt/publications/ismm09-rwrf.pdf |
Thanks, I'll take a look =) I think something funky is going on, because I added some code to swap out the Racket segfault handler while my interpreter coroutine is suspended, and now it crashes instead of hanging, but earlier. It just occurred to me that if the GC is running in a separate thread, and kept chugging along while the interpreter was suspended, that it probably still needed its own handler in place. |
The GC doesn't run in a different thread. You could try disabling GC and see if that helps. |
You can configure with |
Okay, so, I'm happy to just do that for now, although it would be nice to figure out why, and see if I can develop a work-around. I can't post any source code, but roughly, I have a C++ coroutine implementation based on Boost::Context, that puts the Racket interpreter on its own stack, and lets me yield in and out of a file-loading loop for my EDSL. Any thoughts on why that would interact negatively with the generational collector would be much appreciated. One specific question: is the default setting of |
You'll have to wait for @mflatt to provide helpful advice, but |
Mac OS is different because the fault is caught at the Mach layer instead of the BSD-impersonation layer. My only guess about why coroutines would interfere is that stack use by a signal handler might go wrong somehow. |
Interesting, thanks!
Apparently this was premature. Whether or not my code works now depends on whether or not Racket is compiled with optimization, and I my previous test with When Racket is compiled with the default flags,
I'm going to try compiling with |
@elfprince13 what's the status of this bug? Have you managed to get to some kind of conclusion? Also, is there any way you can provide a reproducible example for us to try on our end? |
@pmatos - Thanks for checking in. The title should probably be something different now; however, my code now works correctly when Racket is compiled with The valgrind log + gcc warnings in my previous comment suggest a starting place for tracking it down; however, I'll see if I can extract a reproducible example. |
@elfprince13 If you can get the example, great. If not, can you please compile with |
Annoyingly, I'm having difficulty reproducing the specific conditions that were causing our problem (
results in
results in |
(note: if I rerun |
@elfprince13 is it possible you have a hardware problem? Can you try running a disk checker and memory test to rule those issues out? I have seen those non determinism occur with hardware issues. Also, I compile this config on Linux on a regular basis without problems. Alternatively, can you reproduce the problem on a different pc? |
Was able to replicate the After booting:
The installed Final output from
|
@elfprince13 thank you very much. I can repro locally in a docker container. I will take a look at this and see how far I get. |
Hey @pmatos, I just wanted to check if you'd had a chance to look into this yet. |
Sorry @elfprince13, dropped the ball on this one. Let me get back to you in a few hours. |
@elfprince13 I haven't reached any conclusion but here are some findings that might prove useful if you need a workaround.
I have a few theories - I will keep looking at this. In the meantime, I am also bisecting gcc, in order to understand at which point gcc started compiling racket just fine. |
Further comments on this:
Because this is a default parameter change, I confirmed this by using the new parameter changes. COMPILES RACKET
Default value of BREAKS
Default value of gcc 7.4.0 for This a bug in gcc's inliner - not racket. I will keep this open until I open a bug on gcc side but I need to ensure it's not yet fixed in gcc's tip of master. |
I also meant to say that the gcc change was not a fix. Increasing the inlining requirements meant that the inliner didn't inline the case where it breaks on gcc 8 and therefore hid the bug revealed by gcc 7. You can see this by setting Since this is an inliner bug, you can also compile racket flawlessly by using |
Sounds like there's still the possibility that the Racket source does something unspecified, where inlining reveals that enough that the compiler takes advantage of the lack of specification. |
Yeah, I'm curious if we can make gcc log the inlining decisions it made (using, e.g. |
That's right. If there's undefined behaviour in racket, it could be that gcc is taking advantage of that. I excluded that because ubsan does not report any runtime error. Thinking about it again, it might be ubsan is not complete - not sure about that.
|
I have a few scripts to do that semi-automatically that I can adjust for this use case. I will get back to this later on.
|
GCC head is working so either the defaults changed between 8.3.0 and now, or a fix went in. I will have to bisect. |
... or, of course, there is an undefined behaviour in racket and the way gcc handles it changed so that it now works. :) Given I have seen no undefined behaviour reported by ubsan, I prefer to think that's not the case. |
Forgot to mention - GCC head (e2346a33b) is not working after all. I simply made a mistake on the command line when testing. I am looking for the culprit spot where the inliner fails. |
I'm trying to debug what I think is an unrelated issue, and Racket 7.4 is doing something naughty with memory that makes
gdb
andvalgrind
give up before I get to the bug I want to be looking at. Strangely, it appears to run fine when not hooked up to a debugging tool, so I don't know if that means its just catching its own segfault and moving on or what.The only thing recognizable in the stack trace from valgrind is
scheme_intern_exact_symbol
.If it's helpful, system specs are included:
The text was updated successfully, but these errors were encountered: