New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve random crashes when loading digraphs or semigroups #273
Comments
Could we try temporarily pulling ccache for a week or so, and see if the crashes clean up? That would at least let us know if ccache is the problem? |
We could, but I am pretty confident this is the cause: I've now seen the following architectures being set by `-march=native:
And this is probably not all. |
The "alternative and more systematic fix" cannot work for this repository, because the systems on which we compile are not the same as the ones we run the tests. However, it would be fine for GAP, where we don't do this. |
I believe I've fixed this in #278 |
Seems to be fine now |
The ongoing theory proposed by @ChrisJefferson is that when we build things in one VM and cache them using
ccache
, and then use the cached binaries in another VM running on different hardware, then using the cached binary may not actually work and crash with illegal instructions. (See also discussion in #217)This problem was also previously reported to ccache but has not yet been addressed there, see ccache/ccache#824
I've tried to work around this in 59d58d8 by removing the
-march=native
from the Semigroups and Digraphs build systems. But we still are seeing the random crashes. I've yet to check if this is because our theory is wrong, or because my patch simply was insufficient (for now I am more inclined to believe the latter).An alternative and more systematic fix would be to adjust the
ccache
configuration, which allows specifying how changes in the compiler are detected: the defaultcompiler_check
setting is to use themtime
of the compiler (BTW I am actually surprised that this work across multiple VMs...?). Anyway, we could change that to always take the architecture set by-march=native
into account...The text was updated successfully, but these errors were encountered: