Alignment faults running tests on ARM #33

tomhughes · 2015-11-10T20:44:41Z

I'm seeing a SIGBUS fault running the tests on an ARM system. The backtrace is:

#0  Catch::ExpressionLhs<float const&>::captureExpression<(Catch::Internal::Operator)0, float> (this=0xbeffe2cc, this@entry=0xbeffe388, rhs=@0xbeffe2c4: 17.3400002) at /usr/include/catch/internal/catch_expression_lhs.hpp:90
#1  0x0006d0ec in Catch::ExpressionLhs<float const&>::operator==<float> (rhs=@0xbeffe2bc: 7.73589176e+34, this=0xbeffe2c4) at /usr/include/catch/internal/catch_expression_lhs.hpp:35
#2  ____C_A_T_C_H____T_E_S_T____4 () at test/t/repeated_packed_float/test_cases.cpp:23
#3  0x00042458 in Catch::FreeFunctionTestCase::invoke (this=<optimized out>) at /usr/include/catch/internal/catch_test_case_registry_impl.hpp:116
#4  Catch::TestCase::invoke (this=<optimized out>) at /usr/include/catch/internal/catch_test_case_info.hpp:169
#5  Catch::RunContext::invokeActiveTestCase (this=0xbeffed24) at /usr/include/catch/internal/catch_runner_impl.hpp:298
#6  Catch::RunContext::runCurrentTest (redirectedCerr="", redirectedCout="", this=0xbeffed24) at /usr/include/catch/internal/catch_runner_impl.hpp:270
#7  Catch::RunContext::runTest (testCase=..., this=0xbeffed24) at /usr/include/catch/internal/catch_runner_impl.hpp:108
#8  Catch::Runner::runTests (this=0x25be4 <Catch::FatalConditionHandler::handleSignal(int)>, this@entry=0xbefff178) at /usr/include/catch/catch_runner.hpp:59
#9  0x00042ef8 in Catch::Session::run (this=this@entry=0xbefff344) at /usr/include/catch/catch_runner.hpp:186
#10 0x00015324 in Catch::Session::run (argv=<optimized out>, argc=<optimized out>, this=0xbefff344) at /usr/include/catch/catch_runner.hpp:166
#11 main (argc=<optimized out>, argv=<optimized out>) at /usr/include/catch/internal/catch_default_main.hpp:15

The faulting instruction is a vldr instruction loading the float referenced by *it_pair.first when loading the float from repeated_packed_float/data-one because the address returned is only two byte aligned, but NEON instructions like that require four byte alignment for floats.

I don't think there's an easy fix here - obviously having the actual reader do anything to guarantee alignment is a potentially large performance hit. The alternative is to have the tests copy the value out to an aligned location before testing it.

I guess the question is, what is the pbf_reader supposed to be guaranteeing about the iterators it returns for packed values?

The text was updated successfully, but these errors were encountered:

joto · 2015-11-10T22:57:40Z

Strange that this error didn't show up on Debian's build farm which also includes ARM machines.

The only idea I have how to fix this is to use our own iterator implementation instead of a raw pointer. Not sure how much overhead this would be. Maybe something we can compile only on architectures that have this alignment problem? I need to think about this.

tomhughes · 2015-11-10T23:02:15Z

Yeah it shows up on the Fedora build farm, which is real hardware, but not in a qemu VM. I managed to debug it because there's a real hardware box I can get on to with my Fedora credentials.

To start with it will only happen if you target an instruction set with NEON support, and my assumption is that qemu is not properly generating alignment faults for those instructions, which are supposed to fault without giving the OS a chance to fix it up as linux does with the base instruction set on ARM.

tomhughes · 2015-11-10T23:08:12Z

Fedora is building with -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard but I suspect it is the fpu option that's important here in that it enables VFP/NEON instructions.

I'm not entirely clear from https://wiki.debian.org/ArmHardFloatPort#GCC_floating-point_options what options Debian uses?

There are two implementations for the packed_fixed() function. One simply returns a pair of pointers, the other a pair of iterators we define specially for this case. The simple pointer-based implementation is presumably faster, but it only works on little endian machines and if the CPU can work with the possibly unaligned data. To make this actually work on architectures that care about alignment we still need to define PROTOZERO_USE_BARE_POINTER_FOR_PACKED_FIXED somewhere. See #33.

joto · 2015-11-11T10:57:29Z

Are we sure this only happens for floats? Or is this simply the first test that fails?

I have expanded the float test case so it doesn't just test the non-aligned case by accident, but on purpose. Just to make sure... :-) We might need to expand the test cases for all types in this way.

We already have our own iterator implementation for the packed repeated data. But up to know it is only used on big endian architectures. I have changed the code around a little to make it easier to use in other cases, too. Normally it is not used for performance reason, but we could use it in this case.

@tomhughes: could you try this out. If you comment out these lines
https://github.com/mapbox/protozero/blob/master/include/protozero/pbf_reader.hpp#L99-L101 the iterator will always be used. Does this fix the problem?

If this fixes the problem we'd still have to figure out how to switch to the right code automatically depending on build options or whatever. But lets figure out the fix first and solve that when we come to it.

tomhughes · 2015-11-11T11:06:17Z

I imagine it will happen for any floating type, at least in principle.

The history of alignment on ARM is complicated to say the least. The originals ARMs only had byte and word loads and didn't fault on unaligned word loads but instead gave you "interesting" results where the data was rotated in a particular way. Programmers of the crazier kind were known to rely on this ;-)

Later they added half word loads and also started offering the option (by a processor configuration register) of allowing unaligned loads to work and/or raising a fault for them - some links with more details:

Then when they added the VFP/NEON instructions they made them raise faults even if the processor was otherwise configured to allow unaligned access, much like many SIMD instructions on x86 do:

http://stackoverflow.com/questions/17184731/gcc-generated-assembly-for-unaligned-float-access-on-arm

So broadly speaking integers are likely to be fine unless you deliberately configure linux to fault them by writing to /proc/cpu/alignment while floating point is likely to fault.

I'll have a go at your suggestion and let you know...

tomhughes · 2015-11-11T11:52:57Z

Yes, the tests all pass fine when using the custom iterator.

This isn't optimal, but should fix the problem with unaligned access for now. Maybe we can find a better macro to check? See #33.

joto · 2015-11-11T14:06:12Z

Just to be on the safe side, I have added tests for fixed integers and doubles, too.

I have added an #ifdef check for __arm__ and disabled the bare pointer access in this case. This should fix the problem, but will always use the slow code on any ARM machine even if it weren't a problem in some cases. I couldn't find any other way to detect from a compiler macro what compiler options were set or something like that. Maybe we can still find a better way.

tomhughes · 2015-11-11T14:31:26Z

Confirmed that master now passes.

I think the rule is that if __VFP_FP__ is defined then unaligned floating point access may fault.

tomhughes · 2015-11-11T14:36:48Z

If I'm reading https://bugzilla.mozilla.org/show_bug.cgi?id=549296#c21 right then in principle vldr can either be executed by the VFP unit (the traditional FP stuff) which can't do unaligned access or by the NEON unit (the SIMD stuff) which can. Except that NEON can't do single precision float so that is always VFP.

joto · 2015-11-11T16:04:00Z

@tomhughes: can you test whether a check for __VFP_FP__ works in your case?

It seems this is all impossible to test without access to real hardware of lots of different kinds. I guess we just have to implement some ifdef check that works for all cases we know about and then wait for people to complain and fix that ifdef. :-(

tomhughes · 2015-11-11T16:07:03Z

Well I know that is defined in my case, so obviously it will work. The question is really whether there's any system where it isn't defined but alignment is enforced, or where it is defined but alignment isn't enforced :-(

See #33.

joto · 2015-11-20T11:06:05Z

Okay, so for the moment I am using the slower code on ARM in all cases to be on the safe side. maybe somebody who knows more about these things or can test on different platforms can help with this in the future. But without access to test systems this is all a guessing game which macros mean what.

I have added some info in the README with a link to this issue.

springmeyer assigned joto Nov 10, 2015

joto added a commit that referenced this issue Nov 11, 2015

Disable bare pointer optimization if __arm__ macro is set.

1c75abd

This isn't optimal, but should fix the problem with unaligned access for now. Maybe we can find a better macro to check? See #33.

joto added a commit that referenced this issue Nov 20, 2015

Add link to issue about alignment in README.

c666ec9

See #33.

joto closed this as completed Nov 20, 2015

martin-frbg mentioned this issue Jan 15, 2016

Clean the register on ARM VFP kernels OpenMathLib/OpenBLAS#740

Closed

estan mentioned this issue Mar 14, 2018

Segfaults in tests on armhf architecture Blosc/c-blosc#223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alignment faults running tests on ARM #33

Alignment faults running tests on ARM #33

tomhughes commented Nov 10, 2015

joto commented Nov 10, 2015

tomhughes commented Nov 10, 2015

tomhughes commented Nov 10, 2015

joto commented Nov 11, 2015

tomhughes commented Nov 11, 2015

tomhughes commented Nov 11, 2015

joto commented Nov 11, 2015

tomhughes commented Nov 11, 2015

tomhughes commented Nov 11, 2015

joto commented Nov 11, 2015

tomhughes commented Nov 11, 2015

joto commented Nov 20, 2015

Alignment faults running tests on ARM #33

Alignment faults running tests on ARM #33

Comments

tomhughes commented Nov 10, 2015

joto commented Nov 10, 2015

tomhughes commented Nov 10, 2015

tomhughes commented Nov 10, 2015

joto commented Nov 11, 2015

tomhughes commented Nov 11, 2015

tomhughes commented Nov 11, 2015

joto commented Nov 11, 2015

tomhughes commented Nov 11, 2015

tomhughes commented Nov 11, 2015

joto commented Nov 11, 2015

tomhughes commented Nov 11, 2015

joto commented Nov 20, 2015