Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment faults running tests on ARM #33

Closed
tomhughes opened this issue Nov 10, 2015 · 12 comments
Closed

Alignment faults running tests on ARM #33

tomhughes opened this issue Nov 10, 2015 · 12 comments
Assignees

Comments

@tomhughes
Copy link
Contributor

I'm seeing a SIGBUS fault running the tests on an ARM system. The backtrace is:

#0  Catch::ExpressionLhs<float const&>::captureExpression<(Catch::Internal::Operator)0, float> (this=0xbeffe2cc, this@entry=0xbeffe388, rhs=@0xbeffe2c4: 17.3400002) at /usr/include/catch/internal/catch_expression_lhs.hpp:90
#1  0x0006d0ec in Catch::ExpressionLhs<float const&>::operator==<float> (rhs=@0xbeffe2bc: 7.73589176e+34, this=0xbeffe2c4) at /usr/include/catch/internal/catch_expression_lhs.hpp:35
#2  ____C_A_T_C_H____T_E_S_T____4 () at test/t/repeated_packed_float/test_cases.cpp:23
#3  0x00042458 in Catch::FreeFunctionTestCase::invoke (this=<optimized out>) at /usr/include/catch/internal/catch_test_case_registry_impl.hpp:116
#4  Catch::TestCase::invoke (this=<optimized out>) at /usr/include/catch/internal/catch_test_case_info.hpp:169
#5  Catch::RunContext::invokeActiveTestCase (this=0xbeffed24) at /usr/include/catch/internal/catch_runner_impl.hpp:298
#6  Catch::RunContext::runCurrentTest (redirectedCerr="", redirectedCout="", this=0xbeffed24) at /usr/include/catch/internal/catch_runner_impl.hpp:270
#7  Catch::RunContext::runTest (testCase=..., this=0xbeffed24) at /usr/include/catch/internal/catch_runner_impl.hpp:108
#8  Catch::Runner::runTests (this=0x25be4 <Catch::FatalConditionHandler::handleSignal(int)>, this@entry=0xbefff178) at /usr/include/catch/catch_runner.hpp:59
#9  0x00042ef8 in Catch::Session::run (this=this@entry=0xbefff344) at /usr/include/catch/catch_runner.hpp:186
#10 0x00015324 in Catch::Session::run (argv=<optimized out>, argc=<optimized out>, this=0xbefff344) at /usr/include/catch/catch_runner.hpp:166
#11 main (argc=<optimized out>, argv=<optimized out>) at /usr/include/catch/internal/catch_default_main.hpp:15

The faulting instruction is a vldr instruction loading the float referenced by *it_pair.first when loading the float from repeated_packed_float/data-one because the address returned is only two byte aligned, but NEON instructions like that require four byte alignment for floats.

I don't think there's an easy fix here - obviously having the actual reader do anything to guarantee alignment is a potentially large performance hit. The alternative is to have the tests copy the value out to an aligned location before testing it.

I guess the question is, what is the pbf_reader supposed to be guaranteeing about the iterators it returns for packed values?

@joto
Copy link
Collaborator

joto commented Nov 10, 2015

Strange that this error didn't show up on Debian's build farm which also includes ARM machines.

The only idea I have how to fix this is to use our own iterator implementation instead of a raw pointer. Not sure how much overhead this would be. Maybe something we can compile only on architectures that have this alignment problem? I need to think about this.

@tomhughes
Copy link
Contributor Author

Yeah it shows up on the Fedora build farm, which is real hardware, but not in a qemu VM. I managed to debug it because there's a real hardware box I can get on to with my Fedora credentials.

To start with it will only happen if you target an instruction set with NEON support, and my assumption is that qemu is not properly generating alignment faults for those instructions, which are supposed to fault without giving the OS a chance to fix it up as linux does with the base instruction set on ARM.

@tomhughes
Copy link
Contributor Author

Fedora is building with -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard but I suspect it is the fpu option that's important here in that it enables VFP/NEON instructions.

I'm not entirely clear from https://wiki.debian.org/ArmHardFloatPort#GCC_floating-point_options what options Debian uses?

joto added a commit that referenced this issue Nov 11, 2015
There are two implementations for the packed_fixed() function. One simply
returns a pair of pointers, the other a pair of iterators we define specially
for this case. The simple pointer-based implementation is presumably faster,
but it only works on little endian machines and if the CPU can work with the
possibly unaligned data.

To make this actually work on architectures that care about alignment we
still need to define PROTOZERO_USE_BARE_POINTER_FOR_PACKED_FIXED somewhere.

See #33.
@joto
Copy link
Collaborator

joto commented Nov 11, 2015

Are we sure this only happens for floats? Or is this simply the first test that fails?

I have expanded the float test case so it doesn't just test the non-aligned case by accident, but on purpose. Just to make sure... :-) We might need to expand the test cases for all types in this way.

We already have our own iterator implementation for the packed repeated data. But up to know it is only used on big endian architectures. I have changed the code around a little to make it easier to use in other cases, too. Normally it is not used for performance reason, but we could use it in this case.

@tomhughes: could you try this out. If you comment out these lines
https://github.com/mapbox/protozero/blob/master/include/protozero/pbf_reader.hpp#L99-L101 the iterator will always be used. Does this fix the problem?

If this fixes the problem we'd still have to figure out how to switch to the right code automatically depending on build options or whatever. But lets figure out the fix first and solve that when we come to it.

@tomhughes
Copy link
Contributor Author

I imagine it will happen for any floating type, at least in principle.

The history of alignment on ARM is complicated to say the least. The originals ARMs only had byte and word loads and didn't fault on unaligned word loads but instead gave you "interesting" results where the data was rotated in a particular way. Programmers of the crazier kind were known to rely on this ;-)

Later they added half word loads and also started offering the option (by a processor configuration register) of allowing unaligned loads to work and/or raising a fault for them - some links with more details:

Then when they added the VFP/NEON instructions they made them raise faults even if the processor was otherwise configured to allow unaligned access, much like many SIMD instructions on x86 do:

So broadly speaking integers are likely to be fine unless you deliberately configure linux to fault them by writing to /proc/cpu/alignment while floating point is likely to fault.

I'll have a go at your suggestion and let you know...

@tomhughes
Copy link
Contributor Author

Yes, the tests all pass fine when using the custom iterator.

joto added a commit that referenced this issue Nov 11, 2015
This isn't optimal, but should fix the problem with unaligned access for now.
Maybe we can find a better macro to check? See #33.
@joto
Copy link
Collaborator

joto commented Nov 11, 2015

Just to be on the safe side, I have added tests for fixed integers and doubles, too.

I have added an #ifdef check for __arm__ and disabled the bare pointer access in this case. This should fix the problem, but will always use the slow code on any ARM machine even if it weren't a problem in some cases. I couldn't find any other way to detect from a compiler macro what compiler options were set or something like that. Maybe we can still find a better way.

@tomhughes
Copy link
Contributor Author

Confirmed that master now passes.

I think the rule is that if __VFP_FP__ is defined then unaligned floating point access may fault.

@tomhughes
Copy link
Contributor Author

If I'm reading https://bugzilla.mozilla.org/show_bug.cgi?id=549296#c21 right then in principle vldr can either be executed by the VFP unit (the traditional FP stuff) which can't do unaligned access or by the NEON unit (the SIMD stuff) which can. Except that NEON can't do single precision float so that is always VFP.

@joto
Copy link
Collaborator

joto commented Nov 11, 2015

@tomhughes: can you test whether a check for __VFP_FP__ works in your case?

It seems this is all impossible to test without access to real hardware of lots of different kinds. I guess we just have to implement some ifdef check that works for all cases we know about and then wait for people to complain and fix that ifdef. :-(

@tomhughes
Copy link
Contributor Author

Well I know that is defined in my case, so obviously it will work. The question is really whether there's any system where it isn't defined but alignment is enforced, or where it is defined but alignment isn't enforced :-(

joto added a commit that referenced this issue Nov 20, 2015
@joto
Copy link
Collaborator

joto commented Nov 20, 2015

Okay, so for the moment I am using the slower code on ARM in all cases to be on the safe side. maybe somebody who knows more about these things or can test on different platforms can help with this in the future. But without access to test systems this is all a guessing game which macros mean what.

I have added some info in the README with a link to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants