Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

__cxa_guard_*() functions non-compliant on ARM and big endian platforms #9

Closed
mdempsky opened this Issue · 10 comments

4 participants

@mdempsky
Collaborator

Section 2.8 of the Itanium C++ ABI says "The size of the guard variable is 64 bits. The first byte (i.e. the byte at the address of the full variable) shall contain the value 0 prior to initialization of the associated variable, and 1 after initialization is complete."

On non-ARM platforms, libcxxrt uses the upper most byte of the 64-bit guard value to indicate whether initialization is complete. But on big endian platforms, the upper most byte is the last byte of the guard variable.

Also, for ARM platforms, section 3.2.3 of the ARM C++ ABI describes that the least significant bit of the 32-bit guard variable should be used to indicate whether the guard has been successfully initialized. However, libcxxrt compares the full 32-bits for equality against 0x80000000 to indicate successful initialization.

(Tangentially, using (1<<32) on a 32-bit platform triggers undefined behavior according to the C++, because 1<<32 overflows a signed integer. See C++11 section 5.8 paragraph 2: "The value of E1 << E2 is E1 left-shifted E2 bit positions; [...] Otherwise, if E1 has a signed type and non-negative value, and E1 × 2^{E2} is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.".)

@davidchisnall
Collaborator
@mdempsky
Collaborator

I believe this only matters when using two C++ runtime libraries in the same program,

No, the compiler is allowed to emit checks to see if the first byte is set, and if so, it can skip calling __cxa_guard_acquire(), and both GCC 4.2 and Clang 3.3 do this. (E.g., see lines 1142 through 1171 of http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/ItaniumCXXABI.cpp?revision=182870&view=markup)

If __cxa_guard_acquire() uses the first byte as part of its lock, then there's a risk that racing threads will misinterpret that lock byte as indicating initialization has completed when it hasn't.

I think you mean 1<<31 here.

Doh, yes, I did. Sorry about that! :)

@mdempsky
Collaborator

Hmm. The code here was wrong in several ways. I believe the new version should be correct.

Just reviewed your ARM changes; looks good to me!

@davidchisnall
Collaborator
@mdempsky
Collaborator
@mdempsky
Collaborator

[Wow, github loves mangling email comments, and I can't seem to repair it even with editing... still can't get this to look okay, but it's readable now at least.]

On Mon, Jun 3, 2013 at 10:04 AM, davidchisnall notifications@github.comwrote:

Ah, good point. I think the ARM version is actually cleaner than the
not-ARM version, and I've appended a patch that should unify the two
implementations. I'd appreciate it if you and Chris could review it. The
detection of endian is a bit ugly, but GCC only started defining endian
macros consistently around 4.5-4.6, and we need to be able to build with
GCC 4.2.1.

Argh, annoying. There really doesn't seem to be a good portable solution
to this, so I'm happy with whatever you use, and in OpenBSD we can patch it
to use our appropriate endian detection.

POSIX Issue 8 is going to add endian.h (
http://austingroupbugs.net/view.php?id=162#c665), so something like

static const guard_t INITIALISED = le64toh(1);

would work without endian specific checks, but I don't think anybody's
providing yet. (It's currently provided as on
most systems, and OpenBSD uses letoh64() instead of le64toh() at the
moment... sigh.)

Something like:

static guard_t initval() {
    guard_t r = 0;
    *(char *)&r = 1;
    return r;
}

static const guard_t INITIALISED = initval();

is actually portable too, and clang++ can optimize this, but g++ 4.2 is
stuck doing a run-time global initializer.

Something not-100% kosher (but supported by G++ and Clang) would be:

static const union { char bytes[8]; guard_t g; } INITIALISED_BYTES = {{1}};
#define INITIALISED INITIALISED_BYTES.g

but G++ can't inline that either.

Blargh!

(If anyone cares enough, then with recent compilers we can weaken the
memory barriers slightly, but I think the overhead of the full barrier is
likely to be lost in the noise most of the time).

Noted, though at the moment my focus is on correctness and simplicity
rather than performance, so as long as x86 performance is fast enough for
you, I'm not worried about onerous barriers.

  • // x86 and ARM are the most common little-endian CPUs, so let's have a
  • // special case for them (ARM is already special cased). Assume everything
  • // else is big endian. +# elif defined(x86_64) || defined(__i386) +# define __LITTLE_ENDIAN +# endif

FWIW, OpenBSD's little endian platforms are alpha, amd64, arm, i386, ia64,
mips64el, sh, and vax.

Also tangentially, I think a lot of our architectures can't do 64-bit CAS,
so __sync_*_compare_and_swap() is going to fail to compile on those
architectures. But no need to worry about that just yet. :)

+# if defined(LITTLE_ENDIAN)
+static const guard_t LOCKED = ((guard_t)1) << 63;
+static const guard_t INITIALISED = 1;
+# else
+static const guard_t LOCKED = 1;
+static const guard_t INITIALISED = ((guard_t)1) << 63;
+# endif

Itanium ABI says the first byte should be set to 1, so in the second case
you want to set INITIALISED to ((guard_t)1) << 56.

-extern "C" void __cxa_guard_abort(uint32_t *guard_object)
+extern "C" void __cxa_guard_abort(guard_t *guard_object)

Perhaps worth making this "volatile guard_t *guard_object"? (Also, for
__cxa_guard_release.)

@davidchisnall
Collaborator
@jsonn
Collaborator
@cbergstrom
Owner

Joerg - When would atomic ops not be supported? At some point it's time to let some things go :P

@jsonn
Collaborator

The problem is not atomic ops in general, but 64bit atomic ops. Those are not available on many (older) 32bit architectures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.