[C++] Undefined behaviour in generated code #506

rigtorp · 2017-08-30T12:11:38Z

The C++ code generator typically generates code like this for accessing integer fields:

    std::int32_t securityID(void) const
    {
        return SBE_LITTLE_ENDIAN_ENCODE_32(*((std::int32_t *)(m_buffer + m_offset + 24)));
    }

    NewOrderSingle &securityID(const std::int32_t value)
    {
        *((std::int32_t *)(m_buffer + m_offset + 24)) = SBE_LITTLE_ENDIAN_ENCODE_32(value);
        return *this;
    }

The compiler assumes that a pointer to a int32_t has the correct alignment. With this generated code the alignment requirement might not fulfilled. On amd64 this is fine as long as the compiler doesn't try to use SSE or AVX instructions, but it's not safe in general to assume it will work.

The solution is to use memcpy like this (https://chromium.googlesource.com/chromium/src.git/+/master/base/bit_cast.h):

template <class Dest, class Source>
inline Dest bit_cast(const Source& source) {
   static_assert(sizeof(Dest) == sizeof(Source),
   "bit_cast requires source and destination to be the same size");
   static_assert(base::is_trivially_copyable<Dest>::value,
   "bit_cast requires the destination type to be copyable");
   static_assert(base::is_trivially_copyable<Source>::value,
   "bit_cast requires the source type to be copyable");
    
   Dest dest;
   memcpy(&dest, &source, sizeof(dest));
   return dest;
}

This should optimize to a single load load on amd64, guaranteed not to use instructions requiring alignment.

mjpt777 · 2017-08-30T12:14:50Z

The schema, as per the SBE specification, can use the offset attribute for aligning fields when targeting platforms which do no support unaligned access.

rigtorp · 2017-08-30T12:21:33Z

Currently our unit tests do not pass due to this issue and we had to disable alignment check in ubsan. By using the memcpy trick we can inform the compiler that the load is unaligned and make sure it will not use instructions that require alignment. On amd64 the resulting binary should be the same, except the code is guaranteed not to break if some new optimization pass is introduced that would generate SSE or AVX instructions for this code.

mjpt777 · 2017-08-30T12:24:45Z

Have you used the offset attribute to align you fields?

rigtorp · 2017-08-30T12:30:20Z

I don't control the specification, I'm using what is provided by the exchange etc. But that is not the problem.

The problem is that the generated code is relying on undefined behaviour. The compiler will generate correct instructions on amd64 most of the time. By using memcpy we can make sure it will generate correct instructions all the time.

tmontgomery · 2017-08-30T15:24:07Z

Would love to see a PR that does this for gets and sets. But it would need to be optional so that the original intent (of using offset for guaranteeing aligned fields) would be preserved.

I'll put it on my list to get to. But may be a while. Lots of stuff ahead of it.

myrgy · 2017-08-30T16:38:51Z

I think in this case it could be

    std::int32_t securityID(void) const
    {
        std::int32_t val;
        memcpy(&val, m_buffer + m_offset + 24, sizeof(val));
        return SBE_LITTLE_ENDIAN_ENCODE_32(val);
    }

    void setSecurityID(int32_t val)
    {
        val = SBE_LITTLE_ENDIAN_ENCODE_32(val);
        memcpy(m_buffer + m_offset + 24,  &val, sizeof(val)); 
    }

btw, what is the reason to use void for function in C++ code? It's so old C fashion .... But for modern compiler it's completely redundant.

tmontgomery · 2017-08-30T16:41:52Z

Haven't exterminated all the (void) yet. And we are old C hackers anyway.

rigtorp · 2017-09-01T09:44:14Z

I think @myrgy code will generate optimal code on amd64 for any alignment. On platforms without unaligned loads it would probably generate correct, but less efficient code. If loads are guaranteed to be aligned, the current code is better.

What about adding a template like this to sbe.h support header:

template <typename T>
// Add enable_if to only work with integer types
// Maybe add force inline attributes for supported compilers
inline T load(const char *buffer) {
#ifdef SBE_ASSUME_ALIGNED
  assert(buffer & (alignof(T) - 1) == 0 && "loads need to be aligned");
  return *reinterpret_cast<const T*>(buffer);
#else
  T val;
  std::memcpy(&val, buffer, sizeof(val));
  return val;
#endif
}

rigtorp · 2017-09-01T09:53:37Z

Or maybe we can let people on these esoteric platform deal with this problem? AMD64 and ARM have efficient loads for unaligned integers and the memcpy version should generate the correct instructions.

mjpt777 · 2017-09-01T10:32:10Z

memcpy will have to at least check for alignment so it will always be slower to some extent. It is a difficult choice of how far to go with support with the costs it will incur in development and at runtime.

myrgy · 2017-09-01T11:01:18Z

@mjpt777 , please take a look at compilation results:
https://godbolt.org/g/XWCYBo

gcc and clang generate almost equal code for memcpy and reinterpret_cast
about memcpy - please take into account that compiler even is aware about platform capabilities and can perform optimization as well. In addition gcc will have it's own version of __builtin_memcpy that could be used.

rigtorp · 2017-09-01T13:13:23Z

There is zero cost to using this correct implementation. It also has the benefit of working on all CPUs without crashing. Given that the overhead is 0 I don't see why we should use a implementation which allows the compiler to generate code that will crash my system.

Minimal example:

#include <stdlib.h>
#include <memory.h>
typedef unsigned int u32;

u32 reada32(const void* ptr) { return *(const u32*) ptr; }

u32 readu32(const void* ptr) 
{ 
    u32 result; 
    memcpy(&result, ptr, 4); 
    return result; 
}

On amd64 (-std=c++14 -O3):

reada32(void const*):
        mov     eax, DWORD PTR [rdi]
        ret
readu32(void const*):
        mov     eax, DWORD PTR [rdi]
        ret

On arm64 (-std=c++14 -O3):

reada32(void const*):
        ldr     w0, [x0]
        ret
readu32(void const*):
        ldr     w0, [x0]
        ret

On arm (-std=c++14 -O3 -mcpu=cortex-m7 -march=armv6):

reada32(void const*):
        ldr     r0, [r0]
        bx      lr
readu32(void const*):
        ldr     r0, [r0]  @ unaligned
        bx      lr

mjpt777 · 2017-09-01T13:18:07Z

Thanks this is interesting to know. BTW how does memcpy know the address in the buffer is aligned?

mjpt777 · 2017-09-01T13:21:09Z

I can see looking from your example it is easy for the compiler to determine what it is copying is aligned. Have you tried it with the address coming from an opaque buffer the it is likely to happen in a real application?

tmontgomery · 2017-09-01T16:10:42Z

I do agree that this will mostly be optimized and thus have no cost. However, implying this would always be optimized on all systems, all CPUs, and with all compilers is a strong statement. That I am pretty sure will be false. When adding in that the buffer being used may or may not be aligned it is guaranteed to not always be performant.

SBE is a performance oriented tool. That means some requirements get pushed back onto the user to make the correct choice.

I am not convinced that memcpy should be the default. Given the prevalence of CPUs that can work on unaligned data (x86, ARMv6 and above, etc.) and given the schema can be aligned when needed. The memcpy method, on ARM v6 and ARM v7, does sometimes defeat some optimizations and add cautious instructions. See this for example.

Although, I think a memcpy version should be enable-able via a param to SbeTool for generation. Having a C++ compile time macro isn't needed for this.

myrgy · 2017-09-01T16:37:45Z

yes, can work, but if compiler decided to vectorize operation on such data - it will crash application. because vector registers don't support unaligned access. memcpy inform compiler that access might be unaligned - so prevent such unsafe operation in that case.

here is example of crashing app: https://godbolt.org/g/TbVKee
With -O3 gcc perform automatic vectorization
And it will cause crash even on platform, which allows to perform unaligned access.

#include <cstdint>
#include <cstring>
#include <iostream>
#include <algorithm>

#define SBE_LITTLE_ENDIAN_ENCODE_16(v) (v)
#define SBE_LITTLE_ENDIAN_ENCODE_32(v) (v)
#define SBE_LITTLE_ENDIAN_ENCODE_64(v) (v)

#define SIZE 32

struct Test{
    char m_buffer [SIZE];
    int m_offset = 5;
/*
    std::int32_t securityID(int i) const
    {
        std::int32_t val;
        memcpy(&val, m_buffer + m_offset + 24 + sizeof(std::int32_t) * i, sizeof(std::int32_t));
        return SBE_LITTLE_ENDIAN_ENCODE_32(val);
    }
//*/
//*
    std::int32_t securityID(int i) const
    {
        return SBE_LITTLE_ENDIAN_ENCODE_32(*((std::int32_t *)(m_buffer + m_offset + 24 + sizeof(std::int32_t) * i)));
    }
    //*/
};

int main() {
    char c;
    Test t1;
    Test t2;
   // std::generate(t1.m_buffer, t1.m_buffer + SIZE, std::rand);
    //std::generate(t2.m_buffer, t2.m_buffer + SIZE, std::rand); 

    std::int32_t r[SIZE];
    for(int i= 0; i < SIZE; i++) {
       r[i]   = t1.securityID(i)   + t2.securityID(i);
    }

    for (int i = 0; i < SIZE; ++i) {
        std::cout << r[i];
    }
    return 0;
}

rigtorp · 2017-09-02T15:52:39Z

@mjpt777 The whole point of introducing the memcpy is to tell the compiler that this load is unaligned. Per the C standard memcpy has no alignment requirement and thus the compiler needs to ensure the load is done in a way that no alignment is assumed. So memcpy specifically doesn't know about alignment at compile time, it will use alignment at runtime to copy large blocks using vector instructions, but that is irrelevant here.

@tmontgomery I'm saying that using memcpy will cause correct code (sans compiler bugs) to be generated on all platforms. Currently a new optimization pass can break existing code, since SBE is relying on undefined behavior. Be default we should not be relying on undefined behavior (ie use memcpy).

If someone want to use this on some old SPARC platform etc and take advantage of a properly aligned schema, I would say it is up to them to implement it. Currently the code is causing issues for the 99.9% of users not using old esoteric platforms.

Also for the version without memcpy there should be asserts verifying the alignment requirements when loading and storing values.

@myrgy It actually generated unaligned vector loads in this case.

myrgy · 2017-09-02T19:15:12Z

@rigtorp , yep, my fault - I saw xmm registers usage and didn't check if load instructions support unaligned access.

mjpt777 · 2017-09-03T14:21:27Z

Processors with AVX support allow unaligned access to XMM registers having relaxed the older rule.

http://www.agner.org/optimize/optimizing_assembly.pdf

However this discussion is interesting. We'll look at how we can generate the memcpy option.

tmontgomery · 2017-09-03T16:35:41Z

Discussing it, memcpy should be the default. We use bounds checking on as the default and this goes along with that "safe by default" mentality. But we do want to allow the current method to be generated if desired. However, when it is generated, all alignment is up to the app.

…ts so that alignment isn't an issue. For #530 used a union to do byte swap on float and double accesses for single values and arrays.

tmontgomery · 2017-11-29T00:19:49Z

Moved to memcpy on access of all fields for safety. Decided not to make this optional as it was just too much of a nightmare.

…requiring endian conversion. Issue #530 and #506.

tmontgomery added a commit that referenced this issue Nov 29, 2017

[C++]: for #506 made all accesses of fields use memcpy instead of cas…

d83374f

…ts so that alignment isn't an issue. For #530 used a union to do byte swap on float and double accesses for single values and arrays.

tmontgomery closed this as completed Nov 29, 2017

tmontgomery mentioned this issue Nov 29, 2017

[C++] Fix String Handling #522

Closed

mjpt777 added a commit that referenced this issue Nov 29, 2017

[C++] Fix OTF decoding for alignment and floating point numbers when …

4229d92

…requiring endian conversion. Issue #530 and #506.

MatthiasKauer mentioned this issue Dec 2, 2017

C++: char arrays potentially read/write too far #517

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Undefined behaviour in generated code #506

[C++] Undefined behaviour in generated code #506

rigtorp commented Aug 30, 2017

mjpt777 commented Aug 30, 2017

rigtorp commented Aug 30, 2017

mjpt777 commented Aug 30, 2017

rigtorp commented Aug 30, 2017

tmontgomery commented Aug 30, 2017

myrgy commented Aug 30, 2017 •

edited

Loading

tmontgomery commented Aug 30, 2017

rigtorp commented Sep 1, 2017

rigtorp commented Sep 1, 2017

mjpt777 commented Sep 1, 2017

myrgy commented Sep 1, 2017 •

edited

Loading

rigtorp commented Sep 1, 2017

mjpt777 commented Sep 1, 2017

mjpt777 commented Sep 1, 2017

tmontgomery commented Sep 1, 2017

myrgy commented Sep 1, 2017 •

edited

Loading

rigtorp commented Sep 2, 2017

myrgy commented Sep 2, 2017

mjpt777 commented Sep 3, 2017

tmontgomery commented Sep 3, 2017

tmontgomery commented Nov 29, 2017

[C++] Undefined behaviour in generated code #506

[C++] Undefined behaviour in generated code #506

Comments

rigtorp commented Aug 30, 2017

mjpt777 commented Aug 30, 2017

rigtorp commented Aug 30, 2017

mjpt777 commented Aug 30, 2017

rigtorp commented Aug 30, 2017

tmontgomery commented Aug 30, 2017

myrgy commented Aug 30, 2017 • edited Loading

tmontgomery commented Aug 30, 2017

rigtorp commented Sep 1, 2017

rigtorp commented Sep 1, 2017

mjpt777 commented Sep 1, 2017

myrgy commented Sep 1, 2017 • edited Loading

rigtorp commented Sep 1, 2017

mjpt777 commented Sep 1, 2017

mjpt777 commented Sep 1, 2017

tmontgomery commented Sep 1, 2017

myrgy commented Sep 1, 2017 • edited Loading

rigtorp commented Sep 2, 2017

myrgy commented Sep 2, 2017

mjpt777 commented Sep 3, 2017

tmontgomery commented Sep 3, 2017

tmontgomery commented Nov 29, 2017

myrgy commented Aug 30, 2017 •

edited

Loading

myrgy commented Sep 1, 2017 •

edited

Loading

myrgy commented Sep 1, 2017 •

edited

Loading