New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add bounds checks in the C++ code #130

Closed
amluto opened this Issue May 6, 2014 · 10 comments

Comments

Projects
None yet
4 participants
@amluto

amluto commented May 6, 2014

The C++ code appears to be completely missing any form of bounds checking. Even the interface appears to be designed such that bounds checks can't be added without fixing users.

Please fix this. I, and any other financial users who are about security, will be unable to use this software if this issue is not fixed.

@tmontgomery

This comment has been minimized.

Contributor

tmontgomery commented May 6, 2014

Actually, the C++ codec does do some bounds checking. Also, the API usage pattern negates the need for some bounds checking.

The user is still responsible for handing a buffer to the API that is big enough, etc. So, some bounds checking needs to be done by the actual user.

What types of fields need additional bounds checking, in your opinion? Can you provide an example of where the usage pattern is dangerous?

For example of bounds checking, setting an specific length array field in C++ (from the Car example):

const char *someNumbers(void) const
{
    return (buffer_ + offset_ + 12);
}

sbe_int32_t someNumbers(const int index) const
{
    if (index < 0 || index >= 5)
    {
        throw "index out of range for someNumbers";
    }

    return SBE_LITTLE_ENDIAN_ENCODE_32(*((sbe_int32_t *)(buffer_ + offset_ + 12 + (index * 4))));
}

void someNumbers(const int index, const sbe_int32_t value)
{
    if (index < 0 || index >= 5)
    {
        throw "index out of range for someNumbers";
    }

    *((sbe_int32_t *)(buffer_ + offset_ + 12 + (index * 4))) = SBE_LITTLE_ENDIAN_ENCODE_32(value);
}

int getSomeNumbers(char *dst, const int length) const
{
    if (length > 5)
    {
         throw "length too large for getSomeNumbers";
    }

    ::memcpy(dst, buffer_ + offset_ + 12, length);
    return length;
}

Car &putSomeNumbers(const char *src)
{
    ::memcpy(buffer_ + offset_ + 12, src, 5);
    return *this;
}

Bounds checking, in general, should not be needed for the codecs based on the API usage. In addition, the performance penalty for bounds checking is somewhat prohibitive and goes against the design principles.

@amluto

This comment has been minimized.

amluto commented May 6, 2014

You said "the user is still responsible for handing a buffer to the API that is big enough". How is the user supposed to know what "big enough" is? This is just like Heartbleed: if you get a message that's n bytes long, but it should have been m bytes long, and m > n, then SBE will over-read the message, leading to a crash or information disclosure.

If you're writing a message, it's harder to trigger but worse: if an attacker can get you to write too much data for your buffer, you've just been pwned.

@tmontgomery

This comment has been minimized.

Contributor

tmontgomery commented May 6, 2014

SBE isn't trying to protect against stuff like Heartbleed. Nor can it, actually. The codecs only know what you gave it. And they assume you are giving it accurate information. It isn't checking for you. For speed, the codec let's you do these checks separately however you want.

Thus the normal C best practice still apply...

On the read side, you know how big the read data is. It was returned by the recv/read call. So, if you want to, you can add checks to make sure you are not reading any uninitialized memory based on sbeBlockLength() at every step. i.e. validate main body length, read main body fields, check num group elements and length, read group, etc.

This is not any different than you would do from a socket recv, etc. Check for valid min length (check against main codec sbeBlockLength()) and then carry the check forward.

Or

Do the check up front once based on how the message is laid out. Your choice.

On the write side, you could overrun the buffer, but you know the block length of the message body and the block length of each repeating group, etc. so you can compute whether you are running out of space or not just as you would with anything else.

Or do the check once upfront.

SBE makes the tradeoff that the app is handling those responsibilities and that the normal C (and C++) best practices can still be applied.

BTW, you can still mis-use protobuf and other encoding libraries for information disclosure or corruption as well.

I'll mark this as an enhancement, but these types of checks are up to the app to do.

@amluto

This comment has been minimized.

amluto commented May 6, 2014

Normal C/C++ best practice is for a library to make it easy for users to write correct code. SBE is making it difficult to impossible.

It would be easy to change the API to pass a buffer length and to have SBE validate the length when starting reading/writing a repeating group or a variable length field. This will add negligible overhead and will make it plausible that people will write correct code.

@mjpt777

This comment has been minimized.

Contributor

mjpt777 commented May 6, 2014

The checking of bounds on root block, repeating groups, and var data is an easy and cheap check to perform. The Java and C# implementations do this. This is a nice safety feature for users who do not mind a little cost. Maybe we should add this to the C++ implementation for parity and consider it as an enhancement.

However I think it is being overly dramatic and incorrect saying this is a security issue. SBE is not responsible for receiving data from a network directly. It simply windows over a buffer that is provided to it.

@tmontgomery

This comment has been minimized.

Contributor

tmontgomery commented May 6, 2014

Already marked as an enhancement. It should be conditionally generated because it is quite possible to write correct code without the checks.

The way the checks should work is:

  1. wrap needs to take an additional buffer length param
  2. one check on wrap that ensures block length is OK for buffer length (i.e. each root field will fit)
  3. one check on repeating group access for buffer length encode/decode (i.e. group fits)
  4. check on variable length field access for buffer length encode/decode (i.e. field fits)

These are fairly cheap as @mjpt777 suggests and is what I suggested above anyway. So, it gives parity with Java and C#. And makes it easier for users. It's also in line with what other tools do as well with optional checks.

@kentonv

This comment has been minimized.

kentonv commented May 8, 2014

Don't forget that you can __builtin_expect these branches, which makes them pretty damned cheap.

@tmontgomery

This comment has been minimized.

Contributor

tmontgomery commented May 8, 2014

Indeed. I played with that for the actingVersion checks as well.

@tmontgomery tmontgomery self-assigned this May 9, 2014

@tmontgomery tmontgomery added this to the 1.1 milestone May 9, 2014

@tmontgomery

This comment has been minimized.

Contributor

tmontgomery commented May 9, 2014

Placing this one on my plate for as soon as I can get to it.

tmontgomery pushed a commit that referenced this issue May 11, 2014

tmontgomery pushed a commit that referenced this issue May 11, 2014

@tmontgomery

This comment has been minimized.

Contributor

tmontgomery commented May 11, 2014

Bounds checking is implemented now for C++ generated codecs. This has several implications.

  1. The method signatures for wrap and wrapForEncode and wrapForDecode have changed.
  2. The generated codec can have the bounds checking disabled by defining SBE_NO_BOUNDS_CHECK when compiling for those users (like me) that do bounds checking up front.
  3. Bounds checking does incur a modest performance penalty. Which is outlined below.
  4. When bounds going to be exceeded, a C++ exception will be thrown. So, it should be caught by the user.

NOTE: we will hold off on a new release for now to make sure this is stable.

Performance

Without bounds checking (perf test with -DSBE_NO_BOUNDS_CHECK) as throughput

Test Throughput (msgs/sec)
MarketData encode 50M
MarketData decode 80M
Car encode 19M
Car decode 33M

With bounds checking (default perf test) as throughput

Test Throughput (msgs/sec)
MarketData encode 47M
MarketData decode 71M
Car encode 18M
Car decode 32M

As the complexity of the schema increases, the impact of the bounds checking gets smaller and smaller. Which is expected.

The latency impact of bounds checking in the Market Data test is roughly 2-3 nanoseconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment