Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Please add bounds checks in the C++ code #130
The C++ code appears to be completely missing any form of bounds checking. Even the interface appears to be designed such that bounds checks can't be added without fixing users.
Please fix this. I, and any other financial users who are about security, will be unable to use this software if this issue is not fixed.
Actually, the C++ codec does do some bounds checking. Also, the API usage pattern negates the need for some bounds checking.
The user is still responsible for handing a buffer to the API that is big enough, etc. So, some bounds checking needs to be done by the actual user.
What types of fields need additional bounds checking, in your opinion? Can you provide an example of where the usage pattern is dangerous?
For example of bounds checking, setting an specific length array field in C++ (from the Car example):
Bounds checking, in general, should not be needed for the codecs based on the API usage. In addition, the performance penalty for bounds checking is somewhat prohibitive and goes against the design principles.
You said "the user is still responsible for handing a buffer to the API that is big enough". How is the user supposed to know what "big enough" is? This is just like Heartbleed: if you get a message that's n bytes long, but it should have been m bytes long, and m > n, then SBE will over-read the message, leading to a crash or information disclosure.
If you're writing a message, it's harder to trigger but worse: if an attacker can get you to write too much data for your buffer, you've just been pwned.
SBE isn't trying to protect against stuff like Heartbleed. Nor can it, actually. The codecs only know what you gave it. And they assume you are giving it accurate information. It isn't checking for you. For speed, the codec let's you do these checks separately however you want.
Thus the normal C best practice still apply...
On the read side, you know how big the read data is. It was returned by the recv/read call. So, if you want to, you can add checks to make sure you are not reading any uninitialized memory based on sbeBlockLength() at every step. i.e. validate main body length, read main body fields, check num group elements and length, read group, etc.
This is not any different than you would do from a socket recv, etc. Check for valid min length (check against main codec sbeBlockLength()) and then carry the check forward.
Do the check up front once based on how the message is laid out. Your choice.
On the write side, you could overrun the buffer, but you know the block length of the message body and the block length of each repeating group, etc. so you can compute whether you are running out of space or not just as you would with anything else.
Or do the check once upfront.
SBE makes the tradeoff that the app is handling those responsibilities and that the normal C (and C++) best practices can still be applied.
BTW, you can still mis-use protobuf and other encoding libraries for information disclosure or corruption as well.
I'll mark this as an enhancement, but these types of checks are up to the app to do.
Normal C/C++ best practice is for a library to make it easy for users to write correct code. SBE is making it difficult to impossible.
It would be easy to change the API to pass a buffer length and to have SBE validate the length when starting reading/writing a repeating group or a variable length field. This will add negligible overhead and will make it plausible that people will write correct code.
The checking of bounds on root block, repeating groups, and var data is an easy and cheap check to perform. The Java and C# implementations do this. This is a nice safety feature for users who do not mind a little cost. Maybe we should add this to the C++ implementation for parity and consider it as an enhancement.
However I think it is being overly dramatic and incorrect saying this is a security issue. SBE is not responsible for receiving data from a network directly. It simply windows over a buffer that is provided to it.
Already marked as an enhancement. It should be conditionally generated because it is quite possible to write correct code without the checks.
The way the checks should work is:
These are fairly cheap as @mjpt777 suggests and is what I suggested above anyway. So, it gives parity with Java and C#. And makes it easier for users. It's also in line with what other tools do as well with optional checks.
referenced this issue
May 8, 2014
pushed a commit
May 11, 2014
pushed a commit
May 11, 2014
Bounds checking is implemented now for C++ generated codecs. This has several implications.
NOTE: we will hold off on a new release for now to make sure this is stable.
Without bounds checking (perf test with
With bounds checking (default perf test) as throughput
As the complexity of the schema increases, the impact of the bounds checking gets smaller and smaller. Which is expected.
The latency impact of bounds checking in the Market Data test is roughly 2-3 nanoseconds.