New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thrift: framed/unframed transport, binary/compact protocol #3509
thrift: framed/unframed transport, binary/compact protocol #3509
Conversation
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@brian-pane PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this! I read through the header files and added some notes. I'll read the .cc files over the next couple of days.
#include "common/common/macros.h" | ||
#include "common/singleton/const_singleton.h" | ||
|
||
#include "absl/types/optional.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed in protocol.h
? It doesn't look like this file uses absl::optional
(but I might have missed something).
#include "envoy/buffer/buffer.h" | ||
#include "envoy/common/pure.h" | ||
|
||
#include "common/common/assert.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything in protocol.h
depend on this include?
* Thrift protocol message types. | ||
* See https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TProtocol.h | ||
*/ | ||
enum MessageType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using enum class
rather than plain enum
here, so that the value names (Call, Reply, etc) get namespaced within MessageType
.
* Thrift protocol struct field types. | ||
* See https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TProtocol.h | ||
*/ | ||
enum FieldType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum class
/** | ||
* Names of available Protocol implementations. | ||
*/ | ||
class ProtocolNameValues { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only use of this class is for each Protocol
subclass to return a string representation of its own name. I think it would be cleaner to just make each Protocol
subclass define its own name string internally. That way, we won't have to modify this central ProtocolNameValues
class every time a new Protocol
subclass is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By and large these are used only during debug logging, but my original intention was to avoid allocating these strings. At some point I decided to make the auto protocol report its underlying protocol and ended up constructing or copying strings in all the name() methods.
Since I later added setProtocol
to make the AutoProtocolImpl easier to test, I went ahead and made AutoProtocolImpl::name() return a field value, updated by setProtocol. This allowed me to have Protocol::name() return a const std::string&
. So no more allocations, but I still have ProtocolNames.
I can move these strings into the Protocol implementations, but per the style guide, I'd need to use the CONSTRUCT_ON_FIRST_USE macro. AutoProtocolImpl would require it's own private method to wrap CONSTRUCT_ON_FIRST_USE for "auto". I think putting all these strings in this class and using ConstSingleton ends up being a little cleaner.
/** | ||
* Names of available Transport implementations. | ||
*/ | ||
class TransportNameValues { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same suggestion for this as for ProtocolNameValues
: make each subclass of Transport
define its own name string, rather than collecting the names in a centralized class that has to be updated every time there's a new Transport
subclass.
* decodeFrameStart decodes the start of a transport message, potentially invoking callbacks. | ||
* | ||
* @param buffer the currently buffered thrift data. | ||
* @return bool true if a complete frame header was successfully consumed, false if more data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the frame header is consumed, that means the corresponding bytes are trimmed from the buffer
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'll update the comment (and all the others that behave the same way).
Ignoring the code, would you be up for adding CODEOWNERS per https://docs.google.com/document/d/1eDQQSxqx2khTXfa2vVm4vqkyRwXYkPzZCcbjxJ2_AvA/edit ? You're clearly set on the maintainer front but good to have documented for future reviews :-) |
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@brian-pane I think I addressed your comments. |
if (buffer.length() < 3) { | ||
return false; | ||
} | ||
field_id = BufferHelper::peekI16(buffer, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid for the field ID to be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protocol specifications all refer to it as a signed integer with no mention of the validity of negative values.
The Thrift IDL documents make no mention of the range of values allowed. In fact, the grammar defines it as an IntConstant, which implies negative values are legal. Empirically, the Thrift compiler will only accept values from 1 to 32767.
In practice, the generated bindings allow negative field IDs, but ignore them like any other unknown field ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this as-is from that standpoint that it's better to be a bit lenient in what's accepted.
} | ||
|
||
bool BinaryProtocolImpl::readDouble(Buffer::Instance& buffer, double& value) { | ||
ASSERT(sizeof(double) == sizeof(uint64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Envoy is using C++14, I propose using the new static_assert
here, since that will work for both debug and non-debug builds. (https://en.cppreference.com/w/cpp/language/static_assert)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
|
||
const uint16_t BinaryProtocolImpl::Magic = 0x8001; | ||
|
||
bool BinaryProtocolImpl::readMessageBegin(Buffer::Instance& buffer, std::string& name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THanks for writing all this parsing code. We really need a lexer generator, but I haven’t yet found one that’s well suited to binary protocols.
namespace ThriftProxy { | ||
|
||
int8_t BufferHelper::peekI8(Buffer::Instance& buffer, size_t offset) { | ||
ASSERT(buffer.length() >= 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about throwing an exception here if the buffer length is too small, than asserting? Even though that would mean an extra conditional branch in the critical path, the branch would always go in the same direction in the absence of bugs, so the CPU should be able to predict the branch with near-perfect accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
int8_t BufferHelper::peekI8(Buffer::Instance& buffer, size_t offset) { | ||
ASSERT(buffer.length() >= 1); | ||
int8_t i; | ||
buffer.copyOut(offset, 1, &i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worthwhile to add special methods to Buffer::Instance
for the 1, 2, and 4 byte special cases, since a general-purpose copy will be slow for those. But I’m happy to wait and see if this shows up as an issue in profiles first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll hold off for now. If it's useful, we'll want to use across a bunch of protocols and it should go in its own PR.
} | ||
|
||
double BufferHelper::drainDouble(Buffer::Instance& buffer) { | ||
ASSERT(sizeof(double) == sizeof(uint64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for the union
trick below to work, double
and uint64_t
have to have the same endianness. It’s probably possible to test for that at compile time, but off the top of my head I can’t think of a simple way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to do this at compile time in C with a trick similar to the union used below, but not in C++.
In fact, according to what I just read (https://stackoverflow.com/a/24431286 and https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior/11996970) the union below shouldn't be allowed either (though both Apache Thrift and Protobuf use it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s my first batch of comments on the .cc files. I’ll read more tomorrow.
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
}; | ||
|
||
/** | ||
* BinaryProtocolImpl implements the Thrift Binary protocol with non-strict (e.g. lax) message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LaxBinaryProtocolImpl
...
}; | ||
|
||
/** | ||
* BinaryProtocolImpl implements the Thrift Binary protocol with non-strict (e.g. lax) message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LaxBinaryProtocolImpl
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
|
||
uint16_t version = BufferHelper::peekU16(buffer); | ||
if (BinaryProtocolImpl::isMagic(version)) { | ||
setProtocol(std::make_unique<BinaryProtocolImpl>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to determine at this point in the processing whether we need a LaxBinaryProtocolImpl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only guarantee that the non-strict binary protocol makes is that the leading bit is 0, so misidentifying another protocol seems pretty likely and the result would be interpreting the first four bytes as message name length. For instance, with auto-transport and auto-protocol, an errant HTTP request would be interpreted as an unframed, non-strict binary thrift message with a message name of ~1.1 GB.
I think longer term, we'll want to have a configurable limit on the size of encoded strings (perhaps separate limits for message names vs. fields). With that in place, I'd feel more comfortable allowing the non-strict binary protocol to be detected automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my next batch of notes. The last thing remaining for me to read is compact_protocol.cc
, which I'll try to get finished in the next day.
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just finished reading through the rest of the PR, and looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed and it all looks sane to me. I'm sure we can flush out any issues once we start putting traffic through it. Very nice!
Provides Thrift protocol primitives, with tests, that will be used to create Thrift protocol filters in a subsequent PR. Implements the Thrift framed and unframed transports as well as an "auto" transport that detects the transport based on the first few bytes in a Buffer. Similarly, the Thrift binary and compact protocols are implemented along with an "auto" protocol.
Signed-off-by: Stephan Zuercher stephan@turbinelabs.io
Risk Level: Low - no filters currently make use of these primitives
Testing: unit tests, manual integration testing with a work-in-progress filter in a private branch
Docs Changes: n/a
Release Notes: n/a
Relates To: #2247