Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thrift: framed/unframed transport, binary/compact protocol #3509

Merged
merged 12 commits into from Jun 7, 2018

Conversation

zuercher
Copy link
Member

Provides Thrift protocol primitives, with tests, that will be used to create Thrift protocol filters in a subsequent PR. Implements the Thrift framed and unframed transports as well as an "auto" transport that detects the transport based on the first few bytes in a Buffer. Similarly, the Thrift binary and compact protocols are implemented along with an "auto" protocol.

Signed-off-by: Stephan Zuercher stephan@turbinelabs.io

Risk Level: Low - no filters currently make use of these primitives
Testing: unit tests, manual integration testing with a work-in-progress filter in a private branch
Docs Changes: n/a
Release Notes: n/a
Relates To: #2247

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher
Copy link
Member Author

@brian-pane PTAL

Copy link
Contributor

@brian-pane brian-pane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this! I read through the header files and added some notes. I'll read the .cc files over the next couple of days.

#include "common/common/macros.h"
#include "common/singleton/const_singleton.h"

#include "absl/types/optional.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed in protocol.h? It doesn't look like this file uses absl::optional (but I might have missed something).

#include "envoy/buffer/buffer.h"
#include "envoy/common/pure.h"

#include "common/common/assert.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything in protocol.h depend on this include?

* Thrift protocol message types.
* See https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TProtocol.h
*/
enum MessageType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using enum class rather than plain enum here, so that the value names (Call, Reply, etc) get namespaced within MessageType.

* Thrift protocol struct field types.
* See https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TProtocol.h
*/
enum FieldType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum class

/**
* Names of available Protocol implementations.
*/
class ProtocolNameValues {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only use of this class is for each Protocol subclass to return a string representation of its own name. I think it would be cleaner to just make each Protocol subclass define its own name string internally. That way, we won't have to modify this central ProtocolNameValues class every time a new Protocol subclass is added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By and large these are used only during debug logging, but my original intention was to avoid allocating these strings. At some point I decided to make the auto protocol report its underlying protocol and ended up constructing or copying strings in all the name() methods.

Since I later added setProtocol to make the AutoProtocolImpl easier to test, I went ahead and made AutoProtocolImpl::name() return a field value, updated by setProtocol. This allowed me to have Protocol::name() return a const std::string&. So no more allocations, but I still have ProtocolNames.

I can move these strings into the Protocol implementations, but per the style guide, I'd need to use the CONSTRUCT_ON_FIRST_USE macro. AutoProtocolImpl would require it's own private method to wrap CONSTRUCT_ON_FIRST_USE for "auto". I think putting all these strings in this class and using ConstSingleton ends up being a little cleaner.

/**
* Names of available Transport implementations.
*/
class TransportNameValues {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same suggestion for this as for ProtocolNameValues: make each subclass of Transport define its own name string, rather than collecting the names in a centralized class that has to be updated every time there's a new Transport subclass.

* decodeFrameStart decodes the start of a transport message, potentially invoking callbacks.
*
* @param buffer the currently buffered thrift data.
* @return bool true if a complete frame header was successfully consumed, false if more data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the frame header is consumed, that means the corresponding bytes are trimmed from the buffer, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll update the comment (and all the others that behave the same way).

@alyssawilk
Copy link
Contributor

Ignoring the code, would you be up for adding CODEOWNERS per https://docs.google.com/document/d/1eDQQSxqx2khTXfa2vVm4vqkyRwXYkPzZCcbjxJ2_AvA/edit ? You're clearly set on the maintainer front but good to have documented for future reviews :-)

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher
Copy link
Member Author

zuercher commented Jun 1, 2018

@brian-pane I think I addressed your comments.

if (buffer.length() < 3) {
return false;
}
field_id = BufferHelper::peekI16(buffer, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid for the field ID to be negative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol specifications all refer to it as a signed integer with no mention of the validity of negative values.

The Thrift IDL documents make no mention of the range of values allowed. In fact, the grammar defines it as an IntConstant, which implies negative values are legal. Empirically, the Thrift compiler will only accept values from 1 to 32767.

In practice, the generated bindings allow negative field IDs, but ignore them like any other unknown field ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this as-is from that standpoint that it's better to be a bit lenient in what's accepted.

}

bool BinaryProtocolImpl::readDouble(Buffer::Instance& buffer, double& value) {
ASSERT(sizeof(double) == sizeof(uint64_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Envoy is using C++14, I propose using the new static_assert here, since that will work for both debug and non-debug builds. (https://en.cppreference.com/w/cpp/language/static_assert)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


const uint16_t BinaryProtocolImpl::Magic = 0x8001;

bool BinaryProtocolImpl::readMessageBegin(Buffer::Instance& buffer, std::string& name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks for writing all this parsing code. We really need a lexer generator, but I haven’t yet found one that’s well suited to binary protocols.

namespace ThriftProxy {

int8_t BufferHelper::peekI8(Buffer::Instance& buffer, size_t offset) {
ASSERT(buffer.length() >= 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about throwing an exception here if the buffer length is too small, than asserting? Even though that would mean an extra conditional branch in the critical path, the branch would always go in the same direction in the absence of bugs, so the CPU should be able to predict the branch with near-perfect accuracy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

int8_t BufferHelper::peekI8(Buffer::Instance& buffer, size_t offset) {
ASSERT(buffer.length() >= 1);
int8_t i;
buffer.copyOut(offset, 1, &i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile to add special methods to Buffer::Instance for the 1, 2, and 4 byte special cases, since a general-purpose copy will be slow for those. But I’m happy to wait and see if this shows up as an issue in profiles first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll hold off for now. If it's useful, we'll want to use across a bunch of protocols and it should go in its own PR.

}

double BufferHelper::drainDouble(Buffer::Instance& buffer) {
ASSERT(sizeof(double) == sizeof(uint64_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_assert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for the union trick below to work, double and uint64_t have to have the same endianness. It’s probably possible to test for that at compile time, but off the top of my head I can’t think of a simple way to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to do this at compile time in C with a trick similar to the union used below, but not in C++.

In fact, according to what I just read (https://stackoverflow.com/a/24431286 and https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior/11996970) the union below shouldn't be allowed either (though both Apache Thrift and Protobuf use it).

Copy link
Contributor

@brian-pane brian-pane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s my first batch of comments on the .cc files. I’ll read more tomorrow.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
};

/**
* BinaryProtocolImpl implements the Thrift Binary protocol with non-strict (e.g. lax) message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LaxBinaryProtocolImpl...

};

/**
* BinaryProtocolImpl implements the Thrift Binary protocol with non-strict (e.g. lax) message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LaxBinaryProtocolImpl

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

uint16_t version = BufferHelper::peekU16(buffer);
if (BinaryProtocolImpl::isMagic(version)) {
setProtocol(std::make_unique<BinaryProtocolImpl>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to determine at this point in the processing whether we need a LaxBinaryProtocolImpl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only guarantee that the non-strict binary protocol makes is that the leading bit is 0, so misidentifying another protocol seems pretty likely and the result would be interpreting the first four bytes as message name length. For instance, with auto-transport and auto-protocol, an errant HTTP request would be interpreted as an unframed, non-strict binary thrift message with a message name of ~1.1 GB.

I think longer term, we'll want to have a configurable limit on the size of encoded strings (perhaps separate limits for message names vs. fields). With that in place, I'd feel more comfortable allowing the non-strict binary protocol to be detected automatically.

Copy link
Contributor

@brian-pane brian-pane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my next batch of notes. The last thing remaining for me to read is compact_protocol.cc, which I'll try to get finished in the next day.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Copy link
Contributor

@brian-pane brian-pane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just finished reading through the rest of the PR, and looks good to me.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed and it all looks sane to me. I'm sure we can flush out any issues once we start putting traffic through it. Very nice!

@zuercher zuercher merged commit 2c37930 into envoyproxy:master Jun 7, 2018
@zuercher zuercher deleted the stephan/thrift-primitives branch July 19, 2018 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants