New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thrift: framed/unframed transport, binary/compact protocol #3509

Merged
merged 12 commits into from Jun 7, 2018

Conversation

Projects
None yet
5 participants
@zuercher
Copy link
Member

zuercher commented May 30, 2018

Provides Thrift protocol primitives, with tests, that will be used to create Thrift protocol filters in a subsequent PR. Implements the Thrift framed and unframed transports as well as an "auto" transport that detects the transport based on the first few bytes in a Buffer. Similarly, the Thrift binary and compact protocols are implemented along with an "auto" protocol.

Signed-off-by: Stephan Zuercher stephan@turbinelabs.io

Risk Level: Low - no filters currently make use of these primitives
Testing: unit tests, manual integration testing with a work-in-progress filter in a private branch
Docs Changes: n/a
Release Notes: n/a
Relates To: #2247

thrift: framed/unframed transport, binary/compact protocol
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher

This comment has been minimized.

Copy link
Member

zuercher commented May 30, 2018

@brian-pane
Copy link
Contributor

brian-pane left a comment

Thanks for implementing this! I read through the header files and added some notes. I'll read the .cc files over the next couple of days.

#include "common/common/macros.h"
#include "common/singleton/const_singleton.h"

#include "absl/types/optional.h"

This comment has been minimized.

@brian-pane

brian-pane May 31, 2018

Contributor

Is this needed in protocol.h? It doesn't look like this file uses absl::optional (but I might have missed something).

#include "envoy/buffer/buffer.h"
#include "envoy/common/pure.h"

#include "common/common/assert.h"

This comment has been minimized.

@brian-pane

brian-pane May 31, 2018

Contributor

Does anything in protocol.h depend on this include?

* Thrift protocol message types.
* See https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TProtocol.h
*/
enum MessageType {

This comment has been minimized.

@brian-pane

brian-pane May 31, 2018

Contributor

I recommend using enum class rather than plain enum here, so that the value names (Call, Reply, etc) get namespaced within MessageType.

* Thrift protocol struct field types.
* See https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TProtocol.h
*/
enum FieldType {

This comment has been minimized.

@brian-pane

brian-pane May 31, 2018

Contributor

enum class

/**
* Names of available Protocol implementations.
*/
class ProtocolNameValues {

This comment has been minimized.

@brian-pane

brian-pane May 31, 2018

Contributor

It looks like the only use of this class is for each Protocol subclass to return a string representation of its own name. I think it would be cleaner to just make each Protocol subclass define its own name string internally. That way, we won't have to modify this central ProtocolNameValues class every time a new Protocol subclass is added.

This comment has been minimized.

@zuercher

zuercher May 31, 2018

Member

By and large these are used only during debug logging, but my original intention was to avoid allocating these strings. At some point I decided to make the auto protocol report its underlying protocol and ended up constructing or copying strings in all the name() methods.

Since I later added setProtocol to make the AutoProtocolImpl easier to test, I went ahead and made AutoProtocolImpl::name() return a field value, updated by setProtocol. This allowed me to have Protocol::name() return a const std::string&. So no more allocations, but I still have ProtocolNames.

I can move these strings into the Protocol implementations, but per the style guide, I'd need to use the CONSTRUCT_ON_FIRST_USE macro. AutoProtocolImpl would require it's own private method to wrap CONSTRUCT_ON_FIRST_USE for "auto". I think putting all these strings in this class and using ConstSingleton ends up being a little cleaner.

/**
* Names of available Transport implementations.
*/
class TransportNameValues {

This comment has been minimized.

@brian-pane

brian-pane May 31, 2018

Contributor

I have the same suggestion for this as for ProtocolNameValues: make each subclass of Transport define its own name string, rather than collecting the names in a centralized class that has to be updated every time there's a new Transport subclass.

* decodeFrameStart decodes the start of a transport message, potentially invoking callbacks.
*
* @param buffer the currently buffered thrift data.
* @return bool true if a complete frame header was successfully consumed, false if more data

This comment has been minimized.

@brian-pane

brian-pane May 31, 2018

Contributor

If the frame header is consumed, that means the corresponding bytes are trimmed from the buffer, right?

This comment has been minimized.

@zuercher

zuercher May 31, 2018

Member

Yes. I'll update the comment (and all the others that behave the same way).

@alyssawilk

This comment has been minimized.

Copy link
Contributor

alyssawilk commented May 31, 2018

Ignoring the code, would you be up for adding CODEOWNERS per https://docs.google.com/document/d/1eDQQSxqx2khTXfa2vVm4vqkyRwXYkPzZCcbjxJ2_AvA/edit ? You're clearly set on the maintainer front but good to have documented for future reviews :-)

zuercher added some commits May 31, 2018

CODEOWNERS
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
make transport/proto names const&; review comments
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
given Protocol mock a default name
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher

This comment has been minimized.

Copy link
Member

zuercher commented Jun 1, 2018

@brian-pane I think I addressed your comments.

if (buffer.length() < 3) {
return false;
}
field_id = BufferHelper::peekI16(buffer, 1);

This comment has been minimized.

@brian-pane

brian-pane Jun 4, 2018

Contributor

Is it valid for the field ID to be negative?

This comment has been minimized.

@zuercher

zuercher Jun 4, 2018

Member

The protocol specifications all refer to it as a signed integer with no mention of the validity of negative values.

The Thrift IDL documents make no mention of the range of values allowed. In fact, the grammar defines it as an IntConstant, which implies negative values are legal. Empirically, the Thrift compiler will only accept values from 1 to 32767.

In practice, the generated bindings allow negative field IDs, but ignore them like any other unknown field ID.

This comment has been minimized.

@zuercher

zuercher Jun 4, 2018

Member

I'm going to leave this as-is from that standpoint that it's better to be a bit lenient in what's accepted.

}

bool BinaryProtocolImpl::readDouble(Buffer::Instance& buffer, double& value) {
ASSERT(sizeof(double) == sizeof(uint64_t));

This comment has been minimized.

@brian-pane

brian-pane Jun 4, 2018

Contributor

Since Envoy is using C++14, I propose using the new static_assert here, since that will work for both debug and non-debug builds. (https://en.cppreference.com/w/cpp/language/static_assert)

This comment has been minimized.

@zuercher

const uint16_t BinaryProtocolImpl::Magic = 0x8001;

bool BinaryProtocolImpl::readMessageBegin(Buffer::Instance& buffer, std::string& name,

This comment has been minimized.

@brian-pane

brian-pane Jun 4, 2018

Contributor

THanks for writing all this parsing code. We really need a lexer generator, but I haven’t yet found one that’s well suited to binary protocols.

namespace ThriftProxy {

int8_t BufferHelper::peekI8(Buffer::Instance& buffer, size_t offset) {
ASSERT(buffer.length() >= 1);

This comment has been minimized.

@brian-pane

brian-pane Jun 4, 2018

Contributor

What do you think about throwing an exception here if the buffer length is too small, than asserting? Even though that would mean an extra conditional branch in the critical path, the branch would always go in the same direction in the absence of bugs, so the CPU should be able to predict the branch with near-perfect accuracy.

This comment has been minimized.

@zuercher

zuercher Jun 4, 2018

Member

Seems reasonable.

int8_t BufferHelper::peekI8(Buffer::Instance& buffer, size_t offset) {
ASSERT(buffer.length() >= 1);
int8_t i;
buffer.copyOut(offset, 1, &i);

This comment has been minimized.

@brian-pane

brian-pane Jun 4, 2018

Contributor

It might be worthwhile to add special methods to Buffer::Instance for the 1, 2, and 4 byte special cases, since a general-purpose copy will be slow for those. But I’m happy to wait and see if this shows up as an issue in profiles first.

This comment has been minimized.

@zuercher

zuercher Jun 4, 2018

Member

I think I'll hold off for now. If it's useful, we'll want to use across a bunch of protocols and it should go in its own PR.

}

double BufferHelper::drainDouble(Buffer::Instance& buffer) {
ASSERT(sizeof(double) == sizeof(uint64_t));

This comment has been minimized.

@brian-pane

brian-pane Jun 4, 2018

Contributor

static_assert

This comment has been minimized.

@brian-pane

brian-pane Jun 4, 2018

Contributor

Also, for the union trick below to work, double and uint64_t have to have the same endianness. It’s probably possible to test for that at compile time, but off the top of my head I can’t think of a simple way to do it.

This comment has been minimized.

@zuercher

zuercher Jun 4, 2018

Member

It's possible to do this at compile time in C with a trick similar to the union used below, but not in C++.

In fact, according to what I just read (https://stackoverflow.com/a/24431286 and https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior/11996970) the union below shouldn't be allowed either (though both Apache Thrift and Protobuf use it).

@brian-pane
Copy link
Contributor

brian-pane left a comment

Here’s my first batch of comments on the .cc files. I’ll read more tomorrow.

review comments (static_assert, throw on underflow)
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
};

/**
* BinaryProtocolImpl implements the Thrift Binary protocol with non-strict (e.g. lax) message

This comment has been minimized.

@rgs1

rgs1 Jun 4, 2018

Contributor

nit: LaxBinaryProtocolImpl...

};

/**
* BinaryProtocolImpl implements the Thrift Binary protocol with non-strict (e.g. lax) message

This comment has been minimized.

@rgs1

rgs1 Jun 4, 2018

Contributor

nit: LaxBinaryProtocolImpl

zuercher added some commits Jun 4, 2018

use memcpy instead of the union trick
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
fix typo
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
link slides instead of video
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

uint16_t version = BufferHelper::peekU16(buffer);
if (BinaryProtocolImpl::isMagic(version)) {
setProtocol(std::make_unique<BinaryProtocolImpl>());

This comment has been minimized.

@brian-pane

brian-pane Jun 5, 2018

Contributor

Is it possible to determine at this point in the processing whether we need a LaxBinaryProtocolImpl?

This comment has been minimized.

@zuercher

zuercher Jun 5, 2018

Member

The only guarantee that the non-strict binary protocol makes is that the leading bit is 0, so misidentifying another protocol seems pretty likely and the result would be interpreting the first four bytes as message name length. For instance, with auto-transport and auto-protocol, an errant HTTP request would be interpreted as an unframed, non-strict binary thrift message with a message name of ~1.1 GB.

I think longer term, we'll want to have a configurable limit on the size of encoded strings (perhaps separate limits for message names vs. fields). With that in place, I'd feel more comfortable allowing the non-strict binary protocol to be detected automatically.

@brian-pane
Copy link
Contributor

brian-pane left a comment

Here's my next batch of notes. The last thing remaining for me to read is compact_protocol.cc, which I'll try to get finished in the next day.

zuercher added some commits Jun 5, 2018

remove size_t, fix offset handling in BufferHelper, test underflows
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
missed one underflow
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
remove stray TODO comment
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@brian-pane
Copy link
Contributor

brian-pane left a comment

I just finished reading through the rest of the PR, and looks good to me.

@mattklein123
Copy link
Member

mattklein123 left a comment

I skimmed and it all looks sane to me. I'm sure we can flush out any issues once we start putting traffic through it. Very nice!

@zuercher zuercher merged commit 2c37930 into envoyproxy:master Jun 7, 2018

12 checks passed

DCO All commits have a DCO sign-off from the author
Details
ci/circleci: api Your tests passed on CircleCI!
Details
ci/circleci: asan Your tests passed on CircleCI!
Details
ci/circleci: build_image Your tests passed on CircleCI!
Details
ci/circleci: coverage Your tests passed on CircleCI!
Details
ci/circleci: docs Your tests passed on CircleCI!
Details
ci/circleci: filter_example_mirror Your tests passed on CircleCI!
Details
ci/circleci: format Your tests passed on CircleCI!
Details
ci/circleci: ipv6_tests Your tests passed on CircleCI!
Details
ci/circleci: mac Your tests passed on CircleCI!
Details
ci/circleci: release Your tests passed on CircleCI!
Details
ci/circleci: tsan Your tests passed on CircleCI!
Details

@zuercher zuercher deleted the turbinelabs:stephan/thrift-primitives branch Jul 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment