Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol Changes #8

Closed
wants to merge 7 commits into from
Closed

Conversation

llchan
Copy link

@llchan llchan commented Nov 1, 2019

This PR is a proposal for two protocol-level changes. Some stuff is not pushed yet, just getting this PR started.

  • Use Flatbuffers rather than Protobuf. Preliminary tests are looking promising (1.6x higher throughput in publishing, for example), and I think the added control over allocations and encoding/decoding logic will be beneficial in pushing performance. The tradeoff is that fewer people are familiar with flatbuffers, but the Go and C++ code generators at least have a protobuf-like object API that make it pretty intuitive for protubuf folks, and honestly the lower-level message builder is pretty straightforward as well.
  • NATS message envelope, which every Liftbridge message should use when communicating over NATS. This affords us internal consistency as well as safety if other messages are sent on those subjects.

@llchan
Copy link
Author

llchan commented Nov 1, 2019

A few comments and open questions:

  • I think it makes sense to make the Message as lean as possible, and additionally keep it immutable through its lifecycle. For publishing, we can include ack inbox/policy etc. in the publish request, and for subscriptions we include metadata in what I'm calling SubscriptionMessage.
  • I'm not sure reply and message headers need to be in Liftbridge. Those seem like things that should live inside the value (aka payload). The key makes sense, because it is used for log compaction, but the others seem application-level to me. I think we should keep Liftbridge as lean as possible unless it needs to know about these fields. Thoughts?

Copy link
Member

@tylertreat tylertreat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, only real question is around MsgType.

README.md Outdated
| MsgType | Description |
| ------- | ----------- |
| 0 | Publish |
| 1 | Replication |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the value of specifying the message type. Do you have a use for this information?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were considering using the same envelope for the replication messges as well, in which case we need to know which message type to decode the payload with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we be handling decoding differently between publishes vs. replication? In either case we should know if it's a publish or a replicated message based on the context in which the message is received, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose maybe I'm just being overly cautious. If the user happens to send liftbridge-encoded publishes over the replication subject, it could be catastrophic. But maybe we can just choose a different magic number for the replication header and call it a day? I'm fine with that, and we can keep this byte reserved for better use cases down the road.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different magic number makes sense to me, but I can be convinced otherwise.

README.md Outdated
## NATS Message Envelope

Every Liftbridge message that is sent over NATS should be sent with the following
envelope header:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important that we still support "plain" NATS messages for cases where people just want transparent recording of NATS subjects without publisher code changes. The trade-off obviously is that you give up the features of the envelope (key, acking, etc.), but that's the expectation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely. I should be more clear in the wording here. The plain messages on subjects with streams attached will be treated as value-only messages with no acks etc, just like you have it now. This note is only referring to the Liftbridge-generated messges sent over nats.

@tylertreat
Copy link
Member

To answer your questions...

I think it makes sense to make the Message as lean as possible, and additionally keep it immutable through its lifecycle. For publishing, we can include ack inbox/policy etc. in the publish request, and for subscriptions we include metadata in what I'm calling SubscriptionMessage.

I'm not sure we can do this. What of the case where you publish a Message directly to NATS, not through Liftbridge's Publish API?

I'm not sure reply and message headers need to be in Liftbridge. Those seem like things that should live inside the value (aka payload). The key makes sense, because it is used for log compaction, but the others seem application-level to me. I think we should keep Liftbridge as lean as possible unless it needs to know about these fields. Thoughts?

Reply is a NATS concept which Liftbridge includes on the message in order to expose the reply on the original NATS message. This is mainly for cases where publishers are publishing plain, Liftbridge-agnostic NATS messages. We want to expose the info set on the original message.

Headers I'm torn on since this was a frequently requested feature in NATS Streaming and Kafka supports it.

@llchan
Copy link
Author

llchan commented Nov 1, 2019

  • The first note about refactoring the Message is mostly internal to the client/server encoding. The user would still interact with the Publish API or NewMessage function, it would just be packed differently in the flatbuffers, such that the Message is self-contained and contains no pointers to anything outside of it (i.e. it is memcpy-able opaquely). Perhaps this will be more clear once I push up the next batch.
  • Gotcha, did not realize NATS has a built-in reply concept. That makes sense to keep as a first-class field.
  • If headers are very commonly-requested we can keep them first-class as well. Now that I think about it, if we ever want to do application-agnostic header-based filtering/routing we would need that so we should leave that possibility open.

@tylertreat
Copy link
Member

The first note about refactoring the Message is mostly internal to the client/server encoding. The user would still interact with the Publish API or NewMessage function, it would just be packed differently in the flatbuffers, such that the Message is self-contained and contains no pointers to anything outside of it (i.e. it is memcpy-able opaquely). Perhaps this will be more clear once I push up the next batch.

I think I follow now.

Now that I think about it, if we ever want to do application-agnostic header-based filtering/routing we would need that so we should leave that possibility open.

Yeah, that is what I was thinking as well. Headers give us a lot of flexibility to do interesting things in the future. Routing/filtering is a good example.

@llchan
Copy link
Author

llchan commented Nov 1, 2019

I'm also starting to look at the on-disk encoding, which I think can be unified with the wire protocol.
One thing I didn't quite understand: what are Attributes? It doesnt currently exist in the wire protocol, just want to make sure I'm not missing something.

@tylertreat
Copy link
Member

One thing I didn't quite understand: what are Attributes? It doesnt currently exist in the wire protocol, just want to make sure I'm not missing something.

The thought was to use attributes for flags such as compression, e.g. a compression codec. The on-disk protocol could definitely use some cleaning up though.

@llchan
Copy link
Author

llchan commented Nov 1, 2019

Ah got it, good idea. We can probably wrap the messages in a on-disk flatbuffers message, so that the only things we need to manually encode are the size and CRC-32. I think that will be more future-proof in the event that we want to add more metadata to the message (e.g. compression algorithm). Might also be good to add a header to the segment for segment-level configuration---arguably compression would be able to do more with large batches of messages, vs just within a single message. Anyways I'm kind of digressing, should keep this PR focused on the client-facing API.

Copy link
Member

@tylertreat tylertreat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one question around the use of required.

api.fbs Outdated
// CreateStreamRequest is sent to create a new stream.
table CreateStreamRequest {
subject : string (id: 0); // Stream NATS subject
name : string (id: 1); // Stream name (unique per subject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with flatbuffers, but why not mark more fields as required?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required is kind of a very strong guarantee that the field will never change, and the idea of using it very sparingly is one of the lessons learned from protobuf (they removed required fields entirely in proto3, see [1] and the quote below). Once something is required, it can never be modified without changing all clients at the same time, which defeats the purpose of backwards-compatibility of schema changes. In fact, I'd say maybe we should go the other way and mark nothing as required here. I have it on a few key fields but maybe for upgradeability i should remove those.

We dropped required fields in proto3 because required fields are generally considered harmful and violating protobuf's compatibility semantics. The whole idea of using protobuf is that it allows you to add/remove fields from your protocol definition while still being fully forward/backward compatible with newer/older binaries. Required fields break this though. You can never safely add a required field to a .proto definition, nor can you safely remove an existing required field because both of these actions break wire compatibility. For example, if you add a required field to a .proto definition, binaries built with the new definition won't be able to parse data serialized using the old definition because the required field is not present in old data. In a complex system where .proto definitions are shared widely across many different components of the system, adding/removing required fields could easily bring down multiple parts of the system. We have seen production issues caused by this multiple times and it's pretty much banned everywhere inside Google for anyone to add/remove required fields. For this reason we completely removed required fields in proto3.

[1] protocolbuffers/protobuf#2497

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'm ok with removing required altogether.

@llchan
Copy link
Author

llchan commented Nov 6, 2019

Also I'm overseas for a work trip at the moment but I'll try to carve out some time this weekend to get some of the remaining things I have in my worktree pushed up. Hopefully we can get this closer to merge-ready sooner rather than later so everyone has some time to absorb the changes.

@llchan
Copy link
Author

llchan commented Nov 20, 2019

Hey sorry didn't get a chance to take a look while I was away but I'm back now so will give this some attention over the next few days.

Most notably, we make the core message payload its own table, so that it
is self-contained and wrapper messages can simply memcpy it around.
@tylertreat
Copy link
Member

@llchan Wondering what the state of this is?

@llchan
Copy link
Author

llchan commented Dec 17, 2019

Was running around a bit for Thanksgiving + a conference, but can pick up on this tomorrow. Last I left off I was reading through the existing on-disk serialization code for the liftbridge server. The implementation there may influence the way we should structure our messages---ideally we can memcpy blobs without parsing/unpacking, and possibly keep the door open for compression/mmap in the future. I think the current state of the PR already includes some adjustments towards this end, but will need to review.

EARLIEST = 2, // Start at the oldest message
LATEST = 3, // Start at the newest message
TIMESTAMP = 4, // Start at a specified (ack) timestamp
}
Copy link
Contributor

@caioaao caioaao Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo start position could be an Union instead. This should make consistency easier (eg: not setting an offset when start position type is timestamp)

@tylertreat
Copy link
Member

Closed in favor of #14.

@tylertreat tylertreat closed this Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants