Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MessageId (progress towards cleaning up transform invariants) #1476

Merged
merged 2 commits into from
Feb 16, 2024

Conversation

rukai
Copy link
Member

@rukai rukai commented Feb 14, 2024

This PR introduces a MessageId type and assigns a unique MessageId to every message that goes through shotover.
Additionally responses now hold the MessageId of their corresponding request.

The API for this is exposed as methods on the Message type:

impl Message {
    /// Return the shotover assigned MessageId
    pub fn id(&self) -> MessageId {
        self.id
    }

    /// Return the MessageId of the request that resulted in this message
    /// Returns None when:
    /// * The message is a request
    /// * The message is a response but was not created in response to a request. e.g. Cassandra events and redis pubsub
    pub fn request_id(&self) -> Option<MessageId> {
        self.request_id
    }
}

This will eventually replace the "requests are in the same order as responses" invariant but for now it is just in addition to it.

I updated the Transform::transform docs where we document such invariants and it is rendered below.
The important sections are "Request/Response invariants" and "Deprecated invariants".

image

I have not observed any regressions in benchmark results with this PR and the call to rand doesnt even show up when profiling.

Benefits

  • Improves correctness - Gives us a path forward to resolve some old invariant violations in a performant way Possible design flaw - Filter, Protect etc. do not follow transform invariant #499
  • Allows future performance improvements:
    • We can return some responses before all of them have been returned.
    • We can start sending a new batch of requests before we have any responses back.
    • Better handle out of order protocols (currently only cassandra)
  • Will allow us to simplify handling of pubsub/event messages which are currently special cased via transform_pushed
  • will allow us to simplify cassandra connection logic, a lot of complexity in there to pretend cassandra is not an out of order protocol
  • will allow us to simplify redis pubsub implementation

Downsides

  • The filter transform is now very awkward to implement, we'll have to keep track of the message id of the previous message in order to know where to insert the error message.
    • Maybe in the future we should clean up the filter transform by having relaxed invariants in subchains. Filter is only useful in a subchain anyway and we could make it much more efficient this way by skipping the generation of dummy error messages.
    • Or another possible solution, replace "to be deleted" messages with a dummy frame. That way the codec can just respond with a dummy response and we have a dummy response that we can replace with the error. This would actually be quite efficient as we dont need to insert or remove from the vec of messages.
  • for some other transforms a hashmap of message ids is slightly more complicated than a zip with requests
  • hashmap look ups are slightly more expensive than a zip with requests - But I believe the other performance wins will largely outweigh this.

Review

I've included a port of the protect and CassandraSinkCluster transforms to the message id approach, but I've left other ports out of the PR to keep scope down.
I am fairly confident that this approach will work well for all of our transforms.
But if you have a specific transform you would like me to include in the PR to see how it fares, just let me know.

With "hide whitespace" enabled the protect transform has a very small diff, so for the typical user implemented rewrite transform not much has changed.

Copy link

0 benchmark regressed. 1 benchmark improved. Please check the benchmark workflow logs for full details: https://github.com/shotover/shotover-proxy/actions/runs/7894953323

Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
cassandra_codec/encode_system.local_result_v5_no_compression
                        time:   [19.358 µs 19.771 µs 20.181 µs]
                        change: [-26.889% -23.661% -20.354%] (p = 0.00 < 0.05)
                        Performance has improved.

@rukai rukai force-pushed the message_id branch 4 times, most recently from 9040996 to be8a55d Compare February 14, 2024 08:07
@rukai rukai marked this pull request as ready for review February 14, 2024 08:52
@rukai rukai requested a review from conorbros February 14, 2024 08:53
@shotover shotover deleted a comment from github-actions bot Feb 15, 2024
@shotover shotover deleted a comment from github-actions bot Feb 15, 2024
@shotover shotover deleted a comment from github-actions bot Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants