Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull lots of generated code out into the gorums package #99

Merged
merged 46 commits into from
Jan 7, 2021

Conversation

johningve
Copy link
Member

@johningve johningve commented Jul 31, 2020

This PR attempts to move as much of the generated code out into the gorums package. What remains in the generated code is mostly wrappers that convert types between the application types and the generic ProtoMessage types.

Fixes #61
Fixes #42 probably
Related #17

This moves as much as possible of the static code included with the
generated code into the gorums package.
I have also refactored the quorum call such that the implementation is
in the gorums package too.
What remains in the generated code is only wrappers that convert between
application and generic types.
johningve and others added 11 commits July 31, 2020 12:58
Gorums should generate regular RPCs when no method options are specified
The encoding and reconnect tests have been removed, as they relied on
unexported fields and methods from the generated code. These tests
can probably be reimplemented in the future, but for now they are removed
Gorums and gRPC can no longer be used in the same service.
First number should be 1 since servers start at 0
This shows the cost of doing a map variant vs slice; about 2x slower.
Got it working this time: init function in generated code calls RegisterCodec.
In addition, the Manager in the static code specifies the content subtype.
As a bonus, we no longer need to get the methodInfo from the generated code.
Copy link
Member

@meling meling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

benchmark/benchmark.go Show resolved Hide resolved
benchmark/benchmark.go Outdated Show resolved Hide resolved
cmd/benchmark/main.go Show resolved Hide resolved
cmd/benchmark/main.go Outdated Show resolved Hide resolved
cmd/protoc-gen-gorums/dev/config.go Outdated Show resolved Hide resolved
tests/metadata/metadata_test.go Outdated Show resolved Hide resolved
tests/metadata/metadata_test.go Outdated Show resolved Hide resolved
tests/ordering/order_test.go Outdated Show resolved Hide resolved
@@ -1,60 +0,0 @@
package reconnect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want this?

tests/tls/tls_test.go Outdated Show resolved Hide resolved
If the server stops calling RecvMsg(), the SendMsg() call will
eventually start blocking. If this happens, the sendQ channels will
eventually fill up and block the QC.

To prevent this, we can use the QC context to abort the stream if
SendMsg() is blocking. The stream will then try to reconnect.
When a server is not calling RecvMsg(), the calls to SendMsg() will
eventually block. This test makes sure that we don't block forever.
johningve and others added 12 commits October 8, 2020 23:40
The gRPC documentation states that it is unsafe to call CloseSend
concurrently with SendMsg. Instead, we can use context cancellation to
stop the stream. This requires making a new context each time we start
the stream.

In addition, I realized that orderedNodeStream.sendMsg() can run the
gRPC SendMsg() directly instead of in a goroutine, with the logic to
cancel the stream running in a goroutine instead. This seems to improve
performance.
os.Rename fails if src and dest are on different devices.
This makes the unicast and multicast calltypes use a timeout that is
configured thrught the new WithSendTimeout() manager option.

This is done instead of accepting a context from the caller, because it
is not possible for the caller to know when to correctly cancel the
context.
@meling meling mentioned this pull request Dec 21, 2020
@relab relab deleted a comment from johningve Dec 22, 2020
@meling
Copy link
Member

meling commented Dec 22, 2020

This is a re-posting of @Raytar's comment that were unexpectedly deleted by a bug in GitHub.

I've been thinking about the change done in 8b3912e, and I'm not sure if it's best to keep it or revert it.

Option 1: To sum it up, the commit removes the context parameter from the unicast and multicast calltypes, and instead creates a timeout context internally before sending the message. This timeout can be set by using a manager option. I am not a fan of needing timeouts everywhere, which is why I am now reconsidering this commit.

These contexts are needed to abort the stream if it gets stuck (if the receiving node stops receiving messages or is very slow). The reason I removed the context parameter is that the code that calls the calltype does not know whether or not a message has been sent, making it difficult to cancel the context correctly.

Before this commit, the calling code would have to deal with the cancel function somehow.
One solution would be to ignore the cancel:

ctx, _ := context.WithTimeout(context.Background(), 10 * time.Millisecond)
node.Unicast(ctx, msg)

But now we get a govet warning:

the cancel function returned by context.WithTimeout should be called, not discarded, to avoid a context leak

Looking at the source code for the timeout context, the context is obviously canceled when the timeout happens, and so I assume that the code above does not actually leak a context after the timeout. However, it is not ideal to ignore the documentation.

Option 2: Another solution would be to keep the cancel function and run it at a time when the message is assumed to be sent. For example, in HotStuff, votes are sent to the leader using a unicast. We could store the context cancel function until the next time the node votes, at which point it is safe to cancel the last context because it doesn't matter whether or not the previous vote was sent. This way we might be able to use a cancel context instead of a timeout context as well. It would look like this:

func handlePropose(msg Proposal) {
  // prepare vote
  ...
  // cancels the context from the previous proposal.
  // ensures that the stream will be reconnected if it gets stuck,
  // so that the next vote can be put in the queue without blocking.
  cancel()
  ctx, cancel = context.WithCancel(context.Background())
  leader.Vote(ctx, vote)
}

Option 3: Another possible solution is to make the Unicast/Multicast methods in Gorums block until the message is actually sent. I haven't looked into how this could be done, but then the context can simply be canceled as soon as the method returns. But this might create a problem in HotStuff: Currently, a proposal is handled synchronously for each leader, and the votes are sent at the end of the handlers. But if the voting becomes a blocking function, then the node may not be able to handle another message from the proposer until the vote times out and unblocks.

I am leaning towards reverting this commit for now and implementing the second solution (canceling the previous context when the next message is being prepared) in HotStuff.

@meling
Copy link
Member

meling commented Dec 22, 2020

Option 2: I like this option for the HotStuff scenario, and similar scenarios, where the next call to the same method actually cancels/overrides any previous calls. Hence, this should work for scenarios like the Propose scenario in HotStuff and idempotent messages. The question I have is whether or not there are scenarios where that's not the desired behavior.

Option 3: This might be possible now that we no longer depend on gRPC for this stuff; we might actually know when a message has been sent, which wasn't possible in gRPC. I haven't studied this in-depth.

Messages can optionally be sent asynchronously by using the
WithAsyncSend call option.
@meling meling merged commit 75dce3a into relab:master Jan 7, 2021
@johningve johningve deleted the static-code-refactor branch January 8, 2021 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow creating multiple overlapping configurations api: make wrapper(s) to simplify setup
2 participants