Pull lots of generated code out into the gorums package #99

johningve · 2020-07-31T09:13:47Z

This PR attempts to move as much of the generated code out into the gorums package. What remains in the generated code is mostly wrappers that convert types between the application types and the generic ProtoMessage types.

Fixes #61
Fixes #42 probably
Related #17

This moves as much as possible of the static code included with the generated code into the gorums package. I have also refactored the quorum call such that the implementation is in the gorums package too. What remains in the generated code is only wrappers that convert between application and generic types.

Gorums should generate regular RPCs when no method options are specified

The encoding and reconnect tests have been removed, as they relied on unexported fields and methods from the generated code. These tests can probably be reimplemented in the future, but for now they are removed

Gorums and gRPC can no longer be used in the same service.

First number should be 1 since servers start at 0

This shows the cost of doing a map variant vs slice; about 2x slower.

Got it working this time: init function in generated code calls RegisterCodec. In addition, the Manager in the static code specifies the content subtype. As a bonus, we no longer need to get the methodInfo from the generated code.

meling

Looks great!

benchmark/benchmark.go

cmd/benchmark/main.go

cmd/protoc-gen-gorums/dev/config.go

tests/metadata/metadata_test.go

tests/ordering/order_test.go

meling · 2020-08-04T10:58:41Z

tests/reconnect/reconnect_test.go

@@ -1,60 +0,0 @@
-package reconnect


Don't we want this?

tests/tls/tls_test.go

If the server stops calling RecvMsg(), the SendMsg() call will eventually start blocking. If this happens, the sendQ channels will eventually fill up and block the QC. To prevent this, we can use the QC context to abort the stream if SendMsg() is blocking. The stream will then try to reconnect.

When a server is not calling RecvMsg(), the calls to SendMsg() will eventually block. This test makes sure that we don't block forever.

The gRPC documentation states that it is unsafe to call CloseSend concurrently with SendMsg. Instead, we can use context cancellation to stop the stream. This requires making a new context each time we start the stream. In addition, I realized that orderedNodeStream.sendMsg() can run the gRPC SendMsg() directly instead of in a goroutine, with the logic to cancel the stream running in a goroutine instead. This seems to improve performance.

os.Rename fails if src and dest are on different devices.

This makes the unicast and multicast calltypes use a timeout that is configured thrught the new WithSendTimeout() manager option. This is done instead of accepting a context from the caller, because it is not possible for the caller to know when to correctly cancel the context.

Running make with most up to date version of protoc installed causes the generated files to change.

meling · 2020-12-22T19:49:24Z

This is a re-posting of @Raytar's comment that were unexpectedly deleted by a bug in GitHub.

I've been thinking about the change done in 8b3912e, and I'm not sure if it's best to keep it or revert it.

Option 1: To sum it up, the commit removes the context parameter from the unicast and multicast calltypes, and instead creates a timeout context internally before sending the message. This timeout can be set by using a manager option. I am not a fan of needing timeouts everywhere, which is why I am now reconsidering this commit.

These contexts are needed to abort the stream if it gets stuck (if the receiving node stops receiving messages or is very slow). The reason I removed the context parameter is that the code that calls the calltype does not know whether or not a message has been sent, making it difficult to cancel the context correctly.

Before this commit, the calling code would have to deal with the cancel function somehow.
One solution would be to ignore the cancel:

ctx, _ := context.WithTimeout(context.Background(), 10 * time.Millisecond)
node.Unicast(ctx, msg)

But now we get a govet warning:

the cancel function returned by context.WithTimeout should be called, not discarded, to avoid a context leak

Looking at the source code for the timeout context, the context is obviously canceled when the timeout happens, and so I assume that the code above does not actually leak a context after the timeout. However, it is not ideal to ignore the documentation.

Option 2: Another solution would be to keep the cancel function and run it at a time when the message is assumed to be sent. For example, in HotStuff, votes are sent to the leader using a unicast. We could store the context cancel function until the next time the node votes, at which point it is safe to cancel the last context because it doesn't matter whether or not the previous vote was sent. This way we might be able to use a cancel context instead of a timeout context as well. It would look like this:

func handlePropose(msg Proposal) {
  // prepare vote
  ...
  // cancels the context from the previous proposal.
  // ensures that the stream will be reconnected if it gets stuck,
  // so that the next vote can be put in the queue without blocking.
  cancel()
  ctx, cancel = context.WithCancel(context.Background())
  leader.Vote(ctx, vote)
}

Option 3: Another possible solution is to make the Unicast/Multicast methods in Gorums block until the message is actually sent. I haven't looked into how this could be done, but then the context can simply be canceled as soon as the method returns. But this might create a problem in HotStuff: Currently, a proposal is handled synchronously for each leader, and the votes are sent at the end of the handlers. But if the voting becomes a blocking function, then the node may not be able to handle another message from the proposer until the vote times out and unblocks.

I am leaning towards reverting this commit for now and implementing the second solution (canceling the previous context when the next message is being prepared) in HotStuff.

meling · 2020-12-22T20:02:10Z

Option 2: I like this option for the HotStuff scenario, and similar scenarios, where the next call to the same method actually cancels/overrides any previous calls. Hence, this should work for scenarios like the Propose scenario in HotStuff and idempotent messages. The question I have is whether or not there are scenarios where that's not the desired behavior.

Option 3: This might be possible now that we no longer depend on gRPC for this stuff; we might actually know when a message has been sent, which wasn't possible in gRPC. I haven't studied this in-depth.

This reverts commit 8b3912e.

Messages can optionally be sent asynchronously by using the WithAsyncSend call option.

johningve added 10 commits July 28, 2020 15:48

Make the server work with the new package

0587321

Implemented multicast

d9850b7

Implemented RPCs

849c017

Implemented Unicast

449632f

Implemented future

1cb5cae

Updated benchmark

8fb6ebf

Override Nodes() of generated manager

ec05182

Implemented correctables

fe048e9

Cleanup plugin and remove unused code

a270914

johningve added the enhancement label Jul 31, 2020

johningve requested a review from meling July 31, 2020 09:13

johningve and others added 11 commits July 31, 2020 12:58

Fix example

823941f

Fix stability test

2b3a685

Fix generation of rpc-only services

8cebeab

Gorums should generate regular RPCs when no method options are specified

Fix tests

1f531b4

The encoding and reconnect tests have been removed, as they relied on unexported fields and methods from the generated code. These tests can probably be reimplemented in the future, but for now they are removed

Remove grpc generator test

78ae734

Gorums and gRPC can no longer be used in the same service.

Fix unused import in multicast

7534593

protoc_runner: don't generate gRPC services

22fa58c

Makefile: run tests in all packages

5a87cc4

order_test: limit each test to 5 seconds

0489457

Fix order_test

1aa5ce6

First number should be 1 since servers start at 0

updated the qf implementation; both slice and map comparison

98b0f25

This shows the cost of doing a map variant vs slice; about 2x slower.

meling mentioned this pull request Aug 3, 2020

Fix makefile dependencies #100

Open

Use encoding.RegisterCodec instead of deprecated API

d5545cd

Got it working this time: init function in generated code calls RegisterCodec. In addition, the Manager in the static code specifies the content subtype. As a bonus, we no longer need to get the methodInfo from the generated code.

meling requested changes Aug 4, 2020

View reviewed changes

johningve added 4 commits August 20, 2020 13:12

Cleanup benchmark

957cf5d

Test that client is not blocked by unresponsive server

efe1962

When a server is not calling RecvMsg(), the calls to SendMsg() will eventually block. This test makes sure that we don't block forever.

Don't try to typecast response on error

b39457f

johningve force-pushed the static-code-refactor branch from 9471ca1 to b39457f Compare October 8, 2020 21:26

johningve and others added 12 commits October 8, 2020 23:40

rename WithServerBufferSize to WithReceiveBufferSize

874ac70

rename gorums.GRPCError to gorums.Error

568a83d

Add line breaks to long function calls

aadf927

stability_test: copy file if rename fails

0e4db64

os.Rename fails if src and dest are on different devices.

Don't generate QSpec Methods for Unicast

b49d448

Upgrade dependencies

3cbf51a

Fixed minor typo

c271da6

Added doc comment for QuorumCallData

03ceb4f

Updated golangci-lint version for GitHub actions

a1028a4

Minor

9ec13ea

meling mentioned this pull request Dec 21, 2020

Fix CI issues #101

Closed

meling added 5 commits December 21, 2020 21:43

Updated generated code with protobuf 3.14.0

740eed5

Running make with most up to date version of protoc installed causes the generated files to change.

Regenerated qf tests

81b9b3d

Updated examples/storage to use new API

87c4350

Replaced ioutil.TempDir with testing.TempDir

766da9f

Upgraded matrix strategy since our tests now require 1.15

084af19

relab deleted a comment from johningve Dec 22, 2020

johningve added 3 commits December 23, 2020 13:49

examples/storage: add a multicast example

1d68c41

Revert "uni/multicast: use user-provided timeout"

a1e1c03

This reverts commit 8b3912e.

uni/multicast: block until message is sent

dd5e001

Messages can optionally be sent asynchronously by using the WithAsyncSend call option.

meling merged commit 75dce3a into relab:master Jan 7, 2021

johningve deleted the static-code-refactor branch January 8, 2021 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull lots of generated code out into the gorums package #99

Pull lots of generated code out into the gorums package #99

johningve commented Jul 31, 2020 •

edited

Loading

meling left a comment

meling Aug 4, 2020

meling commented Dec 22, 2020

meling commented Dec 22, 2020

Pull lots of generated code out into the gorums package #99

Pull lots of generated code out into the gorums package #99

Conversation

johningve commented Jul 31, 2020 • edited Loading

meling left a comment

Choose a reason for hiding this comment

meling Aug 4, 2020

Choose a reason for hiding this comment

meling commented Dec 22, 2020

meling commented Dec 22, 2020

johningve commented Jul 31, 2020 •

edited

Loading