Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gorums cannot handle nil for non-PerNodeArg Gorums RPC calls. #35

Open
wruiwr opened this issue Sep 24, 2017 · 10 comments
Open

Gorums cannot handle nil for non-PerNodeArg Gorums RPC calls. #35

wruiwr opened this issue Sep 24, 2017 · 10 comments

Comments

@wruiwr
Copy link

wruiwr commented Sep 24, 2017

For my read-write distributed storage implemented by Gorums,
if I send a nil message through either read or write quorum call,
then the quorum call returns a nil reply with error nil.

The reason for this issue is the callGRPC methods generated by Gorums.

For example in method: callGRPCWrite:

if arg == nil {
    // send a nil reply to the for-select-loop
    replyChan <- internalWriteResponse{node.id, nil, nil}
    return
}

This allows the write quorum call to send a nil write request and reply a nil as the write response with error message nil. (the quorum function can receive enough replies to obtain a quorum)

So, in my quorum functions, I currently wrote some code to handle the nil replies, otherwise, it can cause panic, since there's no value and timestamp to get if replies are nils.

@s111
Copy link
Contributor

s111 commented Sep 24, 2017

I'm not sure I understand the issue here.
gRPC doesn't allow passing nil, as the marshalling step will panic (as it should).
It seems that we now protect against passing nil values on all methods generated by Gorums ( #12).

I think that passing nil values should panic, as this is the developer's mistake.
There is however one special case where I think we should handle nil values: when a per-node-arg function returns nil, we should drop that request and reduce the expected number of replies for that call. This would allow us to do some cool things with the per-node-arg functions.
What do you think @meling?

@wruiwr
Copy link
Author

wruiwr commented Sep 24, 2017

The quorum call actually can invoke gRPC calls with nil message and the quorum functions can also receive enough replies (also nil) to obtain a quorum.

So, the returned reply for this quorum call in this case is nil, however if the server failure happens, then no quorum is obtain, the returned reply is also nil. Therefore, I think in my tests, this two cases return the same values (both nils) (one is not an error (the err is nil) according to Hein, another is caused by the server failure), so I can not check the difference.

@wruiwr
Copy link
Author

wruiwr commented Sep 24, 2017

I think Hein has a solution about this issue. ;)

s111 added a commit that referenced this issue Sep 24, 2017
TestPerNodeArg is left failing to demonstrate the issue.
See #35.
@s111
Copy link
Contributor

s111 commented Sep 24, 2017

A quorum call works with nil values as we circumvent the actual gRPC call:

if arg == nil {
	       // send a nil reply to the for-select-loop
	      replyChan <- internalWriteResponse{node.id, nil, nil}
	      return
}

An actual gRPC call would panic with:

panic: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil

@meling: If you look at config_qc_test.go#L928, a request is ignored but the nil values is still added to the reply slice, i.e., we are allowing a write request which needs a quorum of at least 3 with only 2 actual responses. The call succeeds as len(replies) is > 2 (as the nil values is counted).

See PR #36 for a potential fix.

s111 added a commit that referenced this issue Sep 24, 2017
TestPerNodeArg is left failing to demonstrate the issue.
See #35.
s111 added a commit that referenced this issue Sep 24, 2017
TestPerNodeArg is left failing to demonstrate the issue.
See #35.
meling pushed a commit that referenced this issue Sep 25, 2017
TestPerNodeArg is left failing to demonstrate the issue.
See #35.
@meling
Copy link
Member

meling commented Sep 25, 2017

It seems my quick and dirty initial solution was a bit too simplistic; I was hoping to avoid the extra noise around the {{if .PerNodeArg}} conditional in the template. I've made some more changes to the PR, but there is one outstanding issue that needs to be discussed. It turns out that providing a nil argument does not panic in all cases. Take a look at the test in config_qc_test.go#L140. This does not panic when its argument is nil, whereas a similar test with the WritePerNode method does panic; see config_qc_test.go#L956. This is presumably because the Read method takes a ReadRequest, which is actually never used on the server-side. So it seems that it is allowed by gRPC to invoke with a nil argument, e.g., when the server side does not need anything from the client. Will think some more about this. Thoughts anyone?

@meling
Copy link
Member

meling commented Sep 25, 2017

Added another commit to the fix-nil-in-replies branch. Turns out that it is not easy to check for panic from the Write call since the panic actually happens in another goroutine separate from the calling goroutine running the tests, and so the standard recover() call does not work. One solution could be to check for nil argument to the quorum call and panic immediately, but then also the Read method will fail, even though it works now.

@s111
Copy link
Contributor

s111 commented Sep 27, 2017

The routine that caused the panic needs to invoke recover, or we might never learn about the panic (expect if we sync with the routine at some point).
Maybe we can pass an err chan error to the function invoked as a go routine, that must be read before checking the actual response value?

@meling
Copy link
Member

meling commented Sep 27, 2017

My initial thoughts on this is that it seems to be a bit much hassle for handling nil which nobody really should be passing anyway, even though it works for the Read call. I'm continuing to ponder this.

@s111
Copy link
Contributor

s111 commented Sep 27, 2017

Yes, that's what I'm thinking as well, we should not try to code ourselves around developer error, let it panic. Though, we should recover a crashing go routine and communicate that back to the calling quorum call, so that the panic is actually caught. I.e., the panic needs to be invoked on the main thread so that the program actually panics.

@meling
Copy link
Member

meling commented Sep 28, 2017

Don't you think it is a bit of an overkill to complicate things by recovering individual goroutines in order to pass that on to the main goroutine. Not sure if that helps debug the actual problem, especially if the panic is caused by another issue that we haven't thought about yet. I think I prefer to let it panic as it does now. Though we cannot write a test for nil and recover from it.

meling pushed a commit that referenced this issue Sep 28, 2017
TestPerNodeArg is left failing to demonstrate the issue.
See #35.
@meling meling moved this from To do to In progress in Gorums refactoring with new plugin architecture Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants