Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspect select in transport/control.go *quotaPool:get() #1857

Closed
mgsouth opened this issue Feb 10, 2018 · 3 comments
Closed

Suspect select in transport/control.go *quotaPool:get() #1857

mgsouth opened this issue Feb 10, 2018 · 3 comments

Comments

@mgsouth
Copy link

mgsouth commented Feb 10, 2018

Please answer these questions before submitting your issue.

What version of gRPC are you using?

master 2018-02-08 commit d50734d

What version of Go are you using (go version)?

n/a

What operating system (Linux, Windows, …) and version?

n/a

What did you do?

If possible, provide a recipe for reproducing the error.
Code inspection

What did you expect to see?

Deterministic select

What did you see instead?

Appears to be non-deterministic select operation

func (qb *quotaPool) get(v int, wc waiters) (int, uint32, error) {
...
	for {
		select {
		case <-wc.ctx.Done():
			return 0, 0, ContextErr(wc.ctx.Err())
		case <-wc.tctx.Done():
			return 0, 0, ErrConnClosing
		case <-wc.done:
			return 0, 0, io.EOF
		case <-wc.goAway:
			return 0, 0, errStreamDrain
		case <-qb.c:
			qb.mu.Lock()
...
			qb.mu.Unlock()
		}
	}
}

Unless it's guaranteed that only one of these conditions can be true, then the return value is non-deterministic. Particularly troubling is the case if there is something available in qb.c.

Suspect that this is what is desired:

	for {
		select {
		case <-wc.ctx.Done():
			return 0, 0, ContextErr(wc.ctx.Err())
                default:
                    select {
		    case <-wc.tctx.Done():
			    return 0, 0, ErrConnClosing
                    default:
                        select {
		        case <-wc.done:
			    return 0, 0, io.EOF
                        default:
                            select {
		            case <-wc.goAway:
			        return 0, 0, errStreamDrain
                            default:
                                select {
		                case <-qb.c:
			            qb.mu.Lock()
...
			            qb.mu.Unlock()
                                default: // This default turns the for {} into a spin loop
                                }
                            }
                        }
                    }
		}
	}

although then there's a spin issue. Perhaps re-factor qb.c as a combined "block getters" semaphore, put something on if any of wc.ct.Done()x, wc.tctx.Done(), wc.done, wc.goAwqay, or current qb.c semaphore cases are true.

@dfawley
Copy link
Member

dfawley commented Feb 12, 2018

Do you believe this will lead to problems? Can you explain?

Selects are inherently non-deterministic, as races between the incoming signals will determine the result.

@mgsouth
Copy link
Author

mgsouth commented Feb 14, 2018

Selects are inherently non-deterministic, as races between the incoming signals will determine the result.

True, but the (reasonable) expectation which the current code invalidates is:

For any particular goroutine, or set of goroutines properly synchronized:

  1. Once get() returns wc.ctx.Err() it always returns wc.ctx.Err()
  2. Once get() returns ErrConnClosing it always returns ErrConnClosing, or the above
  3. Once get() returns io.EOF it always returns io.EOF, or one of the above
  4. Once get() returns errStreamDrain it always returns errStreamDrain, or one of the above
  5. Otherwise it always returns a non-zero quota, or one of the above

There are many selects which behave in this fashion. Does that cause any particular problems? Don't know–that would require a careful inspection of all of the code. Even if it currently doesn't, it seems to be brittle, and every code change would require the same inspection.

@dfawley
Copy link
Member

dfawley commented Feb 14, 2018

I see what you're saying. We generally have protections in place before or after our selects in case one term should have priority. For instance, code at the top of Write() in http2Client and http2Server checks the stream/transport contexts already. (And interestingly enough, the exact error returned no longer matters with #1854.) We also have error latching in place for errors in the read path.

We do have some cleanups planned here to greatly simplify most of our selects (boiling them down to a single context check and the condition they're waiting on).

If you can identify real problems that are caused by the code, please don't hesitate to file issues, but I don't think there's anything actionable here at this time.

@dfawley dfawley closed this as completed Feb 14, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants