Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orca: create ORCA producer for LB policies to use to receive OOB load reports #5669

Merged
merged 9 commits into from Nov 3, 2022

Conversation

dfawley
Copy link
Contributor

@dfawley dfawley commented Sep 23, 2022

RELEASE NOTES:

  • balancer: add experimental Producer types and methods and * orca: provide a way for LB policies to receive OOB load reports

@dfawley dfawley added this to the 1.50 Release milestone Sep 23, 2022
@dfawley dfawley force-pushed the orca_producer branch 3 times, most recently from 1b369f2 to 1cf3c60 Compare September 27, 2022 21:16
@dfawley dfawley modified the milestones: 1.50 Release, 1.51 Release Oct 4, 2022
@@ -20,8 +20,15 @@
// avoid polluting the godoc of the top-level orca package.
package internal

import ibackoff "google.golang.org/grpc/internal/backoff"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this import to be renamed?

Copy link
Contributor Author

@dfawley dfawley Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't; I just like to keep things consistent, and we have a backoff package and an internal/backoff package, and it's confusing if we reference both using the same name in different files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package names should ideally be self contained and self explanatory. If we are always going to rename this import, we should ideally rename the package itself.

balancer/balancer.go Outdated Show resolved Hide resolved
// A Producer is a type shared among potentially many consumers. It is
// associated with a SubConn, and an implementation will typically contain
// other methods to provide additional functionality, e.g. configuration or
// subscription registration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description feels very vague to me, in the sense that I can't get a good idea of when to use a Producer, how to use one etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you suggest something better after our discussion today?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. The first sentence is fine, but I don't get much technical understanding from: "and an implementation will typically contain other methods to provide additional functionality, e.g. configuration or subscription registration." Additional functionality with respect to what? There's no functionality defined before this. Do we expect this to ever be implemented by another concrete type? This comment is very specific to type producer in orca/producer. If so, why have this interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional functionality with respect to what? There's no functionality defined before this.

In addition to the builder that created it, I suppose, and the close function the builder returns.

Do we expect this to ever be implemented by another concrete type?

Yes, definitely. This is intended to be generic and able to be implemented by anything. The other example we already have is the health checks, which are implemented in the other languages as a generic wrapper, but hard-coded into gRPC in Go. I'd like to move to a similar model there by using this mechanism.

This comment is very specific to type producer in orca/producer.

Not at all. If it was, it would talk about ORCA configuration. ORCA should not be part of grpc-go directly, but instead should only be a plugin which uses this.

balancer/balancer.go Show resolved Hide resolved
balancer/balancer.go Outdated Show resolved Hide resolved
@easwars easwars assigned dfawley and unassigned easwars Oct 5, 2022
orca/producer.go Outdated Show resolved Hide resolved
balancer_conn_wrappers.go Outdated Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
orca/producer.go Outdated Show resolved Hide resolved
orca/producer.go Outdated Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
balancer_conn_wrappers.go Outdated Show resolved Hide resolved
@dfawley dfawley assigned easwars and unassigned dfawley Oct 6, 2022
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have missed pushing a commit. I have more comments which I will add soon.

balancer/balancer.go Show resolved Hide resolved
balancer/balancer.go Outdated Show resolved Hide resolved
@easwars easwars assigned dfawley and unassigned easwars Oct 6, 2022
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool PR bossman. Some comments.

// A Producer is a type shared among potentially many consumers. It is
// associated with a SubConn, and an implementation will typically contain
// other methods to provide additional functionality, e.g. configuration or
// subscription registration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. The first sentence is fine, but I don't get much technical understanding from: "and an implementation will typically contain other methods to provide additional functionality, e.g. configuration or subscription registration." Additional functionality with respect to what? There's no functionality defined before this. Do we expect this to ever be implemented by another concrete type? This comment is very specific to type producer in orca/producer. If so, why have this interface?

balancer/balancer.go Outdated Show resolved Hide resolved
balancer_conn_wrappers.go Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
balancer_conn_wrappers.go Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
orca/producer.go Outdated Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
@zasweq zasweq removed their assignment Oct 13, 2022
Copy link
Contributor Author

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated; sorry for the force push; I messed up the synchronization between my laptop & workstation.

// A Producer is a type shared among potentially many consumers. It is
// associated with a SubConn, and an implementation will typically contain
// other methods to provide additional functionality, e.g. configuration or
// subscription registration.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional functionality with respect to what? There's no functionality defined before this.

In addition to the builder that created it, I suppose, and the close function the builder returns.

Do we expect this to ever be implemented by another concrete type?

Yes, definitely. This is intended to be generic and able to be implemented by anything. The other example we already have is the health checks, which are implemented in the other languages as a generic wrapper, but hard-coded into gRPC in Go. I'd like to move to a similar model there by using this mechanism.

This comment is very specific to type producer in orca/producer.

Not at all. If it was, it would talk about ORCA configuration. ORCA should not be part of grpc-go directly, but instead should only be a plugin which uses this.

balancer_conn_wrappers.go Show resolved Hide resolved
balancer_conn_wrappers.go Outdated Show resolved Hide resolved
balancer_conn_wrappers.go Show resolved Hide resolved
internal/testutils/balancer.go Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
orca/producer.go Outdated Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
orca/producer.go Show resolved Hide resolved
@@ -197,7 +200,7 @@ func (p *producer) runStream(ctx context.Context) (resetBackoff bool, err error)
ReportInterval: durationpb.New(interval),
})
if err != nil {
return resetBackoff, err
return false, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You liked this rather than using default value of bools more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed due to an earlier comment from @easwars. I said in my reply there I didn't feel strongly either way. I still don't.

// to avoid overloading a server experiencing problems. The attempt count
// is incremented when stream errors occur and is reset when the stream
// reports a result.
Copy link
Contributor

@zasweq zasweq Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the last part of this comment is correct/like this comment. First of all, reset has two semantical meanings for this field, one for the timer, one for the attempt count, which gets plumbed into Exponential Backoff to determine what the backoff timer gets reset to I'm assuming your reset refers to attempt count. Also, there is no linkage or even local variable named request count. I see backoffAttempt, which isn't a field here, but the int in this function variable. The attempt count is not reset when the stream reports a result. Right now, the attempt count is reset on either a. the minInterval has changed after processing the receive from the stream compared to what was the stream was configured with. b. If the second+ stream.Recv() returns an error, this also feels wrong, I will leave a comment there.

return resetBackoff, err
}
resetBackoff = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? I don't get why it matters whether the error comes from the second+ stream.Recv() or the first one with respect to resetting backoff. Shouldn't it be io.EOF the conditional for a successful stream where we can infer the server is working properly?

orca/producer.go Outdated
// Log all other errors.
logger.Error("Received unexpected stream error:", err, ctx.Err())
Copy link
Contributor

@zasweq zasweq Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to my other comment about resetBackoff return. io.EOF? Isn't the only way to return out of runStream() without a nil error if interval != p.minInterval()? Otherwise would always return and cause log spam on every completed stream...? Also this formatting is off. Add a space in between error: and err and add something explaining ctx.Err()? Right now there are two errors being logged. Is ctx even guaranteed to have an error in this case? If ctx.Err() had a value, wouldn't the case in line 170 be hit and it wouldn't hit this default?

@dfawley
Copy link
Contributor Author

dfawley commented Oct 25, 2022

Ping @easwars @zasweq this should be ready to review again.

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Deep down though, I would have still preferred if we could have added the methods required to make RPCs on the SubChannel interface. I feel that would have been a more intuitive change.

Like, currently, the RegisterOOBListener function in the orca package calls GetOrBuildProducer on the SubConn and passes a builder defined locally. This seems quite roundabout. But, if we want the orca package to handle everything locally, then we would need the functionality to make RPCs on the subConns, and also the functionality to uniquely identify subConns based on their channelz ID. If we don't think adding this functionality to the subConn interface is a good long term option, I agree with the approach taken in this PR.

orca/producer.go Show resolved Hide resolved
@easwars easwars assigned dfawley and unassigned easwars Oct 27, 2022
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All testing nits.

orca/producer_test.go Outdated Show resolved Hide resolved
orca/producer_test.go Outdated Show resolved Hide resolved
orca/producer_test.go Show resolved Hide resolved
orca/producer_test.go Outdated Show resolved Hide resolved
orca/producer_test.go Outdated Show resolved Hide resolved
orca/producer_test.go Show resolved Hide resolved
orca/producer_test.go Outdated Show resolved Hide resolved
// Stop listener 3. This makes the interval longer, with stream recreation
// delayed until the next report is received. Reports only go to listener
// 1 now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reports should only go to listener 1 now. Also explain why the interval is now longer, and perhaps tie that idea of a longer interval to stream recreation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping about stream recreation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a very high level understanding of the feature (ORCA OOB reporting) plus this comment as-is is sufficient to understand what's happening here. I could make a longer comment to make things painfully obvious, but sometimes less is more.

orca/producer_test.go Show resolved Hide resolved
orca/producer_test.go Outdated Show resolved Hide resolved
@zasweq zasweq removed their assignment Nov 2, 2022
orca/producer_test.go Outdated Show resolved Hide resolved
orca/producer_test.go Show resolved Hide resolved
orca/producer_test.go Outdated Show resolved Hide resolved

lisOpts := orca.OOBListenerOptions{ReportInterval: 50 * time.Millisecond}
lao := &listenerAndOptions{listener: oobLis, opts: lisOpts}
r.InitialState(resolver.State{Addresses: []resolver.Address{setListenerAndOptions(resolver.Address{Addr: lis.Addr().String()}, lao)}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this split up still.

@@ -255,10 +258,14 @@ func (s) TestProducerBackoff(t *testing.T) {

// Provide a convenient way to expect backoff calls and return a minimal
// value.
expectedBackoff := -1 // -1 indicates any value is allowed.
const backoffShouldNotBeCalled = 9999 // Use to assert backoff function is not called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get that explanation, but it's probably just me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explain in comments why we don't want the backoff function to be called when backoffShouldNotBeCalled is assigned to expectedBackoff below. (It's because the last stream was successful.)


// Register our fake ORCA service
s := grpc.NewServer()
fake := newFakeORCAService()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fos?

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

// Stop listener 2. This does not affect the interval. The next update
// goes to listeners 1 and 3.
// Stop listener 2. This does not affect the interval as listener 3 is
// still the shortest. The next update goes to listeners 1 and 3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and still has the shortest interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This level of review isn't important. The meaning of this comment is very clear, and it's an internal comment in a test, not in external documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't hurt though. But I see what you're saying

orca/producer_test.go Outdated Show resolved Hide resolved
@dfawley dfawley assigned zasweq and unassigned dfawley Nov 2, 2022
zasweq
zasweq approved these changes Nov 3, 2022
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM outside my final responses, but again don't feel strongly about any of my nits.

@zasweq zasweq assigned dfawley and unassigned zasweq Nov 3, 2022
@zasweq
Copy link
Contributor

zasweq commented Nov 3, 2022

You can go ahead and merge btw.

@dfawley dfawley merged commit e41e894 into grpc:master Nov 3, 2022
11 checks passed
@dfawley dfawley deleted the orca_producer branch November 3, 2022 17:27
jronak pushed a commit to jronak/grpc-go that referenced this pull request Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants