Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway subscribes to Validation notifications from Gloo #1820

Merged
merged 32 commits into from Dec 12, 2019

Conversation

ilackarms
Copy link
Member

@ilackarms ilackarms commented Dec 4, 2019

Context

The Gateway needs to know when Gloo resyncs its own storage in order to retry validation of gateway resources (GW, VS, RT). See issue #1815 for a description of the issue.

Solution

This PR adds a NotifyOnResync method to Gloo's Validation grpc service. The gateway subscribes to this endpoint at boot (if validation is enabled) and wires it into its ApiEmitter, forcing the gateway translation loop to resync when a notification is received.

This allows the Gateway to receive notifications when a Gloo resource changes (upstream, secret, configmap, proxy) without having to explicitly watch the resource itself.

Note that gloo notification server does not notify on EDS changes as they are not considered in validating proxy configuration.

BOT NOTES:
resolves #1812

@solo-changelog-bot
Copy link

Issues linked to changelog:
#1812

@ilackarms ilackarms added the work in progress signals bulldozer to keep pr open (don't auto-merge) label Dec 4, 2019
@ilackarms
Copy link
Member Author

/kick

Step #11 - "test":   Timed out after 5.000s.
Step #11 - "test":   Expected
Step #11 - "test":       <*errors.errorString | 0xc0028c9d80>: {
Step #11 - "test":           s: "404 is not TooManyRequests",
Step #11 - "test":       }
Step #11 - "test":   to be nil
Step #11 - "test": 

@ilackarms ilackarms removed the work in progress signals bulldozer to keep pr open (don't auto-merge) label Dec 5, 2019
Comment on lines 178 to 179
time.Sleep(time.Second / 2)
Expect(getNotifications()).To(HaveLen(3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an Eventually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly. it should be a Consistently (because we want to make sure they value does not go over 3 once the ctx is cancelled)

)

func MakeNotificationChannel(ctx context.Context, stream validation.ProxyValidationService_NotifyOnResyncClient) <-chan struct{} {
notifications := make(chan struct{}, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a 10 buffered channel here. I get the need to make up for lost updates, but 10 seems like far too many. 1 or 2 would probably accomplish the same thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this was a typo or copypaste error. i intended to have 1 notification (there's never a need to have more than one resync queued)

Comment on lines 26 to 33
message NotificationRequest {

}

message NotificationResponse {

}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: NotifyOnResyncRequest/ NotifyOnResyncResponse.
In general with our grpc services we try to name the request and response after the rpc in this way. Since there is more than one now, I think if possible we should also rename the ValidateProxy request and response object as well. As more and more rpcs are added it makes managing them much easier.

Comment on lines 169 to 171
s.lock.Lock()
validator := s.validator
s.lock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a locking get validator function?
Also, this can be a RWMutex

Comment on lines 88 to 90
if err := stream.Send(&validation.NotificationResponse{}); err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause an additional, potentially unecessary resync every time gloo starts up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - this is part of the Notification "handshake". look where the gateway processes notifications, it waits to receive the first notification before starting the event loop. gateway will block on gloo starting up. currently, if gloo goes down/restarts without restarting the gateway, notifications will no longer be received by gateway.

i can add logic to make the gateway retry the connection if the notification stream gets cut, but i feel that's work for a follow-up if/when we need it (this PR is already pretty complicated).

relevant code lives in MakeNotificationChannel:

https://github.com/solo-io/gloo/pull/1820/files#diff-4aa11983835012a15b5c74cdff446732R11

Comment on lines +58 to +59
// only call within a lock
// notify all receivers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than annotate this way, can you just lock the function itself with a lock, defer unlock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function needs to be called by another function that acquires a lock. locking here would cause a deadlock

Comment on lines +34 to +35
// only call within a lock
// should we notify on this snap update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the lock comment, should just lock the function itself

Comment on lines +61 to +70
for _, receiver := range s.notifyResync {
receiver := receiver
go func() {
select {
// only write to channel if it's empty
case receiver <- struct{}{}:
default:
}
}()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it ever be necessary to send more than one notification from this function? These go functions can potentially back up and exist for a while if the channel fills up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the receiver only has a size of 1, so we never buffer more than 1 notification

@ilackarms
Copy link
Member Author

/kick rate limit flake

Copy link
Member

@EItanya EItanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@EItanya
Copy link
Member

EItanya commented Dec 12, 2019

/kick

@soloio-bulldozer soloio-bulldozer bot merged commit a3f9641 into feature-rc1 Dec 12, 2019
@soloio-bulldozer soloio-bulldozer bot deleted the validation-notifications branch December 12, 2019 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants