New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway subscribes to Validation notifications from Gloo #1820
Conversation
Issues linked to changelog: |
/kick
|
time.Sleep(time.Second / 2) | ||
Expect(getNotifications()).To(HaveLen(3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be an Eventually
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly. it should be a Consistently
(because we want to make sure they value does not go over 3 once the ctx is cancelled)
) | ||
|
||
func MakeNotificationChannel(ctx context.Context, stream validation.ProxyValidationService_NotifyOnResyncClient) <-chan struct{} { | ||
notifications := make(chan struct{}, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a 10 buffered channel here. I get the need to make up for lost updates, but 10 seems like far too many. 1 or 2 would probably accomplish the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this was a typo or copypaste error. i intended to have 1 notification (there's never a need to have more than one resync queued)
message NotificationRequest { | ||
|
||
} | ||
|
||
message NotificationResponse { | ||
|
||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: NotifyOnResyncRequest/ NotifyOnResyncResponse.
In general with our grpc services we try to name the request and response after the rpc in this way. Since there is more than one now, I think if possible we should also rename the ValidateProxy request and response object as well. As more and more rpcs are added it makes managing them much easier.
s.lock.Lock() | ||
validator := s.validator | ||
s.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a locking get validator function?
Also, this can be a RWMutex
if err := stream.Send(&validation.NotificationResponse{}); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this cause an additional, potentially unecessary resync every time gloo starts up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no - this is part of the Notification "handshake". look where the gateway processes notifications, it waits to receive the first notification before starting the event loop. gateway will block on gloo starting up. currently, if gloo goes down/restarts without restarting the gateway, notifications will no longer be received by gateway.
i can add logic to make the gateway retry the connection if the notification stream gets cut, but i feel that's work for a follow-up if/when we need it (this PR is already pretty complicated).
relevant code lives in MakeNotificationChannel
:
https://github.com/solo-io/gloo/pull/1820/files#diff-4aa11983835012a15b5c74cdff446732R11
// only call within a lock | ||
// notify all receivers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than annotate this way, can you just lock the function itself with a lock, defer unlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function needs to be called by another function that acquires a lock. locking here would cause a deadlock
// only call within a lock | ||
// should we notify on this snap update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about the lock comment, should just lock the function itself
for _, receiver := range s.notifyResync { | ||
receiver := receiver | ||
go func() { | ||
select { | ||
// only write to channel if it's empty | ||
case receiver <- struct{}{}: | ||
default: | ||
} | ||
}() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it ever be necessary to send more than one notification from this function? These go functions can potentially back up and exist for a while if the channel fills up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the receiver only has a size of 1, so we never buffer more than 1 notification
/kick rate limit flake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/kick |
Context
The Gateway needs to know when Gloo resyncs its own storage in order to retry validation of gateway resources (GW, VS, RT). See issue #1815 for a description of the issue.
Solution
This PR adds a
NotifyOnResync
method to Gloo's Validation grpc service. The gateway subscribes to this endpoint at boot (if validation is enabled) and wires it into itsApiEmitter
, forcing the gateway translation loop to resync when a notification is received.This allows the Gateway to receive notifications when a Gloo resource changes (upstream, secret, configmap, proxy) without having to explicitly watch the resource itself.
Note that gloo notification server does not notify on
EDS
changes as they are not considered in validating proxy configuration.BOT NOTES:
resolves #1812