New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(kuma-cp): make store changes processing more reliable #6728
Conversation
Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
Isn't the biggest change here that we no longer block on event sending? I think we need to make sure that all components have a timeout for polling the state of whatever they're listening for since event delivery isn't guaranteed. |
Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
Yes, we are not blocking anymore on sending. You are right this might be the biggest change because thanks to part of not blocking we would fix but we would try to send it to not existing subscribers. I will change the name of the PR. If it's going about timeout I feel like we could add but not sure if that is critical now. I can create a task to fix it. WDYT? |
Hmm, I think @michaelbeaumont is right. Now if the client is busy and its channel is full then eventBus is going to drop the event. Maybe we should go only with |
…eration Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
I've added timeout and the configuration. I am not sure how long it should the to process an event. I've set it to 2 seconds but we can change it. |
IMO the timeout on send blocking isn't a great solution. There should instead be, in each listening component, a timeout after which everything is reconciled. Is that possible? Otherwise, can we maybe just properly unsubscribe as a solution? The blocking is still a problem, but maybe it should be solved properly separately, if just unsubscribing is enough. |
I think for now we should be fine with unsubscribing. Also, each channel has a queue of 10 elements so maybe it's not required and we can queue the requests. |
Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what to call the PR but it's out of date now.
Maybe title it with whatever the user visible change is?
"fix(kuma-cp): make zone syncing more reliable" or something?
Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com>
…#6728) (#6765) fix(kuma-cp): make store changes processing more reliable (#6728) When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com> Co-authored-by: Łukasz Dziedziak <lukidzi@gmail.com>
…#6728) (#6767) * fix(kuma-cp): make store changes processing more reliable (#6728) When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com> Co-authored-by: Łukasz Dziedziak <lukidzi@gmail.com>
…#6728) (#6763) * fix(kuma-cp): make store changes processing more reliable (#6728) When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com> Co-authored-by: Łukasz Dziedziak <lukidzi@gmail.com>
…#6728) (#6764) * fix(kuma-cp): make store changes processing more reliable (#6728) When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com> Co-authored-by: Łukasz Dziedziak <lukidzi@gmail.com>
…#6728) (#6766) * fix(kuma-cp): make store changes processing more reliable (#6728) When running deployment with postgres/etcd we are using EventBus which is responsible for sending database events to mesh-insight-resyncer component. In case of a problem with the connection to the database, mesh-insight-resyncer component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That caused the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing it, the component was hanging and events were not propagated to other subscribers. Signed-off-by: Lukasz Dziedziak <lukidzi@gmail.com> Co-authored-by: Łukasz Dziedziak <lukidzi@gmail.com>
Problem
When running deployment with Postgres we are using EventBus which is responsible for sending database events to
mesh-insight-resyncer
component. In case of a problem with the connection to the database,mesh-insight-resyncer
component is closed and restarted by ResilientComponent. On each restart, we are subscribing to the event bus, but we are not removing the old subscription. That cause the issue in which we were sending events to the subscriber that didn't exist at that time. Because we are sending it and no one is listing on it, the component was hanging and events were not propagated to other subscribers.Solution
When subscribing to the EventBus we are generating an UUID that is a key for a subscription. Before components shut down, we are calling
defer
to unsubscribe from EventBus.Changes:
Error()
method to a channel of errors and added reaction for Eventssyscall.Mkfifo
have equivalent implementation on the other OS --UPGRADE.md
? --> Changelog:
entry here or add aci/
label to run fewer/more tests?