-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registry k8s can not correctly handle unregister when its scaled #456
Comments
Hi, we need to run multiple registry instances mostly to avoid API throttling which seemed to pop-up rather early when increasing the number of NSEs. And it also serves redundancy (e.g. in case of node failure). |
@zolug Hi, we can tune kubernetes go-client to handle more requests. Would it solve your problem if we tune parameters of the client or add some ENVs so you can tune them yourself? About redundancy: |
Hi @NikitaSkrynnik, However, it seems there's still a delay introduced by node-monitor-grace-period to mark a node "unresponsive". Now, I don't know if we could have requirements related to this setting, but with a single registry instance, loosing the node hosting the only registry would result in registry service unavailability at least for node-monitor-grace-period. On the other hand, in case of multiple replicas IMHO the service hopefully would be able to serve part of the requests successfully for the first attempt, while for others new NSE dials might result in picking working endpoints after a couple re-tries. So, probably a short node-monitor-grace-period could invalidate the need for replicas. But correct me if I'm wrong. |
ProblemThere is one main problem with scaling registry - If we have several registries NSE can Register and Unregister in any of them. Corner CasesNSE has registered in If we rework This system of versions is necessary for cases when Registry gets Unregister event from OptionsOption 1 - Master-RegistryCreate a new component Changes
Option 2 - Store connections on registry-clientsWe can store connection to a registry on registry-clients and use it to always connect to the same registry on all requests. Changes
|
Hi @NikitaSkrynnik, I think in general there's a problem in case the registry with the most recent resource ID is lost (e.g. due to scaling), and the NSE unregiters without sending a "new" register. I had an idea to distinguish between local timeout based Unregister() and Unregister() received from an other NSM component in k8s-registry, and only block the first. (There should be a single NSMgr handling an NSE, or the msgs would be directly coming from an NSE e.g., from a remote vlan nse.) |
@zolug Hi, About the second option: We are not sure about distiguishing between Unregister send from NSE and expire chain elements. We've discussed this option internally and decided to not use it because there is no simple way to do that. |
Do you also intend to get rid of the resource version check in the registry? Or how would a registry remove the custom resource if it had no or outdated version info? It might also require removal of the 'expire' element on the registry to avoid premature deletion in case of some event leading to registry re-select. |
@zolug Hello,
No, we don't need to get rid of the resourse version checking. We've found a simple way to distiguish between Unregisters as you suggested above. We will let you know when we have an image for testing. |
Seems like done. @zolug , @NikitaSkrynnik Thanks! |
This problem has been reported on WG call.
Scenario
Steps:
1.Deploy NSM
2. Scale registry-k8s to 4
3.Deploy NSE
4. Try to unregister NSE
Actual: NSE can not be unregistered
Expected: NSE can be unregistered
The text was updated successfully, but these errors were encountered: