Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark local offloaded pods as NotReady when virtual-node is not ready (i.e. remote cluster failure) #1853

Merged
merged 1 commit into from
Jun 9, 2023

Conversation

fra98
Copy link
Member

@fra98 fra98 commented Jun 5, 2023

Description

This PR changes the virtual-kubelet logic to mark local offloaded pods as NotReady when the remote cluster has a failure (i.e., the liqo virtual node has the Ready condition set to False or Unknown).
In summary, it sets the Ready pod condition to False on all pods scheduled on a NotReady virtual node.

Implementation

  • A new controller in the liqo-controller-manager enforces the presence/absence of a custom label on
    local offloaded pods. The presence of the label indicates that the status of the offloaded pod can be managed/updated by the local cluster and can't currently be reflected/updated from the remote cluster (likely due to a remote cluster failure).
  • The virtual-kubelet makes sure that the Ready conditions of the local offloaded pods are set to False if the label is present.

Motivation

We mark pods as NotReady because the Kubernetes node controller skips this update (likely due to a bug). This is necessary to prevent services redirecting traffic to not ready pods (since the endpointslice controller keeps in sync the endpointslice Ready condition with the associated pod Ready condition).

How Has This Been Tested?

  • Locally
  • Unit tests
  • e2e tests

@adamjensenbot
Copy link
Collaborator

Hi @fra98. Thanks for your PR!

I am @adamjensenbot.
You can interact with me issuing a slash command in the first line of a comment.
Currently, I understand the following commands:

  • /rebase: Rebase this PR onto the master branch (You can add the option test=true to launch the tests
    when the rebase operation is completed)
  • /merge: Merge this PR into the master branch
  • /build Build Liqo components
  • /test Launch the E2E and Unit tests
  • /hold, /unhold Add/remove the hold label to prevent merging with /merge

Make sure this PR appears in the liqo changelog, adding one of the following labels:

  • kind/breaking: 💥 Breaking Change
  • kind/feature: 🚀 New Feature
  • kind/bug: 🐛 Bug Fix
  • kind/cleanup: 🧹 Code Refactoring
  • kind/docs: 📝 Documentation

@fra98 fra98 marked this pull request as ready for review June 8, 2023 10:08
@fra98 fra98 requested review from aleoli and cheina97 June 8, 2023 10:09
@fra98 fra98 added kind/bug Something isn't working hold Prevent bot merging labels Jun 8, 2023
@fra98
Copy link
Member Author

fra98 commented Jun 8, 2023

/test

@fra98 fra98 changed the title Update local pods status in case of remote cluster failure Mark local offloaded pods as NotReady when virtual-node is not ready (e.g., remote cluster failure) Jun 8, 2023
@fra98 fra98 changed the title Mark local offloaded pods as NotReady when virtual-node is not ready (e.g., remote cluster failure) Mark local offloaded pods as NotReady when virtual-node is not ready (i.e. remote cluster failure) Jun 8, 2023
@fra98 fra98 force-pushed the frt/local-pod-status branch 2 times, most recently from c9cf803 to 7b5f585 Compare June 8, 2023 20:55
@fra98 fra98 requested a review from aleoli June 8, 2023 21:00
@fra98 fra98 removed the hold Prevent bot merging label Jun 9, 2023
@fra98
Copy link
Member Author

fra98 commented Jun 9, 2023

/rebase test=true

@fra98
Copy link
Member Author

fra98 commented Jun 9, 2023

/merge

@adamjensenbot adamjensenbot added the merge-requested Request bot merging (automatically managed) label Jun 9, 2023
@adamjensenbot adamjensenbot merged commit 6030220 into liqotech:master Jun 9, 2023
13 checks passed
@adamjensenbot adamjensenbot removed the merge-requested Request bot merging (automatically managed) label Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants