New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vm: support health checks for VMs that are not using auto-registration #44712
vm: support health checks for VMs that are not using auto-registration #44712
Conversation
Skipping CI for Draft Pull Request. |
Signed-off-by: Yaroslav Skopets <yaroslav@tetrate.io>
2b59607
to
a189378
Compare
Signed-off-by: Yaroslav Skopets <yaroslav@tetrate.io>
/test all |
/test unit-tests-arm64 |
if features.WorkloadEntryAutoRegistration && proxy.Metadata.AutoRegisterGroup != "" { | ||
entryName = autoregisteredWorkloadEntryName(proxy) | ||
autoCreate = true | ||
} else if features.WorkloadEntryHealthChecks && proxy.Metadata.WorkloadEntry != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where Metadata.WorkloadEntry is first set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of proxy.Metadata.WorkloadEntry
comes from the ISTIO_META_WORKLOAD_ENTRY
environment variable that a user can set for an Istio Proxy (SystemD Unit) on a VM.
proxy.Metadata.WorkloadEntry
is meant to be symmetrical to proxy.Metadata.AutoRegisterGroup
:
- if a user wants auto-registration behaviour, he can
- run
istioctl x workload entry --autoregister
command, which will generate amesh.yaml
file with theISTIO_META_AUTO_REGISTER_GROUP
value set
- run
- if a user does not want auto-registration behaviour, he can
- run
istioctl x workload entry
command (notice absence of--autoregister
flag) - create a
WorkloadEntry
resource explicitly - set environment variable
ISTIO_META_WORKLOAD_ENTRY
for the Istio Proxy (SystemD Unit) on a VM
- run
E.g., as a power user of Istio, we see a value in ability to create WorkloadEntry
through external automation.
This way we can get full control over the content of WorkloadEntry
, including:
WorkloadEntry
nameWorkloadEntry
labelsWorkloadEntry
localityWorkloadEntry
address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of proxy.Metadata.WorkloadEntry comes from the ISTIO_META_WORKLOAD_ENTRY environment variable that a user can set for an Istio Proxy (SystemD Unit) on a VM.
Is it a must for healthcheck, right
I can understand the need to do healthcheck for WLE, but could not for setting the above Spec for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ISTIO_META_WORKLOAD_ENTRY
is used for the purposes of health checks.
When Istio Proxy (running on a VM) connects to istiod
and starts sending health status updates, we need to infer a WorkloadEntry
to save the health updates to.
In the case of auto-registration, WorkloadEntry
name is generated based on metadata of the Istio Proxy.
In the case of externally created WorkloadEntry
, we need to infer which WorkloadEntry
corresponds to the connected Istio Proxy.
One solution would be to find WorkloadEntry
by IP address. However, it wouldn't cover all the use cases:
- there might be multiple
WorkloadEntry
(s) per IP address WorkloadEntry
might be defined for a different IP address than the one sent in the metadata of the Istio Proxy
That's why this PR proposes a solution where Istio Proxy must explicitly name the WorkloadEntry
it corresponds to.
…vm-health-check-for-workload-entry
@yskopets I prefer refactoring the interface, currently it is a bit confusing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a very good idea.
One small suggestion - I don't know what we do for the Service/Deployment created by Istiod for gateway, but would be good to have a consistent way to track who created
a resource. I think there are well-established patterns ( managed-by labels ), we should
use them to make it clear who created the workload entry.
It may also be helpful to add an annotation with the actual istiod instance where the
connection is held, for debug.
The 'auto created' name is a bit confusing - you really mean created by istiod, other
automation tools will also auto-created WEs.
if wle == nil { | ||
return false, fmt.Errorf("failed updating WorkloadEntry %s/%s: WorkloadEntry not found", proxy.Metadata.Namespace, entryName) | ||
} | ||
lastConTime, _ := time.Parse(timeFormat, wle.Annotations[ConnectedAtAnnotation]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's handle the error here appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the existing code that was moved into a separate function to make it reusable.
I think the current behaviour makes sense - if annotation value is corrupted then recover automatically.
What error handling logic do you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd at least like logging for when this happens since I don't think it should ever happen in a typical flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address it in a follow-up PR
This seems very useful. From a UX perspective, thinking loud here - would this be possible without user needing to manually annotate the workload entry resource to indicate health check enabled? e.g. can istiod calculate that info based on the fact that WorkloadEntry is auto vs manual generated? |
@hzxuzhonghu Do I understand correctly that you want me to split I've prepared a #44786 for that. Right now it's Draft. Let me know if that's what you mean. |
@linsun AFAIU, the primary consumer of the It is preferable to keep that EDS logic free from dependencies on If we let a user to create However, within the time span when |
Splitting looks good, the more important should be that the logic in this pr should not couple with register |
store := memory.NewController(memory.Make(collections.All)) | ||
createOrFail(t, store, weB) | ||
|
||
stop := make(chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: test.NewStop(). Or abstract L318-325 to a NewTestController function, since its repeated many times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will address it in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #44825
Thank you all for the feedback. I will address it in the follow-up PR(s). |
In response to a cherrypick label: new pull request created: #44866 |
Please provide a description of this PR:
This PR is a rebased version of former #37086.
The PR implements support for health checking of those VMs that are not using auto-registration.
The notion of WorkloadEntry
"cleanup"
has been extended to accommodate 2 types VMs, namely:"cleanup"
for VMs that do use auto-registration means "delete WorkloadEntry""cleanup"
for VMs that do NOT use auto-registration means "delete Health condition" (as health status is no longer known)The overall purpose of this PR is to make
WorkloadEntry
the fundamental building block, so that it can be used even without higher level concepts such as auto-registration.Please check any characteristics that apply to this pull request.
This PR does not introduce any API changes, however it makes possible a new use case that might need to be documented.
E.g., if a user wants to enable health-checking for a
WorkloadEntry
that is not using auto-registration, he/she needs to make the following changes to the configuration:Additional notes:
This PR adds a new feature while keeping the changes to the existing code to the minimum.
As a result, file
pilot/pkg/autoregistration/controller.go
is now a home to all sorts of logic:WorkloadEntry
resourceWorkloadEntry
cleanup on Istio Proxy disconnectWorkloadEntry
cleanup on expiration of the grace periodTo improve this, I've also prepared a refactoring PR where
pilot/pkg/autoregistration/controller.go
is split into smaller sub-components.Please let me know if you would prefer the refactored PR instead of this one.