Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Publish health probe results as Azure Monitor metrics #541

Open
maskati opened this issue Dec 22, 2022 · 5 comments
Open
Labels
enhancement New feature or request observability related to observability Provisioning Related to deployment issues, revision provisioning, etc.

Comments

@maskati
Copy link

maskati commented Dec 22, 2022

Is your feature request related to a problem? Please describe.
Container Apps startup/liveness/readiness health probes are not published to Container App metrics, where it they could be used for alerting or even external ingress routing decisions.

Describe the solution you'd like.
Publish Container App health probe information as metrics. Ideally provide separate metrics for each probe type and include Replica and Revision dimensions. For example App Service publishes its own health check results as a Health Check Status metric.

Describe alternatives you've considered.
Infer failed liveness probe based on non-zero "Replica Restart Count" metric. If the replica continuously restarts due to a non-transient issue, this does not give an accurate indication of the current health state. It also does not provide a means to infer readiness, since this does not cause a restart.

Readiness could be monitored by an external probe, but this adds complexity. In addition, unlike internal health probes, external probes will move the app from idle to active usage metering, so high frequency polling would also have a cost impact.

@maskati maskati added the enhancement New feature or request label Dec 22, 2022
@ghost ghost added the Needs: triage 🔍 Pending a first pass to read, tag, and assign label Dec 22, 2022
@SophCarp SophCarp added observability related to observability Provisioning Related to deployment issues, revision provisioning, etc. and removed Needs: triage 🔍 Pending a first pass to read, tag, and assign labels Jan 3, 2023
@mortenf1984
Copy link

Is there any information on this feature request? I cannot find any place to actually see the status of the health probes I set up, and getting them as a metric would be great.

@ranat5
Copy link

ranat5 commented Dec 15, 2023

Hi, Has there any update to this enhancement request?

@maskati
Copy link
Author

maskati commented Dec 18, 2023

Not sure if this is showing availability based on probe state, but Diagnose and solve problems -> Availability and Performance -> Availability view shows a graph of "availability". Have not investigated further.

This seems to be sourced from a Microsoft.App/containerapps/detectors named cappContainerAppAvailabilityDetector. You can list other detectors with List Detectors and get the data with Get Detector, but it's not exactly in an easy to consume format.

There seem to be a bunch of detectors emitting various data. It would be nice to be able to consume at least some of these as standard Azure Monitor metrics. I am getting the following detectors:

id                                        name                                             category                       description
--                                        ----                                             --------                       -----------
AutoScalingErrors                         Auto Scaling Errors                              Availability and Performance   This detector shows you auto scaling (KEDA) errors occurred to your container app.
BilledQuantityWithAppAliveCount           Usage Quantity With Replicas                     Configuration and Management   Detector to show the usage quantities per metric with replica count
BilledQuantityWithAppAliveCountWestEurope West Europe Billing Issue                        Configuration and Management   Detector to show the underbilling usage quantities per metric with replica count for West Europe
cappcertificates                          SSL and Domains                                  SSL and Domains                This detector shows SSL and custom domain related issues for your Container App and Container App Environment.
cappconfigandmanagement                   Configuration and Management                     Configuration and Management   This detector shows configuration and management issues of your container app.
cappContainerAppAvailabilityDetector      Container App Availability Metrics               Availability and Performance   Analyze App and Platform availability and monitor the requests and failures to your container app
cappContainerAppAvailabilityMetrics       Container App Down                               Availability and Performance   Analyze App and Platform availability and monitor the requests and failures to your container app
cappcontainerappclustercreation           Container App Env Creation Error Detector        Container Apps Environment     This detector shows you some known issues regarding cluster creation
cappcontainerappcpu                       Container App CPU Usage                          Availability and Performance   This detector shows the Container App CPU usage.
cappcontainerappmemory                    Container App Memory Usage                       Availability and Performance   This detector shows the memory usage of the specified container app.
cappcontainerappnetworkusage              Container App Network Inbound and Outbound Usage Availability and Performance   This detector shows the Container App Network Inbound and Outbound usages.
cappcontainerapprevisions                 Revisions                                                                       List of revisions of the container app
cappdeploymentFailures                    Deployment Failures                              Deployment                     This detector checks for deployment failures.
clustersubnet                             Cluster Subnet                                   Container Apps Environment     Looks for issues with cluster subnet configuration
ContainerAppEnvironmentEvents             Container Environment Events                     Container Apps Environment
ContainerAppsRevisionComparsion           Container Apps Revisions Comparison              Configuration and Management   Track the differences between two seperate revisisons
containerenvinsights                      Container Environment Insights                   Container Environment Insights This detector shows various insights of your container environment.
DaprInsights                              Dapr Components Insights                         Dapr Component Insights        This detector shows various insights of Dapr Components for your Container App Environment.
EasyAuthConfigurationErrors               EasyAuth Configuration Errors                    Configuration and Management   This detector shows you errors of EasyAuth occurred to your container app.
snatusage                                 SNAT Connection and Port Allocation              Availability and Performance   Checks SNAT connection counts and port allocation per host for any cluster outbound IPs.

@floriankoch
Copy link

Any progress on this?
For example to detect issues like this #1025 , the readiness states for the containers would help

@loadaverage
Copy link

I find it perplexing that such a basic and vital metric is not provided. What is even more surprising is that the HealthProbeStatus for the Ingress/Load Balancer shows an average value of 66.7 across all my Container Apps, despite there being no errors related to health probes in the Container Apps/System Logs. The HealthProbeStatus metric displays a flat line with a minimum value of 0, a maximum value of 100, and an average value of 66.7 (surprise).

Can someone explain how this is possible? How 100+0 can give 66.7?
(Why not use blackbox_exporter approach with simple logic of 0 and 1?)

image

image

image

And why there is no metrics from Load Balancer which is attached to every CA Environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request observability related to observability Provisioning Related to deployment issues, revision provisioning, etc.
Projects
None yet
Development

No branches or pull requests

6 participants