Feature Request: Publish health probe results as Azure Monitor metrics #541

maskati · 2022-12-22T10:40:06Z

Is your feature request related to a problem? Please describe.
Container Apps startup/liveness/readiness health probes are not published to Container App metrics, where it they could be used for alerting or even external ingress routing decisions.

Describe the solution you'd like.
Publish Container App health probe information as metrics. Ideally provide separate metrics for each probe type and include Replica and Revision dimensions. For example App Service publishes its own health check results as a Health Check Status metric.

Describe alternatives you've considered.
Infer failed liveness probe based on non-zero "Replica Restart Count" metric. If the replica continuously restarts due to a non-transient issue, this does not give an accurate indication of the current health state. It also does not provide a means to infer readiness, since this does not cause a restart.

Readiness could be monitored by an external probe, but this adds complexity. In addition, unlike internal health probes, external probes will move the app from idle to active usage metering, so high frequency polling would also have a cost impact.

mortenf1984 · 2023-06-13T20:40:07Z

Is there any information on this feature request? I cannot find any place to actually see the status of the health probes I set up, and getting them as a metric would be great.

ranat5 · 2023-12-15T09:20:34Z

Hi, Has there any update to this enhancement request?

maskati · 2023-12-18T11:47:18Z

Not sure if this is showing availability based on probe state, but Diagnose and solve problems -> Availability and Performance -> Availability view shows a graph of "availability". Have not investigated further.

This seems to be sourced from a Microsoft.App/containerapps/detectors named cappContainerAppAvailabilityDetector. You can list other detectors with List Detectors and get the data with Get Detector, but it's not exactly in an easy to consume format.

There seem to be a bunch of detectors emitting various data. It would be nice to be able to consume at least some of these as standard Azure Monitor metrics. I am getting the following detectors:

id                                        name                                             category                       description
--                                        ----                                             --------                       -----------
AutoScalingErrors                         Auto Scaling Errors                              Availability and Performance   This detector shows you auto scaling (KEDA) errors occurred to your container app.
BilledQuantityWithAppAliveCount           Usage Quantity With Replicas                     Configuration and Management   Detector to show the usage quantities per metric with replica count
BilledQuantityWithAppAliveCountWestEurope West Europe Billing Issue                        Configuration and Management   Detector to show the underbilling usage quantities per metric with replica count for West Europe
cappcertificates                          SSL and Domains                                  SSL and Domains                This detector shows SSL and custom domain related issues for your Container App and Container App Environment.
cappconfigandmanagement                   Configuration and Management                     Configuration and Management   This detector shows configuration and management issues of your container app.
cappContainerAppAvailabilityDetector      Container App Availability Metrics               Availability and Performance   Analyze App and Platform availability and monitor the requests and failures to your container app
cappContainerAppAvailabilityMetrics       Container App Down                               Availability and Performance   Analyze App and Platform availability and monitor the requests and failures to your container app
cappcontainerappclustercreation           Container App Env Creation Error Detector        Container Apps Environment     This detector shows you some known issues regarding cluster creation
cappcontainerappcpu                       Container App CPU Usage                          Availability and Performance   This detector shows the Container App CPU usage.
cappcontainerappmemory                    Container App Memory Usage                       Availability and Performance   This detector shows the memory usage of the specified container app.
cappcontainerappnetworkusage              Container App Network Inbound and Outbound Usage Availability and Performance   This detector shows the Container App Network Inbound and Outbound usages.
cappcontainerapprevisions                 Revisions                                                                       List of revisions of the container app
cappdeploymentFailures                    Deployment Failures                              Deployment                     This detector checks for deployment failures.
clustersubnet                             Cluster Subnet                                   Container Apps Environment     Looks for issues with cluster subnet configuration
ContainerAppEnvironmentEvents             Container Environment Events                     Container Apps Environment
ContainerAppsRevisionComparsion           Container Apps Revisions Comparison              Configuration and Management   Track the differences between two seperate revisisons
containerenvinsights                      Container Environment Insights                   Container Environment Insights This detector shows various insights of your container environment.
DaprInsights                              Dapr Components Insights                         Dapr Component Insights        This detector shows various insights of Dapr Components for your Container App Environment.
EasyAuthConfigurationErrors               EasyAuth Configuration Errors                    Configuration and Management   This detector shows you errors of EasyAuth occurred to your container app.
snatusage                                 SNAT Connection and Port Allocation              Availability and Performance   Checks SNAT connection counts and port allocation per host for any cluster outbound IPs.

floriankoch · 2024-01-05T12:53:25Z

Any progress on this?
For example to detect issues like this #1025 , the readiness states for the containers would help

loadaverage · 2024-05-28T00:30:53Z

I find it perplexing that such a basic and vital metric is not provided. What is even more surprising is that the HealthProbeStatus for the Ingress/Load Balancer shows an average value of 66.7 across all my Container Apps, despite there being no errors related to health probes in the Container Apps/System Logs. The HealthProbeStatus metric displays a flat line with a minimum value of 0, a maximum value of 100, and an average value of 66.7 (surprise).

Can someone explain how this is possible? How 100+0 can give 66.7?
(Why not use blackbox_exporter approach with simple logic of 0 and 1?)

And why there is no metrics from Load Balancer which is attached to every CA Environment?

maskati added the enhancement New feature or request label Dec 22, 2022

ghost added the Needs: triage 🔍 Pending a first pass to read, tag, and assign label Dec 22, 2022

SophCarp added observability related to observability Provisioning Related to deployment issues, revision provisioning, etc. and removed Needs: triage 🔍 Pending a first pass to read, tag, and assign labels Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Publish health probe results as Azure Monitor metrics #541

Feature Request: Publish health probe results as Azure Monitor metrics #541

maskati commented Dec 22, 2022

mortenf1984 commented Jun 13, 2023

ranat5 commented Dec 15, 2023

maskati commented Dec 18, 2023

floriankoch commented Jan 5, 2024

loadaverage commented May 28, 2024

Feature Request: Publish health probe results as Azure Monitor metrics #541

Feature Request: Publish health probe results as Azure Monitor metrics #541

Comments

maskati commented Dec 22, 2022

mortenf1984 commented Jun 13, 2023

ranat5 commented Dec 15, 2023

maskati commented Dec 18, 2023

floriankoch commented Jan 5, 2024

loadaverage commented May 28, 2024