You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the microservice world, when a customer reports an issue related to the error/degradation/latency we start debugging the by asking the below questions
Which services error rate spiked in the given timeline?
Which endpoint degraded in the given timeline?
We can identify the deviation for error/latency by going to the respective service/endpoint overview dashboard and check the patterns in the errors or latency graph. This workflow is not scalable for large number of services and dependencies.
Proposal
Add metrics(p99, error) deviation while listing services and endpoints.