Fetching contributors…
Cannot retrieve contributors at this time
226 lines (135 sloc) 13.6 KB
breadcrumb: PCF Metrics Documentation
title: Monitoring and Troubleshooting Apps with PCF Metrics
owner: PCF Metrics
list_style_none: true
This topic describes how developers can monitor and troubleshoot their apps using Pivotal Cloud Foundry (PCF) Metrics.
## <a id="overview"></a> Overview
PCF Metrics helps you understand and troubleshoot the health and performance of your apps by displaying the following:
* [Container Metrics](#container): Three graphs measuring CPU, memory, and disk usage percentages
* [Network Metrics](#network): Three graphs measuring requests, HTTP errors, and response times
* [Custom Metrics](#custom): User-customizable graphs for measuring app performance, such as Spring Boot Actuator metrics
* [App Events](#events): A graph of update, start, stop, crash, SSH, and staging failure events
* [Logs](#logs): A list of app logs that you can search, filter, and download
* [Trace Explorer](#trace): A dependency graph that traces a request as it flows through your apps and their endpoints, along with the corresponding logs
The following sections describe a standard workflow for using PCF Metrics to monitor or troubleshoot your apps.
##<a id='get-started'></a> View an App
In a browser, navigate to `metrics.YOUR-SYSTEM-DOMAIN` and log in with your User Account and Authentication (UAA) credentials. Choose an app for which you want to view metrics or logs. PCF Metrics respects UAA permissions such that you can view any app that runs in a space that you have access to.
![Search for an app](search.png)
PCF Metrics displays app data for a given time frame. See the sections below to [Change the Time Frame](#time) for the dashboard, [Interpret Metrics](#metrics) information on each graph, and [Trace App Requests](#trace) with the Trace Explorer.
![Metrics UI](full.png)
## <a id="time"></a>Change the Time Frame
The graphs show time along the horizontal axis. You can change the time frame for all graphs and the logs by using the time selector at the top of the window. Adjust either end of the selector or click and drag.
**Zoom**: From within any graph, click and drag to zoom in on areas of interest. This adjusts all of the graphs, and the logs, to show data from that time frame.
<%= image_tag("metric-zoom.png") %>
##<a id='metrics'></a> Add, Edit, and Delete Charts
The PCF Metrics dashboard allows users to add, edit, and delete charts.
**Add Chart**: To add a new chart, follow the steps below.
1. Click **+ ADD CHART** at the top right of the dashboard.
<%= image_tag("metrics-add1.png") %>
1. In the modal window, either select a metric from the dropdown menu or type the name of the metric into the search bar to filter results.
<%= image_tag("metrics-add2.png") %>
1. Select an aggregation type. This determines how to combine the data from multiple instances.
<%= image_tag("metrics-add3.png") %>
**Edit Chart**: To change how instances are aggregated for an existing metric, click the pencil icon on the header of the metric chart. When the **Edit Chart** modal window appears, you can choose the aggregation type and click **Save** to apply changes.
<%= image_tag("metrics-edit.png") %>
**Delete Chart**: To delete an existing chart on the dashboard, click the trash can icon on the header of the metric chart and then click **Delete**.
<%= image_tag("metrics-delete.png") %>
## <a id="time"></a>View and Reorder Metric Charts
**Reorder**: Each metric has its own chart. You can click and drag the chart header to change the ordering of charts.
**Expand**: To see more details in complex graphs, you can expand a chart by clicking the icon in the chart header.
<%= image_tag("metrics-expand1.png") %>
You can collapse the chart by clicking the icon again.
<%= image_tag("metrics-expand2.png") %>
## <a id="app-instance"></a>View Metrics at App-Instance Level
PCF Metrics relays metric data at the app-instance level to allow for an in-depth troubleshooting experience. Users are able to view the app metrics related to a specific instance index, which correlates directly with the app instance indices shown in [Apps Manager](
To view metrics at the app-instance level, turn the **view instances** toggle on.
To select or deselect a specific app instance, select the desired instance from the **instance filter** dropdown menu.
<%= image_tag("metrics-instances2.png") %>
Alternatively, click an instance line on the metric chart that interests you to select the instance.
<%= image_tag("metrics-instances3.png") %>
##<a id='metrics'></a> Interpret Metrics
See the following sections to understand how to use each of the views on the dashboard to monitor and troubleshoot your app.
### <a id='container'></a> Container Metrics
Three **Container Metrics** charts are available on the PCF Metrics dashboard:
* CPU usage percentage: **cf.system.cpu**
<%= image_tag("cpu.png") %>
A spike in CPU might point to a process that is computationally heavy. Scaling app instances can relieve the immediate pressure, but you need to investigate the app to better understand and fix the root cause.
* Memory usage percentage: **cf.system.memory**
<%= image_tag("memory.png") %>
A spike in memory might mean a resource leak in the code. Scaling app memory can relieve the immediate pressure, but you need to find and resolve the underlying issue so that it does not occur again.
* Disk usage percentage: **cf.system.disk**
<%= image_tag("disk.png") %>
A spike in disk might mean the app is writing logs to files instead of STDOUT, caching data to local disk, or serializing large sessions to disk.
### <a id='network'></a> Network Metrics
Three **Network Metrics** charts are available on the PCF Metrics dashboard:
* Number of network requests per minute: **cf.system.request-count**
<%= image_tag("request-count.png") %>
A spike in HTTP requests means more users are using your app. Scaling app instances can reduce the response time.
* Number of network request errors per minute: **cf.system.request-error-count**
<%= image_tag("request-error-count.png") %>
A spike in HTTP errors means one or more 5xx errors have occurred. Check your app logs for more information.
* Average latency of a request in milliseconds: **cf.system.latency**
<%= image_tag("latency.png") %>
A spike in response time means your users are waiting longer. Scaling app instances can spread that workload over more resources and result in faster response times.
### <a id='events'></a> Events
The **Events** graph shows the following app events: **Crash**, **Fail** (staging failures), **Update**, **Stop**, **Start**, and **SSH**.
<p class="note"><strong>Note</strong>: The <b>SSH</b> event corresponds to someone successfully using SSH to access a container that runs an instance of the app.</p>
See the following topics for more information about app events:
* [About Starting Applications](
* [Troubleshooting Application Deployment and Health](
### <a id='custom'></a> Custom Metrics
Users can configure their apps to emit custom metrics out of the [Loggregator Firehose]( and then view these metrics on the PCF Metrics dashboard. For steps on how to set up your apps to emit custom metrics, refer to the [Metrics Forwarder Documentation]( If you have configured the apps correctly, you should be able to automatically see custom metrics on the PCF Metrics dashboard when you add a chart.
<%= image_tag("metrics-custom.png") %>
In addition, Spring Boot apps with actuators implemented emit [Spring Boot Actuator metrics]( out of the box, without any changes to source code. In PCF Metrics, these metrics look similar to the following:
<%= image_tag("metrics-spring.png") %>
### <a id="logs"></a>Logs
The **Logs** view displays app log data ingested from the Loggregator Firehose, including a histogram that shows log frequency for the current time frame:
The green time needle visible on metrics charts and the logs histogram indicates the beginning of the logs. Depending on the sort order of your logs, you can see different results:
* Sort by **newest first (default)**:
<%= image_tag("newest-first.png") %>
The logs drawer retrieves all logs in the selected time frame that is _older than/to the left of_ the needle. The log outlined in green is the newest log among the logs located to the left of the needle placement.
* Sort by **oldest first**:
<%= image_tag("oldest-first.png") %>
The logs drawer retrieves all logs in the selected time frame that is _newer than/to the right of_ the needle. The log outlined in green is the oldest log among the logs located to the right of the needle placement.
To adjust the placement of the time needle, click the handle at the bottom of the needle and drag to reposition it. Alternatively, you can click anywhere along the x-axis of a metric chart or the logs histogram to snap the needle to that position.
You can interact with the **Logs** view in the following ways:
* **Keyword**: Perform a keyword search. The histogram updates with blue bars based on what you enter. Hover over a histogram bar to view the number of logs for a specific time.
* **Highlight**: Enter a term to highlight within your search. The histogram updates with yellow bars based on the results. Hover over a histogram bar to view the number of logs for a specific time that contain the highlighted term.
* **Sources**: Choose which sources to display logs from. For more information, see [Log Types and Their Messages](
* **Order**: Modify the order in which logs appear.
* **Download**: Download a file containing logs for the current search.
* **Copy**: Click the copy icon to copy the text of the log.
* **View in Trace Explorer**: Open a window to see the trace of the request associated with the log. See [Trace App Requests](#trace).
## <a id="trace"></a>Trace App Requests
A request to one of your apps initiates a workflow within the app or system of apps. The record of this workflow is a _trace_, which you can use to troubleshoot app failures and latency issues. In the Trace Explorer view, PCF Metrics displays an interactive graph of a trace and its corresponding logs. See the sections below to understand how to use the Trace Explorer.
For more information about traces, see [What is a Trace?]( in the _Open Tracing_ documentation.
###<a id="before"></a> Prerequisites
PCF Metrics constructs the Trace Explorer view using trace IDs shared across app logs. Before you [use the Trace Explorer](#trace-explorer), examine the following list to ensure PCF metrics can extract the necessary data from your app logs for your specific app type.
* **Spring**: Follow the steps below.
1. Ensure you are using Spring Boot v1.4.3 or later.
1. Ensure you are using Spring Cloud Sleuth v1.0.12 or later.
1. Add the following to your app dependency file:
<pre>dependencies { (2)
compile ""
* **Node.js**, **Go**, and **Python**: Ensure that the servers associated with your app do not modify HTTP requests in a way that removes the `X-B3-TraceId`, `X-B3-SpanId`, and `X-B3-ParentSpan` headers from a request. You also need to add Trace ID, Span ID, and Parent Span ID to the SLF4J MDC in your app logs to correlate logs within the Trace Explorer.
* **Ruby**: Ruby servers that use a library depending on Rack modify HTTP request headers in a way that is incompatible with PCF Metrics. If you want to trace app requests for your Ruby apps, ensure that your framework does not rely on Rack. You may need to write a raw Ruby server that preserves the `X-B3-TraceId`, `X-B3-SpanId`, and `X-B3-ParentSpan` headers in the request. You also need to add Trace ID, Span ID, and Parent Span ID to the SLF4J MDC in your app logs to correlate logs within the Trace Explorer.
###<a id="trace-explorer"></a> Use the Trace Explorer
This section explains how to view the trace for a request received by your app and interact with the Trace Explorer.
1. Select an app on the PCF Metrics dashboard.
1. Click the Trace Explorer icon in a log for which you want to trace the request.
![Hover over trace icon](click-trace.png)
* The Trace Explorer displays the apps and endpoints involved in completing a request, along with the corresponding logs:
![Trace Explorer](trace-explorer.png)
A request corresponds to a single trace ID displayed in the top left corner. Each row includes an app in the left column and a _span_ in the right column. A span is a particular endpoint within the app and the time it took to execute in milliseconds. By default, the graph lists each app and endpoint in the order they were called.
<p class="note"><strong>Note</strong>: If you do not have access to the space for an app involved in the request, you cannot see the spans or logs from that app.</p>
* You can click a span to show only logs from that span or any number of spans to toggle which logs appear. Clicking a span also creates a box with that particular span ID in the **Logs** view:
![Click Span](click-span.png)
* If you click `APP APP-NAME` within a log, PCF Metrics returns you to the dashboard view for that app, with the time frame focused on the time of the log that you clicked from.