Node exporter guide by lucperkins · Pull Request #1079 · prometheus/docs

lucperkins · 2018-06-22T22:06:16Z

Guide to locally running a node exporter. Moves existing content from the First Steps doc and expands on it.

brian-brazil

brian-brazil · 2018-07-03T07:06:48Z

content/docs/guides/node-exporter.md

+title: Monitoring Linux or macOS host metrics using a node exporter
+---
+
+# Monitoring Linux or macOS host metrics using a node exporter


It's safest to stick with just Linux

Node

brian-brazil · 2018-07-03T07:07:30Z

content/docs/guides/node-exporter.md

+
+# Monitoring Linux or macOS host metrics using a node exporter
+
+A Prometheus [**node exporter**](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics.


brian-brazil · 2018-07-03T07:08:45Z

content/docs/guides/node-exporter.md

+
+## Installing and running the node exporter
+
+The Prometheus node exporter is a single static binary that you can install [via tarball](#tarball-installation) or using [`go get`](#go-installation). You can also install and run the node exporter as a [Docker image](#docker).


Docker is not recommended, and should not be mentioned.

I'd just have the tarball, give the user one easy way to do things rather than potentially put them down the path of having a working Go environment.

brian-brazil · 2018-07-03T07:09:11Z

content/docs/guides/node-exporter.md

+There is an official [Docker](https://docker.com) image for the node exporter available via [Docker Hub](https://hub.docker.com/r/prom/node-exporter/) as `prom/node-exporter`. You can see a list of available tags [here](https://hub.docker.com/r/prom/node-exporter/tags/). To run the latest version of the image locally:
+
+```bash
+docker run \


This is not the correct command, remove all of this

brian-brazil · 2018-07-03T07:09:31Z

content/docs/guides/node-exporter.md

+# etc.
+```
+
+Success! The node exporter is now exposing a wide variety of system metrics that Prometheus can scrape.


The metric shown is a Go metric, not a system metric

The intention, though, is to verify that metrics are being exposed by the NE. The output from the NE's /metrics endpoint indeed begins with go_gc... metrics rather than system metrics.

Okay, I see what you mean now. I'll remove the "system" from the sentence.

brian-brazil · 2018-07-03T07:12:36Z

content/docs/guides/node-exporter.md

+To see all metrics available for the `node_exporter` job:
+
+```
+{job="node_exporter"}


This is a potentially quite expensive query, it shouldn't be mentioned. Look at a /metrics if you want this

brian-brazil · 2018-07-03T07:12:59Z

content/docs/guides/node-exporter.md

+This will likely bring up metrics for a variety of different `device`s and `mountpoint`s. Here's an example output:
+
+```
+node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hostname"}	  15077224448


This is output from running inside docker, and is not typical.

brian-brazil · 2018-07-03T07:13:23Z

content/docs/guides/node-exporter.md

+node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hostname"}	  15077224448
+node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hosts"}	      15077224448
+node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/resolv.conf"}	15077224448
+node_filesystem_avail_bytes{device="none",fstype="aufs",instance="node_exporter:9100",job="node_exporter",mountpoint="/"}	                    15077224448


You instance label does not match the configuration file

brian-brazil · 2018-07-03T07:14:23Z

content/docs/guides/node-exporter.md

+
+This is just one example, and there are many more node exporter metrics to explore.
+
+## Enabling and disabling node exporter metrics


I don't think this is necessary detail for a first guide, this is intermediate to advanced stuff.

What users will care about is the key cpu/ram/disk/disk io/network/memory metrics and how to use them.

What, in your estimation, are some of the most important metrics? We might as well provide a "maybe check these out first" list.

The important ones are all enabled by default already.

brian-brazil · 2018-07-03T07:15:04Z

content/docs/introduction/first_steps.md

@@ -5,7 +5,7 @@ sort_rank: 3

 # First steps with Prometheus


Now that we've a second guide, this should move to be with it.

I'm fine with that, but I'll save that for a future PR

brian-brazil · 2018-07-04T12:03:46Z

content/docs/guides/node-exporter.md

+title: Monitoring Linux host metrics using a node exporter
+---
+
+# Monitoring Linux host metrics using a node exporter


with the Node

brian-brazil · 2018-07-04T12:04:06Z

content/docs/guides/node-exporter.md

+
+# Monitoring Linux host metrics using a node exporter
+
+The Prometheus [**node exporter**](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics.


they're more kernel than OS

brian-brazil · 2018-07-04T12:05:26Z

content/docs/guides/node-exporter.md

+  - targets: ['localhost:9100']
+```
+
+Once Prometheus is [installed](../../introduction/first_steps) you can start it up, using the `--config.file` flag to point to the Prometheus configuration that you created:


It'd be good to explain how to obtain prometheus. We can skip it for most of the others, but the Node exporter is likely the first thing a user will use

brian-brazil · 2018-07-04T12:06:20Z

content/docs/guides/node-exporter.md

+Now that Prometheus is scraping metrics from a running node exporter instance, we can explore those metrics using the Prometheus UI (aka the [expression browser](/docs/visualization/expression-browser)).
+ Navigate to `localhost:9090/graph` in your browser. Metrics specific to the node exporter are prefixed with `node_` and include metrics like `node_cpu_seconds_total` and `node_exporter_build_info`.
+
+To see all metrics available for the `node` job:


Using a browser would be easier:

brian-brazil · 2018-07-04T12:06:43Z

content/docs/guides/node-exporter.md

+This will likely bring up metrics for a variety of different `device`s and `mountpoint`s. Here's an example output:
+
+```
+node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node:9100",job="node",mountpoint="/etc/hostname"}	  15077224448


This is still atypical output, as it is from inside docker.

The instance label also doesn't match the configuration

brian-brazil · 2018-07-04T12:07:40Z

content/docs/guides/node-exporter.md

+curl http://localhost:9100/metrics
+```
+
+The `node_filesystem_avail_bytes` metric, for example, informs you how much disk space is available to non-root users on each filesystem.


You could include links directly to the expression browser with interesting graphs etc.

Signed-off-by: lucperkins <lucperkins@gmail.com>

brian-brazil · 2018-07-06T08:29:53Z

content/docs/guides/node-exporter.md

@@ -0,0 +1,100 @@
+---
+title: Monitoring Linux host metrics with the node exporter


It's the Node exporter with a capital N

Why not Node Exporter?

Yeah, I would also capitalize both words, not just one.

brian-brazil · 2018-07-06T08:31:12Z

content/docs/guides/node-exporter.md

+Success! The node exporter is now exposing metrics that Prometheus can scrape, including a wide variety of system metrics further down in the output (prefixed with `node_`). To view those metrics (along with help and type information):
+
+```bash
+curl http://localhost:9100/metrics | grep "node_*"


I think you're mixing up globs and regexes. just node_ will do.

brian-brazil · 2018-07-06T08:32:28Z

content/docs/guides/node-exporter.md

+
+Click on the links below to see some example metrics:
+
+* [`node_cpu_seconds_total{mode="system"}`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_cpu_seconds_total%7Bmode%3D%22system%22%7D&g0.tab=1)


Anything with a _total needs a rate(x[1m]) around it to be useful

A few words about what these metrics mean would be useful

brian-brazil · 2018-07-06T08:33:43Z

content/docs/guides/node-exporter.md

+
+* [`node_cpu_seconds_total{mode="system"}`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_cpu_seconds_total%7Bmode%3D%22system%22%7D&g0.tab=1)
+* [`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1)
+* [`node_memory_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_memory_bytes_total&g0.tab=1)


There is no metric by this name. Use something like node_memory_Cached_bytes

Signed-off-by: lucperkins <lucperkins@gmail.com>

brian-brazil · 2018-07-07T08:41:56Z

content/docs/guides/node-exporter.md

+
+Metric | Type | Meaning
+:------|:-----|:-------
+[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute


The first column is a promql expression, so this expression is a gauge. I'd remove the Type column, it'll confuse people.

This is the average over the last minute number of CPU seconds spent in system per second.

brian-brazil · 2018-07-07T08:42:14Z

content/docs/guides/node-exporter.md

+:------|:-----|:-------
+[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute
+[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | gauge | The filesystem space available to non-root users (in bytes)
+[`node_network_receive_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | counter |


You need to take a rate() for a counter.

jamtur01 · 2018-07-09T04:51:33Z

content/docs/guides/node-exporter.md

+
+## Installing and running the Node Exporter
+
+The Prometheus Node Exporter is a single static binary that you can install [via tarball](#tarball-installation). You can [download](/downloads#node_exporter) page, extract it, and run it:


You can download page?

Partial sentence?

Sorry, this doc is in the middle of a re-write, hence the odd loose ends. I'll get these straightened up.

jamtur01 · 2018-07-09T04:52:35Z

content/docs/guides/node-exporter.md

+
+## Configuring your Prometheus instances
+
+Your locally running Prometheus instance needs to be properly configured in order to access Node Exporter metrics. The following [`scrape_config`](../prometheus/latest/configuration/configuration/#<scrape_config>) block will tell Prometheus that scrape from the Node Exporter via `localhost:9100`:


will tell Prometheus that?

will tell Prometheus to?

jamtur01 · 2018-07-09T04:52:58Z

content/docs/guides/node-exporter.md

+  - targets: ['localhost:9100']
+```
+
+To install Prometheus, [download the latest release](/download) for your platform,


Should you have duplicate install instructions?

Should , be :?

Personally, I prefer no duplicate instructions and originally had it that way, but @brian-brazil disagrees.

@brian-brazil I think a link to the first steps guide would be better here.

I think as one of the first guides that a user uses, that it should be completely standalone. Someone just starting out with the node exporter likely does not yet have the experience to run Prometheus based on a 2nd guide.

jamtur01 · 2018-07-09T04:54:02Z

content/docs/guides/node-exporter.md

+cd prometheus-*.*
+```
+
+Once Prometheus is installed you can start it up, using the `--config.file` flag to point to the Prometheus configuration that you created:


Which configuration? The block above? It's not clear that this should be in a file.

jamtur01 · 2018-07-09T04:55:15Z

content/docs/guides/node-exporter.md

+
+## Exploring Node Exporter metrics through the Prometheus expression browser
+
+Now that Prometheus is scraping metrics from a running Node Exporter instance, you can explore those metrics using the Prometheus UI (aka the [expression browser](/docs/visualization/expression-browser)). Navigate to `localhost:9090/graph` in your browser and use the main expression bar at the top of the page to enter expressions, which looks like this:


"at the top of the page to enter expressions, which looks like this:" reads oddly.

jamtur01 · 2018-07-09T04:57:53Z

content/docs/guides/node-exporter.md

+:------|:-----|:-------
+[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute
+[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | gauge | The filesystem space available to non-root users (in bytes)
+[`node_network_receive_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | counter |


For consistency should each guide have a Summary section like the first steps? (I wrote that so I say yes) but if not it should probably be pulled. :)

I'm down with a Summary section. I'll add one here.

Signed-off-by: lucperkins <lucperkins@gmail.com>

brian-brazil · 2018-07-11T10:50:59Z

content/docs/guides/node-exporter.md

+:------|:-------
+[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | The average number of CPU seconds spent in system per second over the last minute
+[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | The filesystem space available to non-root users (in bytes)
+[`rate(node_network_receive_bytes_total[30s])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | 


This is the average network traffic received in bytes per second

The range on the rate here is inconsistent with the 1st example, use 1m here too.

Signed-off-by: lucperkins <lucperkins@gmail.com>

brian-brazil

Please don't merge changes until the code review process is complete and consensus has been achieved.

brian-brazil · 2018-07-11T17:49:13Z

content/docs/guides/node-exporter.md

+:------|:-------
+[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | The average number of CPU seconds spent in system, per second, over the last minute
+[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | The filesystem space available to non-root users (in bytes)
+[`rate(node_network_receive_bytes_total[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_network_receive_bytes_total%5B1m%5D)&g0.tab=1) | The average network traffic received, per second, over the last minute (in bytes)


This is inconsistent with cpu seconds, where the unit is not in brackets.

lucperkins · 2018-07-11T19:49:54Z

@brian-brazil I'll undo the merge in this case, but could we possibly expedite this process a bit in the future? 50+ comments on an introductory tutorial seems unnecessarily exacting. Subjecting simple material to such a grueling review process strikes me as counterproductive and highly likely to deter volunteers from making substantial contributions to the documentation.

RichiH · 2018-07-11T20:24:48Z

After a quick glance the new version is better, so can we leave this merged and base future work on this?

brian-brazil · 2018-07-11T20:25:53Z

Our users deserve high quality documentation, without errors, inconsistencies, or anything else that could confuse users and increase rather than decrease our support load. I appreciate you working on this, but we can't let a PR through just because more than a week has passed.

lucperkins · 2018-07-11T22:49:45Z

@brian-brazil Could you possibly propose a solution to the issue you point out above? Perhaps an alternative to the metric listed? I'm new to Prometheus and I'm not sure how to interpret your critique.

brian-brazil · 2018-07-12T09:04:06Z

My point is consistency about formatting between the 1st entry and the other two. So one way to fix it would be:

The average amount of CPU time spent in system mode, per second, over the last minute (in seconds)

* *: update the Note format * Update note format * Update the Note format

lucperkins changed the title ~~[WIP] Node exporter guide~~ Node exporter guide Jul 3, 2018

lucperkins requested review from brian-brazil and juliusv July 3, 2018 02:35

brian-brazil reviewed Jul 3, 2018

View reviewed changes

brian-brazil reviewed Jul 4, 2018

View reviewed changes

Add node exporter guide

0f8022b

Signed-off-by: lucperkins <lucperkins@gmail.com>

brian-brazil reviewed Jul 6, 2018

View reviewed changes

lucperkins added the additional documentation label Jul 6, 2018

lucperkins added 3 commits July 6, 2018 12:12

Add information table for metrics

65fe382

Signed-off-by: lucperkins <lucperkins@gmail.com>

Capitalize Node Exporter

0a01d8e

Signed-off-by: lucperkins <lucperkins@gmail.com>

Change grep logic

4a1e84b

Signed-off-by: lucperkins <lucperkins@gmail.com>

brian-brazil reviewed Jul 7, 2018

View reviewed changes

jamtur01 reviewed Jul 9, 2018

View reviewed changes

lucperkins added 2 commits July 8, 2018 22:18

Wording fixes

e638a26

Signed-off-by: lucperkins <lucperkins@gmail.com>

Fix merge conflict

3d37f96

Signed-off-by: lucperkins <lucperkins@gmail.com>

brian-brazil reviewed Jul 11, 2018

View reviewed changes

Clarify metrics

0225d3b

Signed-off-by: lucperkins <lucperkins@gmail.com>

lucperkins merged commit 49abf62 into prometheus:master Jul 11, 2018

brian-brazil reviewed Jul 11, 2018

View reviewed changes

lucperkins mentioned this pull request Jul 11, 2018

Revert "Node exporter guide" #1106

Closed

lucperkins mentioned this pull request Jul 12, 2018

Node Exporter guide follow-up #1114

Merged

aylei pushed a commit to aylei/docs that referenced this pull request Oct 28, 2019

*: update the Note format (prometheus#1079)

0cf5ddd

* *: update the Note format * Update note format * Update the Note format


		# Monitoring Linux or macOS host metrics using a node exporter

		A Prometheus [node exporter](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics.


		## Installing and running the node exporter

		The Prometheus node exporter is a single static binary that you can install [via tarball](#tarball-installation) or using [`go get`](#go-installation). You can also install and run the node exporter as a [Docker image](#docker).


		This is just one example, and there are many more node exporter metrics to explore.

		## Enabling and disabling node exporter metrics


		# Monitoring Linux host metrics using a node exporter

		The Prometheus [node exporter](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics.

		@@ -0,0 +1,100 @@
		---
		title: Monitoring Linux host metrics with the node exporter


		Click on the links below to see some example metrics:

		* [`node_cpu_seconds_total{mode="system"}`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_cpu_seconds_total%7Bmode%3D%22system%22%7D&g0.tab=1)


		## Configuring your Prometheus instances

		Your locally running Prometheus instance needs to be properly configured in order to access Node Exporter metrics. The following [`scrape_config`](../prometheus/latest/configuration/configuration/#<scrape_config>) block will tell Prometheus that scrape from the Node Exporter via `localhost:9100`:


		## Exploring Node Exporter metrics through the Prometheus expression browser

		Now that Prometheus is scraping metrics from a running Node Exporter instance, you can explore those metrics using the Prometheus UI (aka the [expression browser](/docs/visualization/expression-browser)). Navigate to `localhost:9090/graph` in your browser and use the main expression bar at the top of the page to enter expressions, which looks like this:

Conversation

lucperkins commented Jun 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-brazil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lucperkins commented Jun 22, 2018 •

edited

Loading