Node exporter guide#1079
Node exporter guide#1079lucperkins merged 7 commits intoprometheus:masterfrom lucperkins:lperkins/node-exporter-guide
Conversation
content/docs/guides/node-exporter.md
Outdated
| title: Monitoring Linux or macOS host metrics using a node exporter | ||
| --- | ||
|
|
||
| # Monitoring Linux or macOS host metrics using a node exporter |
There was a problem hiding this comment.
It's safest to stick with just Linux
Node
content/docs/guides/node-exporter.md
Outdated
|
|
||
| # Monitoring Linux or macOS host metrics using a node exporter | ||
|
|
||
| A Prometheus [**node exporter**](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics. |
content/docs/guides/node-exporter.md
Outdated
|
|
||
| ## Installing and running the node exporter | ||
|
|
||
| The Prometheus node exporter is a single static binary that you can install [via tarball](#tarball-installation) or using [`go get`](#go-installation). You can also install and run the node exporter as a [Docker image](#docker). |
There was a problem hiding this comment.
Docker is not recommended, and should not be mentioned.
I'd just have the tarball, give the user one easy way to do things rather than potentially put them down the path of having a working Go environment.
content/docs/guides/node-exporter.md
Outdated
| There is an official [Docker](https://docker.com) image for the node exporter available via [Docker Hub](https://hub.docker.com/r/prom/node-exporter/) as `prom/node-exporter`. You can see a list of available tags [here](https://hub.docker.com/r/prom/node-exporter/tags/). To run the latest version of the image locally: | ||
|
|
||
| ```bash | ||
| docker run \ |
There was a problem hiding this comment.
This is not the correct command, remove all of this
content/docs/guides/node-exporter.md
Outdated
| # etc. | ||
| ``` | ||
|
|
||
| Success! The node exporter is now exposing a wide variety of system metrics that Prometheus can scrape. |
There was a problem hiding this comment.
The metric shown is a Go metric, not a system metric
There was a problem hiding this comment.
The intention, though, is to verify that metrics are being exposed by the NE. The output from the NE's /metrics endpoint indeed begins with go_gc... metrics rather than system metrics.
There was a problem hiding this comment.
Okay, I see what you mean now. I'll remove the "system" from the sentence.
content/docs/guides/node-exporter.md
Outdated
| To see all metrics available for the `node_exporter` job: | ||
|
|
||
| ``` | ||
| {job="node_exporter"} |
There was a problem hiding this comment.
This is a potentially quite expensive query, it shouldn't be mentioned. Look at a /metrics if you want this
content/docs/guides/node-exporter.md
Outdated
| This will likely bring up metrics for a variety of different `device`s and `mountpoint`s. Here's an example output: | ||
|
|
||
| ``` | ||
| node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hostname"} 15077224448 |
There was a problem hiding this comment.
This is output from running inside docker, and is not typical.
content/docs/guides/node-exporter.md
Outdated
| node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hostname"} 15077224448 | ||
| node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hosts"} 15077224448 | ||
| node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/resolv.conf"} 15077224448 | ||
| node_filesystem_avail_bytes{device="none",fstype="aufs",instance="node_exporter:9100",job="node_exporter",mountpoint="/"} 15077224448 |
There was a problem hiding this comment.
You instance label does not match the configuration file
content/docs/guides/node-exporter.md
Outdated
|
|
||
| This is just one example, and there are many more node exporter metrics to explore. | ||
|
|
||
| ## Enabling and disabling node exporter metrics |
There was a problem hiding this comment.
I don't think this is necessary detail for a first guide, this is intermediate to advanced stuff.
What users will care about is the key cpu/ram/disk/disk io/network/memory metrics and how to use them.
There was a problem hiding this comment.
What, in your estimation, are some of the most important metrics? We might as well provide a "maybe check these out first" list.
There was a problem hiding this comment.
The important ones are all enabled by default already.
| @@ -5,7 +5,7 @@ sort_rank: 3 | |||
|
|
|||
| # First steps with Prometheus | |||
There was a problem hiding this comment.
Now that we've a second guide, this should move to be with it.
There was a problem hiding this comment.
I'm fine with that, but I'll save that for a future PR
content/docs/guides/node-exporter.md
Outdated
| title: Monitoring Linux host metrics using a node exporter | ||
| --- | ||
|
|
||
| # Monitoring Linux host metrics using a node exporter |
content/docs/guides/node-exporter.md
Outdated
|
|
||
| # Monitoring Linux host metrics using a node exporter | ||
|
|
||
| The Prometheus [**node exporter**](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics. |
There was a problem hiding this comment.
they're more kernel than OS
content/docs/guides/node-exporter.md
Outdated
| - targets: ['localhost:9100'] | ||
| ``` | ||
|
|
||
| Once Prometheus is [installed](../../introduction/first_steps) you can start it up, using the `--config.file` flag to point to the Prometheus configuration that you created: |
There was a problem hiding this comment.
It'd be good to explain how to obtain prometheus. We can skip it for most of the others, but the Node exporter is likely the first thing a user will use
content/docs/guides/node-exporter.md
Outdated
| Now that Prometheus is scraping metrics from a running node exporter instance, we can explore those metrics using the Prometheus UI (aka the [expression browser](/docs/visualization/expression-browser)). | ||
| Navigate to `localhost:9090/graph` in your browser. Metrics specific to the node exporter are prefixed with `node_` and include metrics like `node_cpu_seconds_total` and `node_exporter_build_info`. | ||
|
|
||
| To see all metrics available for the `node` job: |
There was a problem hiding this comment.
Using a browser would be easier:
content/docs/guides/node-exporter.md
Outdated
| This will likely bring up metrics for a variety of different `device`s and `mountpoint`s. Here's an example output: | ||
|
|
||
| ``` | ||
| node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node:9100",job="node",mountpoint="/etc/hostname"} 15077224448 |
There was a problem hiding this comment.
This is still atypical output, as it is from inside docker.
The instance label also doesn't match the configuration
content/docs/guides/node-exporter.md
Outdated
| curl http://localhost:9100/metrics | ||
| ``` | ||
|
|
||
| The `node_filesystem_avail_bytes` metric, for example, informs you how much disk space is available to non-root users on each filesystem. |
There was a problem hiding this comment.
You could include links directly to the expression browser with interesting graphs etc.
Signed-off-by: lucperkins <lucperkins@gmail.com>
content/docs/guides/node-exporter.md
Outdated
| @@ -0,0 +1,100 @@ | |||
| --- | |||
| title: Monitoring Linux host metrics with the node exporter | |||
There was a problem hiding this comment.
It's the Node exporter with a capital N
There was a problem hiding this comment.
Why not Node Exporter?
There was a problem hiding this comment.
Yeah, I would also capitalize both words, not just one.
content/docs/guides/node-exporter.md
Outdated
| Success! The node exporter is now exposing metrics that Prometheus can scrape, including a wide variety of system metrics further down in the output (prefixed with `node_`). To view those metrics (along with help and type information): | ||
|
|
||
| ```bash | ||
| curl http://localhost:9100/metrics | grep "node_*" |
There was a problem hiding this comment.
I think you're mixing up globs and regexes. just node_ will do.
content/docs/guides/node-exporter.md
Outdated
|
|
||
| Click on the links below to see some example metrics: | ||
|
|
||
| * [`node_cpu_seconds_total{mode="system"}`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_cpu_seconds_total%7Bmode%3D%22system%22%7D&g0.tab=1) |
There was a problem hiding this comment.
Anything with a _total needs a rate(x[1m]) around it to be useful
A few words about what these metrics mean would be useful
content/docs/guides/node-exporter.md
Outdated
|
|
||
| * [`node_cpu_seconds_total{mode="system"}`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_cpu_seconds_total%7Bmode%3D%22system%22%7D&g0.tab=1) | ||
| * [`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | ||
| * [`node_memory_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_memory_bytes_total&g0.tab=1) |
There was a problem hiding this comment.
There is no metric by this name. Use something like node_memory_Cached_bytes
Signed-off-by: lucperkins <lucperkins@gmail.com>
Signed-off-by: lucperkins <lucperkins@gmail.com>
Signed-off-by: lucperkins <lucperkins@gmail.com>
content/docs/guides/node-exporter.md
Outdated
|
|
||
| Metric | Type | Meaning | ||
| :------|:-----|:------- | ||
| [`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute |
There was a problem hiding this comment.
The first column is a promql expression, so this expression is a gauge. I'd remove the Type column, it'll confuse people.
This is the average over the last minute number of CPU seconds spent in system per second.
content/docs/guides/node-exporter.md
Outdated
| :------|:-----|:------- | ||
| [`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute | ||
| [`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | gauge | The filesystem space available to non-root users (in bytes) | ||
| [`node_network_receive_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | counter | No newline at end of file |
There was a problem hiding this comment.
You need to take a rate() for a counter.
content/docs/guides/node-exporter.md
Outdated
|
|
||
| ## Installing and running the Node Exporter | ||
|
|
||
| The Prometheus Node Exporter is a single static binary that you can install [via tarball](#tarball-installation). You can [download](/downloads#node_exporter) page, extract it, and run it: |
There was a problem hiding this comment.
You can download page?
Partial sentence?
There was a problem hiding this comment.
Sorry, this doc is in the middle of a re-write, hence the odd loose ends. I'll get these straightened up.
content/docs/guides/node-exporter.md
Outdated
|
|
||
| ## Configuring your Prometheus instances | ||
|
|
||
| Your locally running Prometheus instance needs to be properly configured in order to access Node Exporter metrics. The following [`scrape_config`](../prometheus/latest/configuration/configuration/#<scrape_config>) block will tell Prometheus that scrape from the Node Exporter via `localhost:9100`: |
There was a problem hiding this comment.
will tell Prometheus that?
will tell Prometheus to?
content/docs/guides/node-exporter.md
Outdated
| - targets: ['localhost:9100'] | ||
| ``` | ||
|
|
||
| To install Prometheus, [download the latest release](/download) for your platform, |
There was a problem hiding this comment.
Should you have duplicate install instructions?
There was a problem hiding this comment.
Personally, I prefer no duplicate instructions and originally had it that way, but @brian-brazil disagrees.
There was a problem hiding this comment.
@brian-brazil I think a link to the first steps guide would be better here.
There was a problem hiding this comment.
I think as one of the first guides that a user uses, that it should be completely standalone. Someone just starting out with the node exporter likely does not yet have the experience to run Prometheus based on a 2nd guide.
content/docs/guides/node-exporter.md
Outdated
| cd prometheus-*.* | ||
| ``` | ||
|
|
||
| Once Prometheus is installed you can start it up, using the `--config.file` flag to point to the Prometheus configuration that you created: |
There was a problem hiding this comment.
Which configuration? The block above? It's not clear that this should be in a file.
content/docs/guides/node-exporter.md
Outdated
|
|
||
| ## Exploring Node Exporter metrics through the Prometheus expression browser | ||
|
|
||
| Now that Prometheus is scraping metrics from a running Node Exporter instance, you can explore those metrics using the Prometheus UI (aka the [expression browser](/docs/visualization/expression-browser)). Navigate to `localhost:9090/graph` in your browser and use the main expression bar at the top of the page to enter expressions, which looks like this: |
There was a problem hiding this comment.
"at the top of the page to enter expressions, which looks like this:" reads oddly.
content/docs/guides/node-exporter.md
Outdated
| :------|:-----|:------- | ||
| [`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute | ||
| [`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | gauge | The filesystem space available to non-root users (in bytes) | ||
| [`node_network_receive_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | counter | No newline at end of file |
There was a problem hiding this comment.
For consistency should each guide have a Summary section like the first steps? (I wrote that so I say yes) but if not it should probably be pulled. :)
There was a problem hiding this comment.
I'm down with a Summary section. I'll add one here.
Signed-off-by: lucperkins <lucperkins@gmail.com>
Signed-off-by: lucperkins <lucperkins@gmail.com>
content/docs/guides/node-exporter.md
Outdated
| :------|:------- | ||
| [`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | The average number of CPU seconds spent in system per second over the last minute | ||
| [`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | The filesystem space available to non-root users (in bytes) | ||
| [`rate(node_network_receive_bytes_total[30s])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | No newline at end of file |
There was a problem hiding this comment.
This is the average network traffic received in bytes per second
The range on the rate here is inconsistent with the 1st example, use 1m here too.
Signed-off-by: lucperkins <lucperkins@gmail.com>
brian-brazil
left a comment
There was a problem hiding this comment.
Please don't merge changes until the code review process is complete and consensus has been achieved.
| :------|:------- | ||
| [`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | The average number of CPU seconds spent in system, per second, over the last minute | ||
| [`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | The filesystem space available to non-root users (in bytes) | ||
| [`rate(node_network_receive_bytes_total[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_network_receive_bytes_total%5B1m%5D)&g0.tab=1) | The average network traffic received, per second, over the last minute (in bytes) |
There was a problem hiding this comment.
This is inconsistent with cpu seconds, where the unit is not in brackets.
|
@brian-brazil I'll undo the merge in this case, but could we possibly expedite this process a bit in the future? 50+ comments on an introductory tutorial seems unnecessarily exacting. Subjecting simple material to such a grueling review process strikes me as counterproductive and highly likely to deter volunteers from making substantial contributions to the documentation. |
|
After a quick glance the new version is better, so can we leave this merged and base future work on this? |
|
Our users deserve high quality documentation, without errors, inconsistencies, or anything else that could confuse users and increase rather than decrease our support load. I appreciate you working on this, but we can't let a PR through just because more than a week has passed. |
|
@brian-brazil Could you possibly propose a solution to the issue you point out above? Perhaps an alternative to the metric listed? I'm new to Prometheus and I'm not sure how to interpret your critique. |
|
My point is consistency about formatting between the 1st entry and the other two. So one way to fix it would be: The average amount of CPU time spent in system mode, per second, over the last minute (in seconds) |
* *: update the Note format * Update note format * Update the Note format
Guide to locally running a node exporter. Moves existing content from the First Steps doc and expands on it.