Skip to content

Node exporter guide#1079

Merged
lucperkins merged 7 commits intoprometheus:masterfrom
lucperkins:lperkins/node-exporter-guide
Jul 11, 2018
Merged

Node exporter guide#1079
lucperkins merged 7 commits intoprometheus:masterfrom
lucperkins:lperkins/node-exporter-guide

Conversation

@lucperkins
Copy link
Contributor

@lucperkins lucperkins commented Jun 22, 2018

Guide to locally running a node exporter. Moves existing content from the First Steps doc and expands on it.

@lucperkins lucperkins changed the title [WIP] Node exporter guide Node exporter guide Jul 3, 2018
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuperQ FYI

title: Monitoring Linux or macOS host metrics using a node exporter
---

# Monitoring Linux or macOS host metrics using a node exporter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safest to stick with just Linux

Node


# Monitoring Linux or macOS host metrics using a node exporter

A Prometheus [**node exporter**](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The


## Installing and running the node exporter

The Prometheus node exporter is a single static binary that you can install [via tarball](#tarball-installation) or using [`go get`](#go-installation). You can also install and run the node exporter as a [Docker image](#docker).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker is not recommended, and should not be mentioned.

I'd just have the tarball, give the user one easy way to do things rather than potentially put them down the path of having a working Go environment.

There is an official [Docker](https://docker.com) image for the node exporter available via [Docker Hub](https://hub.docker.com/r/prom/node-exporter/) as `prom/node-exporter`. You can see a list of available tags [here](https://hub.docker.com/r/prom/node-exporter/tags/). To run the latest version of the image locally:

```bash
docker run \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct command, remove all of this

# etc.
```

Success! The node exporter is now exposing a wide variety of system metrics that Prometheus can scrape.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric shown is a Go metric, not a system metric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention, though, is to verify that metrics are being exposed by the NE. The output from the NE's /metrics endpoint indeed begins with go_gc... metrics rather than system metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see what you mean now. I'll remove the "system" from the sentence.

To see all metrics available for the `node_exporter` job:

```
{job="node_exporter"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a potentially quite expensive query, it shouldn't be mentioned. Look at a /metrics if you want this

This will likely bring up metrics for a variety of different `device`s and `mountpoint`s. Here's an example output:

```
node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hostname"} 15077224448
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is output from running inside docker, and is not typical.

node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hostname"} 15077224448
node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/hosts"} 15077224448
node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node_exporter:9100",job="node_exporter",mountpoint="/etc/resolv.conf"} 15077224448
node_filesystem_avail_bytes{device="none",fstype="aufs",instance="node_exporter:9100",job="node_exporter",mountpoint="/"} 15077224448
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You instance label does not match the configuration file


This is just one example, and there are many more node exporter metrics to explore.

## Enabling and disabling node exporter metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary detail for a first guide, this is intermediate to advanced stuff.

What users will care about is the key cpu/ram/disk/disk io/network/memory metrics and how to use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What, in your estimation, are some of the most important metrics? We might as well provide a "maybe check these out first" list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important ones are all enabled by default already.

@@ -5,7 +5,7 @@ sort_rank: 3

# First steps with Prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we've a second guide, this should move to be with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that, but I'll save that for a future PR

title: Monitoring Linux host metrics using a node exporter
---

# Monitoring Linux host metrics using a node exporter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the Node


# Monitoring Linux host metrics using a node exporter

The Prometheus [**node exporter**](https://github.com/prometheus/node_exporter) exposes a wide variety of hardware- and OS-related metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're more kernel than OS

- targets: ['localhost:9100']
```

Once Prometheus is [installed](../../introduction/first_steps) you can start it up, using the `--config.file` flag to point to the Prometheus configuration that you created:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to explain how to obtain prometheus. We can skip it for most of the others, but the Node exporter is likely the first thing a user will use

Now that Prometheus is scraping metrics from a running node exporter instance, we can explore those metrics using the Prometheus UI (aka the [expression browser](/docs/visualization/expression-browser)).
Navigate to `localhost:9090/graph` in your browser. Metrics specific to the node exporter are prefixed with `node_` and include metrics like `node_cpu_seconds_total` and `node_exporter_build_info`.

To see all metrics available for the `node` job:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a browser would be easier:

This will likely bring up metrics for a variety of different `device`s and `mountpoint`s. Here's an example output:

```
node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",instance="node:9100",job="node",mountpoint="/etc/hostname"} 15077224448
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still atypical output, as it is from inside docker.

The instance label also doesn't match the configuration

curl http://localhost:9100/metrics
```

The `node_filesystem_avail_bytes` metric, for example, informs you how much disk space is available to non-root users on each filesystem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could include links directly to the expression browser with interesting graphs etc.

Signed-off-by: lucperkins <lucperkins@gmail.com>
@@ -0,0 +1,100 @@
---
title: Monitoring Linux host metrics with the node exporter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the Node exporter with a capital N

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Node Exporter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would also capitalize both words, not just one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Success! The node exporter is now exposing metrics that Prometheus can scrape, including a wide variety of system metrics further down in the output (prefixed with `node_`). To view those metrics (along with help and type information):

```bash
curl http://localhost:9100/metrics | grep "node_*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're mixing up globs and regexes. just node_ will do.


Click on the links below to see some example metrics:

* [`node_cpu_seconds_total{mode="system"}`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_cpu_seconds_total%7Bmode%3D%22system%22%7D&g0.tab=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything with a _total needs a rate(x[1m]) around it to be useful

A few words about what these metrics mean would be useful


* [`node_cpu_seconds_total{mode="system"}`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_cpu_seconds_total%7Bmode%3D%22system%22%7D&g0.tab=1)
* [`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1)
* [`node_memory_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_memory_bytes_total&g0.tab=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no metric by this name. Use something like node_memory_Cached_bytes

Signed-off-by: lucperkins <lucperkins@gmail.com>
Signed-off-by: lucperkins <lucperkins@gmail.com>
Signed-off-by: lucperkins <lucperkins@gmail.com>

Metric | Type | Meaning
:------|:-----|:-------
[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first column is a promql expression, so this expression is a gauge. I'd remove the Type column, it'll confuse people.

This is the average over the last minute number of CPU seconds spent in system per second.

:------|:-----|:-------
[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute
[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | gauge | The filesystem space available to non-root users (in bytes)
[`node_network_receive_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | counter | No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to take a rate() for a counter.


## Installing and running the Node Exporter

The Prometheus Node Exporter is a single static binary that you can install [via tarball](#tarball-installation). You can [download](/downloads#node_exporter) page, extract it, and run it:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can download page?

Partial sentence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this doc is in the middle of a re-write, hence the odd loose ends. I'll get these straightened up.


## Configuring your Prometheus instances

Your locally running Prometheus instance needs to be properly configured in order to access Node Exporter metrics. The following [`scrape_config`](../prometheus/latest/configuration/configuration/#<scrape_config>) block will tell Prometheus that scrape from the Node Exporter via `localhost:9100`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will tell Prometheus that?

will tell Prometheus to?

- targets: ['localhost:9100']
```

To install Prometheus, [download the latest release](/download) for your platform,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you have duplicate install instructions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should , be :?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer no duplicate instructions and originally had it that way, but @brian-brazil disagrees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil I think a link to the first steps guide would be better here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as one of the first guides that a user uses, that it should be completely standalone. Someone just starting out with the node exporter likely does not yet have the experience to run Prometheus based on a 2nd guide.

cd prometheus-*.*
```

Once Prometheus is installed you can start it up, using the `--config.file` flag to point to the Prometheus configuration that you created:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which configuration? The block above? It's not clear that this should be in a file.


## Exploring Node Exporter metrics through the Prometheus expression browser

Now that Prometheus is scraping metrics from a running Node Exporter instance, you can explore those metrics using the Prometheus UI (aka the [expression browser](/docs/visualization/expression-browser)). Navigate to `localhost:9090/graph` in your browser and use the main expression bar at the top of the page to enter expressions, which looks like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"at the top of the page to enter expressions, which looks like this:" reads oddly.

:------|:-----|:-------
[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | counter | The number of seconds CPUs have spent in `system` mode in the last minute
[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | gauge | The filesystem space available to non-root users (in bytes)
[`node_network_receive_bytes_total`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | counter | No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency should each guide have a Summary section like the first steps? (I wrote that so I say yes) but if not it should probably be pulled. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down with a Summary section. I'll add one here.

Signed-off-by: lucperkins <lucperkins@gmail.com>
Signed-off-by: lucperkins <lucperkins@gmail.com>
:------|:-------
[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | The average number of CPU seconds spent in system per second over the last minute
[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | The filesystem space available to non-root users (in bytes)
[`rate(node_network_receive_bytes_total[30s])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_network_receive_bytes_total&g0.tab=1) | No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the average network traffic received in bytes per second

The range on the rate here is inconsistent with the 1st example, use 1m here too.

Signed-off-by: lucperkins <lucperkins@gmail.com>
@lucperkins lucperkins merged commit 49abf62 into prometheus:master Jul 11, 2018
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge changes until the code review process is complete and consensus has been achieved.

:------|:-------
[`rate(node_cpu_seconds_total{mode="system"}[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22system%22%7D%5B1m%5D)&g0.tab=1) | The average number of CPU seconds spent in system, per second, over the last minute
[`node_filesystem_avail_bytes`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1) | The filesystem space available to non-root users (in bytes)
[`rate(node_network_receive_bytes_total[1m])`](http://localhost:9090/graph?g0.range_input=1h&g0.expr=rate(node_network_receive_bytes_total%5B1m%5D)&g0.tab=1) | The average network traffic received, per second, over the last minute (in bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with cpu seconds, where the unit is not in brackets.

@lucperkins
Copy link
Contributor Author

@brian-brazil I'll undo the merge in this case, but could we possibly expedite this process a bit in the future? 50+ comments on an introductory tutorial seems unnecessarily exacting. Subjecting simple material to such a grueling review process strikes me as counterproductive and highly likely to deter volunteers from making substantial contributions to the documentation.

@RichiH
Copy link
Member

RichiH commented Jul 11, 2018

After a quick glance the new version is better, so can we leave this merged and base future work on this?

@brian-brazil
Copy link
Contributor

Our users deserve high quality documentation, without errors, inconsistencies, or anything else that could confuse users and increase rather than decrease our support load. I appreciate you working on this, but we can't let a PR through just because more than a week has passed.

@lucperkins
Copy link
Contributor Author

@brian-brazil Could you possibly propose a solution to the issue you point out above? Perhaps an alternative to the metric listed? I'm new to Prometheus and I'm not sure how to interpret your critique.

@brian-brazil
Copy link
Contributor

My point is consistency about formatting between the 1st entry and the other two. So one way to fix it would be:

The average amount of CPU time spent in system mode, per second, over the last minute (in seconds)

aylei pushed a commit to aylei/docs that referenced this pull request Oct 28, 2019
* *: update the Note format

* Update note format

* Update the Note format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants