Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while sending encoded metrics: write tcp ... write: broken pipe #1066

Closed
Atomicbeast101 opened this issue Sep 11, 2018 · 14 comments
Closed
Labels

Comments

@Atomicbeast101
Copy link

Host operating system: output of uname -a

Linux RPT-PROD-SMB-1 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.16.0 (branch: HEAD, revision: d42bd70f4363dced6b77d8fc311ea57b63387e4f)
  build user:       root@a67a9bc13a69
  build date:       20180515-15:52:42
  go version:       go1.9.6

node_exporter command line flags

./node_exporter --web.listen-address=0.0.0.0:9100 --collector.systemd

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

Every time the metrics are viewed through browser (Firefox) or Prometheus tries to scrape the metrics.

What did you expect to see?

To return the metrics in a timely manner and not report broken-pipe errors.

What did you see instead?

Every time the metrics are viewed, this error is produced:

Sep 11 14:42:09 localhost node_exporter[3030]: time="2018-09-11T14:42:09-04:00" level=error msg="error while sending encoded metrics: write tcp 127.0.0.1:9100->127.0.0.1:58728: write: broken pipe\n" source="<autogenerated>:1"

I noticed viewing the metrics took longer than before (worked before Sept. 6th and nothing was changed during that time).

For anyone curious, Prometheus reports this error: context deadline exceeded

Tried testing it through curl command and it does report the output but it took longer than usual (at least 10 seconds, instead of 1-3 seconds). However, the error is still reported through node_exporter service.

@SuperQ
Copy link
Member

SuperQ commented Sep 11, 2018

Try with --log.level=debug, this should give you more information about which collector is slow.

@Atomicbeast101
Copy link
Author

My apologies for not mentioning something, --log.level=debug does not report anything before or after the error message.

@SuperQ
Copy link
Member

SuperQ commented Sep 11, 2018

That's fine, but the context of the error is important to understand the timing. Having the full log for a failed request is required to determine the fault.

@Atomicbeast101
Copy link
Author

Understood. Here's the full log with the error message included:

Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="collect query: []" source="node_exporter.go:36"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: filefd collector succeeded after 0.000086s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: nfsd collector succeeded after 0.000271s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: sockstat collector succeeded after 0.000131s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Not collecting NFS metrics: open /proc/net/rpc/nfs: no such file or directory" source="nfs_linux.go:97"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: nfs collector succeeded after 0.000067s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: netdev collector succeeded after 0.000588s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Return time: 1536697688.759848" source="time.go:47"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: time collector succeeded after 0.000052s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/abdstats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/arcstats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/dnodestats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/fm\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/vdev_mirror_stats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/xuio_stats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/zil\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/dbuf_stats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/dmu_tx\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/vdev_cache_stats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Cannot open \"/proc/spl/kstat/zfs/zfetchstats\" for reading" source="zfs_linux.go:48"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ZFS / ZFS statistics are not available" source="zfs.go:66"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: zfs collector succeeded after 0.000538s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: bcache collector succeeded after 0.000024s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="ipvs collector metrics are not available for this system" source="ipvs_linux.go:113"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: ipvs collector succeeded after 0.000047s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Unable to detect InfiniBand devices" source="infiniband_linux.go:110"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: infiniband collector succeeded after 0.000056s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: uname collector succeeded after 0.000015s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Set node_mem: map[string]float64{\"Inactive_bytes\":2.55762432e+09, \"Mlocked_bytes\":0, \"Writeback_bytes\":0, \"AnonPages_bytes\"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: meminfo collector succeeded after 0.001175s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: stat collector succeeded after 0.000278s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="return load 0: 0.410000" source="loadavg.go:51"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="return load 1: 0.410000" source="loadavg.go:51"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="return load 2: 0.410000" source="loadavg.go:51"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: loadavg collector succeeded after 0.000133s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: entropy collector succeeded after 0.000063s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: netstat collector succeeded after 0.001428s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: hwmon collector succeeded after 0.001529s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: arp collector succeeded after 0.000083s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Not collecting bonding, file does not exist: /sys/class/net" source="bonding_linux.go:60"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: bonding collector succeeded after 0.000048s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: vmstat collector succeeded after 0.000287s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /proc" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /dev" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/kernel/security" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /dev/shm" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /dev/pts" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/systemd" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/pstore" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/devices" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/perf_event" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/pids" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/net_cls,net_prio" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/hugetlb" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/cpuset" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/memory" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/freezer" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/cpu,cpuacct" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/cgroup/blkio" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/kernel/config" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /proc/sys/fs/binfmt_misc" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/kernel/debug" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /dev/hugepages" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /dev/mqueue" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /proc/fs/nfsd" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring fs type: rpc_pipefs" source="filesystem_linux.go:46"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /proc/sys/fs/binfmt_misc" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring mount point: /sys/fs/fuse/connections" source="filesystem_linux.go:42"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: filesystem collector succeeded after 10.004162s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: edac collector succeeded after 0.000310s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: timex collector succeeded after 0.000015s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring device: sdc1" source="diskstats_linux.go:178"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring device: sda1" source="diskstats_linux.go:178"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring device: sdb1" source="diskstats_linux.go:178"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring device: sdd1" source="diskstats_linux.go:178"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring device: sda2" source="diskstats_linux.go:178"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring device: sda3" source="diskstats_linux.go:178"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: diskstats collector succeeded after 0.000454s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: textfile collector succeeded after 0.000015s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: xfs collector succeeded after 0.000698s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: conntrack collector succeeded after 0.000070s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="collecting metrics for device md127" source="mdadm_linux.go:270"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: mdadm collector succeeded after 0.000118s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: wifi collector succeeded after 0.008060s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: cpu collector succeeded after 0.007229s." source="collector.go:135"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-4.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-c1.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-617.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-104009.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-c2.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-104010.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-193144.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-104008.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-104004.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="Ignoring unit: session-104006.scope" source="systemd_linux.go:211"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=error msg="error while sending encoded metrics: write tcp 127.0.0.1:9100->127.0.0.1:41892: write: broken pipe\n" source="<autogenerated>:1"
Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: systemd collector succeeded after 0.054420s." source="collector.go:135"

@Atomicbeast101
Copy link
Author

Atomicbeast101 commented Sep 14, 2018

I noticed this issue seems to be happening on CentOS v7 only because I have the same node_exporter version running on Ubuntu and it works fine. Could it be kernel related?

EDIT: For anyone wondering what the kernels are for both CentOS & Ubuntu, here is it:
Ubuntu - 4.15.13
CentOS - 3.10.0

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2018

It looks like the filesystem collector is having problems on this machine. It's taking 10 seconds to collect data, which is the default timeout.

Sep 11 16:28:08 RPT-PROD-SMB-1 node_exporter[5206]: time="2018-09-11T16:28:08-04:00" level=debug msg="OK: filesystem collector succeeded after 10.004162s." source="collector.go:135"

Are there network filesystems mounted on this machine? Or anything strange going on with the filesystems?

@Atomicbeast101
Copy link
Author

@SuperQ Disabling the filesystem seems to have fix the issue. There is a network filesystem mounted on the CentOS machine and removing it seems to fix the issue.

It would be nice if the error message can point out which collector is taking the longest. This would help avoid the need of having to get the debug log and scan through to find a collector that's taking too long. Thanks for the help!

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2018

It's not the default because they are technically valid filesystems, but I would recommend excluding the network filesystem types with --collector.filesystem.ignored-fs-types=.... This will let you collect the usual local filesystem metrics without having to disable the feature entirely.

I'm not sure it's easily possible to capture what caused the scrape failure due to the request being canceled. It could be a combination of collectors, rather than a single obvious one like this case.

@discordianfish
Copy link
Member

@SuperQ any idea why it timed out yet logged success?

@SuperQ
Copy link
Member

SuperQ commented Sep 28, 2018

No, I don't have any idea, I haven't really looked at the code closely.

@discordianfish
Copy link
Member

@SuperQ Looking at the code, it seems the mountTimeout it set to 30 seconds. If this would timeout, it also should log something, so not sure what is going on here.
What timeout do you think was reached?

@SuperQ
Copy link
Member

SuperQ commented Oct 4, 2018

The timeout hit was the Prometheus scrape timeout of 10 seconds. It closed the connection to the node_exporter, hence the original error.

The Prometheus client library has methods to pass through the server's timeout value. We could pass this to collectors.

But I don't think it would be useful to fix this problem, as the filesystem read is deadlocked on IO anyway, the timeout would still result in an error.

@discordianfish
Copy link
Member

Ah got it, so the client closed the connection while the exporter was writing to it in the filesystem collector.. The log is a bit confusing.. maybe we could include a stacktrace.

Anyway, the actual problem here is IMO that it logged success. But I think this is due to client_golang design where there is no way to detect an error when sending something to the collect channel.

@discordianfish
Copy link
Member

I think there isn't much more we can do here, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants