-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Host operating system: output of uname -a
Linux 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
CentOS 7
node_exporter version: output of node_exporter --version
node_exporter --version
node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6)
build user: root@1f76dbbcfa55
build date: 20200616-12:44:12
go version: go1.14.4
node_exporter command line flags
node_exporter --collector.processes --collector.qdisc --collector.systemd
Are you running node_exporter in Docker?
No
What did you do that produced an error?
Simply rand node_exporter for an extended period of time
What did you expect to see?
No error logs
What did you see instead?
node_exporter[15746]: level=error ts=2020-11-25T13:12:02.291Z caller=collector.go:161 msg="collector failed" name=processes duration_seconds=0.027611184 err="unable to retrieve number of allocated threads: "read /proc/2054/stat: no such process""
Analysis
This is very closely related to #1043 : that change fixed processes disappearing between list the /proc directory and reading the actual process stats. But another race condition is possible: between opening the /proc/<process id>/stat file and actually reading it, another race condition can occur and the error code returned is different. Bellow is a small code snippet to reproduce that race condition.
The recommended fix is to modify getAllocatedThreads() in collector/processes_linux.go to continue after stat, err := pid.Stat() if the error meets this condition: strings.Contains(err.Error(),syscall.ESRCH.Error()).
package main
import (
"fmt"
"os"
"io"
"io/ioutil"
"syscall"
"strings"
"strconv"
"os/exec"
"log"
)
func main(){
const maxBufferSize = 1024 * 512
fmt.Printf("Starting process sleep\n")
cmd := exec.Command("sleep","1")
err := cmd.Start()
if(err != nil) {
log.Fatal(err)
}
procPath := "/proc/" + strconv.Itoa(cmd.Process.Pid) + "/stat"
fmt.Printf("Read stat for %s\n",procPath)
f, err := os.Open(procPath)
defer f.Close()
if(err != nil) {
log.Fatal(err)
}
cmd.Wait()
fmt.Printf("Sleep process existed, reading opened stat file\n")
reader := io.LimitReader(f, maxBufferSize)
_, err = ioutil.ReadAll(reader)
if err != nil {
if strings.Contains(err.Error(),syscall.ESRCH.Error()) {
fmt.Println("Got error no such process:", err)
} else {
fmt.Println("Read stat failed: ",err)
}
} else {
fmt.Println("No error reading stat")
}
}