Replace cgo dependency with getconf invocation #2

grobie · 2015-02-11T21:53:04Z

This change removes the cgo dependency from prometheus/procfs, which is
used in prometheus/client_golang. As this is the only cgo dependency in
all prometheus components** (prometheus server, *_exporter, etc.),
removing this dependency has been considered more important than
preventing an external system call during program initialization.

In the rare case that either the getconf call or the value parsing
fails, the library falls back to the de-facto standard CLK_TCK value of
100. For the even rarer case that a different value is used on a system,
a warning is printed to stderr.

@juliusv @discordianfish

** I'll replace the proc/stat parser in node_exporter with a call to this library.

grobie · 2015-02-11T21:55:01Z

proc_stat.go

-func ticks() float64 {
-	return float64(C.sysconf(C._SC_CLK_TCK)) // most likely 100
+func clkTckWarning(err error) {
+	f := "[warning] github.com/prometheus/procfs could not read CLK_TCK, falling back to default value %d: %s\n"


The line hasn't been wrapped after 80 characters on purpose to make it easier to grep for the error message.

This change removes the cgo dependency from prometheus/procfs, which is used in prometheus/client_golang. As this is the only cgo dependency in all prometheus componentes (prometheus server, *_exporter, etc.), removing this dependency has been considered more important than preventing an external system call during program initialization. In the rare case that either the getconf call or the value parsing fails, the library falls back to the de-facto standard CLK_TCK value of 100. For the even rarer case that a different value is used on a system, a warning is printed to stderr.

juliusv · 2015-02-11T22:13:09Z

A bit nervous about someone missing the stderr output if they use only glog files (or similar) for their logs, but I guess it's better than panicking, and should only happen in the super-rare case where getconf is not available AND 100 is not the actual HZ value.

👍 unless someone has a better idea :)

juliusv · 2015-02-11T22:23:37Z

One additional concern: shelling out when loading this library prevents any application using the Go Prometheus client library from running with security frameworks/settings which prevent forking. Those users would either have to relax their security (increasing risk of being exploited by someone forking a shell through a vulnerability) or manually change the library code.

Then again, I think this would mainly lead to the forking to fail in a way that will be returned as a Go error, in which case the default will simply be used (like when using rlimits to prevent forking). But it's possible that some frameworks will hard-kill the process if it tries to do any syscalls which it isn't allowed to do.

If that ever does become an issue, we could still make this all configurable via a flag or environment variable (using flags in libs doesn't seem to be a no-no, given that glog does it).

brian-brazil · 2015-02-12T09:46:04Z

proc_stat.go

+var clkTck = 100.
+
+func init() {
+	output, err := exec.Command("getconf", "CLK_TCK").Output()


There's security implications here, how do we know that 'getconf' that we're calling with all of our privileges is what we think it is?

This will also cause any go code that merely links to client_golang to do a fork+exec at startup, which may not be appropriate in all scenarios (think thousands of very short lived tasks talking to the pushgateway) and lead to questions from an strace.

We may be better off hardcoding, where have we seen that the value isn't 100?

Looking at the glibc and linux source, it's 100 - except on ia64 and alpha. ia64 is configurable at kernel compile time (CONFIG_HZ) and alpha is 1024.

I guess the nastiness is: looking at the Linux source at one point in time is not enough. We have to think this through in these dimensions:

how this changed over time

in different OSes (Linux, Mac, *BSD, ...)

on different architectures

Meh.

But I see you're already on it, which is great :)

I agree that it makes me feel queasy to call an external binay at startup for every application that ever includes this library, so if there's a better way, that'd be much appreciated.

discordianfish · 2015-02-12T12:28:03Z

So I did some research and first all all, the situation is super confusing.

I finally found out how the kernel communicates the rate to the userland: http://article.gmane.org/gmane.linux.kernel/38936 (assuming this didn't change again).

ps also isn't using sysconf at all. The code comments there sound like it's a compile-time macro which means if we build our prometheus services on a system with different clock rate than on the systems the services will run on, our calculation breaks: https://github.com/mmalecki/procps/blob/master/proc/sysinfo.c#L93

But I just compile a little c programm and verified that it in fact calls a function at runtime.

What's contributing to the confusion is that there are multiple CLK_TCK timers, some seem deprecated other not so..

Anyways, it seems on kernels >= 2.4.0, the right way to use get the clock rate is by looking at the ELF note section, whether via the sysconf stuff or directly. Using sysconf seems to be more compatible since it's a posix function and the ELF stuff is linux specific, so I would propose we:

with CGO=1, we use sysconf as we did before
with CGO=0 on linux we use debug/elf(?) to read the clock from prometheus' own ELF
with CGO=0 on other systems, we just use a hardcoded value of 100 and print a warning

discordianfish · 2015-02-12T12:38:12Z

Okay another idea to throw in here:
https://github.com/davecheney/junk/blob/master/clock/examples/processtime/main.go

That uses the CLOCK_GETTIME syscall. But that again would only work on linux.

grobie · 2015-02-12T20:05:54Z

We can probably just ignore non-linux systems for procfs for now. AFAIK the BSDs have all removed it by now and osx never had it. And who uses system9 actually? Solaris users might come though.

grobie · 2015-03-31T16:37:10Z

Superseded by #4.

grobie force-pushed the remove-cgo-dependency branch from 25963fc to 3969653 Compare February 11, 2015 21:54

grobie reviewed Feb 11, 2015
View reviewed changes

grobie force-pushed the remove-cgo-dependency branch from 3969653 to bb8cfb8 Compare February 11, 2015 21:59

brian-brazil reviewed Feb 12, 2015
View reviewed changes

grobie closed this Mar 31, 2015

grobie deleted the remove-cgo-dependency branch March 31, 2015 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace cgo dependency with getconf invocation #2

Replace cgo dependency with getconf invocation #2

grobie commented Feb 11, 2015

grobie Feb 11, 2015

juliusv commented Feb 11, 2015

juliusv commented Feb 11, 2015

brian-brazil Feb 12, 2015

brian-brazil Feb 12, 2015

juliusv Feb 12, 2015

discordianfish commented Feb 12, 2015

discordianfish commented Feb 12, 2015

grobie commented Feb 12, 2015

grobie commented Mar 31, 2015

Replace cgo dependency with getconf invocation #2

Replace cgo dependency with getconf invocation #2

Conversation

grobie commented Feb 11, 2015

grobie Feb 11, 2015

Choose a reason for hiding this comment

juliusv commented Feb 11, 2015

juliusv commented Feb 11, 2015

brian-brazil Feb 12, 2015

Choose a reason for hiding this comment

brian-brazil Feb 12, 2015

Choose a reason for hiding this comment

juliusv Feb 12, 2015

Choose a reason for hiding this comment

discordianfish commented Feb 12, 2015

discordianfish commented Feb 12, 2015

grobie commented Feb 12, 2015

grobie commented Mar 31, 2015