-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: add OTEL_RESOURCE_ATTRIBUTES
env var.
#14556
base: main
Are you sure you want to change the base?
Conversation
Add new task hook to inject a `OTEL_RESOURCE_ATTRIBUTES` environment variable with Nomad attributes into tasks. The attributes set are related to the alloc and specific task that is running, the node where the alloc is running, and the job and eval that generated the alloc. These attributes are merged if the task already defines a `OTEL_RESOURCE_ATTRIBUTES` environment variable, or disabled if the value defined by the task is an empty string.
f87a3f0
to
672882c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code mostly looks fine here, but I'm not sure about some of the design motivations of this one @lgfa29. There's a hidden backwards compatibility issue here as well, which we've run into when discussing adding Docker labels: if you are running OTEL already by adding the OTEL env var to your tasks, Nomad just ballooned your OTEL provider costs. You can only opt-out of the env var entirely, not out of having Nomad add to it.
multierror "github.com/hashicorp/go-multierror" | ||
"github.com/hashicorp/nomad/client/allocrunner/interfaces" | ||
"github.com/hashicorp/nomad/nomad/structs" | ||
"go.opentelemetry.io/otel/baggage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They went with the name "baggage" for this concept? 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, that's another spec 😅
https://www.w3.org/TR/baggage/
members := []baggage.Member{ | ||
newMember("nomad.alloc.createTime", fmt.Sprintf("%v", alloc.CreateTime), mErr), | ||
newMember("nomad.alloc.id", alloc.ID, mErr), | ||
newMember("nomad.alloc.name", alloc.Name, mErr), | ||
newMember("nomad.eval.id", alloc.EvalID, mErr), | ||
newMember("nomad.group.name", alloc.TaskGroup, mErr), | ||
newMember("nomad.job.id", job.ID, mErr), | ||
newMember("nomad.job.name", job.Name, mErr), | ||
newMember("nomad.job.region", job.Region, mErr), | ||
newMember("nomad.job.type", job.Type, mErr), | ||
newMember("nomad.namespace", alloc.Namespace, mErr), | ||
newMember("nomad.node.id", node.ID, mErr), | ||
newMember("nomad.node.name", node.Name, mErr), | ||
newMember("nomad.node.datacenter", node.Datacenter, mErr), | ||
newMember("nomad.task.name", task.Name, mErr), | ||
newMember("nomad.task.driver", task.Driver, mErr), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the value to operators in providing these specific attributes to task processes. Ex. why do we want to expose the eval ID for every running process on the cluster?
From a design standpoint I'm not sure it makes sense to have Nomad itself define a hard-coded set of attributes, rather than making this something the operators define (as either client configuration or in the jobspec). Could this whole thing be done via clever templating of an env
block, and if it can be but only with a lot of work, could we implement something to make it easier instead of tying ourselves to the extremely fast-moving OTEL project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I thought of some kind of configuration (either jobspec or client side), but that would be even more commitment to keep it updated, or meta
values, but they are not really well defined and should probably remain opaque to Nomad.
Maybe an external project may be better for now then. I will think more about it, thanks!
// TODO(luiz): remove decode step once the Otel SDK handles it internally. | ||
// https://github.com/open-telemetry/opentelemetry-go/pull/2963 | ||
attrs, err := url.QueryUnescape(resourceAttrs.String()) | ||
if err != nil { | ||
attrs = resourceAttrs.String() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding this PR, but this looks like it impacts the read side and not the write side. If the read side of the SDK gets fixed, don't we still need to encode on the write side so that older versions of the SDK aren't broken? (And are there lots of read-side SDKs for different languages? If there's only go, then it doesn't seem sensible for us to bake-in support for a single language in Nomad.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading a bit more, this looks like we could end up double-encoding in the case where the user has something we're merging onto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was a mismatch between how the baggage spec and the collector handled encoding values. Only the baggage required encoding/decoding so the more general "fix" was an update to the spec. Other languages will need to be updated to handle this as well, but good point on supporting older versions. I will make sure to test it.
Attempting to double-encode would result in an error that is handled by the using the original string.
// https://github.com/open-telemetry/opentelemetry-go/issues/3164 | ||
member, err := baggage.NewMember(k, v) | ||
if err != nil { | ||
logger.Warn("failed to create new baggage member", "key", k, "value", v, "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ridiculous name "baggage" aside, which isn't your fault, I'm not sure we should expose that inside-baseball terminology to end-users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, good point 👍
Yup, those are great observations. I'm going to mark this as draft and rethink the approach. Thanks for the review! |
This is a great candidate for Task Hook Plugins... if such a thing existed. The hook API was designed to seamlessly be able to add them someday. 🤷 |
Add new task hook to inject a
OTEL_RESOURCE_ATTRIBUTES
environment variable with Nomad attributes into tasks. The attributes set are related to the alloc and specific task that is running, the node where the alloc is running, and the job and eval that generated the alloc.These attributes are merged if the task already defines a
OTEL_RESOURCE_ATTRIBUTES
environment variable, or disabled if the value defined by the task is an empty string.