Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profiles proto #488

Closed
wants to merge 8 commits into from
Closed

Conversation

petethepig
Copy link
Member

@petethepig petethepig commented Jun 27, 2023

This PR is moved to a dedicated opentelemetry-proto-profile repo. Please start new conversations in that new repo.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 27, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@tigrannajaryan
Copy link
Member

@petethepig we have just discussed this in the TC meeting.

We would like to give you a separate repo in Otel github org, a fork of this one, where you can experiment freely with the profile protos. We will give you approving/merging rights on this new repo, so that you can also iterate faster and don't need to wait for approvals/merging from OTLP maintainers.

You can also fork Collector and easily change it to consume proto file files from your fork instead of this repo and do your experimentation in the Collector too.

When the experiments are concluded you can do a cleanup on your fork and then we will merge back the finalized version of profile protos back into this repo.

What do you think?

cc @jsuereth

}

message Sample {
repeated uint64 location_ids = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if those should be int32/uint32 I don't think we'll overflow this specially since they refer slice/array indexes.

This is not super important for the wire format since proto will varint encode it but for the collector this will allow it to buffer more as the memory footprint will be lower per profiles.

// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January 1970.
//
// This field is semantically required and it is expected that end_time >= start_time.
fixed64 end_time_unix_nano = 3;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wouldn't be better to have a duration instead. I also wonder what happens for cumulative profile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyriltovena

I have 2 questions:

  • Can you elaborate on why it would be better to have a duration?
  • Why is this going to be different for cumulative profiles? I'm not following

Btw I copied this from trace proto, so there's value in keeping like this for consistency.

// fields from one of these will be embedded in Profile message:
oneof alternative_profile {
opentelemetry.proto.profiles.v1.alternatives.pprof.Profile pprof = 8;
opentelemetry.proto.profiles.v1.alternatives.normalized.Profile normalized = 9;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm waiting to see some benchmarks, but I do think the normalized is the way to go.

With a lot of samples (we can expect ~10k per profiles) it avoid duplications. It will most likely have a smaller memory footprint and faster marshalling/unmarshalling.

It's also closer to pprof which is quite popular for profiling data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyriltovena Explain the math behind that 10k, please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 'pprof' even be listed as an alternative here? It doesn't cleanly support features that we'd like and we can't modify it (it's no longer pprof if we do).

We can however have pprof be supported as a "pass-through" protocol that gets converted to the profiling format, together with JFR (e.g. original_payload field).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhalliday picked couple of pprof in my dev cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christos68k Yeah, pprof was included as a reference so that we'd be able to easily compare it to other ones. Long term plan is to keep just one of these 3 formats, and based on the discussions so far it seems like it will be based on the normalized version.

Copy link
Contributor

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @petethepig && @felixge 🙏

// The instrumentation scope information for the profiles in this message.
// Semantically when InstrumentationScope isn't set, it is equivalent with
// an empty instrumentation scope name (unknown).
opentelemetry.proto.common.v1.InstrumentationScope scope = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstrumentationScope is a logical unit for application code. Wouldn't this limit whole system profiles down to application profiles?

https://opentelemetry.io/docs/specs/otel/glossary/#instrumentation-scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@florianl

I copied this part from other signals, like traces or logs — so if this is truly a limitation then I think it should be discussed in the broader OTEL context.

I don't think it is a limitation though — instrumentation scope describes 2 things:

  • instrumentation scope name and version
  • attributes

I think it is intended that scope name and version would be the same for system-wide profile. RE attributes, the format in its current state is very flexible when it comes to assigning attributes, for example, clients can create multiple separate profiles within one ScopeProfiles, each with it's own set of attributes, allowing for "per application" attributes within one instrumentation scope.

message Sample {
// The ids recorded here correspond to a Profile.location.id.
// The leaf is at location_id[0].
repeated uint64 location_id = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To uniquely identify a stack trace across a complete deployment, using a 128 bit value would be great. In one of the first meetings of the group, @thomasdullien presented our custom format and explained why we went with 128 bit identifiers instead of 64 bit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this ID would be to reference locations inside this profile. Are you thinking about global addressing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes - this comment would also fit to Location.id just like #488 (comment).

One of the motivations is really to avoid collisions in global addressing. Even if the same code is compiled, differences in the build systems/compilers are important and can make a difference.

int64 num_unit = 4; // Index into string table
}

message Mapping {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For postprocessing it might be great for the backend to have additional (optional) information. Additional information can be something like, if this mapping belongs to some interpreter.

Maybe something like:

repeated Label label = 11;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

petethepig@980fc89

I added it to Location as well. My logic here is that the same argument can be made for adding it to Location, so I figured it makes more sense to include it there. I made it a reference to an AttributeSet, same way the linking is done from Sample.

@florianl what do you think?

Copy link

@thomasdullien thomasdullien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct is also for the normalized.

I'd like to make sure that we always use the suffix _index for things that are array indices, and reserve _id for things that are likely hashes or other pseudorandom values. I'd also strongly push for using 128-bit values for _ids, always and everywhere, lest we create issues for everybody when collisions arise. 64-bit values collide randomly at data quantities that are absolutely the norm these days.

// that is most useful to humans. There should be enough
// information present to determine the original sampled values.
//
// - On-disk, the serialized proto must be gzip-compressed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to use gzip, when pareto-improved compression methods are available? (e.g. algorithms that outperform gzip compression at less CPU usage at every level?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pprof proto and I only included pprof proto as an example.

Either way, from my perspective, compression is something that is broadly applicable to the whole OTEL / OTEL collector universe and therefore we should bring these kinds of things up to the broader OTEL collector maintainers group.

// The instruction address for this location, if available. It
// should be within [Mapping.memory_start...Mapping.memory_limit]
// for the corresponding mapping. A non-leaf address may be in the
// middle of a call instruction. It is up to display tools to find

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to specify precisely what "in the middle" means? I'd prefer not to give implementers the choice between 3 different values...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment comes from pprof proto. My understanding is that this comment refers to the limitation of some profilers where they are not always able to get exact instruction address for a given location. So it's less about what the implementer should put into this field and more of a word of caution for implementers of display tools. I might be totally wrong though — this is getting a little outside of my area of expertise, so I will need help from someone.

// should be within [Mapping.memory_start...Mapping.memory_limit]
// for the corresponding mapping. A non-leaf address may be in the
// middle of a call instruction. It is up to display tools to find
// the beginning of the instruction if necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Display tools may not have the ability to find the beginning of the instruction unless the "in the middle" above is specified precisely.

// for the corresponding mapping. A non-leaf address may be in the
// middle of a call instruction. It is up to display tools to find
// the beginning of the instruction if necessary.
uint64 address = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify exactly how deep into the call instruction the address is allowed to be, display tools do not have access to the assembly bytes and hence cannot do more than simple arithmetic to find the beginning of the instruction.

Comment on lines +124 to +125
// The id of the corresponding profile.Function for this line.
uint64 function_id = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an index into the other array?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should rename _id to _index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this makes a lot more sense — changed it to _index and _indices everywhere.

Comment on lines +84 to +85
// The number of events between sampled occurrences.
int64 period = 12;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the comment be clarified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sample rate basically. I'm not going to edit this one because this is copied from pprof spec. I will make sure we document it well in the final otel proto spec though.

Comment on lines +173 to +175
// Unique nonzero id for the location. A profile could use
// instruction addresses or any integer sequence as ids.
uint64 id = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an argument to be made for a 128-bit id here, as follows:

  1. Instruction address cannot be used as an ID, as in JIT scenarios, the same address can be occupied by different pieces of code; likewise in a scenario where modules are loaded / unloaded.
  2. The way to encode things without keeping too much state is therefore hashing other parts of the code into the ID.
  3. 64-bit IDs are too short for hashing in the profiling case, as the odds of collision are uncomfortably high even for 100m items (0.02% collision probability even at 100m items, 2 in 10000).

Comment on lines +203 to +204
// The id of the corresponding profile.Function for this line.
uint64 function_id = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that an ID or an index?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index in that profile.

Comment on lines +210 to +211
// Unique nonzero id for the function.
uint64 id = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that an ID or an index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pprof it is an ID. And there is some confusion right now because usually IDs and indexes are identical, but not always. The plan for the new spec is that we're going to remove ids and rely on indexes. This way there's one way to address objects and so there's going to be:

  • less confusion
  • better performance (array lookups are faster than search for ID)

Comment on lines +1 to +2
// Copyright 2019, OpenTelemetry Authors
//

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cut & paste issue about Copyright year.


message Sample {
repeated Location locations = 1;
repeated Link links = 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be repeated?


message Sample {
repeated uint64 location_ids = 1;
repeated uint64 link_ids = 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embed these in attributes?

}

message Sample {
repeated uint64 location_ids = 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experiment with storing these as a tree (encoded as 2d array of integers)

// * alloc_bytes
// * inuse_objects
// * inuse_bytes
repeated fixed64 values = 4;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte-seconds might overflow when aggregated, consider using double floats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be fixed64



// borrowed from pprof proto
message Mapping {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Florian: should stay for profiles that don't have symbols

// it could be the contents of the .note.gnu.build-id field.
int64 build_id = 5; // Index into string table

// The following fields indicate the resolution of symbolic info.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change these to 1 enum or a bit-field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aalexand Can you provide the list of combinations that are valid? I remember you mentioned that not all 16 combinations are valid, but also 4-way enum won't cut it either, so maybe we can make this an enum with more than 4 but less than 16 combinations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the combinations encountered in practice are these - see below.

has_functions has_filenames has_line_numbers has_inline_frames Description
true false false false A C++ / native binary with just symbol table (.symtab), no DWARF.
true true true true A C++ / native binary symbolized using profiler-friendly (i.e. at least -gmlt) DWARF.
true true true false Full symbol information but no distinction of inline frames - e.g. from Java. Can also correspond to an exotic case of using separate file DWARF, but not providing the *.dwp file.
false false false false Unsymbolized mapping

I would appreciate if someone also familiar with the subject could verify this from their perspective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aalexand thank you! I was trying to make an enum out of this, and here's what I have so far:

  enum SymbolicInfo {
    SYMBOLIC_INFO_UNSPECIFIED = 0;
    SYMBOLIC_INFO_FULL = 1;
    SYMBOLIC_INFO_FUNCTIONS_ONLY = 2;
    SYMBOLIC_INFO_NO_INLINE_FRAMES = 3;
  }

As I was doing this one thing I thought of is do we need these booleans / enums at all? My understanding is that these booleans are useful when combining a profile without symbols with symbols information (e.g in DWARF format). But I would assume that when you have symbols in some different format (let's say DWARF), you already know if it has functions / line numbers / etc. So it feels like maybe we don't need these booleans / enums. Unfortunately, my understanding of all of this is still pretty limited, so I would love your input on this.


Btw, I checked google/pprof repo, and it looks like out of these 4 booleans, the only one that's meaningfully used is HasFunctions one, and even then there's only one mention (symbolz.go). The other fields are just copied back and forth but not actually used it seems like.

cc @korniltsev

@petethepig
Copy link
Member Author

TODO: add a design decision log?

@jsuereth
Copy link
Contributor

@tigrannajaryan I created https://github.com/open-telemetry/opentelemetry-proto-profile for the profiling group to own and experiment with.

So far I only add @Rperry2174 as a maintainer. I would add @petethepig but he's not a member of the OpenTelemetry group yet (please follow this: https://github.com/open-telemetry/community/blob/main/community-membership.md#community-membership, I can vouch for you as one of your two)

@tigrannajaryan
Copy link
Member

@petethepig please re-create this PR in https://github.com/open-telemetry/opentelemetry-proto-profile

I am going to close this PR and you can continue discussing it in the new repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants