Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured Stacktrace in Span Attributes/Logs #2839

Open
tsloughter opened this issue Sep 26, 2022 · 7 comments
Open

Structured Stacktrace in Span Attributes/Logs #2839

tsloughter opened this issue Sep 26, 2022 · 7 comments
Assignees
Labels
area:error-reporting Related to error reporting area:semantic-conventions Related to semantic conventions spec:logs Related to the specification/logs directory spec:trace Related to the specification/trace directory triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide

Comments

@tsloughter
Copy link
Member

What are you trying to achieve?

A backwards compatible way of having stacktraces in spans and logs be structured data instead of strings. This would allow backends to parse and display usable (like links to the line in a file) information without having to know how to parse a specific languages stacktrace -- and not leave those of us using less popular languages on the outside looking in on these features :)

Additional context.

I'd open this for discussion as a PR but the semantic convention generation tool does not appear to accept a type array-of-maps that is needed for the attribute value. I don't want to spend my time on adding this to the tool if this idea is shot down.

The idea is to add an additional field to the exception semantic convention exception.structured_stacktrace with type map<string, any>[] and specific keys in the map representing the parts of a stack frame. Example:

[{"function_name": "com.example.GenerateTrace.methodB", 
  "filename": "GenerateTrace.java", 
  "line_number": 13}, 
 {"function_name": "com.example.GenerateTrace.methodA", 
  "filename": "GenerateTrace.java", 
  "line_number": 9}, 
 {"function_name": "com.example.GenerateTrace.main", 
  "filename": "GenerateTrace.java", 
  "line_number": 5}]'

Other optional keys would also be defined like column_number, see the OpenCensus StackFrame proto.

This example is the same as the one used currently in the semantic convention for exceptions.

@tsloughter tsloughter added the spec:miscellaneous For issues that don't match any other spec label label Sep 26, 2022
@tsloughter
Copy link
Member Author

I forgot that trace and span attribute values can only be primitive values and arrays of primitive values, so not arrays of maps: https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/common#attribute

So closing this. I really would like to see this incorporated somehow, pls comment on this issue and I can reopen it if there are viable alternative suggestions.

@Oberon00
Copy link
Member

Oberon00 commented Sep 27, 2022

I'm reopening this because (1) it's completely feasible to record structured information with the current attributes. Just convert from an "array of structs" to a "struct of arrays", i.e. instead of having one array with objects that have three properties each, have three arrays stack.functions, stack,linenos, stack.filenames. And (2) I think this would without doubt be a useful feature (and having unresolved dependencies for the issue would not a reason to close it).

@Oberon00 Oberon00 reopened this Sep 27, 2022
@Oberon00 Oberon00 added area:semantic-conventions Related to semantic conventions spec:logs Related to the specification/logs directory area:error-reporting Related to error reporting spec:trace Related to the specification/trace directory and removed spec:miscellaneous For issues that don't match any other spec label labels Sep 27, 2022
@tsloughter
Copy link
Member Author

Hm, ok, so something like:

{
"exception.structured_stacktrace.function_names": ["com.example.GenerateTrace.methodB", "com.example.GenerateTrace.methodA", "com.example.GenerateTrace.main"],
"exception.structured_stacktrace.filenames":  [""GenerateTrace.java", "GenerateTrace.java", "GenerateTrace.java"],
"exception.structured_stacktrace.line_numbers": [13, 9, 5]
}

I agree structured stacktraces are too useful to not continue looking into how to support them but thought it wouldn't be feasible in attributes, which this issue was about, but would need a message type.

Your suggestion definitely changes that.

Now that it is also a form I can easily make a PR for because I think the tools support defining a type of array of strings I can go do that as well.

@rbailey7210 rbailey7210 added the triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide label Sep 30, 2022
@reyang
Copy link
Member

reyang commented Sep 30, 2022

I personally think we should just use a plain string and provide some example/suggestion/guidance on the format so people know how to parse the common patterns. I understand that having more structure would reduce parsing cost, I'll explain the reason below:

Here is my experience (I'm the owner of several Microsoft internal stacktrace schemas) - trying to define a cross language/runtime structured stacktrace is a very hard problem, we'll always find cases that don't fit (e.g. what about Java calling into C++ via JNI and going back via reverse-JNI, do we want to distinguish kernel frames and user frames, do we want to show user friendly frames or the real frames which might have thunks, do we use fully-qualified function/type name or short names, do we have path+filename or fullpath and whether it is a local path or the repo path, what do we do with AggregatedExceptions or Exception which has inner exceptions).

From what I've seen in the past 15+ years, all the efforts trying to define highly-structured cross-language callstack ended up with folks trying to fit new scenarios into the established structured in very weird ways, and they will always have to do extra parsing in the backend.

@jmacd
Copy link
Contributor

jmacd commented Sep 30, 2022

I second @reyang's opinion, speaking as the former maintainer of a multi-language stacktrace parsing library (part of Google's offering in this space)

The first time we discussed adding a stacktrace semantic convention to OTel was over two years ago:

open-telemetry/oteps#69

We already have a standard data type for stacktraces. While programmers tend to see the advantage of structuring these at runtime, they also have a bias toward action and a limited perspective. Clearly, their ability to debug a situation would improve if they would just take advantage of runtime libraries to self-introspect the stack. However, this comes at the cost of performance, safety, and likely correctness. The language runtime's ability to generate a stacktrace string is guaranteed to exceed the program's ability to do so using structured APIs.

The use of structured stacks is costly because you will generate many small objects.

The use of structured stacks is unsafe. For example, in the JDK PrintStacktrace() method there is a special anti-recursion feature built in, the comment reads:

    private void printStackTrace(PrintStreamOrWriter s) {
        // Guard against malicious overrides of Throwable.equals by
        // using a Set with identity equality semantics.

The runtime library likely has special advantages in computing these, for example the Golang runtime stacktrace includes a goroutine ID and there is no easy way to get the goroutine ID otherwise.

The use of structured stacks likely harms correctness, for reasons @reyang has covered.

@tsloughter Here is what I propose.

OpenTelemetry should continue to recognize string stacktraces, and tracing/logging SDKs should by default rely on their built-in support for generating stacktraces. Clearly, this doesn't meet every need, and we'd like an alternative. Form a project called OpenStacktrace -- define a format that encompasses a large portion of known stacktrace formats -- specify how to map all the known stacktraces into the new OpenStacktrace format (which shall be string) -- implement high-performance, safe, correct methods to compute OpenStacktrace strings at runtime -- then give OpenTelemetry SDKs the default option (builtin support) or the new option (OpenStacktrace). When the new option becomes appealing to enough users, perhaps we can switch, but the point of all this is that users will likely be better off using whatever is built-in. If OpenStacktrace succeeds, languages can adopt it, and ultimately the vendors and observability systems that ingest this data will always have the option to parse the stacktrace whether it's old or new.

@tsloughter
Copy link
Member Author

I'm not suggesting structure in order to reduce any parsing cost, or for any technical reason, but from a user experience standpoint. When there isn't a structure but a reliance on vendors supporting individual string formats the smaller languages get left out :). Even if we include examples and guidance for dealing with the stacktrace strings from Erlang and Elixir it doesn't mean any vendor will add support.

I'd still argue that having both, where the structured format is optional, is best, but maybe there is something else we can do to guarantee some level of support for linking out to source code? I mean, it would still be a stack of function calls, just spitballing.

Having both would simplify what is expected to be able to be carried in the structured data of the exception and stacktrace -- like Josh mentions Go's goroutine id. Another example, in Erlang there is additional information that can be included with a stack frame. This optional data is passed to a user defined format function to include when the exception and its stacktrace are converted to a string (something only added a version ago, so I am certainly biased to structured stacktraces :) and I wouldn't want to lose from being displayed to the user viewing the stack trace.

@jmacd
Copy link
Contributor

jmacd commented Sep 30, 2022

@tsloughter I wonder if there is a way you could adopt an existing stacktrace format like NodeJS or Java has?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:error-reporting Related to error reporting area:semantic-conventions Related to semantic conventions spec:logs Related to the specification/logs directory spec:trace Related to the specification/trace directory triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide
Projects
None yet
6 participants