Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do about Span.info() / Span.error()? #30

Closed
yurishkuro opened this issue Jan 17, 2016 · 51 comments
Closed

What to do about Span.info() / Span.error()? #30

yurishkuro opened this issue Jan 17, 2016 · 51 comments

Comments

@yurishkuro
Copy link
Member

To fork from a different discussion in #26, the questions here are:

  1. should we expect the message argument to info() and error() methods to be parameterized by referring to portions of the payload, just like the logging APIs do?
  2. if so, do we converge to a single cross-platform syntax or use common patterns from the respective platforms, i.e. span.Info("oops we found %+v", payload) in Go vs. span.info("oops we found {}", payload) in Java?
@codefromthecrypt
Copy link
Contributor

I'll note that we only have this discussion because we decided to partially implement a logging api. In java, there isn't a single answer, either. Some logging is printf format, others are different.

@bhs
Copy link
Contributor

bhs commented Jan 18, 2016

@adriancole noted re the discussion itself... no argument that we're incurring a cost by discussing logging, though IMO it is a worthwhile one given the (tangible, demonstrated) benefits of span-granularity logging data at google and in my conversations about tracing with numerous other companies in the past year or so.

@bhs
Copy link
Contributor

bhs commented Jan 18, 2016

@yurishkuro upon some reflection, here's a concrete proposal... the goals are to present people with platform-idiomatic and straightforward signatures for vanilla message logging, the opportunity to easily mark errors vs info (but nothing more from a severity standpoint), to allow for the logging of potentially large payloads (which may be truncated or ignored by the Tracer), and – if possible – to also support "event"-style logging, where it's still a string at the end of the day, but more of a stable string that marks a specific semantic moment... e.g., cr.

Possibility A (combine event and payload):

  • info(<platform-idiomatic params and message formatting>)
  • error(<platform-idiomatic params and message formatting>)
  • event(String, ...<zero or more payload objects>)

Possibility B (split out event and payload):

  • info(<platform-idiomatic params and message formatting>)
  • error(<platform-idiomatic params and message formatting>)
  • event(String)
  • payload(<one or more payload objects>)

Possibility C (don't support event):

  • info(<platform-idiomatic params and message formatting>)
  • error(<platform-idiomatic params and message formatting>)
  • payload(<one or more payload objects>)

Possibility D (don't support message logging):

  • event(String)
  • payload(<one or more payload objects>)

Possibility E (don't support message logging and combine event and payload):

  • event(String, ...<zero or more payload objects>)

Possibility F (don't support message logging or payloads):

  • event(String)

My favorite is A.

@bhs bhs added this to the 1.0 / "soft launch" milestone Jan 18, 2016
@yurishkuro
Copy link
Member Author

@bensigelman My preference would be A as well (no surprise there). It still leaves us with a task of clarifying the <platform-idiomatic params and message formatting>, since there may not be a single answer on some platforms (like Java, as @adriancole mentioned). We may need to just make a choice, e.g. for Java use slf4j formatting syntax.

One other question with option A is handling exceptions in the error() method. I've ran into that already while doing instrumentation work. The simplest approach I used so far was to assume that the tracer can recognize the exception automatically as one of the payload entries.

@codefromthecrypt
Copy link
Contributor

Possibility F (don't support message logging or payloads):

  • event(String)

I have a bias towards supporting existing ops like Trace.record,
Span.annotate, etc. in v1.0 in the most straight-forward means possible.

add_event(String, timestamp?) or similar meets that requirement. if "log"
didn't conflate with classifying as "info" (as it might not be!!) I
wouldn't mind log(String, timestamp?)

@bhs
Copy link
Contributor

bhs commented Jan 19, 2016

@adriancole re your suggestion of log(String, timestamp?): does it break Zipkin (or the Zipkin UI) if people log long, human readable, often unique strings via an API like this? I'm just trying to understand why we would have reason to support (and explain) two different logging-like APIs (add_event and info/error) that both just record a String and a timestamp. Or is it just conceptual? That things like cs and cr are so semantically important that they shouldn't get lost in the fray with info spam?

@codefromthecrypt
Copy link
Contributor

@adriancole https://github.com/adriancole re your suggestion of log(String,
timestamp?): does it break Zipkin (or the Zipkin UI) if people log long,
human readable, often unique strings via an API like this? I'm just trying
to understand why we would have reason to support (and explain) two
different logging-like APIs (add_event and info/error) that both just
record a String and a timestamp. Or is it just conceptual? That things like
cs and cr are so semantically important that they shouldn't get lost in
the fray with info spam?

Zipkin queries aren't designed for all types of strings, but you can store
any string subject to length constraints. It will probably lead to larger
indexes, but that's also tunable.

The most common case is searching for annotations like as "finagle.retry",
"namer.success" and "namer.closed". You can query for "mary had a little
lamb", but probably not a multi-line string representation of a stack
trace, even if it can be stored and displayed on the timeline.

Hope this helps

@codefromthecrypt
Copy link
Contributor

Or is it just conceptual? That things like cs and cr are so semantically
important that they shouldn't get lost in the fray with info spam?

Conceptually, replacing something that evaluates to
Trace.record("thing.failed") with Span.info("thing.failed") makes no sense.
There's no notion of whether something is "error" or "info" in zipkin, and
if we actually desire people to use it, there should be a drop-in for
things that don't classify as info or error. This isn't just zipkin! look
at all open source tracers, and you'll find this classification of timeline
events is not a widely implemented feature even if it is at Google.

@bhs
Copy link
Contributor

bhs commented Jan 19, 2016

Got it... Yeah, I'm not suggesting it makes sense; I just wanted to make
sure I understood the implicit objection behind the comment you made
earlier (about wanting to use .log() for events, yet not being able to
because of something like .info()).

Ben (mobile)

On Monday, January 18, 2016, Adrian Cole notifications@github.com wrote:

Or is it just conceptual? That things like cs and cr are so semantically
important that they shouldn't get lost in the fray with info spam?

Conceptually, replacing something that evaluates to
Trace.record("thing.failed") with Span.info("thing.failed") makes no sense.
There's no notion of whether something is "error" or "info" in zipkin, and
if we actually desire people to use it, there should be a drop-in for
things that don't classify as info or error. This isn't just zipkin! look
at all open source tracers, and you'll find this classification of timeline
events is not a widely implemented feature even if it is at Google.


Reply to this email directly or view it on GitHub
#30 (comment)
.

Sent via mobile

@bcronin
Copy link

bcronin commented Jan 19, 2016

(Long thread here, so my apologies if I missed anything...and apologies for making it longer!!!)

I want to share my thoughts, which I think are most similar to bhs' "Possibility A" -- but a little different in that it would introduce the notion of a rawLog method as well. Anyway, the below is basically centered around the idea that info/error/event are all, in the data model, pretty much the same thing -- but there's a lot of variation in how best to make a convenient API around it. With that in mind, the spec could state something like the below:


The OpenTracing platform implementations are required to support a rawLog() method that takes a set of optional arguments that allow a log record to be completely specified. (In Go, this is likely an LogOptions object. In JS, this is likely a associative array of fields. In Python, this is likely keyword args.) E.g. in JavaScript:

spanHandle.rawLog({
    timestamp : <time_in_micros>,
    type      : <constant indicating INFO, ERROR, or EVENT>,
    message   : <a post-formatted message string || an event name>,
    payload   : <payload object>,
});

(or perhaps...)

spanHandle.rawLog({
    timestamp  : <time_in_micros>,
    event_name : <stable string name>,
    message    : <a post-formatted message string>,        
    error_flag : <boolean>,
    payload    : <payload object>,
});

Any implementation of OpenTracing must be able to gracefully handle any combination of the options, though it is allowed to ignore data it does not consider valid (e.g. lack of a timestamp or, say, both an event_name and message being specified).

This rawLog API is intended to be fully flexible and account for such use cases as needing to manually specify the timestamp. The full flexibility should help stave off a proliferation of separate methods for uncommon but necessary use cases. As a downside, this API will often be inconvenient to use directly for the common case.

To address the need to make the common case convenient, the following APIs should be supported in a manner that's most idiomatic to the platform (to be explicit: this does mean per-platform variations in the signatures of the below are allowed if it promotes convenience for the platform in question).

  • info(message)
  • error(message)
  • event(name, [optional payload])

In terms of promoting ease-of-use, the following sort of variations are allowed. This is not an exhaustive list:

  • info and error may be expressed as printf-like infof and errorf methods if that is idiomatic to the platform, or whatever else "format string"-style text capture is most suited to the platform (chaining, language-instrinsic string formatting, etc.)
  • info and error are not required to include a means to specify a payload, but optionally allowed to. (Stated differently, the OpenTracing implementation should not assume that a log containing a valid message will not have a payload as well.)
  • event is intended to have a fixed string "stable name" with any context-dependent data solely in the payload. This is a constraint that cannot be enforced by the implementation and therefore this can only be at best an encouraged convention.

As the exact signature and behavior of info, event, event are allow to vary by platform, each platform implementation is expect to document how these calls would translate to a direct call to rawLog.


That's my two-cents. It was a bit on the thinking aloud side of things, so again apologies if this is missing anything fundamental!

@yurishkuro
Copy link
Member Author

@bcronin I would be ok with this (maybe s/rawLog/log)

@bensigelman to clarify the difference between event(str) and info(str) - the intention is that the arguments to event() are not only low-cardinality, but are also standardized with clear semantic meaning, similar to standardizing on what the tag name should be for http.url or http return code. The cs/cr annotations are not actually the primary use case for event(), as the user never needs to set these when using OT, but more for the other "standard" events known to Zipkin like wire send/recv, fragements, etc. And the expectation from the tracing system is that it might do something special based on events that it wouldn't do purely based on info logs, e.g. index/aggregate them.

@bhs
Copy link
Contributor

bhs commented Jan 19, 2016

@yurishkuro makes sense re the mental model for event()... So these are basically important moments in the lifetime of a Span that don't really justify a sub-span (either because they're incidental or because they're, well, "momentary": they have a timestamp rather than a time range). @bcronin was using the example of a span that represents a page load in a browser, and the idea of events for the various (performance timing keys)[https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming].

I also like @bcronin's proposal (very much, in fact)... I am doing a bunch of non-coding work right now but later today I will try to send out a slightly more concrete (but still RFC-status) PR for the golang API and reference it on this thread.

bhs pushed a commit to opentracing/opentracing-go that referenced this issue Jan 19, 2016
@bhs
Copy link
Contributor

bhs commented Jan 19, 2016

Per ^^^, here's a PR illustrating what this might look like in Go for starters:

opentracing/opentracing-go#38

@dkuebric
Copy link
Contributor

Thinking about the Error use-case:

  • Event is provided in RecordEvent constructor, but not for other types. Yet these may have both a general class and an instance-specific message. This is very common with errors like exceptions, where there is a type of the exception and a message that categorizes it. I think the constructor for Error would benefit from this addition.
  • IsError is getting packed on all event types, feels a little weird
  • More of an observation: IsError is a bool, but users might want to represent eg. severity, which will get packed into the Payload. We're now quickly mixing code API and semantic API. Not necessarily a problem; we will make arbitrary-ish decisions to elevate certain tags into the formal API, including possibly IsError.

@bhs
Copy link
Contributor

bhs commented Jan 20, 2016

@dankosaur thank you for the feedback. A few followups:

Re bullet point 1: I'm not sure I fully understand what you're suggesting... can you express it in code/pseudocode (can be non-Go if you want, whatever works)?

Re bullet points 2+3: we could replace IsError with Severity, an enum that encompasses N verbosity levels as well as WARNING and ERROR...

And a note about Payload: I was thinking that it would basically always be application data, perhaps massaged or munged a little bit... but I wasn't thinking that it would be used to add context/flags (what you call the "semantic API") about the log message itself (severity bits, etc, etc). I'll mention this in the comment and update the PR.

@dkuebric
Copy link
Contributor

@bensigelman re: question/suggestion in bullet 1, I was thinking about the following use-case, say in Python:

except SomeExceptionType, e:
  tracer.Error(event=e.__class__, message=e.message)

It's nice to be able to group on the class of an error, not just the message, which may contain unique strings that make it impossible to group. For that reason, I suggested extending the Error interface to allow an Event usage, since Event is in our struct anyway. But re-reading your gist of Event, this might be pushing it towards a use-case that would be somewhat different from how it would be used in non-error events (higher cardinality, at least), so this is a question of consistency balance vs capability perhaps.

re: the Payload note -- that makes sense, it seems as though your intent was to have most/all "semantic API" come from the Event strings, and Severity for errors? This might be enough for point-in-time events; admittedly I don't have any top of mind use-cases aside from errors.

@bhs
Copy link
Contributor

bhs commented Jan 21, 2016

@dankosaur

re: the Payload note -- that makes sense, it seems as though your intent was to have most/all "semantic API" come from the Event strings, and Severity for errors? This might be enough for point-in-time events; admittedly I don't have any top of mind use-cases aside from errors.

Yes, that's what I had in mind... In your python example, I guess I would have imagined something like tracer.Event("exception: " + str(e.__class__), payload=e) (or so... maybe message gets its own field in payload, etc).

@bhs
Copy link
Contributor

bhs commented Jan 21, 2016

There haven't been any marked objections to the proposal outlined in opentracing/opentracing-go#38 .

If we have converged, I can update the semantics doc accordingly and close this issue.

@michaelsembwever
Copy link
Contributor

Please don't introduce the info/error methods.
The introduction of a logging api seems contrived to one specific use-case that has come up.

That said, I think this use-case demonstrates the need for the 'payload'.
I am therefore in favour of option (E).

(I wrote this quickly in lei of a longer comment, because of your previous latest comment @bensigelman )

@bhs
Copy link
Contributor

bhs commented Jan 21, 2016

Sorry, which "one specific use case"? This sort of logging is totally garden-variety behavior within Dapper (and was widely used because it is essential when trying to make sense of a trace).

@michaelsembwever
Copy link
Contributor

Fair enough. "some specific use cases" then.

For the initial OpenTracing API i'd argue for the most simple plumbing api possible.
This is the KISS principle, and i think it's especially relevant here when we're talking about api design. API design is about evolution of the api, not about trying to get that api correct.
It is easy to add the porcelain API (info/error/etc) later on if-needed, as opposed to trying to fix it later. (We should also be looking at separation of API and SPI, and separation of plumbing and porcelain.)

Option (E) seems to be the simplest solution, and an appropriate plumbing API, that solves all the use-cases raised.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Jan 22, 2016 via email

@bhs
Copy link
Contributor

bhs commented Jan 22, 2016

Resistance at this point is a fool's errand for me.

Hardly.

We could start with a thinner interface that just does event+payload logging with the understanding that we will most likely (but not definitely) add logging in a not-so-distant OT API. I can live with that if there is strong support for the more conservative approach.

@michaelsembwever
Copy link
Contributor

We could start with a thinner interface that just does event+payload logging with the understanding that we will most likely (but not definitely) add logging in a not-so-distant OT API. I can live with that if there is strong support for the more conservative approach.

Well said. It's a pragmatic and incremental step that doesn't step on anyone's toes, while still keeping it possible for all platforms to do what they need to. I'd be more than, and i'd expect everyone to be, happy to re-approach the issue with a clean slate once we've had a little more experience with the plumbing api.

@wladekb
Copy link

wladekb commented Feb 24, 2016

In PHP you have a popular interface for loggers:
https://github.com/php-fig/log/blob/master/Psr/Log/LoggerInterface.php
and I'd say this is an idiomatic logging interface currently.

Should the PHP implementation of Open Tracing API:

  • not implement that interface (and thus not be idiomatic)
  • forward all calls to error, critical, alert, emergency to error and calls to warning, notice, info, debug to info
  • something else?

@bhs
Copy link
Contributor

bhs commented Feb 24, 2016

@wladekb, if you interpose on LoggerInterface are you guaranteed to know which Span instance to apply the logging calls to? I.e., if a single request spawns multiple child spans within the PHP process (which is an important case to handle), how does the LoggerInterface adapter identify the single span to annotate with those log messages?

Another approach would be to have the Span.info() and Span.error() calls (optionally?) defer to the LoggerInterface equivalents, but not to attempt the converse due to the attribution problem I'm getting at above...

@wladekb
Copy link

wladekb commented Feb 25, 2016

@bensigelman Unlike in Go or Python LoggerInterface is just an interface specification for logger implementations and it's not a single module. And Span is a logger implementation to some extent so it would be comfortable to have the interface comply with the standard so you can forward logging directly to OT's Span instance without any extra method/arguments mangling. My point was only about standard logging interface compliance and not about the responsibility to detect an active Span which will need to be addressed separately (most likely by frameworks that would support OT or some extra code).
Is that more clear right now?

@bhs
Copy link
Contributor

bhs commented Feb 25, 2016

@wladekb sorry, I don't think I was clear... I understand that LoggerInterface is just an interface... the problem is just that I wasn't sure which Span the PHP code would "forward" to (as you put it). It sounds like you are mainly concerned with having Span present this PHP-standard logging interface just for the sake of comprehensibility, is that right? (I.e., there would be no expectation that random PHP logging calls would magically end up on the appropriate active/unfinished Span instance)

@wladekb
Copy link

wladekb commented Feb 25, 2016

Yes, I thought about interface compatibility only. Making sure the logging goes to the appropriate Span is a different story that will need to be solved separately.

@bhs
Copy link
Contributor

bhs commented Feb 25, 2016

@wladekb gotcha. Yeah, it's tricky... if we take the union of all logging libraries it will be nutso, but it's also nice to "blend into the surroundings" of whatever platform OT is integrating with (in this case PHP). I'd want to sleep on this but am curious to know where other people land on the topic.

@bhs
Copy link
Contributor

bhs commented Feb 27, 2016

Proposal: we add info() and error() logging using platform-idiomatic conventions as a starting place. Remaining log levels are somewhat in limbo right now since we haven't resolved @wladekb's concerns about PHP, but IIUC there are no objections to info() and error() per se.

I'd suggest we move forward with the above if there aren't objections within ~72 hours.

@michaelsembwever
Copy link
Contributor

On the 22nd January (see comments above) it was agreed that for the sake of incremental development and safety and simplicity to the initial api delivered that info/error would be left for another day.

Now without any reasons stated there's a proposal to return to info/error.

The majority of community effort in OpenTracing is a dog chasing its own tail.
Look through the majority of issues and there is this cycling back and forth between one decision and another. When a decision is made, people agree, and then later on an old argument is repeated verbatim, or worse yet like here no reason given, and the decision flips to the other direction. The amount of time spent discussing the most trivial of issues in OpenTracing is directly a result of this lack of clear and documented decision making.

Please stop this.
Close this issue per the agreement made on the 22nd January, and open a new issue post 1.0 where new facts and insights can be brought to the table.

@bhs
Copy link
Contributor

bhs commented Feb 27, 2016

@michaelsembwever I am baffled by your tone... IIRC both you and @adriancole were uneasy about this feature and wanted to back it out in Jan. I wrote the following in response.

Okay, maybe that's where we should start. This topic seems to raise hackles and – more than other stuff like the basic propagation logic – I get the impression that logging can be safely added later.

The reason I reopened discussion on this was because the community had come to consensus about the basic approach to propagation, which, per the above, was the reason I thought it best to defer discussion about this (less complex) piece of the OT API. In other words, the "later" in "safely added later" is now.

In my understanding of well-known tracing systems, intra-span debug logging is the one feature that's still missing from OT. That's why I want to include it, especially since it's structurally simple.

The most upsetting aspect of your message above is the implication that I'm going against the wishes of the community... On that note, I'd be curious to get a quick straw poll from those who've been contributing lately or registered an opinion above... @tschottdorf @yurishkuro @bg451 @adriancole @bcronin @dkuebric @slimsag: any prefs about this? If I misread the situation and there's a consensus that debug logging is either unimportant or undesirable, by all means I will close this issue. My previous understanding was that it's both important and desirable but not worth wasting time on until the juicier topics were resolved.

@tbg
Copy link

tbg commented Feb 27, 2016

I'm probably going to stay away from anything that's not vanilla Log-style logging and do all the special casing I need through a payload; maybe that's what people ought to be doing, at least for v1 - logging just has no common denominator. I could be interested in marking a trace as erroneous (as opposed to individual log messages) as a whole, but that can be done through a custom sampling priority already. My gut feeling is to stay with the simple API we have now, but I don't have a too strong opinion about it.

@bhs
Copy link
Contributor

bhs commented Feb 27, 2016

@tschottdorf FYI, the logging API at head is not vanilla Log-style logging... that's what I'm proposing we add, basically. Per http://opentracing.io/spec/:

Every Span has zero or more Logs, each of which being a timestamped event name, optionally accompanied by a structured data payload of arbitrary size. The event name should be the stable identifier for some notable moment in the lifetime of a Span. For instance, a Span representing a browser page load might add an event for each of the Performance.timing moments. While it is not a formal requirement, specific event names should apply to many Span instances: tracing systems can use these event names (and timestamps) to analyze Spans in the aggregate.

@michaelsembwever
Copy link
Contributor

The most upsetting aspect of your message above is the implication that I'm going against the wishes of the community

No. I'm implying that too many issues and conversations within the community don't land, and then re-launch off, with clearly stated reasons.

The lack of compartmentalization, and clear counter arguments to previous agreed upon decisions looking to justify undoing past decisions, make understanding how the community actually got it to where it is increasingly difficult.

If you had closed this issue. Then re-opened a new issue with a description on the past decision made and why it should be, or is time to be, undone. I wouldn't have reacted.

Even if the the comment you made two days ago, better referenced a shift in consensus in the community and/or the rationale relative to past discussions. I wouldn't have reacted.

And if this wasn't a symptom that I see generally repeating in OpenTracing. I wouldn't have reacted.

@yurishkuro
Copy link
Member Author

@michaelsembwever I don't think I feel the same way about the progress of discussions in OpenTracing as you do. The topic is complex and often different use cases have overlaps, which cause discussions to go wider than we would like in a single issue. We've tried our best to capture those occasions and fork the related topics into their own issues, as evidenced by this issue here, which was a fork. I think there's been very good progress, evidenced by RC1 being available for people to implement against.

The issue of logging methods has been on hold since we decided to go with a simpler initial API for RC1. I don't see anything wrong with resuming that discussion at this point.

@bensigelman The way I would approach the topic of logging methods is the same as I was insisting on with all other API methods - we start with the use cases.

  1. Describe a clear use case
  2. Agree that this use case is common enough that the API should have an opinion on solving it
  3. Demonstrate how existing API cannot solve this common use case or can only solve it in a very ugly manner
  4. propose an API extension
  5. document the use case in the Specs

I want to emphasize point (2). I strongly believe the API Spec should be prescribing how end users are expected to solve specific use cases. Responses like "do it with a tag" or "special case the payload" may be perfectly acceptable forms of prescribing, but they have to be concrete, so that all end users do it in a similar manner, and all tracing systems know exactly what the end user's intention is (whether the tracers choose to implement it or not is a separate matter).

I think following the above procedure would also address some of @michaelsembwever 's concerns of reopening discussions etc. Documenting use cases in the spec captures the decisions made and the reasons why certain API features exist.

Lastly, I do realize that sometimes things are more complicated than that. For example, one currently unsolved use case is indicating that an error occurred during the span execution. This could be solved in isolation by using a common error tag, but it may or may not be the best solution if a more holistic logging API is introduced. I think that is inevitable in the evolution of the API, as we gain more experience and start to see generalization of usage patterns. Documenting the use cases is especially important in those evolutionary refactorings.

NB: all this rant above probably should go into its own issue, for a new section of the docs on the "process". Guilty as charged.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Feb 29, 2016 via email

@michaelsembwever
Copy link
Contributor

For an example of building the right community see Apache's practices on lazy consensus and vetos in https://www.apache.org/foundation/voting.html

This approach is important, because at different times different users (and possibly companies behind them) can easily stack a "vote" in their favour. The idea with lazy consensus and vetos moves the community away from quick votes and toward discussing things thoroughly through. And things are discussed through better when long discussions can be broken up into concrete steps with intermediate summaries. This minimises what's involved for newcomers to join, and better prevents the discussion chasing its tail.

@yurishkuro
Copy link
Member Author

I was recently working on upgrading some instrumentation libraries and again ran into the need for some sort of an error() method or alternative. To follow my own suggestion above, here is the clear use case:

  • a span is started around a certain instrumented function,
  • when the function throws an exception, I want the span finish()-ed and also record the exception,
  • I want to do it in a portable way that will be understood by various OT implementations,
  • existing production code example.

A similar, but slightly different use case:

  • an outbound HTTP request returns a not-OK status code,
  • the instrumentation code for http framework can see the not-OK status, but may not be able to get any other details, like the error message, since it may require parsing the response,
  • I want to at least be able to mark the span with some standard "error occurred" flag, to cause some special handling in the recording pipeline.

If we can resolve these scenarios without an error() method, but with some pre-defined constants for tag and/or event names, I'd be ok to use that.

From earlier discussions, I think the following can be introduced:

  1. Standard boolean tag error to mark the spans as failed
  2. Standard event name exception to use with log(), e.g. log(event=tags.Exception, payload=ex)
  3. Perhaps a guidance to tracer implementations that if instrumentation sets a not-OK http.status_code tag, or logs an exception event, the tracer should automatically add a tag error=true. Alternatively, it can be a guidance for instrumentation.

@michaelsembwever @bensigelman @dkuebric

@bhs
Copy link
Contributor

bhs commented May 24, 2016

@yurishkuro

1: Of course the error span tag is already in and still SGTM, of course!

2: The tags.Exception idea seems valuable. I would further propose that the payload have separate fields for the exception message and the stack trace (as a sequence/list). So, log(event=tags.Exception, payload={message: "...", stack_track: ["...", "...", ...]}).

3: I am ok with the guidance for instrumentation but I'd leave it at that. I don't think it's OT's business to automatically set Span errors.

Deviation from error-logging per se:

I was not and am not thrilled about logEvent as the only game in town... there are plenty of things worth logging to traces, and they're not all "events" in the cr / cs / etc sense. Many (most?) are just, well, logging statements that aid in contextualizing and/or debugging traces. That said, I think semantic event logging also has its place and deserves a formal place in the API.

Since the original spirit of this issue (#30) related to the above, I would like to make a concrete proposal: namely, that we create a sibling to Event (call it Message, perhaps?) underneath LogData and make it clear that Event is intended for strings like "cr", "cs", "exception", etc, and Message is intended for strings like "Foo(): just finished the 17th iteration of the select loop", etc. I would also support a logMessage(format, args...) signature that follows format string conventions on a per-platform basis. I would not create separate info and error flavors of logMessage, though... Span error tags (and things like the exception proposal above) are enough for me.

@bhs bhs modified the milestone: 1.0GA Jul 8, 2016
@bhs
Copy link
Contributor

bhs commented Jul 8, 2016

Strawman: we officially don't-support info- and error-style logging and stick with key-value structured logging until a clear need arises (or doesn't) for something else. (Per #96)

If there are no objections, I'll close this in several days.

@bhs bhs added this to the 1.0GA milestone Jul 8, 2016
@yurishkuro
Copy link
Member Author

From my pov the KV logging will provide the underlying mechanism to support all of these types of logs.

As for closing this or not, it depends on whether we want to do KV logging API change with a small set of standard log tags or do the tags separately. In the latter case this issue is a good placeholder.

@atanasbalevski
Copy link

Hi,
just my 50c,
Having a special, defined tag like Tags.STATUS with an enum values like OK/ERROR/WARN will also help the UI to color differently the spans,
Implementing the log level names is a bit semantically incorrect, cause i think all spans are info :) just some happen to contain the info for an error

best
atanas

@bhs
Copy link
Contributor

bhs commented Nov 16, 2016

... continued at opentracing/specification#7.

@bhs bhs closed this as completed Nov 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants