Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify logs vs events vocabulary and usage #2863

Conversation

tigrannajaryan
Copy link
Member

The spec is currently not very clear about what is a log and what is an event.
Here is a for example a discussion that shows that there are different ways
to interpret what is written in the spec today: open-telemetry/opentelemetry-collector-contrib#14474

This PR attempts to clarify the spec by defining the vocabulary more strictly
and by adding a relevant FAQ.

specification/logs/README.md Outdated Show resolved Hide resolved
specification/logs/README.md Outdated Show resolved Hide resolved
specification/logs/README.md Show resolved Hide resolved
Within a particular `event.domain`, the `event.name` uniquely defines a particular class
or type of OpenTelemetry Event. OpenTelemetry Events with the same `event.domain` /
`event.name` follow the same schema which assists in analysis in observability platforms.
See also OpenTelemetry Event [semantic conventions](./semantic_conventions/events.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will discuss this more in the SIG call tomorrow, but I feel we should just say Events are LogRecords with these two attributes. We happened to keep the event.domain attribute in InstrumentationScope for the purpose of API ergonomics - to be able to create multiple Events with a given domain. I also assumed that the InstrumentationScope attributes automatically propagate to the LogRecords inside it, that's why it made sense to me to keep the attribute at the scope level. It turns out this assumption is wrong. However, the backends will move the event.domain attribute into the LogRecord during storage and hence it makes sense to say Events are LogRecords with these two attributes.

Can you also take a look at the changes to ./semantic_conventions/events.md in my PR #2848?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the backends will move the event.domain attribute into the LogRecord during storage

I think you are making an assumption that is not necessarily true for all backends, maybe it is true for some.


Not necessarily. Just because the data in an external data source is called an "event" it
does not mean it automatically qualifies as an OpenTelemetry Event and must be shaped like
an OpenTelemetry Event.
Copy link
Contributor

@scheler scheler Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be very helpful to include guidance on when to produce Events vs Logs, and that I think is based on the expectations of the backend/receiver. For example, at AppDynamics, Events and Logs have some differences in their processing pipelines in the backend. Events are relatively limited in number and are produced by sources creating events of a certain importance, whereas Logs are far large in volume with good amount of noise in them. As an example, Kubernetes Events are (or, can be) Events and Kubernetes Pod logs, by default, are Logs.

What should Go Structured Logs map to in OpenTelemetry? IMO, they should remain as Logs as they are used produced by applications for the general purpose of troubleshooting and are likely to contain a good amount of noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events are relatively limited in number and are produced by sources creating events of a certain importance, whereas Logs are far large in volume with good amount of noise in them.

I don't think that this distinction makes a difference from the data model or API perspective. The same could be said about different log severity levels. The important point, AFAICT, is that it needs to be possible for a processor of log records to identify records that satisfy some criteria to direct them for separate processing. That is separate and distinct from how we refer to the concepts of "opaque bucket of bits" log records and "structured and schema-validatable" log records.

to OpenTelemetry LogRecords (for example in OpenTelemetry Collector) it is reasonable
to shape them like OpenTelemetry Events. In the given example it is reasonable to map
the `name` field from the data source to `event.name` and the `category` field to
`event.domain`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very unlikely that category is same as domain. The data source is a better fit for domain, because this data source is likely emitting events with different values for category and that's why they have category as a field/attribute in their Events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the two things we brought up on the SIG making their way into this document.

  • In .NET we have the EventSource class.

    EventSource emits strongly-typed events. Each event emitted is uniquely identified by the EventSource.Name and the individual event's Event.Id / Event.Name. Events also have a message template and may have attributes. All events emitted through an EventSource will the same exact message template + attribute structure.

    So given the definition/description above, I feel like these EventSource events would be appropriate to shim into OpenTelemetry as Events? With EventSource.Name = event.domain & Event.Name = event.name.

  • In .NET we have the ILogger interface.

    ILogger is a framework for emitting structured logs. An ILogger is bound to a categoryName. Typically some type name (ex: "MyCompanyName.MyLoggingClass"). ILogger has a whole bunch of different overloads which emit log messages.

    You can log without any type of strong-name: myLogger.LogInformation("Something happend for {userId}", userId). That will emit a structured log through the defined category.

    You can also log with a strong-name: myLogger.LogInformation(new EventId(id: 100, name: "AddUserFailure), "Something happend for {userId}", userId). That will emit a structured log through the defined category with the strongly-typed name information.

    What to do in this case is less clear. We could shim the second call with EventId supplied into OpenTelemtry Events with categoryName = event.domain and EventId.Name = event.name. It is assumed but not enforced by the API that all logs sharing a categoryName + EventId will share structure. I think if some backend relied on event.domain + event.name for structure detection, and I used ILogger with EventId to strongly type my logs, I would expect reasonably that the backend would be able to detect my structures?

/cc @alanwest

The spec is currently not very clear about what is a log and what is an event.
Here is a for example a discussion that shows that there are different ways
to interpret what is written in the spec today: open-telemetry/opentelemetry-collector-contrib#14474

This PR attempts to clarify the spec by defining the vocabulary more strictly
and by adding a relevant FAQ.
@@ -208,15 +210,106 @@ Wikipedia’s [definition of log file](https://en.wikipedia.org/wiki/Log_file):
>In computing, a log file is a file that records either events that occur in an
>operating system or other software runs.

From OpenTelemetry's perspective LogRecords and Events are both represented
using the same [data model](./data-model.md).
From OpenTelemetry's perspective logs and events conceptually are not different. Both
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to be focusing on specification/logs folder only. While we don't want to boil the ocean, I've noticed a continuous source of confusion regarding "span events" vs. "log events". I feel that if we don't clarify span events vs log events, we'll continue to run in circles. I want to ask some stepping back questions:

  1. Do we even want to use the term event in both places, or we want to eventually get rid of one?
  2. Why do we want to have the term "event" here? Would something like "named log record" be good enough / better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svrnm I wish to get your opinion here from the documentation perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho, s/span events/span logs/ (as it was in OpenTracing) would address some of the confusion. Framed as span logs, these entries can still be sub-categorized as logs vs. events just like regular log records. It does create a new confusion between span logs and regular logs, which is fundamentally unavoidable as we're forcing the user to make this distinction instead of handling it behind the scenes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method to add a span event/log is called AddEvents (based on this across languages, so if we start calling it "Span Log" we run into an issue here:

  • Removing addEvent is probably not possible, because it is a breaking change, so adding addLog as an alias would be an option, but that increases confusion
  • Calling it "Span Log" and then having a method called "addEvent" is really confusing.

My opinion: From a documentation perspective the key point is being conscious about that potential confusion and give end-user guidance here by being transparent and sharing the thought process that went into things, even if it boils down to "today we would call it Span Log, but we had called it Span Event in the past and we don't want breaking changes, so we have to live with that"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we made a mistake when we renamed InstrumentationLibrary to InstrumentationScope in stable API surface and data models and I'd want to avoid making that same mistake again, so I would be strongly opposed to renaming span.AddEvent() or the Span Event concept and OTLP message.

I have concern that using the term Event as a term of art for log records that have specific properties is going to lead to (even more) significant confusion in the long run. I'm not sure I have a good proposal for an alternative at this time, but something that denotes that there is additional information available to help interpret the attributes on the log record would be good. SchematicLogRecord is a mouthful, but it should effectively convey that there is a schema available that describes how to validate and interpret the semantics of the record. Open to alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this vocabulary update should include something about span events and feel having Log Event, Log Record, Span Event will lead to confusion.

The best alternative I could come up with for Log Event is Report, but I also like SemanticLogRecord, it is clearer what it actually is with no definition required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/clarify-events-vs-logs branch from 1c8fba9 to 1ccee58 Compare October 12, 2022 15:39

**Who produces OpenTelemetry Events?**

OpenTelemetry Events are produced using OpenTelemetry Events API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to limit the source of the events to OpenTelemetry Events API only? Can't we have OTel collector to fetch events from another event source?

@@ -124,10 +126,10 @@ languages have established standards for using particular logging libraries. For
example in Java world there are several highly popular and widely used logging
libraries, such as Log4j or Logback.

OpenTelemetry defines [events](#events-and-logs) as a type of LogRecord with
OpenTelemetry defines [Events](#events-and-logs) as a type of LogRecord with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to elaborate a bit on my thoughts in the SIG meeting today...

What we framed things this way:

"OpenTelemetry provides a generic way to define additional signal types. These signals are modeled as a type of LogRecord with specific characteristics...."

Then instead of event.domain and event.name we use signal.type and signal.name.

Examples include

signal.type signal.name
browser click well-known type (i.e. the semantics are defined by the OTel spec)
k8s pod_start well-known type (i.e. the semantics are defined by the OTel spec)
my_custom_type some_name not well-known (i.e, the end user defines the semantics)
metric http.server.duration this one is just to illustrate how in an alternate universe we might have modeled metrics using one data model to rule them all

Additionally, thinking of things in this way alleviates any need to discuss span events in the scope of this conversation. To me span events are still just simply something that happened during the course of a span at a particular point in time. That is, a span event is not some notion of a nested signal type.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Oct 13, 2022
This is an alternate to open-telemetry#2863

## Problem

"Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible.

## Proposal

The OpenTelemetry Event is defined as a LogRecord that has specific attributes, namely event.name and event.domain. The event.domain here is one of the important elements. It places the event definitions into isolated buckets (domains).
We could use the term "Domenized Logs", but it sounds a bit weird, so I want to suggest renaming the concept of "domain" to "category", without changing the semantics.
The specially shaped Log Records can be called "Categorized Logs" and have attributes "log.category" (previously known as "event.domain") and "log.name" (previously known as "event.name").
We will refrain from using the term "event" as much as possible to avoid confusion.
I am open to other name suggestions for "Categorized Logs". We just want to make sure to avoid the word "events" and avoid inventing completely new terms, so some sort of adjective + "logs" seems to be the best approach.

## What Changes?

- "Event" is renamed to "Categorized LogRecord"
- "event.name" is renamed to "log.name"
- "event.domain" is renamed to "log.category"
- "log.category" is an attribute of LogRecord (instead of previously "event.domain" being a Scope attribute)

## What Did We Lose?

"event.domain" previously could be recorded as a Scope attribute and be used for efficient batch processing/routing of logrecords. This is no longer possible, but we can add another such attribute in the future that serves the same purpose (e.g. see the [proposal to add "signal.type"](https://github.com/open-telemetry/opentelemetry-specification/pull/2863/files#r993794504)).
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Oct 13, 2022
This is an alternate to open-telemetry#2863

## Problem

"Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible.

## Proposal

The OpenTelemetry Event is defined as a LogRecord that has specific attributes, namely event.name and event.domain. The event.domain here is one of the important elements. It places the event definitions into isolated buckets (domains).

We could use the term "Domenized Logs", but it sounds a bit weird, so I want to suggest renaming the concept of "domain" to "category", without changing the semantics.

The specially shaped Log Records can be called "Categorized Logs" and have attributes "log.category" (previously known as "event.domain") and "log.name" (previously known as "event.name").

We will refrain from using the term "event" as much as possible to avoid confusion.

I am open to other name suggestions for "Categorized Logs". We just want to make sure to avoid the word "events" and avoid inventing completely new terms, so some sort of adjective + "logs" seems to be the best approach.

## What Changes?

- "Event" is renamed to "Categorized LogRecord"
- "event.name" is renamed to "log.name"
- "event.domain" is renamed to "log.category"
- "log.category" is an attribute of LogRecord (instead of previously "event.domain" being a Scope attribute)

## What Did We Lose?

"event.domain" previously could be recorded as a Scope attribute and be used for efficient batch processing/routing of logrecords. This is no longer possible, but we can add another such attribute in the future that serves the same purpose (e.g. see the [proposal to add "signal.type"](https://github.com/open-telemetry/opentelemetry-specification/pull/2863/files#r993794504)).
@tigrannajaryan
Copy link
Member Author

All, after the SIG discussion yesterday I came up with an alternate proposal, please see here #2876

I am going to convert this one to draft as well for now.

@tigrannajaryan tigrannajaryan marked this pull request as draft October 13, 2022 16:48
@tigrannajaryan
Copy link
Member Author

All, I made an alternate proposal that attempts to address the problems we experience with events: #2897
Please take a look.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 28, 2022
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet