Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename Events to Categorized Logs #2876

Conversation

tigrannajaryan
Copy link
Member

@tigrannajaryan tigrannajaryan commented Oct 13, 2022

This is an alternate to #2863

Problem

"Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible.

Proposal

The OpenTelemetry Event is defined as a LogRecord that has specific attributes, namely event.name and event.domain. The event.domain here is one of the important elements. It places the event definitions into isolated buckets (domains).

We could use the term "Domenized Logs", but it sounds a bit weird, so I want to suggest renaming the concept of "domain" to "category", without changing the semantics.

The specially shaped Log Records can be called "Categorized Logs" and have attributes "log.category" (previously known as "event.domain") and "log.name" (previously known as "event.name").

We will refrain from using the term "event" as much as possible to avoid confusion.

I am open to other name suggestions for "Categorized Logs". We just want to make sure to avoid the word "events" and avoid inventing completely new terms, so some sort of adjective + "logs" seems to be the best approach.

Some alternates that were suggested earlier: "Schematic LogRecord", "Semantic LogRecord" (also colloquially I think it is safe to shorten LogRecord to Log when used with an adjective). The advantage of "Categorized Log" is that we use the same term in the attribute name "log.category", whereas it is unclear what attribute name to use for "Schematic" or "Semantic" logs.

What Changes?

  • "Event" is renamed to "Categorized LogRecord"
  • "event.name" is renamed to "log.name"
  • "event.domain" is renamed to "log.category"
  • "log.category" is an attribute of LogRecord (instead of previously "event.domain" being a Scope attribute)

What Did We Lose?

"event.domain" previously could be recorded as a Scope attribute and be used for efficient batch processing/routing of logrecords. This is no longer possible, but we can add another such attribute in the future that serves the same purpose (e.g. see the proposal to add "signal.type").

This is an alternate to open-telemetry#2863

## Problem

"Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible.

## Proposal

The OpenTelemetry Event is defined as a LogRecord that has specific attributes, namely event.name and event.domain. The event.domain here is one of the important elements. It places the event definitions into isolated buckets (domains).

We could use the term "Domenized Logs", but it sounds a bit weird, so I want to suggest renaming the concept of "domain" to "category", without changing the semantics.

The specially shaped Log Records can be called "Categorized Logs" and have attributes "log.category" (previously known as "event.domain") and "log.name" (previously known as "event.name").

We will refrain from using the term "event" as much as possible to avoid confusion.

I am open to other name suggestions for "Categorized Logs". We just want to make sure to avoid the word "events" and avoid inventing completely new terms, so some sort of adjective + "logs" seems to be the best approach.

## What Changes?

- "Event" is renamed to "Categorized LogRecord"
- "event.name" is renamed to "log.name"
- "event.domain" is renamed to "log.category"
- "log.category" is an attribute of LogRecord (instead of previously "event.domain" being a Scope attribute)

## What Did We Lose?

"event.domain" previously could be recorded as a Scope attribute and be used for efficient batch processing/routing of logrecords. This is no longer possible, but we can add another such attribute in the future that serves the same purpose (e.g. see the [proposal to add "signal.type"](https://github.com/open-telemetry/opentelemetry-specification/pull/2863/files#r993794504)).
@scheler
Copy link
Contributor

scheler commented Oct 13, 2022

I strongly prefer keeping the term Event. In the Log SIG call yesterday, it was called out that the term Event was overloaded since there is Span Events and Metric Events. However, each one is a valid in using the term and we only need to clarify the meaning of each in the Glossary. Event is a widely used and well understood term and renaming it to something else is only more confusing.

Span Events are Events attached to Spans. Metric Events are data points from which Measurements are extracted. The events in question can be called Standalone Events, since they are no different from these two other type of events except the context is different. (I thought the Glossary already had the term Standalone Event, but it isn't there).

@tigrannajaryan
Copy link
Member Author

Event is a widely used and well understood term and renaming it to something else is only more confusing.

@scheler I am not sure it is well understood. Can you point to any definition of what an "event" is that everybody understands the same way?

@atoulme
Copy link
Contributor

atoulme commented Oct 17, 2022

+1 to keep events. If you stick to logs, you will miss a universe of data - weather data, CSV entries, JSON docs come to mind.
See this definition of event: https://docs.splunk.com/Splexicon:Event

@tigrannajaryan
Copy link
Member Author

+1 to keep events. If you stick to logs, you will miss a universe of data - weather data, CSV entries, JSON docs come to mind. See this definition of event: https://docs.splunk.com/Splexicon:Event

Each event is given a [timestamp](https://docs.splunk.com/Splexicon:Timestamp), [host](https://docs.splunk.com/Splexicon:Host), [source](https://docs.splunk.com/Splexicon:Source), and [source type](https://docs.splunk.com/Splexicon:Sourcetype).

That's Splunk events. That's not how Otel understands events (as defined currently). The only common thing here is the Timestamp. Otel events don't need to have a Host, Source or Sourcetype.

If you stick to logs, you will miss a universe of data

This PR suggests a new OpenTelemetry-authored name for the same concept, it doesn't suggest to stop supporting events as they are understood now. Nothing changes functionally. The name change is merely to avoid confusion that stems from a disagreement on what an "event" means.

@atoulme
Copy link
Contributor

atoulme commented Oct 17, 2022

This PR suggests a new OpenTelemetry-authored name for the same concept, it doesn't suggest to stop supporting events as they are understood now. Nothing changes functionally. The name change is merely to avoid confusion that stems from a disagreement on what an "event" means.
In the past, this has created tensions downstream. For example, the Java logs effort had discussions about the scope of log support, and the maintainers argued for only supporting logging backends.

@tsloughter
Copy link
Member

+1 from me but I'm not really a fan of the new name. I'd prefer Semantic Log. Semantic is just a term already used throughout Otel and I think that helps, but maybe it hurts in a way I'm not predicting?

@jack-berg
Copy link
Member

The advantage of "Categorized Log" is that we use the same term in the attribute name "log.category", whereas it is unclear what attribute name to use for "Schematic" or "Semantic" logs.

If we were to go with schematic log, we could have schema.domain, and schema.name, where schema.domain + schema.name identify a particular schema.

We could also do a hybrid approach, where we say that OpenTelemetry Events are LogRecords with a schema defined by schema.domain and schema.name. The advantage of this would be that we still get to use the term "Event", but reduce confusion mapping from existing log / event data models since the usage of the term schema is less likely to appear in other data models than event or name. Also, I think schema.domain and schema.name will generally be more intuitive to users than event.domain and event.name.

The api mechanics are then something like:

// Obtain a logger which emits events in the "mobile" schema.domain
var eventLogger = loggerProvider.loggerBuilder("foo.bar.baz").setSchemaDoamin("mobile").build();
// Emit an event with a schema.name of "click"
eventLogger.eventBuilder("click").setAttributes(...).emit();

@jkwatson
Copy link
Contributor

I strongly disagree with this idea. People understand what "events" are. This new language will ensure no one will understand what they are, and therefore make it something people are afraid to use.

@tigrannajaryan
Copy link
Member Author

I strongly disagree with this idea. People understand what "events" are. This new language will ensure no one will understand what they are, and therefore make it something people are afraid to use.

@jkwatson I maintain that there is no single understanding of what "events" are. I have heard countless different definitions from different people. I have yet to hear a definition that everyone (most people?) agree with. Can you link to one? I would be glad to be proven wrong and we can use that definition in Otel.

@reyang
Copy link
Member

reyang commented Oct 18, 2022

I strongly disagree with this idea. People understand what "events" are. This new language will ensure no one will understand what they are, and therefore make it something people are afraid to use.

@jkwatson I maintain that there is no single understanding of what "events" are. I have heard countless different definitions from different people. I have yet to hear a definition that everyone (most people?) agree with. Can you link to one? I would be glad to be proven wrong and we can use that definition in Otel.

The one I could find from public domain (rather than a definition from a particular product/vendor/company) https://en.wikipedia.org/wiki/Event_(computing).

@scheler
Copy link
Contributor

scheler commented Oct 18, 2022

"Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible.

@tigrannajaryan can you please update the problem description and elaborate on what is confusing? The term is used in different contexts for sure but I want to understand how they are conflicting.

@scheler
Copy link
Contributor

scheler commented Oct 18, 2022

This could also be a topic for otel-user-research folks to comment on. Today, many vendors have APIs to create custom events, so that will have to be renamed as well for consistency.

@Aneurysm9
Copy link
Member

I strongly disagree with this idea. People understand what "events" are. This new language will ensure no one will understand what they are, and therefore make it something people are afraid to use.

@jkwatson I maintain that there is no single understanding of what "events" are. I have heard countless different definitions from different people. I have yet to hear a definition that everyone (most people?) agree with. Can you link to one? I would be glad to be proven wrong and we can use that definition in Otel.

The one I could find from public domain (rather than a definition from a particular product/vendor/company) https://en.wikipedia.org/wiki/Event_(computing).

I don't think that provides a useful definition here:

In programming and software design, an event is an action or occurrence recognized by software, often originating asynchronously from the external environment, that may be handled by the software.

By that definition starting a span is an event. Ending it is another event. So is attaching attributes. And observing an instrument value. And changing an instrument value. Everything is an Event, so defined. Which is all well and good and comports with my own personal definition of Event, but it doesn't give us language to use to differentiate LogRecords that have no discernible schema from those that do.

@jkwatson
Copy link
Contributor

I firmly believe that if we define and explain "OpenTelemetry Event API" to people, it will not be confusing to 99% of our users. If we call something "OpenTelemetry Categorized Logs API", no one will understand what we are trying to do, even if we do explain it.

I know I lost the battle on having to support a logging API in OpenTelemetry. I will probably lose this one too, but I think that I need to raise my voice and ask us to listen to common sense and define things that are named simply. We should not create new confusing names for things that are very often called "events" and are not confusing to users of APIs that call them such.

I don't think the term "event" is confusing. Sure, many things are called events, but many things are also called "telemetry" and we haven't shied away from using that term!

@atoulme
Copy link
Contributor

atoulme commented Oct 18, 2022

Let me try this again. Here is the definition of events in "Exploring Splunk", by David Carasso (pdf here, page 137):

An event is one line of data. Here is an event in a web activity log:
173.26.34.223 - - [01/Jul/2009:12:05:27 -0700] “GET /trade/
app?action=logout HTTP/1.1” 200 2953
More specifically, an event is a set of values associated with a timestamp.
While many events are short and only take up a line or two, others can
be long, such as a whole text document, a config file, or whole Java stack
trace. Splunk uses line-breaking rules to determine how it breaks these
events up for display in the search results.

This book was written in 2012.

Could we settle on the definition that "an event is a set of values associated with a timestamp"?

@Aneurysm9
Copy link
Member

Could we settle on the definition that "an event is a set of values associated with a timestamp"?

What we need is language to use to differentiate LogRecords that have no discernible schema from those that do. How does defining Event in the most general terms possible help with that?

I firmly believe that if we define and explain "OpenTelemetry Event API" to people, it will not be confusing to 99% of our users. If we call something "OpenTelemetry Categorized Logs API", no one will understand what we are trying to do, even if we do explain it.

I want to agree with this, but it seemed that even within the Log SIG we couldn't find agreement on what an Event was or wasn't.

I don't think the term "event" is confusing. Sure, many things are called events, but many things are also called "telemetry" and we haven't shied away from using that term!

As far as I can tell we do not define the term telemetry. Not in the spec overview where we define some signal types and not in the glossary. The problem here is that the Log SIG wants to use Event as a term of art. In order to do so we need to agree on its definition. That can be surprisingly hard for things that "everyone understands".

@atoulme
Copy link
Contributor

atoulme commented Oct 19, 2022

What we need is language to use to differentiate LogRecords that have no discernible schema from those that do. How does defining Event in the most general terms possible help with that?

Is that what we're doing here? I assumed I was responding to the problem stated in the issue at the top:

"Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible.

Sorry if I missed something.

@pyohannes
Copy link
Contributor

I have yet to hear a definition that everyone (most people?) agree with. Can you link to one?

Not the one, but another one: https://github.com/cloudevents/spec/blob/main/cloudevents/spec.md#event

CloudEvents is a CNCF project, and the people involved have tried hard to define what an "event" is.

@Aneurysm9
Copy link
Member

I have yet to hear a definition that everyone (most people?) agree with. Can you link to one?

Not the one, but another one: https://github.com/cloudevents/spec/blob/main/cloudevents/spec.md#event

CloudEvents is a CNCF project, and the people involved have tried hard to define what an "event" is.

I think that helps illustrate the point. They define Event as data about a thing that happened and its context. They further then define Event Data, or just Data in some places, as "Domain-specific information about the occurrence (i.e. the payload)" which seems in line with what we're trying to define here. That Data is then further defined by datacontenttype and dataschema fields that allow a receiver to determine how to process the Data. Even this spec that is focused entirely around how to represent "events" in the general sense found the need to define particular terms to express that there is data that can be interpreted according to a given schema.

@tigrannajaryan
Copy link
Member Author

So far we have these definitions:

Wikipedia

an event is an action or occurrence recognized by software, often originating asynchronously from the external environment, that may be handled by the software. Computer events can be generated or triggered by the system, by the user, or in other ways. Typically, events are handled synchronously with the program flow; that is, the software may have one or more dedicated places where events are handled, frequently an event loop.
A source of events includes the user, who may interact with the software through the computer's peripherals - for example, by typing on the keyboard. Another source is a hardware device such as a timer. Software can also trigger its own set of events into the event loop, e.g. to communicate the completion of a task. Software that changes its behavior in response to events is said to be event-driven, often with the goal of being interactive.

Splunk book

an event is a set of values associated with a timestamp

CloudEvents

An "event" is a data record expressing an occurrence and its context. Events are routed from an event producer (the source) to interested event consumers. The routing can be performed based on information contained in the event, but an event will not identify a specific routing destination. Events will contain two types of information: the Event Data representing the Occurrence and Context metadata providing contextual information about the Occurrence. A single occurrence MAY result in more than one event.

I am going to be charitable and ignore the differences between the 3 and assume they are all the same.

The problem is that we are proposing a different definition at OpenTelemetry and still call it an Event. The current best understanding of what OpenTelemetry event is is the following:

OpenTelemetry defines OpenTelemetry Events as LogRecords that are shaped
in a special way:

  • They have a LogRecord attribute event.name (and possibly other LogRecord attributes).
  • They have an InstrumentationScope with a non-empty Name and with an
    InstrumentationScope attribute event.domain (and possibly other InstrumentationScope attributes).

Perhaps the majority's opinion is that we don't care and our definition is the right one. I don't know, to me this sounds like a source of future confusion and misunderstanding.

My previous PR tries to mitigate this by saying we have a OpenTelemetry Event (colloquially shortened to a capital Event) and it is different from other understandings of an "event". Is this sufficient to avoid the confusion? I am not sure, that's why I proposed this alternate PR.

@tedsuo
Copy link
Contributor

tedsuo commented Oct 19, 2022

I think the term event is sufficiently clear. Users understand that the context for this term is "OpenTelemetry."

If we need to differentiate between the other types of events that OTel offers, such as span events and metric events, we could call these log events. But personally, I agree with @jkwatson that our users will not at all be confused by the term "events."

@tigrannajaryan
Copy link
Member Author

Discussed this in the Log SIG yesterday and this seems like a dead end to me. Closing.

@tigrannajaryan tigrannajaryan deleted the feature/tigran/cetegorizedlogs branch October 20, 2022 14:05
@tigrannajaryan
Copy link
Member Author

All, I made an alternate proposal that attempts to address the problems we experience with events: #2897
Please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants