Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: OpenTelemetry Logs, Events and Domains #2897

Closed
tigrannajaryan opened this issue Oct 20, 2022 · 23 comments
Closed

Proposal: OpenTelemetry Logs, Events and Domains #2897

tigrannajaryan opened this issue Oct 20, 2022 · 23 comments
Assignees
Labels
spec:logs Related to the specification/logs directory triaged-rejected The issue is triaged and rejected by the OTel community

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Oct 20, 2022

We have had a lot of discussions around events and logs and can't come to an agreement on what exactly to do.

I think we need to step back and understand what problem we are solving.

The key moment for me was when @tedsuo mentioned that the desire for "events" originated from the need to have a "primary index", which is essentially a piece of data on which to base "groupby" or "filter" functions on the backend. For example, perhaps you want to see only Kubernetes events, so you want to be able to filter by this criteria. Or perhaps you want to see all browser events grouped by their type (e.g. "click" events grouped separately from "scroll" events).

This need drove the proposal to have the "event.domain" and "event.name". The tuple ("event.domain", "event.name") becomes that "primary index" that can be used for filtering, grouping and any other similar querying.

Now here is the thing: I don't think this is necessary. We have this exact problem in spans and we solve it differently.
For example you may have "http" spans or you may have "database" spans. How are these differentiated? We don't have an attribute with a value that tells you which domain the span belongs to. Instead we use the fact of presence of a particular attribute to know which domain the span belongs to. If the span has "http.method" attribute (or any of the required http attributes) then it is an "http" span.

So, if you want to filter all "http" spans, you choose the ones which have the "http.method" attribute present. If you want to see all "http" spans grouped by their method you use the value of "http.method" for grouping.

What prevents us from using the exact same approach for logs and events?

For example, all browser events can have an attribute "browser.event.type". For click events we will have "browser.event.type=click".

Another example is Kubernetes. For Kubernetes events there is no defined concept of "type" at all, but there is a concept of "reason" so you will have "k8s.event.reason" as the attribute, the presence of which indicates it is a Kubernetes event.
Note how the attribute names for different domains are different. This aligns well with how things work in the tracing world, for span attributes.

So, my question is: why do we even need the new concept of "event.domain" and "event.name"?

Proposal

I suggest deleting "event.domain" and "event.name" semantic conventions.

Experts of a particular area need to come up with their own set of semantic conventions in their own namespace. For example, browser events will be in the "browser.*" namespace. This is exactly what we do for spans and metrics.

I suggest deleting the "Events API". Also delete the event_domain parameter from "Get Logger" API. These are unnecessary.
Otel libraries that help instrumenting the browser can provide helper APIs for creating browser-specific events (DOM events). That specific API can require the "eventType" as parameter and record its value as "browser.event.type" attribute.

FAQ

Does this Solve the Original Problem?

Yes, it does. The "primary indexing" will be based on the presence of the attribute instead of the attribute value. This aligns well with what we do in the tracing world.

Are Backends Capable of Presence Check?

I believe there are at least some that do. See for example Elasticsearch Exists query or Splunk if(isnull()) query.

Backends that aren't (are there any?) will need to implement it if they want to have the associated capabilities.

What Else Does This Help With?

The entire debate of what is an event and how we define it becomes almost unnecessary. If in a particular domain there is a concept of event then reflect that in the naming of the attributes (e.g. "browser.event.type"). That's all that is needed.
All we will assert is this: "Logs and events are represented as Otel's LogRecords concept".

What About Schema URL?

We discussed that perhaps Schema URL can be the indicator of the domain instead of an attribute value. This is no longer necessary since the domain is indicated by a presence of an attribute on the LogRecord.

What Do We Record on the Scope?

Nothing. No Scope attributes are necessary anymore.

How Do We Record Kubernetes Events?

Exactly as we do today: just use a set of "k8s.event.*" attributes.

How Do We Record Windows Events?

Use "windows.event.*" namespace and "windows.event.category" attribute for the presence check.

How Do I Build a UI That Can Groupby and Filter for Domains?

The UI needs to be aware of what domains it supports and what is the presence attribute for the domain. This is exactly the same situation as with spans.

Of course a generic UI that can groupby and filter by an arbitrary attribute will work just fine too.

What does Logs API look like?

TBD, if this we agree on this proposal conceptually.

@tigrannajaryan tigrannajaryan added the spec:logs Related to the specification/logs directory label Oct 20, 2022
@tigrannajaryan
Copy link
Member Author

@tigrannajaryan tigrannajaryan added the needs discussion Need more information before all suitable labels can be applied label Oct 20, 2022
@tigrannajaryan
Copy link
Member Author

cc @open-telemetry/specs-logs-approvers

@jack-berg
Copy link
Member

Now here is the thing: I don't think this is necessary. We have this exact problem in spans and we solve it differently.
For example you may have "http" spans or you may have "database" spans. How are these differentiated? We don't have an attribute with a value that tells you which domain the span belongs to. Instead we use the fact of presence of a particular attribute to know which domain the span belongs to. If the span has "http.method" attribute (or any of the required http attributes) then it is an "http" span.

The span solution is poor for discoverability. The system processing the data and / or the user querying the data needs to be aware of the conventions ahead of time. They need to be aware that all HTTP spans require the presence of http.method, all database spans require the presence of db.system, etc. Suppose you want to have a feature which allows you to discover the classes of spans in your system. You'd only be able to discover the classes that have well defined semantic conventions - any spans which characterize a class of spans undefined in the semantic conventions would have to be grouped into an "unknown" category. Additionally, the fact that existing semantic conventions have a single attribute whose presence can identify the class is accidental. Future semantic conventions could define a class of spans with all attributes optional, which would greatly increase the burden of identifying the class of spans.

I assert that for events, the class or type of the event is the core characteristic of the signal. (In contrast, I'd argue that for spans, identifying the class is important, but the core characteristic is likely the hierarchical arrangement.) You want to be able to unambiguously filter for all events of a particular class or type. You want to be able to discover the complete set of distinct classes or types of events that have occurred. This is not possible with duck typing approach taken with spans.

Without a data model level definition of what an event is, you can't have an event API. And without an event API, we have to tell users to use existing log frameworks to create event-like things in OpenTelemetry, since we've been adamant (correctly so) that we should not create yet another logging API (currently the log API is only meant for the log appender use case). Telling users to use existing log frameworks to create event-like things is bad a UX, since it requires that users inspect the source code to understand how the log framework data models translate to the OpenTelemetry data model.

My opinion is that we need an event API. We need an unambiguous way of identifying OpenTelemetry events in the data model. We need the identification of events in the data model to have very clear semantics to avoid incorrect mapping from existing systems which have events which are semantically different from OpenTelemetry events. My suggestion for changing the event semantic conventions to be less ambiguous is here.

@alanwest
Copy link
Member

I suggest deleting the "Events API". Also delete the event_domain parameter from "Get Logger" API. These are unnecessary.

Yes. I support. I think this does a few things:

  1. It helps to unblock the work of the community to provide support for logging libraries.
  2. Frees up other SIGs to further vet and prototype their ideas. That is, I do not think the Event API is a sufficient prerequisite to block progress on client side, k8s monitoring, etc. I don't think we should preclude ourselves from coming back to the idea of a generic Event API distinct from the Log API, but at this point I think it is far too nebulous to warrant blocking support for logging libraries.

Otel libraries that help instrumenting the browser can provide helper APIs for creating browser-specific events (DOM events). That specific API can require the "eventType" as parameter and record its value as "browser.event.type" attribute.

As I have not been involved, I can't speak for the client-side group, but trying to put an end user hat on I've always suspected that a browser/mobile specific API would provide better ergonomics. I think it would ultimately be confusing to end users to use a "logger" to record client side interactions.

For example you may have "http" spans or you may have "database" spans. How are these differentiated? We don't have an attribute with a value that tells you which domain the span belongs to. Instead we use the fact of presence of a particular attribute to know which domain the span belongs to.

What prevents us from using the exact same approach for logs and events?

You had me convinced awhile back that this approach warrants a stronger effort before jumping to a generic solution like event.domain/event.name. There's a lot I think the semantic conventions do well. Though, I do believe there exist some limitations today. Namely, the semantic conventions do well at describing the context of a piece of telemetry, but they often do not well describe what a piece of telemetry is. I touched on this here #2348 (comment).

Using your example of browser.event.type, I'd be concerned that anything might have this attribute even if it was not in fact a browser event. Maybe the telemetry with this attribute represents something else that occurred during the context of a browser event.

That said, I'd prefer we push on these limitations.

@tigrannajaryan
Copy link
Member Author

@jack-berg

The span solution is poor for discoverability.

This is true. However, why is it acceptable for spans and not for logs/events?

I assert that for events, the class or type of the event is the core characteristic of the signal. (In contrast, I'd argue that for spans, identifying the class is important, but the core characteristic is likely the hierarchical arrangement.)

What is the basis for this assertion? I don't understand why this is so.

You want to be able to unambiguously filter for all events of a particular class or type.

The way I suggested is equally unambiguous. if getAttr("event.type")=="browser" is as unambiguous as if hasAttr("browser.event.type"). Where is the ambiguity in the presence check?

Without a data model level definition of what an event is, you can't have an event API.

Yes. And my suggestion is that we don't need an event API.

And without an event API, we have to tell users to use existing log frameworks to create event-like things in OpenTelemetry, since we've been adamant (correctly so) that we should not create yet another logging API (currently the log API is only meant for the log appender use case).

This is the part that I mentioned is TBD. Maybe we shouldn't be so adamant about it.

My opinion is that we need an event API.

So it sounds like, we need an API, but because we are adamant that it can't be a Logging API let's call it Event API and we are off the hook? :-)

@tigrannajaryan
Copy link
Member Author

Using your example of browser.event.type, I'd be concerned that anything might have this attribute even if it was not in fact a browser event. Maybe the telemetry with this attribute represents something else that occurred during the context of a browser event.

@alanwest What prevents anything to have an attribute "event.type=browser" even if it was not in fact a browser event?

@CodeBlanch
Copy link
Member

Let's say I'm authoring some HTML5 map widget. When users click on my map widget I want to emit a click event with details about the lat/long for the click. What would I do in this case? Have my own custom event or use the standard one with some extra attributes? Adding extra attributes to a standard event might surprise/break backends? Creating a custom event then might ghost these from "browser" or "click" queries completely? I haven't been involved in the client discussions either, but unclear to me we have even solved the original problem 😄

@MSNev
Copy link

MSNev commented Oct 20, 2022

So it sounds like, we need an API, but because we are adamant that it can't be a Logging API let's call it Event API and we are off the hook

This takes us back to the discussion on "combining" them in the first place.. I still propose that they should have been be separate.

I suggest deleting "event.domain" and "event.name" semantic conventions.

IMHO, We need some way to uniquely identify what the log is representing whether that be a generic log, kubernetes log or an event (or some other possible structured record).

For events, my original preference for identifying an event was a single event.name with the value including the "domain" (eg. something like. event.name=otel.browser.pageview) this would allow backends to be able to identify and perform (if required) some sort of validation of the values. Splitting into event.domain and event.name is just a variation on this and (my understanding) was driven by discussions to avoid clashes with existing / other log records.

And unlike Spans where consumers (backends / UX / etc) "infer" what is being reported by the presence of specific attributes which includes the concept of composition (including anything and everything), at least for events this is explicitly NOT a requirement and should be avoided. eg. what do you do with a log event containing browser.event.type and mobile.event.type'; k8s.*` etc

Whether this is a schema [ url | uri | urn]; event.type; event.name/event.domain is just the semantic, fundamentally we need to be a simple unambiguous method to allow backends to be a perform any routing / validation of events for massive scale. As unlike having a limited number of servers (eg. 100,000's) sending logs, there WILL be 100's of millions of inbound requests coming from clients (eg every browser or every tab on every browser will be sending requests (if instrumented) and there is no way to "consolidate" requests (for this case) on the client).

From the discussions yesterday, my preference would be to define that log records which contain a "schema" definition and an index (event.name / event.type into the schema) would be the preferred approach as this solves another issue that we have not yet highlighted of defining some validation of the field "type" or field "meta-data" which today only loosely defined. Having a defined schema would allow additional restrictions that may be defined.

Failing that a single presence of a event.type / event.name / log.type / log.xxx etc is the next logical level as it would act as both the index and inferred schema. The downside of this step is the where and how the schema is defined / published / maintained for each defined event. And this "somewhat" precludes sharing of custom definitions.

Summary (TLTR)

suggest deleting "event.domain" and "event.name" semantic conventions.

As long as it's replaced with a schema with index or single event.type convention so the LogRecord can be unambiguously defined. (ie. defines the namespace).

I suggest deleting the "Events API"...

As long as it's split out into its own definition, it would also be "nice" to have a single simple unified Events Api to support the creation of events for a given defined domain. Even if this API is just defining the convenience method(s) for creating and publishing events

@jkwatson
Copy link
Contributor

Please, I implore you and the log sig: think about the end user when making this sort of proposal.

If I, as an end user, want to create an OpenTelemetry event, what would you be asking me to have to do and know in order to do it, with this proposal?

I would need to know that a) I have to use this weird other logging API, that isn't my normal logging API. b) know how to precisely construct a log message so that it would be interpreted as an event by my backend of choice.

This is not a user-friendly or ergonomic choice. The end user should not have to understand the logs data model in order to create an OpenTelemetry event. This should be some that is done easily and simply using an Event API. Vendors and creators of backends that support OpenTelemetry events will need to understand these details, of course, but we definitely should not push the requirement of this knowledge onto the end user. Unless, of course, you don't want to have events in OpenTelemetry at all, which is what this proposal seems to be suggesting.

@Aneurysm9
Copy link
Member

If I, as an end user, want to create an OpenTelemetry event, what would you be asking me to have to do and know in order to do it, with this proposal?

I would need to know that a) I have to use this weird other logging API, that isn't my normal logging API. b) know how to precisely construct a log message so that it would be interpreted as an event by my backend of choice.

This is not a user-friendly or ergonomic choice. The end user should not have to understand the logs data model in order to create an OpenTelemetry event. This should be some that is done easily and simply using an Event API.

I agree, but think even this doesn't go far enough. If the Event API is a wrapper around the LogRecord creation API and ensures that the identifying attributes are present doesn't this just push the issue up one layer of abstraction?

lr := logrecord.New([]logrecord.Attribute{
  {key: "event.domain", value: "browser"},  
  {key: "event.name", value: "click"},
  {key: "browser.click.target", value: "#my-cool-widget"},
  ...,
}

How is that any different from:

ev := event.New("browser", "click", []logrecord.Attribute{
  {key: "browser.click.target", value: "#my-cool-widget"},
  ...,
}

If I have domain-specific attributes I still need to know that I need to use this weird other event API and provide it attributes that fit my schema without the API itself having any knowledge of that schema or ability to help me avoid shooting myself in the foot.

Instead, I'd propose that we not have any generic Event API and instead let groups that want to develop specific event creation APIs to do so. That way, a browser click event could be something more like this:

ev := browser.Click(element.ID)

@tigrannajaryan
Copy link
Member Author

Let's say I'm authoring some HTML5 map widget. When users click on my map widget I want to emit a click event with details about the lat/long for the click. What would I do in this case? Have my own custom event or use the standard one with some extra attributes?

@CodeBlanch you provide the following attributes:

{
  "browser.event.type": "click", 
  "browser.event.lat": 123,
  "browser.event.lon": 234,
   ... whatever other attributes you need to record for the click
}

Adding extra attributes to a standard event might surprise/break backends?

Why would it break backends? Why is there an expectation that there can't be extra attributes? In fact we have exactly the opposite in our telemetry stability requirements and allow adding log attributes.

Creating a custom event then might ghost these from "browser" or "click" queries completely? I haven't been involved in the client discussions either, but unclear to me we have even solved the original problem 😄

Only if they don't follow Otel recommendations on what stability to expect from telemetry.

@tigrannajaryan
Copy link
Member Author

IMHO, We need some way to uniquely identify what the log is representing whether that be a generic log, kubernetes log or an event (or some other possible structured record).

@MSNev Yes, we need it. I explained how to do that identification. We can check that k8s.event.reason attribute is present. Why is this not sufficient?

And unlike Spans where consumers (backends / UX / etc) "infer" what is being reported by the presence of specific attributes which includes the concept of composition (including anything and everything), at least for events this is explicitly NOT a requirement and should be avoided.

Presence of an attribute seems to be as reliable an indicator as the value of an attribute.

eg. what do you do with a log event containing browser.event.type and mobile.event.type'; k8s.*` etc

Do you mean containing at the same time? Be restrictive and deal with it like you deal with any other malformed data or be permissive and assume one of the attributes matters and ignore the rest. This is no different than receiving an impossible combination of event.domain and event.name or any other malformed set of attributes.

Whether this is a schema [ url | uri | urn]; event.type; event.name/event.domain is just the semantic, fundamentally we need to be a simple unambiguous method to allow backends to be a perform any routing / validation of events for massive scale. As unlike having a limited number of servers (eg. 100,000's) sending logs, there WILL be 100's of millions of inbound requests coming from clients (eg every browser or every tab on every browser will be sending requests (if instrumented) and there is no way to "consolidate" requests (for this case) on the client).

Are you arguing that checking for presence is not simple, not unambiguous or is not performant? What exactly is the argument?

From the discussions yesterday, my preference would be to define that log records which contain a "schema" definition and an index (event.name / event.type into the schema) would be the preferred approach as this solves another issue that we have not yet highlighted of defining some validation of the field "type" or field "meta-data" which today only loosely defined. Having a defined schema would allow additional restrictions that may be defined.

Please elaborate. I am not sure I understand what issue is this.

As long as it's split out into its own definition, it would also be "nice" to have a single simple unified Events Api to support the creation of events for a given defined domain. Even if this API is just defining the convenience method(s) for creating and publishing events

Why can't you define a simple API helper in your domain to create events of your domain's type? For example:

function createEvent(logger, type, attrs) {
  logger.createLogRecord({attrs...,"browser.event.type":type})
}

@tigrannajaryan
Copy link
Member Author

Please, I implore you and the log sig: think about the end user when making this sort of proposal.

@jkwatson good to see you back at Otel!

Looks like you are interested in this topic. It would be great if you could help! Join the Log SIG calls if you can.

If I, as an end user, want to create an OpenTelemetry event, what would you be asking me to have to do and know in order to do it, with this proposal?

I suggest that there is no such thing as OpenTelemetry event. There is for example browser events, for which the browser instrumentation library will define a nice API to call.

I would need to know that a) I have to use this weird other logging API, that isn't my normal logging API. b) know how to precisely construct a log message so that it would be interpreted as an event by my backend of choice.

No, you don't need to do either of those. Call a purpose-built events API exposed by the library you use.

... The end user should not have to understand the logs data model in order to create an OpenTelemetry event. This should be some that is done easily and simply using an Event API. Vendors and creators of backends that support OpenTelemetry events will need to understand these details, of course, but we definitely should not push the requirement of this knowledge onto the end user.

I agree. I think purpose-built APIs give you exactly this. You can have an API that is shaped to match the problem domain (e.g. browser events or mobile events or whatever else it is).

@scheler
Copy link
Contributor

scheler commented Oct 20, 2022

Instead, I'd propose that we not have any generic Event API and instead let groups that want to develop specific event creation APIs to do so.

I think it will be good to list down the use-cases for Events. RUM is one (both browser and mobile). The other requirement I have seen so far is for capturing Custom Events that different vendors have API for for their users. And the custom events API is across all products including RUM, APM and Infra. There's also a third category for receiver of events from other sources, such as Kubernetes Events. Given these multiple use-cases, I thought it will be good to have a common representation of Events and an API. I don't know if it helps backends if different groups model Events differently and not have a central definition of Events in the specification.

@tigrannajaryan
Copy link
Member Author

The other requirement I have seen so far is for capturing Custom Events that different vendors have API for for their users. And the custom events API is across all products including RUM, APM and Infra.

@scheler can you please clarify what this is?

There's also a third category for receiver of events from other sources, such as Kubernetes Events. Given these multiple use-cases, I thought it will be good to have a common representation of Events and an API. I don't know if it helps backends if different groups model Events differently and not have a central definition of Events in the specification.

These are in the Collector. They don't need an API.

@scheler
Copy link
Contributor

scheler commented Oct 20, 2022

The other requirement I have seen so far is for capturing Custom Events that different vendors have API for for their users. And the custom events API is across all products including RUM, APM and Infra.

@scheler can you please clarify what this is?

There's also a third category for receiver of events from other sources, such as Kubernetes Events. Given these multiple use-cases, I thought it will be good to have a common representation of Events and an API. I don't know if it helps backends if different groups model Events differently and not have a central definition of Events in the specification.

These are in the Collector. They don't need an API.

The backends do not know the source. When they are looking for Events, they are looking for messages with specific common characteristics. Events don't need to be created only through an API. Events must have a data model that backends can rely on.

Of course, Kubernetes Events or other events received by the Collector need not be Events. However, this fact must be published so the backends know what they are receiving.

For the purpose of this discussion, we can drop this category as I dont know if anyone really need them as Events. I don't know if @dmitryax needs K8s events as OpenTelemetry Events or just wanted to align them as they are named events.

@MSNev
Copy link

MSNev commented Oct 20, 2022

We can check that k8s.event.reason attribute is present. Why is this not sufficient?

Because it may contain k8s.event.reason and browser.event.type, so "which" event is this for example.

Do you mean containing at the same time? Be restrictive and deal with it like you deal with any other malformed data

As called out, if we define the semantics of what is an event vs what is a general log then you also avoid the situation of (for example) someone wanting to send an event (like perhaps an exception) which happens to contain other attributes that would normally be an event. eg. An exception occurred while processing event x -- if both are present what is this...

Are you arguing that checking for presence is not simple, not unambiguous or is not performant? What exactly is the argument?

It's not unambiguous or performant as now the back end needs to know what "event" it might want to route (in order of preference if multiple names are present). As opposed to looking for a single "this is an event" and then routing/validating as such.

Please elaborate. I am not sure I understand what issue is this.

If there is a schema present then the backend (may) perform additional validation of the content of the fields and cause the event to be dropped (and not sent for storage / indexing) because the received "event" is not deemed to be valid, examples of possible simple validation

  • Required fields not present
  • Wrong type of field (number / string / etc)
  • Maximum Field length
  • And many other possibilities depending on the schema event requirements

This also helps with simple transformations where a value is passed as a string, but want to be stored as a numeric, or enum value validation. A more extreme example would be if a value is passed as a simple JSON encoded blob, this "could" provide direction on how fields are converted without it needing to be a full OTLP attribute definition
eg.
{ "key1": "value1", "key2": 42 } and not
[ { "key": "key1", "value": { "stringValue": "value1" } }, { "key": "key2", "value": { "intValue": 42 } } ]

Where the schema would define that "key1" is a string value and "key2" is an integer.

Why can't you define a simple API helper in your domain to create events of your domain's type?

I thought that is what I said :-) create domains specific API helpers. which could hide the fact that it's using the logger and hide the fact that it's adding additional (fixed) properties to identify the event type / schema definition etc

function createBrowserEvent(type, attrs) {
  _logger.createLogRecord({attrs...,"event.type": "browser." + type})
}

function createBrowserPageViewEvent(attrs) {
  _logger.createLogRecord({attrs...,"event.type":"browser.pageview"})
}

@jkwatson
Copy link
Contributor

If there is to be no event API, then I propose we get rid of an otel logging API for Java altogether. Java does not need another logging API . Period. Full stop.

@jack-berg
Copy link
Member

@tigrannajaryan

The span solution is poor for discoverability.

This is true. However, why is it acceptable for spans and not for logs/events?

I don't know why it's acceptable for spans. I think attribute presence for classification based on prior knowledge is cumbersome and may be fragile in some instances. For example, you can't actually use the presence of http.method to identify http server spans - you must look for SpanKind = server AND http.method != null. If another semantic convention resuses (admittedly unlikely) http.method, the identity is broken. If another semantic convention is introduced with several optional attributes and no single required attribute, you'd have to check for the presence of one of several attributes (yuck). Ultimately these ergonomics bubble up, forcing the user to understand the subtleties of semantic conventions. With a succinct classification mechanism, users always know how to query if they want to select a particular type of things.

I assert that for events, the class or type of the event is the core characteristic of the signal. (In contrast, I'd argue that for spans, identifying the class is important, but the core characteristic is likely the hierarchical arrangement.)

What is the basis for this assertion? I don't understand why this is so.

My chain of reasoning is: 1. It's valuable to have an API for emitting events. 2. You can't define an event API without defining what an event is. 3. The presence of an event class or type is a common thread in many existing definitions of events.

You want to be able to unambiguously filter for all events of a particular class or type.

The way I suggested is equally unambiguous. if getAttr("event.type")=="browser" is as unambiguous as if hasAttr("browser.event.type"). Where is the ambiguity in the presence check?

Those are unambiguous yes. But the total set of events received is not discoverable. Can only discover the set of browser events.

This is the part that I mentioned is TBD. Maybe we shouldn't be so adamant about it.

So it sounds like, we need an API, but because we are adamant that it can't be a Logging API let's call it Event API and we are off the hook? :-)

Consider the following options:

  1. Define an event API with a corresponding event data model definition which is intuitive for most users and unambiguous for backends. Define a separate log API which is only to be used by authors of log appenders.
  2. Define only a log API. Tell users they can use the log API to create event-like things by defining their own semantic conventions and adhering to those. But users shouldn't use the log API for traditional application logs because the OpenTelemetry log API / SDK can't / shouldn't compete with existing frameworks on features. Backends can identify events of interests by being aware of the semantic conventions and other potential sources of semantic conventions that live outside OpenTelemetry.

Those clearly aren't the only options, but for me option 1 is simpler to explain to users and simpler for backends to consume. It's not clear to me what we stand to gain by taking away an event API, and / or taking away a clear way to identify the class of the events. Put another way, even if we can move forward without an event API, and without a clear way to identify the class of the events, why impose a constraint which makes things harder to reason about.

@tsloughter
Copy link
Member

There would still need to be some generic "event.name" and maybe "event.domain" attribute in the data model so that generic events can be supported by all backends simply by following the spec and not require knowing about every type of event -- since not every type of event will be supported by every backend if that direction were taken.

A backend could require the user input the custom attribute to look for to create an "event" but that seems like unnecessary work for both the backend UI and the user when the log data model could support this.

An event API that acts on top of the logs SDK would be preferable in my opinion for users -- along with continuing to not require a logs API since there are languages already with capable structured log apis.

@MSNev
Copy link

MSNev commented Oct 21, 2022

I should also add here that one of the "definitions" of an event being considered for the RUM sig is that an "event" is defined by

  • event.domain + event.name (this could be also considered (domain / schema as above) / name as the index
  • event.data this is the "nested attributes which represents all of the specific fields (optional / required) that are associated with the event.

The rationale for having all of the defined "event" fields in event.data is so that there would be zero conflict with other semantic convention attributes that may also be present / included with this event. Which could also include application level attributes that the developer may want to include.

This is also where the "schema" definition would be extremely useful to allow defining the validation on this embedded data values.

So taking this further this could also be extended to "move" the name into the data

example using the browser domain version 1

{
    event.schema: "otel://browser/1",
    event.data: {
        name: "pageview",
        ....
    }
}

@ramthi
Copy link

ramthi commented Oct 22, 2022

@jack-berg

The span solution is poor for discoverability.

This is true. However, why is it acceptable for spans and not for logs/events?

Spans already have a structure; compared to them, logs are going to be super noisy, unstructured, and possibly orders of magnitude more in volume. Always querying for a known set of fields domain & name like so domain+name=="browser.event.click" or domain+name="k8s.event.pending" compared to browser.event.type=="click" or k8s.event.reason="pending" is definitely complicated. Supporting new "domains" would be a huge change for backends. Streaming systems would suffer the penalty of looking for 100s of different attributes (for their presence), while always looking for consistent fields for various values would be efficient.

@carlosalberto carlosalberto added the triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide label Oct 24, 2022
@tigrannajaryan
Copy link
Member Author

OK. To summarize the comments: there is not much love for the presence-based domain indication and there are known downsides. I am going to close this proposal as rejected.

I am at Kubecon, so won't be able to work on for until next week. I still think the current spec is not the best we can do.

@tigrannajaryan tigrannajaryan added triaged-rejected The issue is triaged and rejected by the OTel community and removed triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide needs discussion Need more information before all suitable labels can be applied labels Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:logs Related to the specification/logs directory triaged-rejected The issue is triaged and rejected by the OTel community
Projects
None yet
Development

No branches or pull requests