Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Log Data model #97

Merged

Conversation

tigrannajaryan
Copy link
Member

@tigrannajaryan tigrannajaryan commented Apr 13, 2020

This is a proposal of a data model and semantic conventions that allow to
represent logs from various sources: application log files, machine
generated events, system logs, etc.

Existing log formats can be unambiguously mapped to this data model.
Reverse mapping from this data model is also possible to the extent that
the target log format has equivalent capabilities.

The purpose of the data model is to have a common understanding of what a
log record is, what data needs to be recorded, transferred, stored and
interpreted by a logging system.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/specs-approvers please review. The initial wave of discussion happened in the Google Doc. I believe it is time to move to refinements and eventual approval.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/specs-approvers please review.

@roncohen
Copy link

roncohen commented May 1, 2020

thank you for the hard work on this @tigrannajaryan and others. I understand the wish to keep up the momentum and move forward on this. One point that recently came up was whether we need to consider how the data model can be made to encompass not just logs, but also tracing events and metrics or at least how it will interoperate with these. cc'ing @tedsuo because I believe we discussed it in a recent call.

@roncohen
Copy link

roncohen commented May 1, 2020

one of the requirements for this model is that is can be unambiguously mapped to from common models. It doesn't look like we've been through that exercise for the Elastic Common Schema (ECS) and it is not present here. I'm happy to take a swing at it, if that sounds appropriate.

@tigrannajaryan
Copy link
Member Author

one of the requirements for this model is that is can be unambiguously mapped to from common models. It doesn't look like we've been through that exercise for the Elastic Common Schema (ECS) and it is not present here. I'm happy to take a swing at it, if that sounds appropriate.

@roncohen Yes, definitely, please do. Happy to add it to this OTEP or reference it from this OTEP if you post it elsewhere.

text/0097-log-data-model.md Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member Author

Posted this in gitter, but no response to it. I am bringing it here. I think this design looks good, but I have one concern. There are a bunch of vendor specific things. Why do Splunk need SignalFX events and Splunk events? The company is the same, and should standardize. The google/amazon log fields are also something we should make generic versus specific and have these as part of the general spec.

These are just examples to help clarify the meaning of the fields for people who are familiar with one or another format. This document does not claim or attempt to standardize any vendor-specific formats. Ultimately vendors are free to decide how (and if) they want to map their formats to this model.

@jkowall
Copy link

jkowall commented May 22, 2020

These are just examples to help clarify the meaning of the fields for people who are familiar with one or another format. This document does not claim or attempt to standardize any vendor-specific formats. Ultimately vendors are free to decide how (and if) they want to map their formats to this model.

I'm not sure why they belong in the spec if that's the case. If we want to include example uses such as these vendor specifics then they belong in a different file/location versus in the appendix. It seems like free advertising for commercial solutions which doesn't belong in this forum. If those are to be included then we should open that up to all vendor solutions who want to be included in the specification.

@tigrannajaryan
Copy link
Member Author

I'm not sure why they belong in the spec if that's the case. If we want to include example uses such as these vendor specifics then they belong in a different file/location versus in the appendix. It seems like free advertising for commercial solutions which doesn't belong in this forum. If those are to be included then we should open that up to all vendor solutions who want to be included in the specification.

I'll leave up to OpenTelemetry admins to decide if having this is appropriate in an OTEP. I believe examples help clarify the OTEP and should stay.

@jkowall
Copy link

jkowall commented May 22, 2020

I'll leave up to OpenTelemetry admins to decide if having this is appropriate in an OTEP. I believe examples help clarify the OTEP and should stay.

No problem, I'll submit a PR for a logz.io example section to be added in fairness if that's how we're going to proceed. Personally, I do not believe this is the right forum for vendor names to be included.

Copy link

@jkowall jkowall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved aside from my objection in having vendor names included. I will add a section for my employer after this is approved as right now only Splunk, Signalfx (also Splunk), Google, Microsoft are referenced examples.

Tigran Najaryan added 7 commits May 22, 2020 15:56
This is a proposal of a data model and semantic conventions that allow to
represent logs from various sources: application log files, machine
generated events, system logs, etc.

Existing log formats can be unambiguously mapped to this data model.
Reverse mapping from this data model is also possible to the extent that
the target log format has equivalent capabilities.

The purpose of the data model is to have a common understanding of what a
log record is, what data needs to be recorded, transferred, stored and
interpreted by a logging system.
text/0097-log-data-model.md Outdated Show resolved Hide resolved
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@bogdandrutu bogdandrutu merged commit 22990a5 into open-telemetry:master May 22, 2020
@tigrannajaryan tigrannajaryan deleted the feature/tigran/log-data-model branch June 16, 2020 18:02
tigrannajaryan added a commit to open-telemetry/opentelemetry-specification that referenced this pull request Apr 6, 2023
Fixes #597

## Changes

- Add a section for "generic attributes" to the log semconv
- Add an attribute `log_record.id` making use of ULID as discussed in
#597

Some additional notes:
- I kept the PR small, so I left out some other potential attributes,
e.g. something for pre-existing ID (like windows event logs) or for
storing the used logging/eventing system or even something like a
"signature" that might be worth discussing, etc.
- I followed the structure of "generic attributes" from the spans
semconv
- I took some of the existing wording from #597 &
open-telemetry/oteps#97 (comment) to
describe the field

---------

Signed-off-by: svrnm <neumanns@cisco.com>
Co-authored-by: Joao Grassi <joao@joaograssi.com>
Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
jsuereth pushed a commit to jsuereth/otel-semconv-test that referenced this pull request Apr 19, 2023
Fixes #597

## Changes

- Add a section for "generic attributes" to the log semconv
- Add an attribute `log_record.id` making use of ULID as discussed in
#597

Some additional notes:
- I kept the PR small, so I left out some other potential attributes,
e.g. something for pre-existing ID (like windows event logs) or for
storing the used logging/eventing system or even something like a
"signature" that might be worth discussing, etc.
- I followed the structure of "generic attributes" from the spans
semconv
- I took some of the existing wording from #597 &
open-telemetry/oteps#97 (comment) to
describe the field

---------

Signed-off-by: svrnm <neumanns@cisco.com>
Co-authored-by: Joao Grassi <joao@joaograssi.com>
Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
jsuereth pushed a commit to open-telemetry/semantic-conventions that referenced this pull request May 11, 2023
Fixes #597

## Changes

- Add a section for "generic attributes" to the log semconv
- Add an attribute `log_record.id` making use of ULID as discussed in
#597

Some additional notes:
- I kept the PR small, so I left out some other potential attributes,
e.g. something for pre-existing ID (like windows event logs) or for
storing the used logging/eventing system or even something like a
"signature" that might be worth discussing, etc.
- I followed the structure of "generic attributes" from the spans
semconv
- I took some of the existing wording from #597 &
open-telemetry/oteps#97 (comment) to
describe the field

---------

Signed-off-by: svrnm <neumanns@cisco.com>
Co-authored-by: Joao Grassi <joao@joaograssi.com>
Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet