New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 'metadata' to an event that is not sent on output. #1834

Closed
jordansissel opened this Issue Oct 2, 2014 · 7 comments

Comments

Projects
None yet
6 participants
@jordansissel
Contributor

jordansissel commented Oct 2, 2014

Originally from: https://logstash.jira.com/browse/LOGSTASH-1798

It would be great to have arbitrary metadata for an event, which isn't passed through the output and specifically not made available to the encoding/serialization phase.

Example use case:

  • Fields like index_name, which are derived by a filter, and used by the elasticsearch output, but probably shouldn't be part of the event sent to Elasticsearch.

Proposed syntax is simply a fieldref namespace "[@metadata]". Anything under this is considered metadata and not show up normally through JSON or other serialization.


Example use (once #1644 is merged)

input {
  elasticsearch {
    host => "localhost"
    # Store ES document metadata (index, type, id, etc) in metadata
    docinfo_target => "@metadata"
  }
}

filter {
  ...
}

output {
  elasticsearch {
    action => "update"
    document_id => "%{[@metadata][id]}"
    index => "%{[@metadata][index]}"
    type => "%{[@metadata][type]}"
  }
}
@untergeek

This comment has been minimized.

Show comment
Hide comment
@untergeek

untergeek Oct 2, 2014

Member

+1 to @metadata as the field name. I think it fits our schema naming quite well, and will not likely collide with anything else.

Member

untergeek commented Oct 2, 2014

+1 to @metadata as the field name. I think it fits our schema naming quite well, and will not likely collide with anything else.

@untergeek

This comment has been minimized.

Show comment
Hide comment
@untergeek

untergeek Oct 2, 2014

Member

And the use case for me is [@metadata][document_id] and [@metadata][action] for determining output behavior for the elasticsearch output plugin, with special regard to replacing rivers.

Member

untergeek commented Oct 2, 2014

And the use case for me is [@metadata][document_id] and [@metadata][action] for determining output behavior for the elasticsearch output plugin, with special regard to replacing rivers.

@torrancew

This comment has been minimized.

Show comment
Hide comment
@torrancew

torrancew Oct 2, 2014

Contributor

👍 to @metadata here as well. In general, I've got my colleagues fairly educated on avoiding field names matching "@*" as a policy when generating custom logs, as the existing reserved fields follow that convention.

Contributor

torrancew commented Oct 2, 2014

👍 to @metadata here as well. In general, I've got my colleagues fairly educated on avoiding field names matching "@*" as a policy when generating custom logs, as the existing reserved fields follow that convention.

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Oct 2, 2014

Contributor

@torrancew I'm open to possibly allowing logstash to do that policing for you. That is, having Logstash validate any attempts to use @-named fields and warn or otherwise do something for you to validate.

Contributor

jordansissel commented Oct 2, 2014

@torrancew I'm open to possibly allowing logstash to do that policing for you. That is, having Logstash validate any attempts to use @-named fields and warn or otherwise do something for you to validate.

@colinsurprenant

This comment has been minimized.

Show comment
Hide comment
@colinsurprenant

colinsurprenant Oct 2, 2014

Contributor

+1 on the metadata idea and @metadata seems right.

Contributor

colinsurprenant commented Oct 2, 2014

+1 on the metadata idea and @metadata seems right.

@avleen

This comment has been minimized.

Show comment
Hide comment
@avleen

avleen Oct 2, 2014

Contributor

Really like this.
The example is spot on with what I had in mind too.
I'd use this for index name, cluster name, all kinds of stuff.

Grok is another place where this could make a difference. Currently we extract a my_timestamp field like this:
%{TIMESTAMP:my_timestamp}. If grok can take %{TIMESTAMP:[@metadata][my_timestamp]}, it means we wouldn't have to do a remove_field in every date{} filter.

Contributor

avleen commented Oct 2, 2014

Really like this.
The example is spot on with what I had in mind too.
I'd use this for index name, cluster name, all kinds of stuff.

Grok is another place where this could make a difference. Currently we extract a my_timestamp field like this:
%{TIMESTAMP:my_timestamp}. If grok can take %{TIMESTAMP:[@metadata][my_timestamp]}, it means we wouldn't have to do a remove_field in every date{} filter.

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Oct 2, 2014

Contributor

@avleen +1, there's some performance and configuration-complexity benefits here for users, I think, because you can use @metadata as a sort of scratch space to store things that are important to the event but do not represent the event itself.

Contributor

jordansissel commented Oct 2, 2014

@avleen +1, there's some performance and configuration-complexity benefits here for users, I think, because you can use @metadata as a sort of scratch space to store things that are important to the event but do not represent the event itself.

@jordansissel jordansissel added this to the v1.5.0 milestone Oct 2, 2014

jordansissel added a commit to jordansissel/logstash that referenced this issue Oct 2, 2014

@jordansissel jordansissel referenced this issue Oct 2, 2014

Closed

Event 'metadata' #1836

0 of 1 task complete

jordansissel added a commit to jordansissel/logstash that referenced this issue Oct 7, 2014

jordansissel added a commit to jordansissel/logstash that referenced this issue Oct 10, 2014

Add metadata via @metadata field
This makes @metadata basically a way to store data along with an event
that is *NOT* included when serialized to an output.

Use cases:
- For elasticsearch output, set the index, type, document_id, routing
  key, etc with metadata and you won't be burdened by storing a filed
  named 'index' in your document!
- For elasticsearch input, we can set @metadata fields for the
  index/type/document_id instead of polluting the event data itself.
- No need for "short-lived fields" such as timestamps. For example, a
  common pattern is to use grok to capture a timestamp text  and give that
  to the date filter and finally use mutate to remove that captured text
  field.
- Provide a kind of scratch space for events that are not part of the
  event data.

Fixes #1834

jordansissel added a commit that referenced this issue Oct 10, 2014

Add metadata via @metadata field
This makes @metadata basically a way to store data along with an event
that is *NOT* included when serialized to an output.

Use cases:
- For elasticsearch output, set the index, type, document_id, routing
  key, etc with metadata and you won't be burdened by storing a filed
  named 'index' in your document!
- For elasticsearch input, we can set @metadata fields for the
  index/type/document_id instead of polluting the event data itself.
- No need for "short-lived fields" such as timestamps. For example, a
  common pattern is to use grok to capture a timestamp text  and give that
  to the date filter and finally use mutate to remove that captured text
  field.
- Provide a kind of scratch space for events that are not part of the
  event data.

Fixes #1834

Fixes #1836

@tbragin tbragin added v1.5.0 and removed v1.5.0 labels Jun 18, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment