Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 'metadata' to an event that is not sent on output. #1834

Closed
jordansissel opened this issue Oct 2, 2014 · 7 comments
Closed

Allow 'metadata' to an event that is not sent on output. #1834

jordansissel opened this issue Oct 2, 2014 · 7 comments

Comments

@jordansissel
Copy link
Contributor

Originally from: https://logstash.jira.com/browse/LOGSTASH-1798

It would be great to have arbitrary metadata for an event, which isn't passed through the output and specifically not made available to the encoding/serialization phase.

Example use case:

  • Fields like index_name, which are derived by a filter, and used by the elasticsearch output, but probably shouldn't be part of the event sent to Elasticsearch.

Proposed syntax is simply a fieldref namespace "[@metadata]". Anything under this is considered metadata and not show up normally through JSON or other serialization.


Example use (once #1644 is merged)

input {
  elasticsearch {
    host => "localhost"
    # Store ES document metadata (index, type, id, etc) in metadata
    docinfo_target => "@metadata"
  }
}

filter {
  ...
}

output {
  elasticsearch {
    action => "update"
    document_id => "%{[@metadata][id]}"
    index => "%{[@metadata][index]}"
    type => "%{[@metadata][type]}"
  }
}
@untergeek
Copy link
Member

+1 to @metadata as the field name. I think it fits our schema naming quite well, and will not likely collide with anything else.

@untergeek
Copy link
Member

And the use case for me is [@metadata][document_id] and [@metadata][action] for determining output behavior for the elasticsearch output plugin, with special regard to replacing rivers.

@torrancew
Copy link
Contributor

👍 to @metadata here as well. In general, I've got my colleagues fairly educated on avoiding field names matching "@*" as a policy when generating custom logs, as the existing reserved fields follow that convention.

@jordansissel
Copy link
Contributor Author

@torrancew I'm open to possibly allowing logstash to do that policing for you. That is, having Logstash validate any attempts to use @-named fields and warn or otherwise do something for you to validate.

@colinsurprenant
Copy link
Contributor

+1 on the metadata idea and @metadata seems right.

@avleen
Copy link
Contributor

avleen commented Oct 2, 2014

Really like this.
The example is spot on with what I had in mind too.
I'd use this for index name, cluster name, all kinds of stuff.

Grok is another place where this could make a difference. Currently we extract a my_timestamp field like this:
%{TIMESTAMP:my_timestamp}. If grok can take %{TIMESTAMP:[@metadata][my_timestamp]}, it means we wouldn't have to do a remove_field in every date{} filter.

@jordansissel
Copy link
Contributor Author

@avleen +1, there's some performance and configuration-complexity benefits here for users, I think, because you can use @metadata as a sort of scratch space to store things that are important to the event but do not represent the event itself.

@jordansissel jordansissel added this to the v1.5.0 milestone Oct 2, 2014
jordansissel added a commit to jordansissel/logstash that referenced this issue Oct 2, 2014
@jordansissel jordansissel mentioned this issue Oct 2, 2014
1 task
jordansissel added a commit to jordansissel/logstash that referenced this issue Oct 7, 2014
jordansissel added a commit to jordansissel/logstash that referenced this issue Oct 10, 2014
This makes @metadata basically a way to store data along with an event
that is *NOT* included when serialized to an output.

Use cases:
- For elasticsearch output, set the index, type, document_id, routing
  key, etc with metadata and you won't be burdened by storing a filed
  named 'index' in your document!
- For elasticsearch input, we can set @metadata fields for the
  index/type/document_id instead of polluting the event data itself.
- No need for "short-lived fields" such as timestamps. For example, a
  common pattern is to use grok to capture a timestamp text  and give that
  to the date filter and finally use mutate to remove that captured text
  field.
- Provide a kind of scratch space for events that are not part of the
  event data.

Fixes elastic#1834
jordansissel added a commit that referenced this issue Oct 10, 2014
This makes @metadata basically a way to store data along with an event
that is *NOT* included when serialized to an output.

Use cases:
- For elasticsearch output, set the index, type, document_id, routing
  key, etc with metadata and you won't be burdened by storing a filed
  named 'index' in your document!
- For elasticsearch input, we can set @metadata fields for the
  index/type/document_id instead of polluting the event data itself.
- No need for "short-lived fields" such as timestamps. For example, a
  common pattern is to use grok to capture a timestamp text  and give that
  to the date filter and finally use mutate to remove that captured text
  field.
- Provide a kind of scratch space for events that are not part of the
  event data.

Fixes #1834

Fixes #1836
@tbragin tbragin added v1.5.0 and removed v1.5.0 labels Jun 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants