-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for _ttl at event level ? #43
Comments
@wiibaa, thanks for migrating these. I know you're aware of these details, but I'm going to comment here for the benefit of other readers. TTL is a bad idea for time-series data as indices can grow to billions of documents per day. If a document-level TTL is set on any document in the index, the entire index will be scanned every 60 seconds (a configurable default) to look for documents which have a TTL set, and to check if it has expired. This is a tremendous amount of overhead just for reading, not even deleting. Even using a
The need to prune documents by TTL or by query is necessary in certain environments, but time-series data (like logs) should almost never be handled in this way, in my opinion. Your Elasticsearch environment will be far better served by splitting into separate indices and dropping them with With this said, if you insist on doing event-level TTL, there's nothing to prevent users from adding a TTL to individual events in Logstash by adding a field called
Because this is fully supported now—because of the Logstash 1.2 schema change—I'm going to close this issue with all caveats mentioned. Feel free to re-open if you believe it necessary. |
@untergeek I had a rough idea, but it is always good to hear the details from the experts |
A valuable clue. Thank you! |
Moved from https://logstash.jira.com/browse/LOGSTASH-470
ElasticSearch has the ability to automatically prune older messages if a TTL value is provided by the message or set as default on the index itself.
So for example you could create a single index and tell it store events for 30 days, after 30 days ElasticSearch would start removing the old entries from the index. This could also be used with the daily style indexes that LogStash automatically creates. Although this would not delete the index itself it would empty it out.
Also it may not be a bad idea to allow for filters to also be able to tweak this value. That way I could save my access log entries for a week but my error logs for a month.
Interesting comment from MixMuffins
The thing about using the TTL is that it creates a lot of excess overhead in elasticsearch if you're dealing with a lot of indexes. Have you tried considering an automated script to remove entries after a certain period of time, and having that run as a cronjob? It'd be more efficient and wouldn't sacrifice speed on your elasticsearch indexing.
The text was updated successfully, but these errors were encountered: