Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for _ttl at event level ? #43

Closed
wiibaa opened this issue Jan 26, 2015 · 3 comments
Closed

Add support for _ttl at event level ? #43

wiibaa opened this issue Jan 26, 2015 · 3 comments

Comments

@wiibaa
Copy link
Contributor

wiibaa commented Jan 26, 2015

Moved from https://logstash.jira.com/browse/LOGSTASH-470
ElasticSearch has the ability to automatically prune older messages if a TTL value is provided by the message or set as default on the index itself.
So for example you could create a single index and tell it store events for 30 days, after 30 days ElasticSearch would start removing the old entries from the index. This could also be used with the daily style indexes that LogStash automatically creates. Although this would not delete the index itself it would empty it out.
Also it may not be a bad idea to allow for filters to also be able to tweak this value. That way I could save my access log entries for a week but my error logs for a month.

Interesting comment from MixMuffins
The thing about using the TTL is that it creates a lot of excess overhead in elasticsearch if you're dealing with a lot of indexes. Have you tried considering an automated script to remove entries after a certain period of time, and having that run as a cronjob? It'd be more efficient and wouldn't sacrifice speed on your elasticsearch indexing.

@untergeek
Copy link
Contributor

@wiibaa, thanks for migrating these. I know you're aware of these details, but I'm going to comment here for the benefit of other readers.

TTL is a bad idea for time-series data as indices can grow to billions of documents per day. If a document-level TTL is set on any document in the index, the entire index will be scanned every 60 seconds (a configurable default) to look for documents which have a TTL set, and to check if it has expired. This is a tremendous amount of overhead just for reading, not even deleting.

Even using a delete_by_query via cron is problematic because of how it affects segment sizing and allocation. It makes for very uneven segment merges which puts strain on your indexing and search operations. In addition, deleting documents in Elasticsearch does not result in immediate deletion. From Elasticsearch: The Definitive Guide book:

Internally, Elasticsearch has marked the old document as deleted…The old version of the document doesn’t disappear immediately, although you won’t be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.

The need to prune documents by TTL or by query is necessary in certain environments, but time-series data (like logs) should almost never be handled in this way, in my opinion. Your Elasticsearch environment will be far better served by splitting into separate indices and dropping them with DELETE calls (or using curator).

With this said, if you insist on doing event-level TTL, there's nothing to prevent users from adding a TTL to individual events in Logstash by adding a field called _ttl with a string value that Elasticsearch will recognize such as "1d" for one day, or "1w" for one week. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-ttl

  mutate {
    add_field => { "_ttl" => "1d" }
  }

Because this is fully supported now—because of the Logstash 1.2 schema change—I'm going to close this issue with all caveats mentioned. Feel free to re-open if you believe it necessary.

@wiibaa
Copy link
Contributor Author

wiibaa commented Jan 27, 2015

@untergeek I had a rough idea, but it is always good to hear the details from the experts

@wols
Copy link

wols commented May 9, 2015

A valuable clue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants