There is no way to ask ES to insert a timestamp at index time #15644

lam-juice · 2015-12-23T22:47:47Z

Due to the deprecation of _timestamp, there is no way to insert a timestamp at exactly the index time.

In my use case, the _timestamp field is useful for more than just ttl - I use ES as a timeseries storage, and if anyone wants to know how long the event pipeline takes to deliver the document, prior to ES 2.0 it can be calculated as (_timestamp - @timestamp) (provided that @timestamp is obtained from the parsed event time, and _timestamp is enabled)

when such events can be identified, time-sensitive operations that missed these events due to pipeline delays may attempt to recover the events by filtering on the pipeline delay time, i.e. "(_timestamp - @timestamp) > 2 hours"

asking Logstash to insert its own date field does not accurately capture any delay between the time when this new field is created, and the time the document is indexed.

i understand that the user may want to store multiple custom timestamps, e.g. moved, modified, etc. so let's not make one special _timestamp - however, i believe the time at which the document is indexed for the first time, is special and at least important for the purpose of identifying documents that needs some recovery action due to indexing delays.

Ghost93 · 2015-12-26T21:17:39Z

+1

we use it to calculate delay in production - we have a field called arriveTime and use aggregation on arriveTime - _timestamp

israel · 2016-01-07T07:15:12Z

+1

If there is no _timestamp system meta field, a retry scenario becomes less elegant and less efficient since the client will have to modify the index request source with a new "timestamp" value for its user defined "timestamp" field.

l15k4 · 2016-02-16T16:08:20Z

I'm dealing with the same thing, also I'm not sure whether I have to re-index all my data because of the deprecated _timestamp field : https://discuss.elastic.co/t/upgrade-from-1-7-x-to-2-2-0--timestamp-mapping-issue/41894

Documentation says it should be compatible, but it somehow is not...

clintongormley · 2016-02-29T20:36:03Z

In master this can be done using ingest pipelines. Closing in favour of #14049

djschny · 2016-03-14T15:35:55Z

Unfortunately ingest pipelines will not give what is desired though correct? Since it will not account for time an index request spends sitting in a threadpool queue prior to being worked on since the timestamp was added by an ingest node prior to getting to a data node.

s1monw · 2016-03-14T15:43:18Z

Unfortunately ingest pipelines will not give what is desired though correct? Since it will not account for time an index request spends sitting in a threadpool queue prior to being worked on since the timestamp was added by an ingest node prior to getting to a data node.

there is always a delta between the assignment of a timestamp and the moment when it's actually indexed. The timestamp doesn't guarantee lineralizability in any way, that's a sequence ID. This feature can only be as good as some client app setting the timestamp no matter where you put it. It should not be more neither less.

lam-juice · 2016-03-14T17:11:20Z

However, what is doable then - is to make a guarantee that the ingest pipeline would perform the field insertion at a stage no earlier than where _timestamp insertions are currently performed, correct?

If the guarantee can be made, please ignore everything below.

Even if there is always a delta, it is an issue involving its magnitude - i.e. not a binary issue - I believe in this case an effort should be made to minimize this delta - "delta always exists" - yes it's very well understood - however, it should not be used as a reason to arbitrarily alter this delta, while not providing a mean to accomplish what the small delta used to.

i.e. a small delta is better than a large delta - a timestamp inserted closer to actual indexing is better than a timestamp inserted by the client app. To say that it "can only as good as some client app setting the timestamp" totally ignores the meaning of the magnitude of the delta.

Do you agree that using _timestamp is a good way to measure pipeline delays? If not, is an alternative offered that works at least as well as _timestamp? Or, should we ignore Logstash delays altogether?

s1monw · 2016-03-14T19:16:03Z

IMO the timestamp should be assigned as soon as the document enters the system. that is what ingest will do. I think that is a clear property of the timestamp. I don't understand what folks are concerned about here, how much time do you think it takes from sending the doc until it's really indexed? I don't understand your usecase either, do you rely on the actual time the doc is indexed? what does it buy you?

please be reasonable we can't give you any total ordering guarantees based on timestamps.

if you are concerned about the corner case of the document being stuck in the thread-pool queue? I think you can just ignore that, unless you totally overload your sever it should be like a second at most or something?

lam-juice · 2016-03-14T20:02:38Z

You can probably find the answer if you ask yourselves, "Why was _timestamp created in the first place"?

What I'm concerned is having a knowledge - the best that a system can muster - the time elapsed between the moment an event is generated and the moment the same event is available for searches.

What does it buy us? ElasticSearch is only one component in the entire analytics pipeline. Let's say for example, perioidically, a process queries ElasticSearch and aggregates some results. When an event is delayed for a long enough time, it may miss such aggregation. A _timestamp provides a good approximation of the delay of an event, and provides a mean to rerun such aggregations on events that were missed. It's just one example.

Another example is that it can allow us to profile time spent in Logstash, so its configuration can be optimised - or if not possible - having Logstash replaced with something more efficient.

Judging from the replies I'm not the only person with such concern, so without the tool for measurement, nobody can say it's a "corner case", or "like a second at most or something", we're talking about.

s1monw · 2016-03-14T20:25:44Z

sorry I am not sure _timestamp had the properties you are asking for. It was added at some point inside elasticsearch and if you hammering the index you can easily got stuck in a lock on the indexing etc. there is no guarantee of any sort neither in the new way nor in the old way. There can be hours between the doc was indexed and it being visible in search. If you are relying on exact numbers here your system has a huge flaw, IMO.

@lam-juice @djschny the new way is as good as the old way, it is assigned at a slightly earlier stage but for all usecases of _timestamp the neither place should matter. If it does, _timestamp is not what you are looking for.

lam-juice · 2016-03-15T15:57:27Z

It depends on the guarantee we want - to us, ElasticSearch and what's feeding ElasticSearch its events are 2 separate entities, thus a facility is better than none in order to optimise and/or evaluate what's between ES and the event source (e.g. Logstash).

Does it provide 100% guarantee? No - and it's not what I need anyway. Has _timestamp statistically been providing what I need, which is 99% of the time a good approximation (even if it can have a long tail)? Definitely.

So you can see I'm not relying on exact numbers @s1monw - if it is assigned at a slightly earlier stage, so be it - deduplications provide the robustness I need anyway if I decide to observe the difference in the fatness of the tail, and reindex more conservatively - I think our fundamental disagreement is over the notion that "an ES-assigned timestamp does not accomplish more than a client-assigned timestamp" - because I believe a client-assigned timestamp does not allow even a rough estimate of time spent before an event crosses the boundary into ES.

s1monw · 2016-03-16T11:48:16Z

Does it provide 100% guarantee? No - and it's not what I need anyway. Has _timestamp statistically been providing what I need, which is 99% of the time a good approximation? Definitely.

that has not changed if you use ingest

"an ES-assigned timestamp does not accomplish more than a client-assigned timestamp"

that is what it basically was all the time no other guarantees given.

clintongormley added discuss :Search/Mapping Index mappings, including merging and defining field types labels Jan 10, 2016

clintongormley closed this as completed Feb 29, 2016

ryanpersaud mentioned this issue Aug 16, 2018

Usage of buffer_time Yelp/elastalert#1313

Open

gnumoksha mentioned this issue Jan 28, 2020

Probleme indexing a document using 'timestamp' field elastic/elasticsearch-php#756

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is no way to ask ES to insert a timestamp at index time #15644

There is no way to ask ES to insert a timestamp at index time #15644

lam-juice commented Dec 23, 2015

Ghost93 commented Dec 26, 2015

israel commented Jan 7, 2016

l15k4 commented Feb 16, 2016

clintongormley commented Feb 29, 2016

djschny commented Mar 14, 2016

s1monw commented Mar 14, 2016

lam-juice commented Mar 14, 2016

s1monw commented Mar 14, 2016

lam-juice commented Mar 14, 2016

s1monw commented Mar 14, 2016

lam-juice commented Mar 15, 2016

s1monw commented Mar 16, 2016

There is no way to ask ES to insert a timestamp at index time #15644

There is no way to ask ES to insert a timestamp at index time #15644

Comments

lam-juice commented Dec 23, 2015

Ghost93 commented Dec 26, 2015

israel commented Jan 7, 2016

l15k4 commented Feb 16, 2016

clintongormley commented Feb 29, 2016

djschny commented Mar 14, 2016

s1monw commented Mar 14, 2016

lam-juice commented Mar 14, 2016

s1monw commented Mar 14, 2016

lam-juice commented Mar 14, 2016

s1monw commented Mar 14, 2016

lam-juice commented Mar 15, 2016

s1monw commented Mar 16, 2016