Allow configurable elasticsearch index date format #840

pavelnikolov · 2018-05-19T13:25:19Z

Which problem is this PR solving?

Resolves Make default time span of ES index configurable #628

Short description of the changes

This PR allows overriding the Elasticsearch index date format. There is a new flag when using SPAN_STORAGE_TYPE=elasticsearch:

--es.index.date-format string       The date format of the Elasticsearch index names suffix, using Golang date format. https://golang.org/pkg/time/#example_Time_Format (default "2006-01-02")

Signed-off-by: Pavel Nikolov <pavel.nikolov@fairfaxmedia.com.au>

coveralls · 2018-05-19T13:35:54Z

Coverage remained the same at 100.0% when pulling 398d971 on pavelnikolov:support-custom-index-date-temaplte into 4402978 on jaegertracing:master.

yurishkuro · 2018-05-19T21:59:28Z

overall looks good, but a couple of comments:

does this PR require a corresponding change to the cron job script that cleans stale indices?
this change does not actually address the issue in Make default time span of ES index configurable #628, which was more about the frequency of index rotation than the format of the index name. In other words, the change would be to add a rotation frequency parameter, expressed as Duration, and then truncate the timestamps to that duration (https://golang.org/pkg/time/#Time.Truncate) before passing it to the date format function.
since this PR is already changing the name of the index, we might want to allow changing the rest of the name (prefix) as well, which would help Support archiving traces with ES storage #818

pavelnikolov · 2018-05-19T22:48:57Z

I have no idea about the cron-job. I'll have to get familiar with its code.
Definitely using time.Truncate gives you ultimate control over the duration of the rolling indices. I was thinking of a very basic configuration:
- monthly index: 2006-01
- daily index: 2006-01-02
- hourly index: 2006-01-02-15
  Unfortunately, we can't, for example, have new index every 4 hours or a weekly index. But I was hoping that the above 3 would cover most cases. Do you insist on having a time.Duration parameter and using time.Truncate? If yes, I believe this needs to be pushed in a separate PR.
I'll add an option to update the prefix as well.

yurishkuro · 2018-05-20T00:46:49Z

The cleaner script is here https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py

pavelnikolov · 2018-05-20T12:08:11Z

Do you think that the dependencies job would be affected as well? https://github.com/jaegertracing/spark-dependencies/blob/master/jaeger-spark-dependencies-elasticsearch/src/main/java/io/jaegertracing/spark/dependencies/elastic/ElasticsearchDependenciesJob.java#L151

yurishkuro · 2018-05-29T23:46:55Z

plugin/storage/es/spanstore/writer.go

@@ -76,6 +77,7 @@ type Span struct {
 // NewSpanWriter creates a new SpanWriter for use
 func NewSpanWriter(
 	client es.Client,
+	dateFormat string,


since we're breaking the API, let's use a struct instead

type SpanWriterOptions struct { dateFormat string numShards int64 numReplicas int64 } func NewSpanWriter( client es.Client, logger *zap.Logger, metricsFactory metrics.Factory, options SpanWriterOptions) { }

It will be easier to extend if we want to make index name configurable

yurishkuro · 2018-05-29T23:49:42Z

plugin/storage/es/spanstore/writer_test.go

+
+		dateFormat string
+
+		expectedSpanIndexName    string


blank lines not needed

yurishkuro · 2018-05-29T23:50:12Z

Yes, Spark job will be affected as well. But it would have to be a different PR.

CpuID · 2018-05-30T04:20:51Z

@pavelnikolov there has been some discussion in #799 re configurable index name prefixes, which is related to the changes being made here (your changes so far focus on date format, but the changes need to be made in the same 3~ places and coordinated). @yurishkuro has suggested his preference would be to make both sets of changes together to save doing the coordinated changes twice.

@pavelnikolov can you take a look over #799 and see if you feel up to making the suggested changes in this PR? I am happy to help/provide input if you need also, was planning to PR separately otherwise :)

pavelnikolov · 2018-06-11T08:49:05Z

I’ll check soon. I’ve been too busy lately with other things.

…

Sent from my phone

On 30 May 2018, at 14:21, Nathan Sullivan ***@***.***> wrote: @pavelnikolov there has been some discussion in #799 re configurable index name prefixes, which is related to the changes being made here (your changes so far focus on date format, but the changes need to be made in the same 3~ places and coordinated). @yurishkuro has suggested his preference would be to make both sets of changes together to save doing the coordinated changes twice. @pavelnikolov can you take a look over #799 and see if you feel up to making the suggested changes in this PR? I am happy to help/provide input if you need also, was planning to PR separately otherwise :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pavolloffay · 2018-11-20T14:15:43Z

I don't think we should go with this approach - configurable date format option. The biggest downside is the limitation on retention configuration:

monthly index: 2006-01
daily index: 2006-01-02
hourly index: 2006-01-02-15
yearly 2006

not possible to do weekly or N days/hours retention.

It's also harder to configure and ES cleaner would not look nice either - re date format in python is different.

I think we should go with truncation (span.StartTime.Truncate(d)) with minimal value of 1 day.

The reader would use the es.index.trunc-time to get the list of indexes based on es.max-span-age.

// Returns the array of indices that we need to query, based on query params
func (s *SpanReader) indicesForTimeRange(indexName string, startTime time.Time, endTime time.Time) []string {
	var indices []string
	firstIndex := indexWithDate(indexName, startTime, s.dateFormat)
	currentIndex := indexWithDate(indexName, endTime, s.dateFormat)
	for currentIndex != firstIndex {
		indices = append(indices, currentIndex)
                // use trunc time
		endTime = endTime.Add(-s.truncTime * time.Hour)
		currentIndex = indexWithDate(indexName, endTime, s.dateFormat)
	}
	return append(indices, firstIndex)
}

jaegerIndices := s.indicesForTimeRange(s.serviceIndexPrefix, currentTime.Add(-s.maxSpanAge), currentTime)

pavolloffay · 2018-11-20T14:24:08Z

cc @jaegertracing/elasticsearch

yurishkuro · 2018-11-20T14:43:51Z

Was there a specific reason someone asked specifically for date format rather than truncation?

pavolloffay · 2018-11-20T14:49:45Z

from #628 (comment)

At the moment all time based indices that we have in our ES clusters are named like _YYYY.MM.DD. And we have a single cronjob which deletes indices with data older than some configured number of days. Jaeger indices are the only exception to this naming convention and we have to run/maintain a separate cronjob just for Jaeger. Not a big deal, but it's a little annoying : )

I don't know how @pavelnikolov deletes indices. but IIRC curator should be format agnostic the creation date is retrieved from index properties.

masteinhauser · 2018-11-20T15:00:02Z

Was there a specific reason someone asked specifically for date format rather than truncation?

As long as the effect is the same: a span is written to the expected index, the means likely do not matter as long as the span's original timestamp is retained.

IIRC curator should be format agnostic the creation date is retrieved from index properties.

At least in later versions of elasticsearch-curator package, this is correct.

We are actually moving away from jaeger's time-based index naming entirely and choosing to use the Elasticsearch Rollover API as it scales more predictably for us. Example: 25GB per-shard, automatically, which is ideal for our write/read workloads (both human, and automated anomaly detection)
This is also what we use with our FluentD configuration in Kubernetes for per-team centralized logging.

not possible to do weekly or N days/hours retention.

At least in all of our usage, this largely doens't make sense in both our high-usage and low-usage indexes. The Elasticsearch Rollover API is a far better solution, from Elasticsearch's perspective, than sharding on time-based indices.

yurishkuro · 2018-11-20T15:10:25Z

Rollover API looks pretty interesting, thanks for pointing it out. Perhaps it can also be used to avoid iterating through indices at query time.

pavolloffay · 2018-11-20T15:45:13Z

Perhaps it can also be used to avoid iterating through indices at query time.

I think the reader would have to use a different index - alias https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html. Once the write index is rolled over the old index has to be moved to read alias. Perhaps we could use this to manage retention in archive index?

pavolloffay · 2018-11-26T14:20:43Z

@pavelnikolov I will close this PR for two reasons:

it does not seem as the right approach to configure retention - not flexible enough. Better is to use rollover API or truncate timestamp on configurable duration.
it will complicate the index configuration on index cleaner and spark job - they don't use golang time format.

Allow configurable elasticsearch index date format

081d6cc

Signed-off-by: Pavel Nikolov <pavel.nikolov@fairfaxmedia.com.au>

pavelnikolov requested review from black-adder, pavolloffay, vprithvi and yurishkuro as code owners May 19, 2018 13:25

yurishkuro mentioned this pull request May 29, 2018

Make Elasticsearch index prefix configurable #799

Closed

Merge branch 'master' into support-custom-index-date-temaplte

9cf0048

yurishkuro requested a review from jpkrohling as a code owner May 29, 2018 23:41

ghost assigned yurishkuro May 29, 2018

ghost added the review label May 29, 2018

yurishkuro reviewed May 29, 2018

View reviewed changes

plugin/storage/es/spanstore/writer_test.go

dateFormat string

expectedSpanIndexName string

Copy link

Member

yurishkuro May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank lines not needed

Merge branch 'master' into support-custom-index-date-temaplte

398d971

pavolloffay added the storage/elasticsearch label Aug 3, 2018

pavolloffay mentioned this pull request Nov 20, 2018

Allow configurable elasticsearch index date format #1201

Closed

pavolloffay closed this Nov 26, 2018

ghost removed the review label Nov 26, 2018

This was referenced Jan 21, 2019

WIP: Allow configuring es index rotation frequency #1290

Closed

Make default time span of ES index configurable #628

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configurable elasticsearch index date format #840

Allow configurable elasticsearch index date format #840

pavelnikolov commented May 19, 2018 •

edited by yurishkuro

Loading

coveralls commented May 19, 2018 •

edited

Loading

yurishkuro commented May 19, 2018

pavelnikolov commented May 19, 2018 •

edited

Loading

yurishkuro commented May 20, 2018

pavelnikolov commented May 20, 2018

yurishkuro May 29, 2018 •

edited

Loading

yurishkuro May 29, 2018

yurishkuro commented May 29, 2018

CpuID commented May 30, 2018

pavelnikolov commented Jun 11, 2018 via email

pavolloffay commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

yurishkuro commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

masteinhauser commented Nov 20, 2018

yurishkuro commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

pavolloffay commented Nov 26, 2018

Allow configurable elasticsearch index date format #840

Allow configurable elasticsearch index date format #840

Conversation

pavelnikolov commented May 19, 2018 • edited by yurishkuro Loading

Which problem is this PR solving?

Short description of the changes

coveralls commented May 19, 2018 • edited Loading

yurishkuro commented May 19, 2018

pavelnikolov commented May 19, 2018 • edited Loading

yurishkuro commented May 20, 2018

pavelnikolov commented May 20, 2018

yurishkuro May 29, 2018 • edited Loading

Choose a reason for hiding this comment

yurishkuro May 29, 2018

Choose a reason for hiding this comment

yurishkuro commented May 29, 2018

CpuID commented May 30, 2018

pavelnikolov commented Jun 11, 2018 via email

pavolloffay commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

yurishkuro commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

masteinhauser commented Nov 20, 2018

yurishkuro commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

pavolloffay commented Nov 26, 2018

pavelnikolov commented May 19, 2018 •

edited by yurishkuro

Loading

coveralls commented May 19, 2018 •

edited

Loading

pavelnikolov commented May 19, 2018 •

edited

Loading

yurishkuro May 29, 2018 •

edited

Loading