Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine how to address the next Elasticsearch index pattern break #2219

Closed
codefromthecrypt opened this issue Oct 25, 2018 · 23 comments
Closed
Labels
elasticsearch Elasticsearch storage component

Comments

@codefromthecrypt
Copy link
Member

Many of you will recall that we had to redo everything last change to the index constraints: dropping of multi-type.

One thing we decided to do to accommodate this was to use a delimited as clearly we need to have separate indexing and retention for spans vs dependency links.

Unfortunately, we seem bit again.. Looks like Elasticsearch will also drop support for the colon pattern soon.

[2018-10-24T11:32:00,033][WARN ][o.e.d.c.m.MetaDataCreateIndexService] index or alias name [zipkin:span-2018-10-24] containing ':' is deprecated and will not be supported in Elasticsearch 7.0+

First, we should strongly push back on Elasticsearch as this causes another big round of work for us amplified by the ecosystem. It isn't quite cool to have to keep hitting mines like this. The knock on effects include double-reads, data migrations, all sorts of things, it isn't fun especially for a volunteer group.

Anyway, assuming pushback fails we need to address this thing:

  • figure out which character is ok now.
  • make conditional code (again) to address this not only here, but also in the dependency linker which is completely different codebase
  • communicate to custom ingesters about this and why and what to do about it.
@codefromthecrypt codefromthecrypt added the elasticsearch Elasticsearch storage component label Oct 25, 2018
@xeraa
Copy link
Member

xeraa commented Oct 25, 2018

Background: elastic/elasticsearch#23892

@codefromthecrypt
Copy link
Member Author

@xeraa yeah so I think the extent of this, tell me if I'm wrong, is that all people's stuff, their cleanup jobs, kibana searches etc will now have to do split mode right? How are you communicating to customers the work they have to do? I would like to be able to link to that as this is likely a big chore not just on our codebase.

@codefromthecrypt
Copy link
Member Author

I made a note requesting the same to the upstream issue

@xeraa
Copy link
Member

xeraa commented Oct 25, 2018

I might be reading https://github.com/elastic/elasticsearch/pull/26247/files wrong, but this should only apply to the creation of new indices in 7.0+. So you can keep using the old pattern for reads / searches, but it will indeed need a migration path for new indices and searching the combination of old and new indices.

We might not be aware how common colons are in index names, so this is definitely good feedback. With the collision for cross cluster search I'd assume that we want to make this switch, but maybe we can ease the migration path.

Hackish workaround: Configurable separator and existing indices get an alias to conform to the new pattern (though that was a 30s thought process at 3am, so this might totally not work for various reasons).

@codefromthecrypt
Copy link
Member Author

If it were me, I would probably have tried to hide the details of the cross-cluster search in such as way that it would be less impactful, for example through escaping or otherwise vs expecting everyone to change their clusters. Not everyone grants us ability to change the index templates, so this will shift folks. I've again asked to not do this, so it can be formally accepted for reconsideration or more formally not reconsidered.. Honestly, that issue if the only issue about colons had very little publicly visible consideration. We already have tons more thought in how to work around it, this seems a bad call elastic/elasticsearch#23892 (comment)

@codefromthecrypt
Copy link
Member Author

mighty heads up @openzipkin/elasticsearch 7.x will drop the ability to create indexes with colons and 8.x will drop the ability to read them. This forces us into a migration concern including likely some logic change to do dual-reads (performance affecting), like we had to last time. Aliasing is another choice, but that would require a script of some sort which we don't have a concrete example of yet. I'm still hoping ES can be creative and stop this train, but just in case, your thoughts are welcome.

@codefromthecrypt
Copy link
Member Author

so @xeraa had some tips.. if things keep going this way off head I think the impact is..

  • we have to choose a different delimiter than ':', even if parameterizing this, we still need a default. We can't predict if ES will blacklist a new one so I really want someone at ES to actively confirm the replacement.
  • We have to change our delimiter when we detect ES 7+ I suggest we do it like this as that defers maintenance for others vs forcing a change to unaffected versions. For example, this makes the decision site specific and aligns it with the root cause (Deprecate : in cluster and index names elastic/elasticsearch#23892) as opposed to what might feel like arbitrary busy work.
    • this implies a change to the index template in our side, and also on their side. This will almost certainly break people's sites who use secondary templates.
    • they will likely still have data so we'll need to form dual index patterns which in some cases will overflow the max query length (again)
    • beyond this, all their maintenance scripts, and any other ES tooling need to be updated. I really want ES to own this problem, ideally they will put together a list of things to do vs expecting us and everyone else in the world to do this individually.

@codefromthecrypt
Copy link
Member Author

FYI here's what our dependency linker code does for the current list of index changes:

  public void run() {
    run( // multi-type index
        index + "-" + dateStamp + "/span",
        index + "-" + dateStamp + "/dependencylink",
        SpanBytesDecoder.JSON_V1);

    run( // single-type index
        index + ":span-" + dateStamp + "/span",
        index + ":dependency-" + dateStamp + "/dependency",
        SpanBytesDecoder.JSON_V2);
    log.info("Done");
  }

Depending on if this is strictly just a delimiter issue, or more, it might be little or more than a little impact

@codefromthecrypt
Copy link
Member Author

I have a work testing now, which will use single-character wildcard on search _ and explicit characters on insert. So, for <7 index-type delimiter : and >=7 -. It occurred to me that since we use the hyphen delimiter elsewhere, if this becomes banned by ES in the future it will break everything else. Instead of waiting for a special character to be cleared and not working with ES7 at all, better to assume they won't break hyphen until that's not the case.

@codefromthecrypt
Copy link
Member Author

hmm maybe "_" isn't allowed in the POST urls...

@codefromthecrypt
Copy link
Member Author

PS we can't really test all of this independently as there is more than just the one issue here before Elasticsearch 7 works

@xeraa
Copy link
Member

xeraa commented Feb 18, 2019

Leading _ are not allowed for index names. Unfortunately we don't have whitelisted characters in our docs ATM, but both - and _ are very widely used, so I have high hopes that those will stay around 😅

@codefromthecrypt
Copy link
Member Author

not sure if #2428 is related.. for example if windows never supported colons in index names, or if this was a bug later fixed.

@gquintana
Copy link
Contributor

gquintana commented Apr 17, 2019

Instead of debating which character to use when building index names. What about making it configurable and let the developer choose the index pattern (with sensible defaults). In Logstash you can configure logstash-%{type}-%{+YYYY.MM.dd} for instance, it's very convenient, and it can be backward compatible. Even with Elasticsearch 6.x I'd like to drop the colon from index name, in order to ease upgrade to Elasticsearch 7.x.

About index templates, and mappings in particular, instead of trying to build a generic and complicated VersionSpecificTemplates class which can handle all versions. What about just having a template file for each version like in Logstash (in https://github.com/logstash-plugins/logstash-output-elasticsearch/tree/master/lib/logstash/outputs/elasticsearch see elasticsearch-template-*.json).

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Apr 17, 2019 via email

codefromthecrypt pushed a commit that referenced this issue Apr 17, 2019
@gquintana
Copy link
Contributor

Associating the delimiter change to ES version change is not a good thing to me https://github.com/apache/incubator-zipkin/blob/6a3b6446f011bacf497b0860f4a5c7cce3f7ae85/zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/IndexTemplates.java#L38
I am beginning with Zipkin+ES6 now, if I can change the delimiter now, the ES7 upgrade will be smoother.

@timothyBrake
Copy link

Can I asume Zipkin can't use the ElasticSearch 7.0 as storage until this issue is fixt or is there a workaround?

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Apr 19, 2019 via email

@xeraa
Copy link
Member

xeraa commented Apr 19, 2019

Unfortunately there is no workaround on the Elasticsearch side (like a configuration flag): in 7.x you cannot create an index with a colon in the name and in 8.x you won't be able to read such an index any more.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Apr 19, 2019 via email

@timothyBrake
Copy link

No pressure as there are other storage options ;-)

@codefromthecrypt
Copy link
Member Author

started mail thread on this because it is a constant concern and we have no choice but to formalize a support model around the problem https://lists.apache.org/thread.html/73c2efa69e3ff0a519c6b6c2f5e551159c34902c29df01b2703e9126@%3Cdev.zipkin.apache.org%3E

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 3, 2019

This will go out in next release after 2.13. For the impatient, you can grab off master like this for testing.

$ TAG=master-SNAPSHOT
$ curl -sSL https://jitpack.io/com/github/apache/incubator-zipkin/zipkin-server/${TAG}/zipkin-server-${TAG}-exec.jar > zipkin.jar
$ STORAGE_TYPE=elasticsearch java -jar zipkin.jar 

Note: the implementation reads both index/type delimiters which should allow for smooth upgrading, for some definition of smooth.

Massive notes here: https://github.com/apache/incubator-zipkin/blob/master/zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/IndexNameFormatter.java#L73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
elasticsearch Elasticsearch storage component
Projects
None yet
Development

No branches or pull requests

4 participants