-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to use Cassandra-3.11.3's DelimiterAnalyzer and PREFIX mode SASI for the annotations index in the Zipkin2 Cassandra storage #1948
Conversation
@@ -33,12 +33,14 @@ CREATE TABLE IF NOT EXISTS zipkin2.span_by_service ( | |||
AND speculative_retry = '95percentile' | |||
AND comment = 'Secondary table for looking up span names by a service name.'; | |||
|
|||
DROP INDEX IF EXISTS zipkin2.span_annotation_query_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this there before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if the user has already used zipkin-storage/zipkin2_cassandra
and created the schema.
(This is the default name of an index created, ie from CREATE … INDEX … ON zipkin2.span (annotation_query) …
)
And if that is the case then the code won't actually be calling this cql file.
So this line only has any purpose if the cql file is going to be manually run.
This comes back to this question…
upgrade schema approach, or require users to drop keyspace if already using this storage backend
If we're happy to say anyone already using the new schema is expected to either:
- manually run
cqlsh -f zipkin2-schema-indexes.cql
, or - drop keyspace and let it be recreated from scratch.
Then this approach is ok.
If not then we need something more along the lines of zipkin-storage/cassandra/src/main/resources/cassandra-schema-cql3-upgrade-1.txt
looks sweet. should ping @openzipkin/cassandra about those running "cassandra3" if they can upgrade. I like this. Will be nice to see the storage impact before and after, and some throughput difference. |
Preconditions.checkState( | ||
0 < VersionNumber.parse("3.11.3").compareTo(host.getCassandraVersion()), | ||
"All Cassandra nodes must be running 3.11.3+"); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit confused when writing this…
Isn't this type of check done somewhere else in the codebase?
cql version check is done elsewhere.. I don't think impl version is...
|
c1da341
to
33e32d8
Compare
Here you go @adriancole …
It's visible from these that the new delimiter SASI annotations index is almost as fast as with no annotations index, both of which are ~20x as fast as the previous CONTAINS SASI annotations index.
|
The original benchmarking graph is available here: http://michaelsembwever.github.io/zipkin_1948.html |
33e32d8
to
9cca0c1
Compare
@drolando fyi cassandra 3.11.3 is coming fast and once this merges should affect indexing a lot |
@adriancole @michaelsembwever I'll wait until C* 3.11.3 is out before upgrading to Cassandra 3 then. |
@michaelsembwever Any idea about when they'll release the new version? I saw a blog post from 2 years ago about Cassandra doing monthly releases but that doesn't seem to be the case at all. |
9cca0c1
to
bfaddff
Compare
rebased in preparation of the pending release of cassandra 3.11.3 |
@@ -104,8 +104,7 @@ | |||
Input input = | |||
new AutoValue_SelectTraceIdsFromSpan_Input( | |||
serviceName, | |||
// % for like, bracing with ░ to ensure no accidental substring match | |||
"%░" + annotationKey + "░%", | |||
annotationKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we wouldn't want the trailing %
? I'm guessing that without the trailing %
it will just do a strict match vs a partial prefix right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this one is the exact match thing which our api uses
I would expect we arent looking to change the api to substring match as a
part of this change. I think we have negative tests for this actually.
the like part was an impl detail of the old concat indexing but the special
bounded it around the query
|
I didn't realize we had negative tests for this. I was looking forward to the ability to do prefixed based searches :/ It makes sense not to change API contracts for this right now though. P.S. sorry for closing this. Comment and Close is way to close to the Comment button :/ |
@@ -33,13 +33,14 @@ CREATE TABLE IF NOT EXISTS zipkin2.span_by_service ( | |||
AND speculative_retry = '95percentile' | |||
AND comment = 'Secondary table for looking up span names by a service name.'; | |||
|
|||
DROP INDEX IF EXISTS zipkin2.span_annotation_query_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question.. I guess this means that zipkin2.span (annotation_query)
results in an index named zipkin2.span_annotation_query_idx
? so essentially we are re-creating that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second question: does dropping this cleanup the data associated? and third: if we create an index below, is there a command we can use to rebuild the index based on old data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first question: yes. re-creating it with different mode and analyzer_class
second question: yes it's drops the data. creating the new index will go over the base data and rebuild it into the new index.
bfaddff
to
22f1445
Compare
fyi rebased this |
@@ -97,6 +99,11 @@ static KeyspaceMetadata getKeyspaceMetadata(Session session) { | |||
} | |||
|
|||
static KeyspaceMetadata ensureExists(String keyspace, boolean searchEnabled, Session session) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this means when you ask to automatically install the schema, we crash if an older version. Sounds fine as long as ENSURE_SCHEMA=false doesn't kill others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of the nodes in the Cassandra cluster are not 3.11.3 then the Zipkin server will crash.
For example someone accidently starts Zipkin up when the Cassandra cluster is mid-way through a rolling upgrade to 3.11.3
@@ -104,8 +104,7 @@ | |||
Input input = | |||
new AutoValue_SelectTraceIdsFromSpan_Input( | |||
serviceName, | |||
// % for like, bracing with ░ to ensure no accidental substring match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could do something like detect if the indexing type is updated. Ex if the mode=CONTAINS then box annotationKey like before?
22f1445
to
ec26344
Compare
rebased. |
@adriancole Cassandra-3.11.3 is out. |
…tions index in the Zipkin2 Cassandra storage ref: #1861
ec26344
to
2f304ad
Compare
Thanks much! we can release shortly |
So instructions for folks is to basically recreate their keyspace right? (or use a different name), correct? |
hot dog! |
The recommended approach would be to drop and create the keyspace. Advanced users could just run For example:
|
great!
|
out in zipkin 2.11 |
This PR is waiting for Cassandra-3.11.3 to first be released.
It switches the SASI used for the annotation_query from the expensive (cpu, latency and disk space)
CONTAINS NonTokenizingAnalyzer
to the fast efficientDelimiterAnalyzer
that was provided by @zuochangan and made available via CASSANDRA-14247.ref: #1861
To test Zipkin with the new Delimiter Tokenizer, which will be available in Cassandra-3.11.3, do the following steps:
To stress-test the Zipkin schema that uses the new Delimiter Tokenizer