Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cassandra] Remove SASI index from dependencies table #793

Closed
wants to merge 2 commits into from

Conversation

vprithvi
Copy link
Contributor

@vprithvi vprithvi commented Apr 27, 2018

  • remove SASI indexes so that users may use older Cassandra versions or ScyllaDB
  • provide a migration script for current users

ref #790
Signed-off-by: Prithvi Raj p.r@uber.com

WITH compaction = {
PRIMARY KEY (bucket, ts)
) WITH CLUSTERING ORDER BY (ts DESC)
AND compaction = {
'min_threshold': '4',
'max_threshold': '32',
'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually delete SASI below

ts timestamp,
ts_index timestamp,
ts timestamp,
date_bucket text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had buckets in a different format, this can be int, or bigint if we anticipate smaller buckets in the future.

ts timestamp,
ts_index timestamp,
ts timestamp,
date_bucket text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's customary to list PK fields at the top

PRIMARY KEY (ts)
)
WITH compaction = {
PRIMARY KEY (bucket, ts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bucket/data_bucket

@vprithvi vprithvi force-pushed the remove-SASI-index branch 2 times, most recently from f07db20 to 5e58764 Compare May 2, 2018 20:56
@coveralls
Copy link

coveralls commented May 2, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 7d98f69 on remove-SASI-index into 4106c29 on master.

- Use traditional indexes and a date bucket field instead
- This allows people to use older versions of cassandra

Signed-off-by: Prithvi Raj <p.r@uber.com>
@vprithvi vprithvi changed the title WIP: Remove SASI indexes Remove SASI indexes May 7, 2018
Copy link
Contributor

@isaachier isaachier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. You might want to run shellcheck on the bash script.

#!/usr/bin/env bash

function usage {
>&2 echo "Error: $1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use a heredoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, let me try it out

@@ -93,6 +98,14 @@ func (s *DependencyStore) GetDependencies(endTs time.Time, lookback time.Duratio
return mDependency, nil
}

func getDepSelectString(startTs time.Time, endTs time.Time) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cassandra has no date range support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cassandra does not support range queries on partition keys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that explains it.

confirm() {
read -r -p "${1:-Are you sure? [y/N]} " response
case "$response" in
[yY][eE][sS]|[yY])
Copy link
Contributor

@isaachier isaachier May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably just do if [[ "$(echo $response | tr 'a-z' 'A-Z')" =~ "Y(ES)?" ]].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying it's necessarily easier, but why wouldn't it work?

Copy link
Contributor Author

@vprithvi vprithvi May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I didn't spend time debugging it. Does it work for you?

bash-3.2$ response=Y && if [[ "$(echo $response | tr 'a-z' 'A-Z')" =~ "Y(ES)?" ]]; then echo "passed"; fi
bash-3.2$

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think bash regex might just be awful.

@black-adder
Copy link
Contributor

Changes look fine, what's our strategy for rolling this out? We going to wait until a major release? If not, we probably want this to be backwards compatible.

@vprithvi
Copy link
Contributor Author

vprithvi commented May 7, 2018

Changes look fine, what's our strategy for rolling this out? We going to wait until a major release? If not, we probably want this to be backwards compatible.

The strategy for rolling out is for people to stop the spark dependency job, which stops writes to the dependency table, run the migration script, and start a new version of the spark job. (I'm writing this up in a readme)

@vprithvi
Copy link
Contributor Author

I had an offline discussion with @black-adder and decided that it makes sense to add a new Keyspace to accomplish some backward compatibility.

The new dependencies table will live in the new keyspace, so that jaeger-query instances that are not updated can still access old dependencies at the older location.

@yurishkuro
Copy link
Member

I think I'd need more info. The main concern is data compatibility, not running outdated query - upgrading components is the easy part.

@vprithvi
Copy link
Contributor Author

I think I'd need more info.

The idea is the we will create a new Keyspace as follows:
jaeger_analytics_$dc_$version

This keyspace contains the new dependency table.

The migration path looks like the following:

  1. Run a script to create the new keyspace and schema
  2. Run the updated spark-dependencies job (which writes to the new schema)
  3. Run a data migration script to transfer old dependencies from the old table to the new table
  4. Update jaeger-query
  5. Kill the old spark-dependencies job
  6. Delete the old dependencies table in the old keyspace

The main benefit offered here is that dependencies will be available throughout the update/migration process because the old spark-dependencies job still writes in the old data format.


cqlsh -e "CREATE TABLE $keyspace.dependencies (
ts timestamp,
date_bucket bigint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition key first

@yurishkuro yurishkuro changed the title Remove SASI indexes [cassandra] Remove SASI index from dependencies table Nov 13, 2018
@yurishkuro yurishkuro mentioned this pull request Nov 13, 2018
@black-adder black-adder mentioned this pull request Feb 11, 2019
2 tasks
@vprithvi
Copy link
Contributor Author

Superseded by #1328

@vprithvi vprithvi closed this Feb 13, 2019
@ghost ghost removed the review label Feb 13, 2019
@pavolloffay pavolloffay deleted the remove-SASI-index branch August 27, 2019 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants