Remove SASI index on dependency column family #790

vprithvi · 2018-04-26T19:19:08Z

While most SASI indexes were removed as part of #80, the one in the dependencies column family still exists, leading to problems when using older versions of cassandra that don't support SASI indexes, or using alternate storage like ScyllaDB .

We can update the dependency schema to not have SASI indexes, and provide a migration script from the old schema to the new schema.

We should also ensure that https://github.com/jaegertracing/spark-dependencies works with the new schema.

The text was updated successfully, but these errors were encountered:

yurishkuro · 2018-04-26T19:21:40Z

btw, I suggest adding a "source" column to the schema, potentially to represent data coming from different sources, i.e. not just from traces (where source would be an aggregation job), but say from service mesh, or network sniffing. The UI diagram can aggregate all sources together, and use different viz to distinguish the links.

vprithvi · 2018-04-26T20:05:10Z

I suggest adding a "source" column to the schema, potentially to represent data coming from different sources

I like this idea, but I don't think it should be part of the migration, I created #791 to capture this.

yurishkuro · 2018-04-26T20:09:54Z

I am not suggesting we implement all of the relevant business logic, but if we are already making a breaking schema change, why not include an extra field?

vprithvi · 2018-04-26T20:37:29Z

but if we are already making a breaking schema change, why not include an extra field?

Because it's unrelated to this change, and is unusable without the business logic. Why shouldn't the change be done along with the business logic?

vprithvi · 2018-04-27T20:12:43Z

I'm thinking that changing the data model to include a date bucket while making the time stamp as a clustering key would enable us to maintain the current query patterns while removing the SASI index.

The schema looks something like this:

jaeger/plugin/storage/cassandra/schema/v001.cql.tmpl

Lines 191 to 196 in d52969b

    
           CREATE TABLE IF NOT EXISTS ${keyspace}.dependencies ( 
        
               ts           timestamp, 
        
               date_bucket  text 
        
               dependencies list<frozen<dependency>>, 
        
               PRIMARY KEY (bucket, ts) 
        
           ) WITH CLUSTERING ORDER BY (ts DESC)

While the write path is largely unaffected, reads become a bit more involved, as we need to compute buckets that we want to retrieve dependencies from.

We also need to update the spark dependencies job.

The migration path seems to be the following:

Stop dependencies job
Use the Cassandra COPY command to export to a CSV file (which works for tables with less than 2 million rows)
Delete existing dependencies column family
Create dependencies column family with new schema
Massage the CSV file into the new format and load into new schema
Run new version of dependencies job which writes to new schema

@yurishkuro @black-adder @jpkrohling @pavolloffay wdyt?

vprithvi mentioned this issue Apr 26, 2018

Add a source column to the dependencies column family #791

Open

vprithvi mentioned this issue Apr 27, 2018

[cassandra] Remove SASI index from dependencies table #793

Closed

black-adder mentioned this issue Feb 13, 2019

Remove SASI indices #1328

Merged

2 tasks

black-adder closed this as completed in #1328 Feb 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove SASI index on dependency column family #790

Remove SASI index on dependency column family #790

vprithvi commented Apr 26, 2018 •

edited

yurishkuro commented Apr 26, 2018

vprithvi commented Apr 26, 2018

yurishkuro commented Apr 26, 2018

vprithvi commented Apr 26, 2018

vprithvi commented Apr 27, 2018

Remove SASI index on dependency column family #790

Remove SASI index on dependency column family #790

Comments

vprithvi commented Apr 26, 2018 • edited

yurishkuro commented Apr 26, 2018

vprithvi commented Apr 26, 2018

yurishkuro commented Apr 26, 2018

vprithvi commented Apr 26, 2018

vprithvi commented Apr 27, 2018

vprithvi commented Apr 26, 2018 •

edited