Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SASI indices #1328

Merged
merged 12 commits into from Feb 14, 2019

Conversation

Projects
None yet
3 participants
@black-adder
Copy link
Collaborator

commented Feb 11, 2019

Signed-off-by: Won Jun Jang wjang@uber.com

A lot of Copy Pasta from #793. Resolves #790.

Migration Path:

  1. Run v001tov002part1.sh which will copy dependencies into a csv, update the dependency UDT, create a new dependenciesv2 table, and write dependencies from the csv into the dependenciesv2 table.
  2. Run the collector and query services with the cassandra flag dependency-sasi-disabled=true which will update jaeger to read from the new dependenciesv2 table.
  3. Update spark job to write to the new dependenciesv2 table (this is a TODO).
  4. Run v001tov002part2.sh which will delete the old dependency table and the SASI index.

Users who wish to continue to use the SASI indices don't have to do anything as the cassandra flag dependency-sasi-disabled will default to false. Ideally, users will migrate on their own timeline and we can remove support for the old table in a major version release.

  • Fix crossdock
  • Actually test
query := s.session.Query(depsSelectStmt, endTs.Add(-1*lookback), endTs)
startTs := endTs.Add(-1 * lookback)
var query cassandra.Query
switch s.indexMode {

This comment has been minimized.

Copy link
@vprithvi

vprithvi Feb 11, 2019

Member

Would GetDependencies be simplified by omitting this switch? Instead we might read from SASIDisabled first, and if that fails we can read from SASIEnabled?
When migrating, this would mean that users don't need to redeploy query after moving over the dependency table

This comment has been minimized.

Copy link
@black-adder

black-adder Feb 11, 2019

Author Collaborator

I'm not a fan of adding the extra roundtrip time. Additionally, if the user has opted in to migrating to the new table, redeploying the query service will not be the most strenuous thing.

Port int `yaml:"port"`
Authenticator Authenticator `yaml:"authenticator"`
DisableAutoDiscovery bool `yaml:"disable_auto_discovery"`
DependencySASIDisabled bool `yaml:"dependency_sasi_disabled"`

This comment has been minimized.

Copy link
@vprithvi

vprithvi Feb 12, 2019

Member

Could this be simplified to SASI disabled?

This comment has been minimized.

Copy link
@black-adder

black-adder Feb 12, 2019

Author Collaborator

i'd rather it be more explicit but I can simplify

Parent: d.Parent,
Child: d.Child,
CallCount: int64(d.CallCount),
}
if s.indexMode == SASIEnabled {
dep.Source = string(d.Source)

This comment has been minimized.

Copy link
@vprithvi

vprithvi Feb 12, 2019

Member

Does this assignment need to be conditional? (Also - shouldn't this be set for SASIDisabled?)

This comment has been minimized.

Copy link
@black-adder

black-adder Feb 12, 2019

Author Collaborator

yes, if you attempt to add the source field to a schema that doesn't support it, c* will error. And good catch, it should be for disabled

This comment has been minimized.

Copy link
@vprithvi

vprithvi Feb 12, 2019

Member

Yes - but that doesn't answer my question; the query string for SASIDisabled doesn't include this field anyways

This comment has been minimized.

Copy link
@black-adder

black-adder Feb 12, 2019

Author Collaborator

i'll test it and let you know

suffixCA = ".tls.ca"
suffixServerName = ".tls.server-name"
suffixVerifyHost = ".tls.verify-host"
suffixDependencySASIDisabled = ".dependency-sasi-disabled"

This comment has been minimized.

Copy link
@vprithvi

vprithvi Feb 12, 2019

Member

Could we use sasi-disabled instead of dependency-sasi-disabled?

}

// NewDependencyStore returns a DependencyStore
func NewDependencyStore(
session cassandra.Session,
metricsFactory metrics.Factory,
logger *zap.Logger,
indexMode IndexMode,

This comment has been minimized.

Copy link
@vprithvi

vprithvi Feb 12, 2019

Member

Is there anything preventing someone from passing in an undefined IndexMode, like IndexMode(10) for e.g.?

This comment has been minimized.

Copy link
@black-adder

black-adder Feb 12, 2019

Author Collaborator

ill validate here

@@ -106,7 +106,14 @@ func (f *Factory) CreateSpanWriter() (spanstore.Writer, error) {

// CreateDependencyReader implements storage.Factory
func (f *Factory) CreateDependencyReader() (dependencystore.Reader, error) {
return cDepStore.NewDependencyStore(f.primarySession, f.primaryMetricsFactory, f.logger), nil
var (
sasiDisabled = f.Options.GetPrimary().DependencySASIDisabled

This comment has been minimized.

Copy link
@vprithvi

vprithvi Feb 12, 2019

Member

I think the following reads simpler:

indexMode := cDepStore.SASIEnabled
if f.Options.GetPrimary().DependencySASIDisabled {
indexMode = cDepStore.SASIDisabled
}
Parent: "goo",
Child: "gle",
Parent: "bi",
Child: "ng",

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

quit raising entropy

SASIDisabled

depsInsertStmtSASI = "INSERT INTO dependencies(ts, ts_index, dependencies) VALUES (?, ?, ?)"
depsInsertStmt = "INSERT INTO dependenciesv2(ts, date_bucket, dependencies) VALUES (?, ?, ?)"

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

can we do dependencies_v2


-- compaction strategy is intentionally different as compared to other tables due to the size of dependencies data
CREATE TABLE IF NOT EXISTS jaeger.dependenciesv2 (
date_bucket bigint,

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

I prefer that we define it as ts_bucket timestamp field. The code can truncate the timestamp to a Day precision, or we can even make it configurable if someone wants finer control (e.g maybe they are going to save deps every minute).

Show resolved Hide resolved plugin/storage/cassandra/schema/migration/v001tov002part1.sh
@codecov

This comment has been minimized.

Copy link

commented Feb 12, 2019

Codecov Report

Merging #1328 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1328   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files         162     163    +1     
  Lines        7326    7367   +41     
======================================
+ Hits         7326    7367   +41
Impacted Files Coverage Δ
plugin/storage/cassandra/dependencystore/model.go 100% <100%> (ø) ⬆️
plugin/storage/cassandra/options.go 100% <100%> (ø) ⬆️
...lugin/storage/cassandra/dependencystore/storage.go 100% <100%> (ø) ⬆️
plugin/storage/cassandra/factory.go 100% <100%> (ø) ⬆️
model/dependencies.go 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e1ee51...18aa185. Read the comment docs.

@@ -8,6 +8,17 @@ Changes by Version

##### Breaking Changes

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

I am guessing this is technically a non-breaking change. It will become breaking once we start defaulting to _v2

depsInsertStmt = "INSERT INTO dependencies(ts, ts_index, dependencies) VALUES (?, ?, ?)"
depsSelectStmt = "SELECT ts, dependencies FROM dependencies WHERE ts_index >= ? AND ts_index < ?"
// SASIEnabled is used when the dependency table is SASI indexed.
SASIEnabled IndexMode = iota

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

would it not make sense to simply refer to this as v2 throughout, including constants, variables, the CLI flags? SASI then becomes a side effect that the user doesn't really need to know about.

Parent: d.Parent,
Child: d.Child,
CallCount: int64(d.CallCount),
}
if s.indexMode == SASIDisabled {

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

another argument to refer to this change as "dependencies v2" rather than tying to SASI.

@black-adder black-adder force-pushed the remove_sasi_indices branch from 23bd58e to efeaa92 Feb 12, 2019

@yurishkuro
Copy link
Member

left a comment

lgtm, just need to confirm the code works with the old schema

Migration Path:

1. Run `plugin/storage/cassandra/schema/migration/v001tov002part1.sh` which will copy dependencies into a csv, update the `dependency UDT`, create a new `dependencies_v2` table, and write dependencies from the csv into the `dependencies_v2` table.
2. Run the collector and query services with the cassandra flag `enable-dependencies-v2=true` which will update jaeger to write and read to and from the new `dependencies_v2` table.

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

cassandra flag enable-dependencies-v2=true

please specify the exact flag name

which will update jaeger to write

which will instruct jaeger to write


1. Run `plugin/storage/cassandra/schema/migration/v001tov002part1.sh` which will copy dependencies into a csv, update the `dependency UDT`, create a new `dependencies_v2` table, and write dependencies from the csv into the `dependencies_v2` table.
2. Run the collector and query services with the cassandra flag `enable-dependencies-v2=true` which will update jaeger to write and read to and from the new `dependencies_v2` table.
3. Update [spark job](https://github.com/jaegertracing/spark-dependencies) to write to the new `dependencies_v2` table.

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

this should rather say "update Spark job to version N"

This comment has been minimized.

Copy link
@black-adder

black-adder Feb 13, 2019

Author Collaborator

I'll punt on this until I've actually made the change in spark.

@@ -36,6 +37,8 @@ func (d *Dependency) MarshalUDT(name string, info gocql.TypeInfo) ([]byte, error
return gocql.Marshal(info, d.Child)
case "call_count":
return gocql.Marshal(info, d.CallCount)
case "source":

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

what happens if you run this code against the schema where UDT was not upgraded? Will gocql simply not invoke this function for "source"?

This comment has been minimized.

Copy link
@black-adder

black-adder Feb 13, 2019

Author Collaborator

yes

}

// Sanitize sanitizes the DependencyLink.
func (d DependencyLink) Sanitize() DependencyLink {

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

ApplyDefaults

Show resolved Hide resolved plugin/storage/cassandra/schema/migration/v001tov002part1.sh
fi


while IFS="," read ts dependency; do bucket=`date +"%Y%m%d" -d "$ts"`; echo "$bucket,$ts,$dependency"; done < dependencies.csv > dependencies_datebucket.csv

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

can you write it in normal multi-line syntax?

echo "About to delete $row_count rows."
confirm

cqlsh -e "DROP INDEX IF EXISTS $keyspace.ts_index"

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 12, 2019

Member

$keyspace.ts_index

naming 🤦‍♂️

black-adder added some commits Feb 11, 2019

Remove SASI indices
Signed-off-by: Won Jun Jang <wjang@uber.com>
address comments
Signed-off-by: Won Jun Jang <wjang@uber.com>
actually address comments
Signed-off-by: Won Jun Jang <wjang@uber.com>
update CHANGELOG
Signed-off-by: Won Jun Jang <wjang@uber.com>
address comments
Signed-off-by: Won Jun Jang <wjang@uber.com>
fix migration scripts
Signed-off-by: Won Jun Jang <wjang@uber.com>
first attempt
Signed-off-by: Won Jun Jang <wjang@uber.com>
yay
Signed-off-by: Won Jun Jang <wjang@uber.com>
remove space
Signed-off-by: Won Jun Jang <wjang@uber.com>

@black-adder black-adder force-pushed the remove_sasi_indices branch from 77cd266 to 0dbea68 Feb 13, 2019


// IsValid returns true if the Version is a valid one.
func (i Version) IsValid() bool {
return i < end

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 13, 2019

Member

I would prefer a less confusing name than 'end', e.g. 'versionEnumEnd'

This comment has been minimized.

Copy link
@yurishkuro

yurishkuro Feb 13, 2019

Member

also, Version(-1) is valid according to this function

make migrations scripts more usable
Signed-off-by: Won Jun Jang <wjang@uber.com>
address comments
Signed-off-by: Won Jun Jang <wjang@uber.com>

@pavolloffay pavolloffay referenced this pull request Feb 13, 2019

Closed

Release 1.10 #1321

4 of 4 tasks complete

@yurishkuro yurishkuro referenced this pull request Feb 13, 2019

Closed

Make dependencies_v2 the default for Cassandra #1344

3 of 3 tasks complete

@black-adder black-adder merged commit 7e51957 into master Feb 14, 2019

6 of 7 checks passed

License Compliance 8 issues found
Details
DCO DCO
Details
WIP Ready for review
Details
codecov/patch 100% of diff hit (target 100%)
Details
codecov/project 100% (target 100%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ghost ghost removed the review label Feb 14, 2019

@black-adder black-adder deleted the remove_sasi_indices branch Feb 14, 2019

@huynq0911 huynq0911 referenced this pull request Feb 15, 2019

Merged

Fix unsorted imports #1347

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.