Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Cassandra support for Autocomplete tags #2309

Merged
merged 3 commits into from Jan 1, 2019

Conversation

zeagord
Copy link
Member

@zeagord zeagord commented Dec 3, 2018

No description provided.

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start... next in-memory I think as then we can add some tests

return new SelectTagKeys(factory);
}

static class AccumulateTagsAllResults extends AccumulateAllResults<List<String>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible refactor possibility, as I recognise this class :P

zipkin/src/main/java/zipkin2/storage/TagStore.java Outdated Show resolved Hide resolved
Factory(Session session) {
this.session = session;
this.preparedStatement =
session.prepare(QueryBuilder.select("key").distinct().from(TABLE_TAGS));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the cardinality of tags will be? But this won't scale.

But, for example, if there's 1 million tags in a Zipkin storage this full-table scan is going to be painful (if not timeout).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I would do a progressive query (by pagination but only simple one, no ordering and deifnirively not with offset but with after.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have a whitelist of tags to index and not just index all tags you might get away without doing this at all. And just return the list of whitelisted tags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the whitelist implementation. Let me know your thoughts.

@zeagord
Copy link
Member Author

zeagord commented Dec 4, 2018

I have added the in-memory api. I will revisit the cassandra tomorrow and address the above concerns.

@codefromthecrypt
Copy link
Member

made a comment about fixed cardinality.. we definitely need to document this as it is indeed inappropriate for unbounded. #2236 (comment)

One thing @zeagord and I discussed is initially inheriting the config for the other names (service/span). This is for simplicity. In the future we could add a timestamp/lookback parameter to only fetch the values for a range. However, same problem would apply to service/span so thinking of that later

@zeagord
Copy link
Member Author

zeagord commented Dec 7, 2018

Need to add tests and revisit the Elastic search design.

@tacigar
Copy link
Member

tacigar commented Dec 9, 2018

In elasticsearch, how about using _q instead of changing mapping?

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 10, 2018 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 10, 2018 via email

@codefromthecrypt
Copy link
Member

so on cassandra (and elasticsearch) we'll need to ensure the "deduper" is in use to avoid thrashing writes.

In both cases, it might be helpful to reverse-engineer the service-span mapping to re-use the same table. ex PRIMARY KEY ((type, key), value) This could make data management in general easier long term.

Since there is a time bomb on elasticsearch #2219, we might want to solve that first before merging this (or at least before cutting a release with it).

Meanwhile, we can allow UI testing to work with static managed list of tags. (ex there may be only several values associated with phase, for example.. so one way is to allow the UI to configure predefined where it is small)

@zeagord
Copy link
Member Author

zeagord commented Dec 10, 2018

is this tagkey because "key" is reserved? I suspect if not easier to just do key/value

Bodyconverters for ES looks for "key" from the result of aggregations which clashes with the key in the tag {k,v}. It could be the name of the aggregation. I will change the name of the aggs alone and see if it works.

Factory(Session session) {
this.session = session;
this.preparedStatement =
session.prepare(QueryBuilder.select("key").distinct().from(TABLE_TAGS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have a whitelist of tags to index and not just index all tags you might get away without doing this at all. And just return the list of whitelisted tags.

@michaelsembwever
Copy link
Member

In both cases, it might be helpful to reverse-engineer the service-span mapping to re-use the same table. ex PRIMARY KEY ((type, key), value) This could make data management in general easier long term.

if service-span and tags are collapsed to one table it does make full table scans (eg QueryBuilder.select("key").distinct().from(TABLE__)) more painful…
i suspect keeping two separate tables is in fact wiser, thanks to @drolando for raising this.

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made most of a pass!

@codefromthecrypt
Copy link
Member

this chops off the basic functionality and will allow the UI work to start immediately when merged. I can help rework the other commits similarly #2332

@codefromthecrypt codefromthecrypt changed the title [WIP] Api to query by tags Adds Cassandra support for Autocomplete tags Dec 17, 2018
@codefromthecrypt
Copy link
Member

other storage impls pulled out in #2333 and #2334

@codefromthecrypt
Copy link
Member

I think this is nearly ready. we need to test the auto-upgrade logic and also update the README files to talk about how autocomplete works

@michaelsembwever
Copy link
Member

michaelsembwever commented Dec 20, 2018

Code changes LGTM.

One comment though: most of the diff is whitespace/code-style changes. For the reviewer there's a huge waste of time reading diffs that have nothing to do with the actual PR. It would be great if those changes where separated out to a separate follow-up commit (still within the PR). That way by reviewing just the first commit of the PR a lot of time could be saved.

@codefromthecrypt
Copy link
Member

@michaelsembwever sorry about the formatting thing. I was in a rush to get something stable before I turned off internet for the vacation, but that amplified efforts to others.. not sure the better call but I apologize nevertheless. thanks for reviewing despite this.

@codefromthecrypt
Copy link
Member

FYI travis is failing still on the same tests as last push this needs to be looked into prior to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants