Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: trace and observe each zoekt host #12516

Merged
merged 2 commits into from Jul 28, 2020
Merged

search: trace and observe each zoekt host #12516

merged 2 commits into from Jul 28, 2020

Conversation

keegancsmith
Copy link
Member

This commit moves tracing and observability from being around the
aggregated zoekt client to being both the aggregated client and a trace
per zoekt replica.

To do this we adjust the prometheus metric to have a hostname label. We
also change the category / family name to indicate if its a search
against a specific zoekt replica or its the aggregation.

This commit moves tracing and observability from being around the
aggregated zoekt client to being both the aggregated client and a trace
per zoekt replica.

To do this we adjust the prometheus metric to have a hostname label. We
also change the category / family name to indicate if its a search
against a specific zoekt replica or its the aggregation.
@keegancsmith keegancsmith requested review from uwedeportivo and a team July 28, 2020 13:08
@pecigonzalo
Copy link
Contributor

pecigonzalo commented Jul 28, 2020

In theory, each service should already have an identifier as prometheus adds it to it from discovery. Ill verify.


Yeah, they have instance already. I think Im getting this the other way around and this is the hostname of the target, is that the case?

@uwedeportivo
Copy link
Contributor

@pecigonzalo yes, correct

@pecigonzalo
Copy link
Contributor

I would be careful with sort of metrics as they can create a cardinality explosion for the metrics DB. In most cases, I believe the downstream service should actually provide the metrics if possible, and this service only provide his health (general latency to the upstream, etc)

@uwedeportivo
Copy link
Contributor

@pecigonzalo yes, you are right. you would be surprised of our labelling in general :-). this one is "only" a multiplier of 15 at the bigdata cluster. we have far worse offenders in our code base. we need to clean and reduce sometime. this one is super useful for our current debug situation with our bigdata customer.

@uwedeportivo uwedeportivo merged commit 5d25b0d into master Jul 28, 2020
@uwedeportivo uwedeportivo deleted the zoekt-tracing branch July 28, 2020 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants