Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job & instance labels to span metrics, a new target_info metrics, and the ability to customize dimension label mapping #2261

Merged
merged 37 commits into from
May 9, 2023

Conversation

ie-pham
Copy link
Collaborator

@ie-pham ie-pham commented Mar 27, 2023

What this PR does: Add the service label to job in span metrics and add instance as additional intrinsic dimensions. Add feature to allow custom dimension label mapping.

Dimension Mapping config:

span_metrics+: {
          dimensions: [],
          dimension_mappings: [
            { // label would be "vulture-zero" and the value would be "<vulture-0>"
              name: 'vulture-zero',
              source_labels: ['vulture-0'],
              join: '/'
            },
            { // label would be "vulture-index" and the value would be "<vulture-0>/<vulture-1>"
              name: 'vulture-index',
              source_labels: ['vulture-0', 'vulture-1'],
              join: '/'
            },
          ],
        },

Decisions for target_info/job/instance:
#2261 (comment)

  1. If service.name is present, then job is present. If not, the label is skipped. If ONLY service.namespace is present and not service.name, then also job is skipped.
  2. If service.instance.id is present then instance is present. If not, the label is skipped.
  3. If there are NO resource labels other than the job and instance labels, then the target_info is skipped entirely.
  • target_info should contain all resource attributes + job + instance

Make sure this is a non-breaking change

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ie-pham ie-pham changed the title add job, instance, target_info Add job & instance labels to span metrics, a new target_info metrics, and the ability to customize dimension label mapping Mar 28, 2023
@ie-pham ie-pham marked this pull request as ready for review March 28, 2023 01:48
@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented Mar 28, 2023

@ie-pham thanks for this great news.
I have a question please: do we rename service into job or do we compose job as:

  • ${service.namespace}/${service.name} if service.namespace is defined
  • ${service.name} otherwise?

cc @gouthamve @jpkrohling

Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Please correct me if I'm wrong, I'm no expert in the requirements for target_info.

modules/generator/processor/spanmetrics/spanmetrics.go Outdated Show resolved Hide resolved
modules/generator/processor/util/util.go Show resolved Hide resolved
modules/generator/processor/spanmetrics/spanmetrics.go Outdated Show resolved Hide resolved
@jpkrohling
Copy link

Is this change drifting this component further away from the upstream component? Is there an intention of converging them into one?

@knylander-grafana
Copy link
Contributor

@ie-pham Do we need documentation for this? It looks like new features that need content.

@ie-pham
Copy link
Collaborator Author

ie-pham commented Mar 28, 2023

@ie-pham Do we need documentation for this? It looks like new features that need content.

yes we do, can you point me to where i need to update this?

@gouthamve
Copy link
Member

Also this will cause an increase in cardinality because we are not including service.instance.id in the labels. It should be documented clearly.

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So w/o finding every line I've noticed that some of these changes will impact existing cloud users. Specifically the changes will increase cardinality and impact their bill. We either need the defaults to not change the metrics that are being produced or we need to communicate this change in advance in some clear and official way to our cloud users.

Will look to @gouthamve to provide guidance.

Copy link

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the changes from an OTel perspective, as I understand they will indeed bring this closer to the expected pipelines in upstream (otelcol with span to metrics connector and Prometheus exporter)

@ie-pham
Copy link
Collaborator Author

ie-pham commented Mar 30, 2023

@gouthamve @jpkrohling @kovrus can the user disable job and instance?

@kovrus
Copy link

kovrus commented Mar 31, 2023

@gouthamve @jpkrohling @kovrus can the user disable job and instance?

What is the motivation behind that?

Those are generated by Prometheus remote write, it is not possible to disable it manually, but if the service.name or service.instance.id is missing then they won't be created https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/prometheusremotewrite/helper.go#L188-L205. When using span metrics connector with prom remote write there always be service.name generated (if not, those points will be dropped).

@kovrus
Copy link

kovrus commented Mar 31, 2023

@ie-pham is it possible to add some tests for the target info generation?

@ie-pham
Copy link
Collaborator Author

ie-pham commented Mar 31, 2023

@ie-pham is it possible to add some tests for the target info generation?

@kovrus tests have already been added to "spanmetrics_test"

Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Nice work. A couple comments but otherwise I'm +1 here.

docs/sources/tempo/configuration/_index.md Show resolved Hide resolved
modules/overrides/limits.go Outdated Show resolved Hide resolved
Comment on lines 179 to 182
p.spanMetricsCallsTotal.UpdateLabels(p.labels)
p.spanMetricsDurationSeconds.UpdateLabels(p.labels)
p.spanMetricsSizeTotal.UpdateLabels(p.labels)
p.spanMetricsTargetInfo.UpdateLabels(p.targetInfoLabels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is thread-safe. Batches can be concurrently pushed to a processor. If a set of traces has instance_id, and other doesn't, how does that affect all the labels that are being mutated in the metrics and the processor itself (eg. line 174)?

Copy link
Collaborator Author

@ie-pham ie-pham Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! i made some changes and added tests for this scenario but yeah I'm not sure how to handle concurrency. Do you have any recommendations on how to handle it?

@ie-pham ie-pham merged commit 2981488 into grafana:main May 9, 2023
14 checks passed
mapno pushed a commit to mapno/tempo that referenced this pull request May 10, 2023
… and the ability to customize dimension label mapping (grafana#2261)

* add job, instance, target_info

* add test & gofmt

* fmt & test

* fix e2e test

* add custom dimension mapping

* updated test for custom mapping and changelog

* fmt

* changed the dimension mapping logic

* fix test

* update logic for job instance

* fix test & lint

* add service back and add gauge for target_info

* added overrides and docs

* undo config test change

* typo

* change traces_spanmetrics_target_info to just target_info

* lint

* fix overrides

* handle batches with different resource attributes

* redo counter

* refactor registry

* refactor label pair

* oops

* oops

* rebase mishap

* remove job instance by default

* refactor

* rebase hell

* count

* lint

* remove arbitary number

* traces_target_info

* comments

* sanitize labels for target info

* small refactor to improve perf

* rebased

* rebased
mapno added a commit that referenced this pull request May 10, 2023
…2447)

* Add config param for autocomplete filtering (#2433)

* Add config param for autocomplete filtering

* Docs

* Fix

* Fix test

* Address comments

* Add job & instance labels to span metrics, a new target_info metrics, and the ability to customize dimension label mapping (#2261)

* add job, instance, target_info

* add test & gofmt

* fmt & test

* fix e2e test

* add custom dimension mapping

* updated test for custom mapping and changelog

* fmt

* changed the dimension mapping logic

* fix test

* update logic for job instance

* fix test & lint

* add service back and add gauge for target_info

* added overrides and docs

* undo config test change

* typo

* change traces_spanmetrics_target_info to just target_info

* lint

* fix overrides

* handle batches with different resource attributes

* redo counter

* refactor registry

* refactor label pair

* oops

* oops

* rebase mishap

* remove job instance by default

* refactor

* rebase hell

* count

* lint

* remove arbitary number

* traces_target_info

* comments

* sanitize labels for target info

* small refactor to improve perf

* rebased

* rebased

---------

Co-authored-by: Jennie Pham <94262131+ie-pham@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants