Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadog]: sanitize datadog service names #1982

Merged
merged 5 commits into from Jan 13, 2021

Conversation

ericmustin
Copy link
Contributor

Description:

This PR adds some improved sanitisation and logic around the datadog service name.

certain invalid characters (such as: \t), were causing issues in the datadog UI, and so this PR applies normalization to service names to prevent this going forward.

Long term we'd like to be able to leverage some of the existing work in the datadog-agent, but this requires some work on this libraries end to expose for 3rd party use.

Testing:

Added Unit Tests

@ericmustin ericmustin requested a review from a team as a code owner January 11, 2021 16:01
@ericmustin
Copy link
Contributor Author

cc @mx-psi @KSerrania ptal if u have a moment

@codecov
Copy link

codecov bot commented Jan 11, 2021

Codecov Report

Merging #1982 (e82a168) into master (5010cd7) will decrease coverage by 0.02%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1982      +/-   ##
==========================================
- Coverage   90.39%   90.37%   -0.03%     
==========================================
  Files         394      394              
  Lines       19347    19358      +11     
==========================================
+ Hits        17489    17495       +6     
- Misses       1397     1403       +6     
+ Partials      461      460       -1     
Flag Coverage Δ
integration 69.77% <ø> (ø)
unit 89.15% <50.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
exporter/datadogexporter/utils/trace_helpers.go 0.00% <0.00%> (ø)
exporter/datadogexporter/translate_traces.go 87.67% <100.00%> (+2.51%) ⬆️
receiver/k8sclusterreceiver/watcher.go 95.29% <0.00%> (-2.36%) ⬇️
processor/groupbytraceprocessor/event.go 96.80% <0.00%> (+0.79%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5010cd7...e82a168. Read the comment docs.

@ericmustin
Copy link
Contributor Author

Not sure what the lint failure is all about, unrelated to this PR

Comment on lines 30 to 31
// DefaultServiceName is the default name we assign a service if it's missing and we have no reasonable fallback
DefaultServiceName string = "unnamed-service"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do anything similar in the Trace Agent? We could link it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, with comment that links to info from trace-agent

// '-' only creates issues for span operation names not service names
case c == '-' && isService:
buf.WriteRune(c)
lastWasUnderscore = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some more tests for the method generally, and for this particular case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, CodeCov is still complaining about this 🤔

// ported from https://github.com/DataDog/datadog-agent/blob/eab0dde41fe3a069a65c33d82a81b1ef1cf6b3bc/pkg/trace/traceutil/normalize.go#L72
// fallbackServiceNames is a cache of default service names to use
// when the span's service is unset or invalid.
var fallbackServiceNames sync.Map
Copy link
Member

@mx-psi mx-psi Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using sync.Map I would use ttlmap (see here). That way we avoid having the map growing indefinitely if we have a lot of different service names/a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought i'd scrapped all this fallback stuff for now since we aren't taking into account lang anyway, the fallback can be hardcoded


// fallbackService returns the fallback service name for a service
// belonging to language lang.
func fallbackService(lang string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add tests for tis and NormalizeSpanName too. We can have unit tests for this by overriding the fallbackServiceNames variable or by having a Normalizer struct with the map as a field and fallbackService as a method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some more tests for NormalizeSpanName

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to merge/rebase master to get the lint errors fixed (they were fixed in #1983)

assert.Equal(t, utils.NormalizeSpanName(tabName, false), "")
assert.Equal(t, utils.NormalizeSpanName(junkName, false), "getsridof_junk")
assert.Equal(t, utils.NormalizeSpanName(onlyJunkName, false), "only_junk")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a couple more tests to increase coverage, something like:

Suggested change
}
assert.Equal(t, utils.NormalizeServiceName("\x02\x1c\x18\x08"), DefaultServiceName)
assert.Equal(t, utils.NormalizeServiceName(""), DefaultServiceName)
}

Otherwise LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 updated

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It's weird that Codecov didn't change the reported coverage after the last changes 🤔

@ericmustin
Copy link
Contributor Author

@bogdandrutu @jrcamp would it be possible to get this in for 0.18? it caused some issues on our end that we'd like to resolve going forward

@bogdandrutu bogdandrutu merged commit f74ee2b into open-telemetry:master Jan 13, 2021
pcwiese referenced this pull request in pcwiese/opentelemetry-collector-contrib Jan 14, 2021
* Bump cloud.google.com/go in /processor/resourcedetectionprocessor (#2003)

Bumps [cloud.google.com/go](https://github.com/googleapis/google-cloud-go) from 0.67.0 to 0.75.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/master/CHANGES.md)
- [Commits](googleapis/google-cloud-go@v0.67.0...v0.75.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Release v0.18.0 (#2008)

* [exporter/datadog]: sanitize datadog service names (#1982)

* [exporter/datadog]: sanitize service names

* [exporter/datadog]: add more test cases and simplify translation logic

* [exporter/datadog] add more tests

* [datadog/exporter]: linting in tests

* Test that default metrics have no tags set other than hostname (#2014)

* Update example configuration (#2013)

* Add more metadata to awsecscontainer receiver (#2011)

* add more metadata to awsecscontainer receiver

* use the image.tag instead of image.version for the key value

* Fix readme and add test to validate example config (#2000)

* Prometheus federation example (#2007)

* Add example showing how to use the Prometheus federation endpoint with prometheus_simple

* add copyright header to the file

* [exporter/datadog]: ensure that version tag is used for stats aggregations, add tests for computing apm stats (#2010)

* [AzureMonitorExporter] Favor RPC over HTTP spans (#378) (#2006)

Some instrumentation libraries are sending an RPC server span with both RPC and HTTP semantic attributes present (dotnet). In these cases we should favor the RPC attributes when figuring the  Span type.

* Adding ai.operation.parentid override

* Add composite sampling policy in tail sampler

* Fix lb exporter issues when DNS resolution fails during startup

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jeff Cheng <jcheng@signalfx.com>
Co-authored-by: Eric Mustin <mustin.eric@gmail.com>
Co-authored-by: Albert Vaca Cintora <albert.vaca@datadoghq.com>
Co-authored-by: Dominik Rosiek <58699848+sumo-drosiek@users.noreply.github.com>
Co-authored-by: John <59711343+JohnWu20@users.noreply.github.com>
Co-authored-by: Daniel Jaglowski <dan.jaglowski@bluemedora.com>
Co-authored-by: Antoine Toulme <atoulme@users.noreply.github.com>
Co-authored-by: Pranav Pandit <pranavp@microsoft.com>
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
ljmsc referenced this pull request in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
Test the export timeout by waiting indefinitely for the export to
timeout instead of having a second timer, in its own goroutine, timeout.
The algorithm this replaces fails on machines that are slow and the one
meta-timer is given priority to progress over the export timer that is
being testing, resulting in a false-negative test result.

Move the testing of a BatchSpanProcessor export timeout to its own test
function. This removes the bloat this introduces to the other testing
options and allows customization that enable the testing in a
deterministic manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants