New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/datadog]: sanitize datadog service names #1982
[exporter/datadog]: sanitize datadog service names #1982
Conversation
cc @mx-psi @KSerrania ptal if u have a moment |
Codecov Report
@@ Coverage Diff @@
## master #1982 +/- ##
==========================================
- Coverage 90.39% 90.37% -0.03%
==========================================
Files 394 394
Lines 19347 19358 +11
==========================================
+ Hits 17489 17495 +6
- Misses 1397 1403 +6
+ Partials 461 460 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Not sure what the lint failure is all about, unrelated to this PR |
// DefaultServiceName is the default name we assign a service if it's missing and we have no reasonable fallback | ||
DefaultServiceName string = "unnamed-service" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we do anything similar in the Trace Agent? We could link it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, with comment that links to info from trace-agent
// '-' only creates issues for span operation names not service names | ||
case c == '-' && isService: | ||
buf.WriteRune(c) | ||
lastWasUnderscore = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some more tests for the method generally, and for this particular case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, CodeCov is still complaining about this 🤔
// ported from https://github.com/DataDog/datadog-agent/blob/eab0dde41fe3a069a65c33d82a81b1ef1cf6b3bc/pkg/trace/traceutil/normalize.go#L72 | ||
// fallbackServiceNames is a cache of default service names to use | ||
// when the span's service is unset or invalid. | ||
var fallbackServiceNames sync.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using sync.Map
I would use ttlmap
(see here). That way we avoid having the map growing indefinitely if we have a lot of different service names/a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on second thought i'd scrapped all this fallback stuff for now since we aren't taking into account lang anyway, the fallback can be hardcoded
|
||
// fallbackService returns the fallback service name for a service | ||
// belonging to language lang. | ||
func fallbackService(lang string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add tests for tis and NormalizeSpanName
too. We can have unit tests for this by overriding the fallbackServiceNames
variable or by having a Normalizer
struct with the map as a field and fallbackService
as a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some more tests for NormalizeSpanName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to merge/rebase master to get the lint errors fixed (they were fixed in #1983)
assert.Equal(t, utils.NormalizeSpanName(tabName, false), "") | ||
assert.Equal(t, utils.NormalizeSpanName(junkName, false), "getsridof_junk") | ||
assert.Equal(t, utils.NormalizeSpanName(onlyJunkName, false), "only_junk") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a couple more tests to increase coverage, something like:
} | |
assert.Equal(t, utils.NormalizeServiceName("\x02\x1c\x18\x08"), DefaultServiceName) | |
assert.Equal(t, utils.NormalizeServiceName(""), DefaultServiceName) | |
} |
Otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! It's weird that Codecov didn't change the reported coverage after the last changes 🤔
@bogdandrutu @jrcamp would it be possible to get this in for 0.18? it caused some issues on our end that we'd like to resolve going forward |
* Bump cloud.google.com/go in /processor/resourcedetectionprocessor (#2003) Bumps [cloud.google.com/go](https://github.com/googleapis/google-cloud-go) from 0.67.0 to 0.75.0. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/master/CHANGES.md) - [Commits](googleapis/google-cloud-go@v0.67.0...v0.75.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Release v0.18.0 (#2008) * [exporter/datadog]: sanitize datadog service names (#1982) * [exporter/datadog]: sanitize service names * [exporter/datadog]: add more test cases and simplify translation logic * [exporter/datadog] add more tests * [datadog/exporter]: linting in tests * Test that default metrics have no tags set other than hostname (#2014) * Update example configuration (#2013) * Add more metadata to awsecscontainer receiver (#2011) * add more metadata to awsecscontainer receiver * use the image.tag instead of image.version for the key value * Fix readme and add test to validate example config (#2000) * Prometheus federation example (#2007) * Add example showing how to use the Prometheus federation endpoint with prometheus_simple * add copyright header to the file * [exporter/datadog]: ensure that version tag is used for stats aggregations, add tests for computing apm stats (#2010) * [AzureMonitorExporter] Favor RPC over HTTP spans (#378) (#2006) Some instrumentation libraries are sending an RPC server span with both RPC and HTTP semantic attributes present (dotnet). In these cases we should favor the RPC attributes when figuring the Span type. * Adding ai.operation.parentid override * Add composite sampling policy in tail sampler * Fix lb exporter issues when DNS resolution fails during startup Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Cheng <jcheng@signalfx.com> Co-authored-by: Eric Mustin <mustin.eric@gmail.com> Co-authored-by: Albert Vaca Cintora <albert.vaca@datadoghq.com> Co-authored-by: Dominik Rosiek <58699848+sumo-drosiek@users.noreply.github.com> Co-authored-by: John <59711343+JohnWu20@users.noreply.github.com> Co-authored-by: Daniel Jaglowski <dan.jaglowski@bluemedora.com> Co-authored-by: Antoine Toulme <atoulme@users.noreply.github.com> Co-authored-by: Pranav Pandit <pranavp@microsoft.com>
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Test the export timeout by waiting indefinitely for the export to timeout instead of having a second timer, in its own goroutine, timeout. The algorithm this replaces fails on machines that are slow and the one meta-timer is given priority to progress over the export timer that is being testing, resulting in a false-negative test result. Move the testing of a BatchSpanProcessor export timeout to its own test function. This removes the bloat this introduces to the other testing options and allows customization that enable the testing in a deterministic manner.
Description:
This PR adds some improved sanitisation and logic around the datadog service name.
certain invalid characters (such as:
\t
), were causing issues in the datadog UI, and so this PR applies normalization to service names to prevent this going forward.Long term we'd like to be able to leverage some of the existing work in the datadog-agent, but this requires some work on this libraries end to expose for 3rd party use.
Testing:
Added Unit Tests