Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do onTraceReleased asynchronously in groupbytraceprocessor #1808

Merged
merged 11 commits into from Jan 22, 2021

Conversation

pkositsyn
Copy link
Contributor

@pkositsyn pkositsyn commented Dec 11, 2020

Description:
fixing a bug, when this processor could be overloaded.

Link to tracking Issue: #1807

Testing:
Added test for checking nonblocking onTraceReleased

Documentation:
Changed the metrics description

@pkositsyn pkositsyn requested review from jpkrohling and a team as code owners December 11, 2020 23:00
@project-bot project-bot bot added this to In progress in Collector Dec 11, 2020
@codecov
Copy link

codecov bot commented Dec 12, 2020

Codecov Report

Merging #1808 (68a2f3f) into master (f53bd40) will increase coverage by 20.05%.
The diff coverage is 73.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1808       +/-   ##
===========================================
+ Coverage   69.77%   89.83%   +20.05%     
===========================================
  Files          34      378      +344     
  Lines        1532    18214    +16682     
===========================================
+ Hits         1069    16362    +15293     
- Misses        391     1388      +997     
- Partials       72      464      +392     
Flag Coverage Δ
integration 69.77% <ø> (ø)
unit 88.53% <73.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
processor/groupbytraceprocessor/event.go 95.93% <33.33%> (ø)
processor/groupbytraceprocessor/processor.go 96.63% <50.00%> (ø)
processor/groupbytraceprocessor/storage_memory.go 88.88% <100.00%> (ø)
receiver/prometheusexecreceiver/factory.go 100.00% <0.00%> (ø)
exporter/carbonexporter/exporter.go 79.62% <0.00%> (ø)
internal/awsxray/util.go 100.00% <0.00%> (ø)
...cedetectionprocessor/internal/resourcedetection.go 97.05% <0.00%> (ø)
receiver/memcachedreceiver/util.go 0.00% <0.00%> (ø)
...r/dynatraceexporter/serialization/serialization.go 100.00% <0.00%> (ø)
exporter/datadogexporter/metadata/metadata.go 48.07% <0.00%> (ø)
... and 358 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f53bd40...68a2f3f. Read the comment docs.

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a review here, but we do need to discuss the approach first.

processor/groupbytraceprocessor/event_test.go Outdated Show resolved Hide resolved
// because last phase does not affect this processor anymore.
go func() {
if err := em.onTraceReleased(payload); err != nil {
em.logger.Error("onTraceReleased failed", zap.Error(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to include payload information to make this useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea to log payload for every error. Just imagine that the following processor starts to fail on every trace. Btw the payload is ResourceSpans, what profit do we get from logging pointers?

Ideally, this logger should be aggregate like in exporters to avoid spamming on failures

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the traceID would be more than enough. You can also move it to Debug instead.

Ideally, this logger should be aggregate like in exporters to avoid spamming on failures

That's the main reason to use metrics for this, so that people know that something went wrong but still don't get flooded with error messages.

Copy link
Contributor Author

@pkositsyn pkositsyn Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People know, that something went wrong, but don't know what exactly. That's why every processor returns error - in my opinion it must be shown in logs. This is error indeed.

If it is needed, why don't you put traceID into the error? In my world error should be self-sustained

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is needed, why don't you put traceID into the error? In my world error should be self-sustained

That's what I'm requesting :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean put it in the place where the error occured. Honestly, I don't see any profit in knowing traceID. At least in my cases with sampling I cannot do anything with ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And currently no traceID is logged for error. Don't think recovering traceID from []ResourceSpans is the right move

@pkositsyn
Copy link
Contributor Author

Also did minor changes, which improve performance. Since the processing is one-threaded, this is really crucial

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 22, 2020
@pkositsyn
Copy link
Contributor Author

@jpkrohling Could you take a look please?

@github-actions github-actions bot removed the Stale label Dec 24, 2020
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this change contained only the changes that are subject to this PR. Apart from a missing test, this looks good.

processor/groupbytraceprocessor/event.go Show resolved Hide resolved
@@ -25,7 +25,7 @@ import (

type memoryStorage struct {
sync.RWMutex
content map[string][]pdata.ResourceSpans
content map[[16]byte][]pdata.ResourceSpans
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be pdata.TraceId, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but I don't like this idea, because in case of changes in the model it will implicitly affect this processor. Current approach guarantees correctness

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much every processor and exporter that needs to deal with TraceID uses pdata.TraceId, @bogdandrutu usually writes scripts that updates those places en masse.

processor/groupbytraceprocessor/storage_memory.go Outdated Show resolved Hide resolved
processor/groupbytraceprocessor/processor.go Show resolved Hide resolved
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 7, 2021
@bogdandrutu
Copy link
Member

@jpkrohling review status?

@github-actions github-actions bot removed the Stale label Jan 12, 2021
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once it uses the TraceID in the storage, this is ready to be merged.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 20, 2021
@jpkrohling
Copy link
Member

@pkositsyn , are you interested in continuing with this one? We are very close here :-)

@pkositsyn
Copy link
Contributor Author

Done

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to be merged on green!

Collector automation moved this from In progress to Reviewer approved Jan 20, 2021
@github-actions github-actions bot removed the Stale label Jan 21, 2021
@bogdandrutu bogdandrutu merged commit ce4189c into open-telemetry:master Jan 22, 2021
Collector automation moved this from Reviewer approved to Done Jan 22, 2021
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
* Authentication processor 1/4 - Add configauth

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

* Authentication processor 2/4 - Add auth context and interface

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
ljmsc referenced this pull request in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
The initialization options of the exporter are not used after the
Exporter is created. Stop saving them in a field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Collector
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants