Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadbalancingexporter: narrow down critical area of ConsumeTraces #1798

Merged

Conversation

ZhengHe-MD
Copy link
Contributor

Description:

quick fix on the problem that rolling updates of next tier collectors will stop dns resolver from periodical checking.

Link to tracking Issue:

#1690

Testing:

Add a test simulate rolling updates.

Documentation:

None

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks great, the test needs some attention.

}()

// wait until rolling updates finish
time.Sleep(100 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of waiting a specific time, which might lead to intermittent CI failures, it would be better to create a WaitGroup and Add/Wait based on conditions.

})

// keep consuming traces every 2ms
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it's just a test, you'd need a way to break out of this loop. Perhaps add a channel that is closed after the test is finished (currently, the time.Sleep on line 615)?

for {
select {
case <- ticker.C:
go p.ConsumeTraces(context.Background(), randomTraces())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this to be another go routine? With this change, this shouldn't block long anymore, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thx, you're right. I should remove the go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. after a deep thought, I found that it does need the go. because I use time.Sleep in ConsumeTraces to simulate unreachable backends, if I don't use go routine, this loop will stuck there and stop consuming traces.

@ZhengHe-MD
Copy link
Contributor Author

@jpkrohling thx for reviewing my codes. I will certainly pay attention to the test. and tell you when it's ready.

@codecov
Copy link

codecov bot commented Dec 11, 2020

Codecov Report

Merging #1798 (f9e8dd9) into master (a045ffc) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1798      +/-   ##
==========================================
- Coverage   89.84%   89.82%   -0.03%     
==========================================
  Files         378      378              
  Lines       18213    18213              
==========================================
- Hits        16363    16359       -4     
- Misses       1387     1389       +2     
- Partials      463      465       +2     
Flag Coverage Δ
integration 69.77% <ø> (ø)
unit 88.52% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
exporter/loadbalancingexporter/exporter.go 92.23% <100.00%> (ø)
receiver/collectdreceiver/receiver.go 68.49% <0.00%> (-2.74%) ⬇️
receiver/k8sclusterreceiver/watcher.go 95.29% <0.00%> (-2.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a045ffc...f9e8dd9. Read the comment docs.

@ZhengHe-MD
Copy link
Contributor Author

@jpkrohling the PR is ready for review, please take a look when you have time

Comment on lines 630 to 635
go func() {
time.Sleep(50 * time.Millisecond)
resolverCh <- 1
}()

<-resolverCh
Copy link
Contributor

@pkositsyn pkositsyn Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You write to the channel twice (second time in onLookupIPAddr) and read once. If I'm not mistaken, it should be just close of the channel in this goroutine (actually without goroutine) and read in onLookupIPAddr. This way no future calls to onLookupIPAddr will block (yay, the goroutine isn't leaked). Correct me if I'm wrong, but now I see that the possible execution order is that read in 635 line corresponds to write in onLookupIPAddr, which makes this goroutine have no effect.

Btw for channels without value I would use chan struct{} to make it obvious that you don't need the value. While reading, I was keeping the thought in mind like "there should be the reason why int channel is used, let's look what effect have these values"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, that it is done to avoid blocking in tests. Then I would just suggest making a buffered channel (size 1 is enough) to avoid goroutine leak

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, chan int does look confusing, will use chan struct{}, and also a size 1 buffered channel

Comment on lines +601 to +609
// ensure using default exporters
p.updateLock.Lock()
p.exporters = defaultExporters
p.updateLock.Unlock()
p.res.onChange(func(endpoints []string) {
p.updateLock.Lock()
p.exporters = defaultExporters
p.updateLock.Unlock()
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: onChange can be no-op function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! you're right! and I just know this after you pointed out. but that's a hidden fact for this specific test case and I would rather keep this for the sake of completeness and readability. if I change the test case to simulate the following process:

["127.0.0.1", "127.0.0.2"] ->
["127.0.0.1", "127.0.0.2", "127.0.0.3"] -> 
["127.0.0.2", "127.0.0.3"] -> 
["127.0.0.2", "127.0.0.3", "127.0.0.4"] ->
["127.0.0.3", "127.0.0.4"]

the onChange can't be a no-op function then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can later be updated then ;-)

@ZhengHe-MD
Copy link
Contributor Author

@pkositsyn thx for reviewing this, and I've modify the test according to your suggestions. please see if there is any other issue when you have time. 😬

Comment on lines +601 to +609
// ensure using default exporters
p.updateLock.Lock()
p.exporters = defaultExporters
p.updateLock.Unlock()
p.res.onChange(func(endpoints []string) {
p.updateLock.Lock()
p.exporters = defaultExporters
p.updateLock.Unlock()
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can later be updated then ;-)

exporter/loadbalancingexporter/exporter_test.go Outdated Show resolved Hide resolved
@bogdandrutu
Copy link
Member

@jpkrohling friendly ping, is there any comment you have left?

@jpkrohling
Copy link
Member

From an earlier comment from this PR:

Yes, I wanted to come up with a proposal using WaitGroup constructs before going out for PTO but couldn't. Would it be OK to wait for me to return (Jan 11)? Otherwise, please open an issue and ping me there, so that we can address it later.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 5, 2021
@ZhengHe-MD
Copy link
Contributor Author

ZhengHe-MD commented Jan 9, 2021

@jpkrohling This PR is declared stale now and I don't see any change you've made in the new branch. If you still have some doubts with the test, maybe I can remove the test first.

@github-actions github-actions bot removed the Stale label Jan 9, 2021
@jpkrohling
Copy link
Member

It was marked automatically as stale by a GitHub bot. I added a link to a gist with what I had in mind, let me know what you think.

@bogdandrutu bogdandrutu merged commit 8345baf into open-telemetry:master Jan 19, 2021
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
… the type (#1798)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
ljmsc referenced this pull request in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
* Change NewSplitDriver paramater and initialization

* Update CHANGELOG.md

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Update CHANGELOG.md

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Move splitdriver option into options.go and rename

* Update CHANGELOG.md

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Change option name and nil test to snapshots

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Update exporters/otlp/protocoldriver.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Update exporters/otlp/options.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Update exporters/otlp/options.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Update exporters/otlp/options.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Update exporters/otlp/options.go

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

* Change SplitDriverOption to match spec

* Update changelog entry

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants