New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadbalancingexporter: narrow down critical area of ConsumeTraces #1798
loadbalancingexporter: narrow down critical area of ConsumeTraces #1798
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks great, the test needs some attention.
}() | ||
|
||
// wait until rolling updates finish | ||
time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of waiting a specific time, which might lead to intermittent CI failures, it would be better to create a WaitGroup and Add/Wait based on conditions.
}) | ||
|
||
// keep consuming traces every 2ms | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's just a test, you'd need a way to break out of this loop. Perhaps add a channel that is closed after the test is finished (currently, the time.Sleep
on line 615)?
for { | ||
select { | ||
case <- ticker.C: | ||
go p.ConsumeTraces(context.Background(), randomTraces()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this to be another go routine? With this change, this shouldn't block long anymore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thx, you're right. I should remove the go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. after a deep thought, I found that it does need the go. because I use time.Sleep
in ConsumeTraces
to simulate unreachable backends, if I don't use go routine, this loop will stuck there and stop consuming traces.
@jpkrohling thx for reviewing my codes. I will certainly pay attention to the test. and tell you when it's ready. |
Codecov Report
@@ Coverage Diff @@
## master #1798 +/- ##
==========================================
- Coverage 89.84% 89.82% -0.03%
==========================================
Files 378 378
Lines 18213 18213
==========================================
- Hits 16363 16359 -4
- Misses 1387 1389 +2
- Partials 463 465 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@jpkrohling the PR is ready for review, please take a look when you have time |
go func() { | ||
time.Sleep(50 * time.Millisecond) | ||
resolverCh <- 1 | ||
}() | ||
|
||
<-resolverCh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You write to the channel twice (second time in onLookupIPAddr
) and read once. If I'm not mistaken, it should be just close of the channel in this goroutine (actually without goroutine) and read in onLookupIPAddr
. This way no future calls to onLookupIPAddr
will block (yay, the goroutine isn't leaked). Correct me if I'm wrong, but now I see that the possible execution order is that read in 635 line corresponds to write in onLookupIPAddr
, which makes this goroutine have no effect.
Btw for channels without value I would use chan struct{}
to make it obvious that you don't need the value. While reading, I was keeping the thought in mind like "there should be the reason why int channel is used, let's look what effect have these values"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, that it is done to avoid blocking in tests. Then I would just suggest making a buffered channel (size 1 is enough) to avoid goroutine leak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, chan int
does look confusing, will use chan struct{}
, and also a size 1 buffered channel
// ensure using default exporters | ||
p.updateLock.Lock() | ||
p.exporters = defaultExporters | ||
p.updateLock.Unlock() | ||
p.res.onChange(func(endpoints []string) { | ||
p.updateLock.Lock() | ||
p.exporters = defaultExporters | ||
p.updateLock.Unlock() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: onChange
can be no-op function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! you're right! and I just know this after you pointed out. but that's a hidden fact for this specific test case and I would rather keep this for the sake of completeness and readability. if I change the test case to simulate the following process:
["127.0.0.1", "127.0.0.2"] ->
["127.0.0.1", "127.0.0.2", "127.0.0.3"] ->
["127.0.0.2", "127.0.0.3"] ->
["127.0.0.2", "127.0.0.3", "127.0.0.4"] ->
["127.0.0.3", "127.0.0.4"]
the onChange
can't be a no-op function then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test can later be updated then ;-)
@pkositsyn thx for reviewing this, and I've modify the test according to your suggestions. please see if there is any other issue when you have time. 😬 |
// ensure using default exporters | ||
p.updateLock.Lock() | ||
p.exporters = defaultExporters | ||
p.updateLock.Unlock() | ||
p.res.onChange(func(endpoints []string) { | ||
p.updateLock.Lock() | ||
p.exporters = defaultExporters | ||
p.updateLock.Unlock() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test can later be updated then ;-)
@jpkrohling friendly ping, is there any comment you have left? |
From an earlier comment from this PR:
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@jpkrohling This PR is declared stale now and I don't see any change you've made in the new branch. If you still have some doubts with the test, maybe I can remove the test first. |
It was marked automatically as stale by a GitHub bot. I added a link to a gist with what I had in mind, let me know what you think. |
… the type (#1798) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
* Change NewSplitDriver paramater and initialization * Update CHANGELOG.md Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * Update CHANGELOG.md Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * Update exporters/otlp/protocoldriver.go Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * Update exporters/otlp/protocoldriver.go Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * Update exporters/otlp/protocoldriver.go Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * Update exporters/otlp/protocoldriver.go Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * Move splitdriver option into options.go and rename * Update CHANGELOG.md Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * Change option name and nil test to snapshots * Update exporters/otlp/protocoldriver.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Update exporters/otlp/protocoldriver.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Update exporters/otlp/protocoldriver.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Update exporters/otlp/protocoldriver.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Update exporters/otlp/options.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Update exporters/otlp/options.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Update exporters/otlp/options.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Update exporters/otlp/options.go Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> * Change SplitDriverOption to match spec * Update changelog entry Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Description:
quick fix on the problem that rolling updates of next tier collectors will stop dns resolver from periodical checking.
Link to tracking Issue:
#1690
Testing:
Add a test simulate rolling updates.
Documentation:
None