Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix goroutine leak in watchMux #4221

Closed
wants to merge 1 commit into from

Conversation

ikaven1024
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #4212

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-search`: Fix lock race affects watch RestChan not close, causing client watch api to hang.

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 11, 2023
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from ikaven1024 after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 11, 2023
@ikaven1024 ikaven1024 force-pushed the fix/watchMux-leak branch 2 times, most recently from 496da7d to 44a24ee Compare November 11, 2023 15:02
@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 11, 2023
@ikaven1024
Copy link
Member Author

cc @xigang

@codecov-commenter
Copy link

codecov-commenter commented Nov 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1b2c6ed) 52.78% compared to head (3c07a34) 52.77%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4221      +/-   ##
==========================================
- Coverage   52.78%   52.77%   -0.01%     
==========================================
  Files         239      239              
  Lines       23584    23585       +1     
==========================================
  Hits        12448    12448              
  Misses      10460    10460              
- Partials      676      677       +1     
Flag Coverage Δ
unittests 52.77% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/search/proxy/store/util.go 92.88% <100.00%> (+0.82%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if chosen == doneCaseIndex {
// Received from done chan.
// In fact, this will never happen
panic(fmt.Sprintf("unexpectedly receive from done chan: %v", val.Interface()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If panic is triggered, does the exception need to be caught?

}()
defer close(w.result)
for {
chosen, val, ok := reflect.Select(cases)
Copy link
Member

@xigang xigang Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of reflect.Select is worse than that if select. if the scale of the federate cluster is too large, there may be performance issue. short-tem that's no problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic select case cost twice time of static select. I test it with 1000 source, 10000 events:

goos: darwin
goarch: arm64
BenchmarkSelectCaseStatic
BenchmarkSelectCaseStatic-8    	       2	 740559750 ns/op
BenchmarkSelectCaseDynamic
BenchmarkSelectCaseDynamic-8   	       1	1627772333 ns/op

gist

fix: fix exec failure with karamada-aggregated-apiserver
Signed-off-by: yingjinhui <yingjinhui@didiglobal.com>
@karmada-bot karmada-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 12, 2023
@@ -246,19 +257,8 @@ func (w *watchMux) startWatchSource(source watch.Interface, decorator func(watch
select {
case <-w.done:
return
default:
case w.result <- copyEvent:
Copy link
Member

@xigang xigang Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikaven1024 @XiShanYongYe-Chang Is there a data race when multiple goroutines write w.result channel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a channel in multiple goroutines is safe, there is no data race.

}

go func() {
// close result chan after all goroutines exit, avoiding data race.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, why would there be data race in the previous approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data race happens when one goroutine is wirting result chan, while another goroutine is closing result chan in Stop function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikaven1024 explained it at #4212 (comment).

@ikaven1024
Copy link
Member Author

Migrate to #4212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants