Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spool: fix data race when to exit #42129

Merged
merged 9 commits into from Mar 13, 2023
Merged

Conversation

hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Mar 13, 2023

What problem does this PR solve?

Issue Number: close #42130

Problem Summary:

when we are in exit mode, Some tasks still want to start a new goroutine and call waitgroup.Add. but at the same time, we are calling the waitgroup.Wait.
the data race will happen.

What is changed and how it works?

Adding the logic of checking the exit mode when to check the running exit and wait to make all tasks called waitgroup.Add.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 13, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • GMHDBJD
  • you06

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@hawkingrei
Copy link
Member Author

/check-issue-triage-complete

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei hawkingrei requested a review from you06 March 13, 2023 06:39
@@ -142,7 +145,12 @@ func (p *Pool) RunWithConcurrency(fns chan func(), concurrency uint32) error {

// checkAndAddRunning is to check if a task can run. If can, add the running number.
func (p *Pool) checkAndAddRunning(concurrency uint32) (conc int32, run bool) {
p.waiting.Add(1)
defer p.waiting.Add(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the waiting variable modification into Run? Because it may call p.wg.Add(1) after this checking and the check(for p.waiting.Load() > 0 {...}) in ReleaseAndWait may pass after checkAndAddRunning and before p.wg.Add(1), then there is still possible race.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

@@ -173,6 +181,10 @@ func (p *Pool) checkAndAddRunningInternal(concurrency int32) (conc int32, run bo
// ReleaseAndWait releases the pool and waits for all tasks to be completed.
func (p *Pool) ReleaseAndWait() {
p.isStop.Store(true)
// wait for all the task in the pending to exit
for p.waiting.Load() > 0 {
time.Sleep(waitInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to confirm the usage of ReleaseAndWait. If it may be called in Executor.Close, 5ms sleep is unacceptable. If it's called when TiDB exiting or kvstore closing, this LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove 5ms sleep first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A spinlock is risky because it may loop here while the Run or RunWithConcurrency threads are scheduled out, why not use a condition variable here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 13, 2023
@hawkingrei hawkingrei requested a review from GMHDBJD March 13, 2023 08:20
Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 13, 2023
@hawkingrei
Copy link
Member Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 6277e79

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 13, 2023
@ti-chi-bot ti-chi-bot merged commit 51c22cd into pingcap:master Mar 13, 2023
@hawkingrei hawkingrei deleted the fix_spool branch March 13, 2023 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none size/S Denotes a PR that changes 10-29 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DATA RACE in the TestReleaseWhenRunningPool
4 participants