Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: consider paused job when check runnable #54419

Merged
merged 4 commits into from
Jul 4, 2024

Conversation

tangenta
Copy link
Contributor

@tangenta tangenta commented Jul 3, 2024

What problem does this PR solve?

Issue Number: close #54383, ref #53246

Problem Summary:

session 1 session 2 session 3
t1 create table t (a int); OK
t2 alter table t modify column a tinyint; -- job 1
t3 admin pause ddl jobs 1; OK Paused
t4 Paused alter table t add column c int;
// Should be blocked!

As the table shown above, we run add column when the previous job modify column is not finished. This is unexpected because they reference the same table.

What changed and how does it work?

This PR moves the unfinished jobs to pending jobs after delivery2Worker() complete, in order to prevent getJob() obtaining new jobs that affect the same table.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 3, 2024
Copy link

tiprow bot commented Jul 3, 2024

Hi @tangenta. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jul 4, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 56.3686%. Comparing base (230bbc2) to head (7820b1f).
Report is 18 commits behind head on master.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #54419         +/-   ##
=================================================
- Coverage   72.8839%   56.3686%   -16.5153%     
=================================================
  Files          1533       1657        +124     
  Lines        436132     614397     +178265     
=================================================
+ Hits         317870     346327      +28457     
- Misses        98667     244560     +145893     
- Partials      19595      23510       +3915     
Flag Coverage Δ
integration 37.2226% <80.0000%> (?)
unit 71.8742% <85.0000%> (-0.0209%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9656% <ø> (ø)
parser ∅ <ø> (∅)
br 52.6284% <ø> (+6.4986%) ⬆️

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jul 4, 2024
failpoint.InjectCall("afterDelivery2Worker", job)
s.runningJobs.removeRunning(jobID, involvedSchemaInfos)
moveRunningJobsToPending := r != nil || (job != nil && job.IsPaused())
Copy link
Contributor

@lance6716 lance6716 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about when delivery2Worker exits, if the job is not in finished states (which means job is paused, worker panicked, and other unexpected conditions), we move it to pending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not panic/owner change, pause is the only state if we exit here actually, see job.InFinalState() branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case we will have more states in future (for example, somehow we revert to the old way to handle job, to let this function only forward one schema state). I think checking "not finished" is more correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

more correct

you mean 'more robust'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, more robust.

@@ -233,11 +233,23 @@ func (j *runningJobs) addRunning(jobID int64, involves []model.InvolvingSchemaIn
}
}

func (j *runningJobs) removeRunningOrPending(jobID int64, involves []model.InvolvingSchemaInfo, moveToPending bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe FinishOrPendJob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 4, 2024
Copy link

ti-chi-bot bot commented Jul 4, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-04 01:59:38.377561644 +0000 UTC m=+1462504.863050471: ☑️ agreed by lance6716.
  • 2024-07-04 03:06:55.898254979 +0000 UTC m=+1466542.383743808: ☑️ agreed by D3Hunter.

@D3Hunter
Copy link
Contributor

D3Hunter commented Jul 4, 2024

/retest

Copy link

tiprow bot commented Jul 4, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tangenta
Copy link
Contributor Author

tangenta commented Jul 4, 2024

/approve

Copy link

ti-chi-bot bot commented Jul 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, lance6716, tangenta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jul 4, 2024
@tangenta
Copy link
Contributor Author

tangenta commented Jul 4, 2024

/retest

Copy link

tiprow bot commented Jul 4, 2024

@tangenta: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor

D3Hunter commented Jul 4, 2024

/retest

Copy link

tiprow bot commented Jul 4, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot merged commit be16d49 into pingcap:master Jul 4, 2024
23 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.2: #54431.

ti-chi-bot bot pushed a commit that referenced this pull request Jul 4, 2024
@tangenta tangenta deleted the ddl-fix-paused-dep branch July 4, 2024 23:32
@tangenta tangenta restored the ddl-fix-paused-dep branch July 4, 2024 23:33
@tangenta tangenta deleted the ddl-fix-paused-dep branch July 4, 2024 23:33
@D3Hunter D3Hunter mentioned this pull request Jul 5, 2024
75 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-8.2 release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[conn.go:1024] ["connection running loop panic"]
4 participants