Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: fix runnable ingest job checking #52503

Merged
merged 15 commits into from Apr 12, 2024

Conversation

tangenta
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #52475

Problem Summary: see #52475

What changed and how does it work?

  • Remove MarkJobProcessing and RetireOwnerHook.
  • Record the running ingest job in runningJobs.
  • Cleanup unfinished jobs when ddl owner is acquired.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 11, 2024
Copy link

tiprow bot commented Apr 11, 2024

Hi @tangenta. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tangenta tangenta changed the title ddl: refine runnable ingest job checking ddl: fix runnable ingest job checking Apr 11, 2024
@ywqzzy ywqzzy self-requested a review April 11, 2024 07:31
pkg/ddl/job_table.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to use some channels to make the test more stable

@@ -831,6 +831,7 @@ func (d *ddl) Start(ctxPool *pools.ResourcePool) error {
if err != nil {
logutil.BgLogger().Error("error when getting the ddl history count", zap.Error(err))
}
d.runningJobs.clear()
Copy link
Contributor

@lance6716 lance6716 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we reuse the startDispatchLoop to initialize the runningJobs, by relying on the deterministic order from order by processing desc, job_id and add the jobs one by one.

Considering the case that the former DDL owner marked wrong jobs as running, like
100 (running), 101 (pending), 102 (running wrongly). Now the new DDL owner will let the states be (running, pending, pending) then (finished, pending, running). However, the correct state should be (running, pending, pending) then (finished, running, pending).

I slightly prefer the new DDL owner re-compute the running jobs instead of reuse the persistent state from persistent table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Considering the case that the former DDL owner marked wrong jobs as running, like
100 (running), 101 (pending), 102 (running wrongly).”
How come?

Copy link
Contributor

@lance6716 lance6716 Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Considering the case that the former DDL owner marked wrong jobs as running, like 100 (running), 101 (pending), 102 (running wrongly).” How come?

The old owner uses the code before this PR, and marked 102 as running, like in the linking issue

Copy link

codecov bot commented Apr 11, 2024

Codecov Report

Merging #52503 (68e8407) into master (12833e8) will increase coverage by 2.4508%.
Report is 24 commits behind head on master.
The diff coverage is 87.2862%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #52503        +/-   ##
================================================
+ Coverage   72.1321%   74.5830%   +2.4508%     
================================================
  Files          1467       1493        +26     
  Lines        426954     439041     +12087     
================================================
+ Hits         307971     327450     +19479     
+ Misses        99738      91078      -8660     
- Partials      19245      20513      +1268     
Flag Coverage Δ
integration 50.2205% <75.3217%> (?)
unit 71.4568% <82.6022%> (+0.4581%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9957% <ø> (ø)
parser ∅ <ø> (∅)
br 49.7796% <76.1904%> (+8.6845%) ⬆️

Copy link

ti-chi-bot bot commented Apr 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lance6716, ywqzzy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

ti-chi-bot bot commented Apr 12, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-04-12 02:54:13.437781925 +0000 UTC m=+1190114.965322471: ☑️ agreed by lance6716.
  • 2024-04-12 03:35:28.833269586 +0000 UTC m=+1192590.360810176: ☑️ agreed by ywqzzy.

@ti-chi-bot ti-chi-bot bot merged commit 5814957 into pingcap:master Apr 12, 2024
23 of 24 checks passed
3AceShowHand pushed a commit to 3AceShowHand/tidb that referenced this pull request Apr 16, 2024
@lance6716
Copy link
Contributor

/cherry-pick release-8.1

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Apr 26, 2024
@ti-chi-bot
Copy link
Member

@lance6716: new pull request created to branch release-8.1: #52911.

In response to this:

/cherry-pick release-8.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
4 participants