Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: improve the processing speed of general DDL jobs #19997

Closed
wants to merge 9 commits into from

Conversation

hidehalo
Copy link
Contributor

@hidehalo hidehalo commented Sep 15, 2020

What problem does this PR solve?

Issue Number: close #14770

This is a tentative proposal may be wrong, I am not so sure this proposal work better than before program.
@zimulala PTAL, I really need some advices.

Problem Summary:

What is changed and how it works?

What's Changed:

  • ddl owner worker no longer get DDL job through delay-based polling when doDDLJob called on non-owner TiDB server
  • ddl non-owner worker no longer checks the DDL history ddl job queue through delay-based polling when doDDLJob called on non-owner TiDB

How it Works:

  • add DDL job subscriber implement which based on etcd
  • ddl owner worker get DDL job through watching unique etcd key /tidb//ddl/job/reverse in select loop when doDDLJob called on non-owner TiDB
  • ddl non-owner worker notify ddl owner worker via put value to etcd key /tidb//ddl/job/reverse when doDDLJob called
  • ddl non-owner worker wait for watch etcd key response in select loop when doDDLJob called on non-owner TiDB
  • ddl owner worker would put value to etcd key /tidb/ddl/job/done/{jobID} as DDL job done notification when call runDDLJob and job state is synced or cancelled

Check List

  • No code

Release note

Improve the processing speed of general DDL jobs

@ti-srebot ti-srebot added the contribution This PR is from a community contributor. label Sep 15, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Sep 15, 2020

@github-actions github-actions bot added the sig/sql-infra SIG: SQL Infra label Sep 15, 2020
@zimulala
Copy link
Contributor

PTAL @tangenta @AilinKid

@@ -0,0 +1,102 @@
// Copyright 202- PingCAP, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright 202- PingCAP, Inc.
// Copyright 2020- PingCAP, Inc.

type DDLJobSubscriber interface {
// WaitNewDDLJob return a channel for new ddl job notify
WaitNewDDLJob(ctx context.Context) clientv3.WatchChan
// NotifyNewDDLJob when dll job enqueue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// NotifyNewDDLJob when dll job enqueue
// NotifyNewDDLJob when ddl job enqueue

}

func (sub *ddlJobSubscirber) WaitNewDDLJob(ctx context.Context) clientv3.WatchChan {
return sub.etcdCli.Watch(ctx, "/tidb//ddl/job/reverse")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use double / here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mistake

case <-w.ctx.Done():
return
if d.ownerManager.IsOwner() {
select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our extreme use cases of the global schema watcher of etcd, the watcher channel is not that stable all the time.
So I think here we can keep the time ticker as the final guarantee. For most cases, the etcd watch channel can accelerate the async job notification.
@zimulala @tangenta PTAL for this

NotifyNewDDLJob(ctx context.Context) error
// WaitDDLJobDone return a channel for ddl job doned notify
WaitDDLJobDone(ctx context.Context, jobID int64) clientv3.WatchChan
// NotifyDDLJobDone when dll job dequeue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return &mockDDLJobSubscriber{childCtx, cancelFunc}
}

func (sub *mockDDLJobSubscriber) WaitNewDDLJob(ctx context.Context) clientv3.WatchChan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem WaitNewDDLJob and WaitDDLJobDone will block on this? the channel returned here doesn't have the input.

@ti-challenge-bot
Copy link

There is no reward for this challenge pull request, so you can request a reward from @zimulala.

More

Tip : About reward you can refs to reward-command.

Warning: None

1 similar comment
@ti-challenge-bot
Copy link

There is no reward for this challenge pull request, so you can request a reward from @zimulala.

More

Tip : About reward you can refs to reward-command.

Warning: None

@hidehalo hidehalo closed this Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge-program contribution This PR is from a community contributor. sig/sql-infra SIG: SQL Infra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the processing speed of general DDL jobs
5 participants