Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: initial support for parallel DDL #6955

Merged
merged 20 commits into from Jul 25, 2018

Conversation

@zimulala
Copy link
Member

commented Jul 2, 2018

What have you changed? (mandatory)

ddl: remove cleanAddIndexQueueJobs and initial support for parallel DDL.
Initial support for parallel DDL is as follows:

  • The DDL of "add index" and the other types of DDL can be executed parallelly when they are on the different tables. We use two queues to save the "add index" and other DDLs in storage. And we have two workers to handle these DDL jobs. The "add index" worker handles the "add index" queue. Another worker handles another queue.

  • If the DDL of "add index" and the other types of DDL are on the same table, we need to perform these two operations serially.

What are the type of the changes (mandatory)?

The currently defined types are listed below, please pick one of the types for this PR by removing the others:

  • Improvement

How has this PR been tested (mandatory)?

unit test

Does this PR affect documentation (docs/docs-cn) update? (optional)

Yes.

@shenli shenli requested a review from winkyao Jul 2, 2018
@zimulala zimulala force-pushed the zimulala:parallel-ddl branch from 0b64181 to f57c901 Jul 2, 2018
@zimulala zimulala force-pushed the zimulala:parallel-ddl branch from f57c901 to 7d35817 Jul 2, 2018
if err != nil {
return nil, errors.Trace(err)
}
return append(generalJobs, addIdxJobs...), nil

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Jul 3, 2018

Contributor

The return jobs may be not sorted by job ID, should we return sorted jobs? Because the older function naturally return a sorted jobs

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 5, 2018

Author Member

I think it can be done in the next PR. I added a "TODO" now.

ddl/ddl.go Outdated
for _, worker := range d.workers {
worker.wg.Add(1)
go worker.start(d.ddlCtx)
// TODO: Add the type of DDL worker.
metrics.DDLCounter.WithLabelValues(metrics.CreateDDLWorker).Inc()

// For every start, we will send a fake job to let worker

This comment has been minimized.

Copy link
@shenli

shenli Jul 5, 2018

Member

What is For every start?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 5, 2018

Author Member

It's original comment. I think it means "For each call to the start function".

ddl/ddl.go Outdated
@@ -427,6 +423,19 @@ func checkJobMaxInterval(job *model.Job) time.Duration {
return 1 * time.Second
}

func (d *ddl) asyncNotifyWorker(jobTp model.ActionType) {
// If the workes don't run, we needn't to notice workers.

This comment has been minimized.

Copy link
@shenli

shenli Jul 5, 2018

Member

workers or works?

This comment has been minimized.

Copy link
@shenli

shenli Jul 8, 2018

Member

notify is better than notice.

@@ -282,41 +293,56 @@ func (w *worker) finishDDLJob(t *meta.Meta, job *model.Job) (err error) {
return errors.Trace(err)
}

func isDependencyJobDone(t *meta.Meta, job *model.Job) (bool, error) {

This comment has been minimized.

Copy link
@ciscoxll

ciscoxll Jul 6, 2018

Contributor

Will there be multiple job DDL dependencies?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 7, 2018

Author Member

We only record the maximum job ID in multiple dependent jobs.

@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 7, 2018

/run-all-tests

1 similar comment
@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 9, 2018

/run-all-tests

@zimulala zimulala removed the status/DNM label Jul 9, 2018
@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 9, 2018

/run-common-test
/run-integration-common-test

@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 9, 2018

/run-common-test

@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 11, 2018

ddl/ddl.go Outdated
@@ -427,6 +423,19 @@ func checkJobMaxInterval(job *model.Job) time.Duration {
return 1 * time.Second
}

func (d *ddl) asyncNotifyWorker(jobTp model.ActionType) {
// If the workers don't run, we needn't to notice workers.

This comment has been minimized.

Copy link
@shenli

shenli Jul 11, 2018

Member

s/notice/notify/

wg sync.WaitGroup
id int
tp workerType
ddlJobCh chan struct{}

This comment has been minimized.

Copy link
@shenli

shenli Jul 11, 2018

Member

What is this used for?

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Jul 12, 2018

Contributor

ddlJobCh used to be member of ddl, but now, we have 2 kinds of workers, we need two ddlJobCh for every kind of worker. So move ddlJobCh from ddl to worker

@@ -134,6 +134,22 @@ func GetDDLJobs(txn kv.Transaction) ([]*model.Job, error) {
return jobs, nil
}

// GetDDLJobs returns the DDL jobs and an error.

This comment has been minimized.

Copy link
@shenli

shenli Jul 11, 2018

Member

Do not need to mention the error.

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 16, 2018

Author Member

Other functions also mention the error.

This comment has been minimized.

Copy link
@shenli

shenli Jul 16, 2018

Member

This comment provides nothing more than the function name.

@shenli shenli self-assigned this Jul 12, 2018
Copy link
Contributor

left a comment

reset

@crazycs520 crazycs520 dismissed their stale review Jul 13, 2018

misoperation

ddl/ddl.go Outdated
d.workers[0] = newWorker(generalWorker, 0, d.store, ctxPool)
d.workers = make(map[workerType]*worker, 2)
d.workers[generalWorker] = newWorker(generalWorker, 0, d.store, ctxPool)
d.workers[addIdxWorker] = newWorker(addIdxWorker, 0, d.store, ctxPool)

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

Worker id is always 0 ?

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 16, 2018

Contributor

Worker id is always 0 ?

@@ -61,6 +61,7 @@ func newWorker(tp workerType, id int, store kv.Storage, ctxPool *pools.ResourceP
worker := &worker{
id: id,
tp: tp,
ddlJobCh: make(chan struct{}, 1),

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

Why it need to be make(chan struct{}, 1 ) rather than make(chan struct{})

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 13, 2018

Author Member

We need to push info to the channel.

for {
select {
case <-ticker.C:
log.Debugf("[ddl] wait %s to check DDL status again", checkTime)

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

%s to print time? what's the result looks like

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 14, 2018

Member

why not print worker type?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 13, 2018

Author Member

It's the old code, I will handle it.

@@ -149,6 +146,15 @@ func asyncNotify(ch chan struct{}) {
// buildJobDependence sets the curjob's dependency-ID.
// The dependency-job's ID must less than the current job's ID, and we need the largest one in the list.
func buildJobDependence(t *meta.Meta, curJob *model.Job) error {
switch curJob.Type {

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

It's hard to understand here.

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 14, 2018

Member

It's also hard to understand for me.

This comment has been minimized.

Copy link
@shenli

shenli Jul 16, 2018

Member

Please add comments.

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 16, 2018

Contributor

I don't think add comment helps @shenli
meta.Meta should not store the information about queue key.
If the caller has to modify status in meta.Meta, before calling its method, why not provide that status as argument?

t.SetJobListKey(meta.AddIndexJobListKey)
defer t.SetJobListKey(meta.DefaultJobListKey)
}

jobs, err := t.GetAllDDLJobs()

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

I suggest rename GetAllDDLJobs to GetDDLJobsInQueue and pass the queue, rather than change meta's jobListKey status, it's very tricky.

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 14, 2018

Member

GetAllDDLJobs definitely get jobs from mDDLJobListKey? Is that correct?

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 16, 2018

Contributor

mDDLJobListKey will change between two queue, I don't know which one, that's the problem.

So I suggest:

GetDDLJobsInQueue(general)
GetDDLJobsInQueue(addindex)

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 16, 2018

Contributor

Please address comment

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 17, 2018

Author Member

Done, I changed the name of the function.

return true, nil
}

historyJob, err := t.GetHistoryDDLJob(job.DependencyID)

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

Will t.GetHistoryDDLJob select the right job queue?

This comment has been minimized.

This comment has been minimized.

Copy link
@shenli

shenli Jul 18, 2018

Member

It is better to find the job in the waiting job list. Because the history job list maybe long.

@@ -149,6 +146,15 @@ func asyncNotify(ch chan struct{}) {
// buildJobDependence sets the curjob's dependency-ID.
// The dependency-job's ID must less than the current job's ID, and we need the largest one in the list.
func buildJobDependence(t *meta.Meta, curJob *model.Job) error {

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

There will be read/write conflict in GetALLDDLJobs, because each worker check the other worker's queue to dependence.
Is the error retryable and properly handled?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 16, 2018

Author Member

It will be retried.

once := sync.Once{}
var checkErr error
tc.onJobRunBefore = func(job *model.Job) {
// TODO: extract a unified function for use by other tests.

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

s/for use by other tests/for other tests

if lastJob != nil {
finishedJobs, err := m.GetAllHistoryDDLJobs()
c.Assert(err, IsNil)
// get the last 11 jobs completed。

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

I find a strange char here

@@ -104,6 +104,9 @@ func CancelJobs(txn kv.Transaction, ids []int64) ([]error, error) {
errs[i] = errors.Trace(err)
continue
}
if job.Type == model.ActionAddIndex {
t.SetJobListKey(meta.AddIndexJobListKey)
}

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jul 13, 2018

Contributor

Add an else branch so that the code is more robust without the assumption about job.Type default value.

ddl/ddl.go Outdated
@@ -440,7 +449,7 @@ func (d *ddl) doDDLJob(ctx sessionctx.Context, job *model.Job) error {
}

// Notice worker that we push a new job and wait the job done.

This comment has been minimized.

Copy link
@shenli

shenli Jul 16, 2018

Member

s/Notice/Notify/g

err := d.doDDLJob(ctx, job)
c.Assert(err, IsNil)
v := getSchemaVer(c, ctx)
checkHistoryJobArgs(c, ctx, job.ID, &historyJobArgs{ver: v, tbl: tblInfo})
return job
}

func buildRebaseAutoID(dbInfo *model.DBInfo, tblInfo *model.TableInfo, newBaseID int64) *model.Job {

This comment has been minimized.

Copy link
@shenli

shenli Jul 16, 2018

Member

buildRebaseAutoIDJob

@@ -149,6 +146,15 @@ func asyncNotify(ch chan struct{}) {
// buildJobDependence sets the curjob's dependency-ID.
// The dependency-job's ID must less than the current job's ID, and we need the largest one in the list.
func buildJobDependence(t *meta.Meta, curJob *model.Job) error {
switch curJob.Type {

This comment has been minimized.

Copy link
@shenli

shenli Jul 16, 2018

Member

Please add comments.

@@ -162,6 +168,7 @@ func buildJobDependence(t *meta.Meta, curJob *model.Job) error {
return errors.Trace(err)
}
if isDependent {
log.Infof("[ddl] current DDL job %v is dependent job %v", curJob, job)

This comment has been minimized.

Copy link
@shenli

shenli Jul 16, 2018

Member

How about "current DDL job %v depends on job %v"?

@@ -348,7 +374,8 @@ func (w *worker) handleDDLJobQueue(d *ddlCtx, shouldCleanJobs bool) error {
return errors.Trace(w.handleUpdateJobError(t, job, err))
})

if runJobErr != nil {
waitDependencyJob := job != nil && job.DependencyID != 0
if runJobErr != nil || waitDependencyJob {

This comment has been minimized.

Copy link
@shenli

shenli Jul 16, 2018

Member

If waitDependencyJob is true, it is not an error.

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 16, 2018

Author Member

Yes, but we'd better wait a moment. I will add a comment for it.

@shenli

This comment has been minimized.

Copy link
Member

commented Jul 16, 2018

/run-all-tests

return m.txn.LLen(m.jobListKey)
// If the length of jobListKeys isn't zero, we need to replace m.jobListKey with jobListKeys[0].
// Otherwise, we use m.jobListKey directly.
func (m *Meta) DDLJobQueueLen(jobListKeys ...JobListKeyType) (int64, error) {

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 18, 2018

Member

Use variable params here is weird, just pass job-list-key.

jobs, err := t.GetAllDDLJobs()
// Jobs in the same queue are ordered. If we want to find a job's dependency-job, we need to look for
// it from the other queue. So if the job is "ActionAddIndex" job, we need find its dependency-job from DefaultJobList.
// TODO: rename SetJobListKey to ChangeJobQueue.

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 18, 2018

Member

Remove TODO?

@@ -306,7 +305,6 @@ func newDDL(ctx context.Context, etcdCli *clientv3.Client, store kv.Storage,
uuid: id,
store: store,
lease: lease,
ddlJobCh: make(chan struct{}, 1),
ddlJobDoneCh: make(chan struct{}, 1),

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 18, 2018

Member

Could we move the ddlJobDoneCh to the different worker?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 19, 2018

Author Member

Why?

if historyJob == nil {
return false, nil
}
log.Infof("[ddl] DDL job %v isn't dependent on job ID %d", job, job.DependencyID)

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 18, 2018

Member

DDL job %v isn't dependent on job ID %d ? What about DDL job %v dependent job ID %d is finished ?

@@ -104,7 +104,11 @@ func CancelJobs(txn kv.Transaction, ids []int64) ([]error, error) {
errs[i] = errors.Trace(err)
continue
}
err = t.UpdateDDLJob(int64(j), job, true)
if job.Type == model.ActionAddIndex {
err = t.UpdateDDLJob(int64(j), job, true, meta.AddIndexJobListKey)

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 18, 2018

Member

Why not just new a meta with meta.AddIndexJobListKey?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 19, 2018

Author Member

We have a meta here, I think it's OK.

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 24, 2018

Member

I mean we can use a new meta to avoid add a param to the function.

ddl/ddl.go Outdated
@@ -413,11 +411,9 @@ func (d *ddl) genGlobalID() (int64, error) {

// generalWorker returns the first worker. The ddl structure has only one worker before we implement the parallel worker.

This comment has been minimized.

Copy link
@shenli

shenli Jul 20, 2018

Member

Need to update the comment.

zimulala added 3 commits Jul 23, 2018
@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 23, 2018

@shenli

This comment has been minimized.

Copy link
Member

commented Jul 23, 2018

@zimulala Please resolve the conflicts.

@@ -282,41 +299,61 @@ func (w *worker) finishDDLJob(t *meta.Meta, job *model.Job) (err error) {
return errors.Trace(err)
}

func isDependencyJobDone(t *meta.Meta, job *model.Job) (bool, error) {
if job.DependencyID == 0 {

This comment has been minimized.

Copy link
@shenli

shenli Jul 23, 2018

Member

Please create a constant for 0 or add comment for the if statement.

return true, nil
}

func newMetaWithQueueTp(txn kv.Transaction, tp string) *meta.Meta {

This comment has been minimized.

Copy link
@shenli

shenli Jul 23, 2018

Member

Please address it.

@@ -86,11 +86,20 @@ type Meta struct {
}

// NewMeta creates a Meta in transaction txn.
func NewMeta(txn kv.Transaction) *Meta {
// If the current Meta needs to handle a job, jobListKey is the type of the job's list.
// We don't change the value of the jobListKey in a Meta.

This comment has been minimized.

Copy link
@shenli

shenli Jul 23, 2018

Member

Can not understand the comment.

for {
kv.RunInNewTxn(s.store, false, func(txn kv.Transaction) error {
m := meta.NewMeta(txn)
// Get the number of jobs from the adding index queue.
addIdxLen, err1 := m.DDLJobQueueLen(meta.AddIndexJobListKey)

This comment has been minimized.

Copy link
@shenli

shenli Jul 24, 2018

Member

Can we use GetDDLJobs and get the length of the return value?

@@ -175,14 +190,18 @@ func (d *ddl) addDDLJob(ctx sessionctx.Context, job *model.Job) error {
job.Version = currentVersion
job.Query, _ = ctx.Value(sessionctx.QueryString).(string)
err := kv.RunInNewTxn(d.store, true, func(txn kv.Transaction) error {
t := meta.NewMeta(txn)
t := newMetaWithQueueTp(txn, job.Type.String())

This comment has been minimized.

Copy link
@shenli

shenli Jul 24, 2018

Member

Why not use meta.AddIndexJobListKey as the second parameter?

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Jul 24, 2018

Contributor

The job may not be add index, So has to according to the job type to create meta?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 24, 2018

Author Member

Almost. I want to put the check of job type or worker type into newMetaWithQueueTp.

@@ -362,6 +401,7 @@ func (w *worker) handleDDLJobQueue(d *ddlCtx, shouldCleanJobs bool) error {
// No job now, return and retry getting later.
return nil
}
w.waitDependencyJobFinished(job, &waitDependencyJobCnt)

This comment has been minimized.

Copy link
@shenli

shenli Jul 24, 2018

Member

If its dependencyJob is not done yet, it would return at line 357. So why we need to wait here?

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Jul 24, 2018

Contributor

line 357 return is in a txn func, not return in handleDDLJobQueue func

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 24, 2018

Author Member

As @crazycs520 said. And if put it in line357, we need wait for 200ms. I am afraid this txn is easy to conflict. So I put it here.

@@ -86,11 +86,19 @@ type Meta struct {
}

// NewMeta creates a Meta in transaction txn.
func NewMeta(txn kv.Transaction) *Meta {
// If the current Meta needs to handle a job, jobListKey is the type of the job's list.
func NewMeta(txn kv.Transaction, jobListKeys ...JobListKeyType) *Meta {

This comment has been minimized.

Copy link
@shenli

shenli Jul 24, 2018

Member

Can we always specify the JobListKey?

This comment has been minimized.

Copy link
@zimulala

zimulala Jul 24, 2018

Author Member

A lot of places use this function, so I use this method to handle it.
And In other packages, I think we needn't distinguish the type of jobListKeys.

@shenli
shenli approved these changes Jul 24, 2018
Copy link
Member

left a comment

LGTM

@shenli shenli added status/LGT2 and removed status/LGT1 labels Jul 24, 2018
Copy link
Member

left a comment

LGTM

@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2018

/run-all-tests

@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 25, 2018

/run-common-test
/run-integration-ddl-test

@zimulala

This comment has been minimized.

Copy link
Member Author

commented Jul 25, 2018

/run-common-test -tidb-test=pr/592
/run-integration-common-test -tidb-test=pr/592

@zimulala zimulala merged commit da6f0c1 into pingcap:master Jul 25, 2018
7 checks passed
7 checks passed
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci-tidb/build Jenkins job succeeded.
Details
jenkins-ci-tidb/common-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details
@zimulala zimulala deleted the zimulala:parallel-ddl branch Jul 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.