Adding index in parallel #19386

djshow832 · 2020-08-24T03:46:36Z

Description

Now adding index is processed only on the DDL owner. When the table is huge, it takes too much time. We can leverage the computing capability of the whole cluster to accomplish it.

Here's a rough plan:

When the DDL owner receives an adding index job, it looks up the statistics to figure out whether the table is huge. It not, it just executes it serially as before.
If the table is huge, the owner splits the job into many subjobs, each of which takes care of a relatively small range of data. The subjobs are put into a queue.
The owner sends the subjobs to all the other TiDB instances. Each TiDB instance receives only one subjob at one time. Note that as the number of subjobs is greater than the number of TiDB instances, subjobs are not sent all at once.
Each subjob is further split into multiple ranges, each of which corresponds to a transaction. Once a transaction is done, TiDB persists the progress onto TiKV.
Once a TiDB instance has processed a subjob, it notifies the owner to fetch the next subjob. In this way, we can make sure that the more efficient a TiDB instance is, the more subjobs it processes.
If a subjob fails in one TiDB instance, it notifies the owner to rollback. The owner notifies all the TiDB instances to stop, and the rollback job is executed in background on the owner.
If a TiDB instance hasn't responded for some time, it may be down. The owner sends the subjob to another TiDB instance. The new TiDB instance reads the progress and continues the subjob.
Once the owner is down, the new owner reads the subjob queue and continues the whole job.
Once all the subjobs are done, the owner returns success.

Score

6600

SIG Slack Channel

You can join #sig-ddl on slack in your spare time to discuss and get help with mentors or others.

Mentor

@djshow832

Contact the mentors: #tidb-challenge-program channel in TiDB Community Slack Workspace

Recommended Skills

DDL
Transaction
Golang

Learning Materials

TiDB DDL architecture

ghost · 2020-08-25T03:52:11Z

+1 I like this design over any static weighting because it responds well to environments where virtual machines could be migrated around or performance could change due to noisy neighbors/steal.

TszKitLo40 · 2020-09-09T10:52:57Z

/pick-up

TszKitLo40 · 2020-09-14T02:17:09Z

@djshow832 I want to try this issue, can this issue be picked up?

djshow832 · 2020-09-14T02:58:25Z

@djshow832 I want to try this issue, can this issue be picked up?

Of course, please try picking up again. @TszKitLo40

ti-challenge-bot · 2020-09-15T13:01:59Z

The description of issue updated, but still has some problems.

More

Tip :
You need to ensure that the description of the issue follows the following template:

```
## Score

- ${score}

## Mentor

- ${mentor}
```

Warning: The description format for this issue is wrong.

TszKitLo40 · 2020-09-18T02:05:00Z

/pick-up

Rustin170506 · 2020-09-18T03:04:43Z

/pick-up

@TszKitLo40 Sorry you cannot pick up this issue. Because you do not have a team.

This is a known issue. I will fix it later.

hptc challenge program only for team, so you can try to join a team.

ben2077 · 2020-09-18T14:15:21Z

/pick-up

ti-challenge-bot · 2020-09-18T14:15:28Z

Pick up success.

aierui · 2020-09-19T14:20:48Z

/pick-up

ti-challenge-bot · 2020-09-19T14:20:53Z

This issue already picked by CodingBen.

ben2077 · 2020-09-25T10:08:45Z

For owner:
First check the job is the parent, if it is, it will only do following three things:
(only the owner can execute the parent job)

It will get all the subTasks from the subTaskQueue, then check the ‘runner’ status
which represents the TIDB instance that is executing the subTask,(if it’s down, update the runner to nil and set status to unclaimed)
Check the subTask’s status one by one, reset the failed subTask’ runner and status to nil and unclaimed.
If owner find all the subTasks in queue witch have same jobId were done, it will finish the job.

For the TiDB Instance, which is not the owner：
All the TIDB instance in the cluster will read the subTaskQueue on time, get at most one unclaimed task and execute it each time.

ben2077 · 2020-09-25T10:32:11Z

/pick-up

ti-challenge-bot · 2020-09-25T10:32:16Z

The challenge program issue is already in the assign flow and development has started. So you cannot pick up this issue. You can try other issues.

ben2077 · 2020-09-26T17:46:47Z

The status quo I recognize:
The earliest index addition logic is based on a single physical table, so the progress information of add index exists in such a triplet:
1.JobId-StartHandle (where the current job has been executed)
2.JobId-EndHandle
3.JobId-Physical table ID
This design actually limits the processing relationship between add index ddl job and physical table at the same time in the system to only one pair, so the code for the add index task of the partition table is also operated one by one. . But it is difficult and inefficient to perform such logic in a cluster. So we may redesign such a recorg information. As follows:
1.JobId-SubTaskId-StartHandle
2.JobId-SubTaskId-EndHandle
3.JobId-SubTaskId-AddedCount (for progress statistics)
In this way, the backfill index job can be restored from the previous Job-Partition level to the SubTask-Range level.

djshow832 added type/performance difficulty/hard sig/sql-infra SIG: SQL Infra challenge-program labels Aug 24, 2020

ti-srebot added the challenge-program-2 label Aug 24, 2020

AndreMouche removed the challenge-program-2 label Sep 2, 2020

ghost mentioned this issue Sep 12, 2020

Support Concurrent DDL Execution #18396

Closed

3 tasks

ti-challenge-bot bot added picked and removed picked labels Sep 14, 2020

pingcap deleted a comment from Rustin170506 Sep 14, 2020

pingcap deleted a comment from ti-challenge-bot bot Sep 14, 2020

pingcap deleted a comment from Rustin170506 Sep 14, 2020

pingcap deleted a comment from ti-challenge-bot bot Sep 14, 2020

pingcap deleted a comment from Rustin170506 Sep 14, 2020

pingcap deleted a comment from ti-challenge-bot bot Sep 14, 2020

Rustin170506 added hptc and removed high-performance labels Sep 17, 2020

Rustin170506 mentioned this issue Sep 17, 2020

can not change challenge program ti-community-infra/ti-challenge-bot#116

Closed

1 task

ti-challenge-bot bot added the picked label Sep 18, 2020

pingcap deleted a comment from ti-challenge-bot bot Sep 18, 2020

Rustin170506 removed the picked label Sep 18, 2020

ti-challenge-bot bot added the picked label Sep 18, 2020

AndreMouche assigned djshow832 Sep 23, 2020

Rustin170506 removed the picked label Sep 25, 2020

ben2077 mentioned this issue Sep 25, 2020

ddl: add index in parallel(#19386) #20222

Closed

breezewish mentioned this issue Sep 27, 2020

Update issue status in issue body ti-community-infra/ti-challenge-bot#148

Open

ghost mentioned this issue Nov 13, 2020

Support Background Execution or CREATE EVENT #20848

Open

tisonkun removed the hptc label Aug 13, 2021

tisonkun removed the difficulty/hard label Sep 1, 2021

djshow832 closed this as completed Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding index in parallel #19386

Adding index in parallel #19386

djshow832 commented Aug 24, 2020 •

edited by Rustin170506

Loading

ghost commented Aug 25, 2020

TszKitLo40 commented Sep 9, 2020

TszKitLo40 commented Sep 14, 2020

djshow832 commented Sep 14, 2020 •

edited

Loading

ti-challenge-bot bot commented Sep 15, 2020

TszKitLo40 commented Sep 18, 2020

Rustin170506 commented Sep 18, 2020 •

edited

Loading

ben2077 commented Sep 18, 2020

ti-challenge-bot bot commented Sep 18, 2020

aierui commented Sep 19, 2020

ti-challenge-bot bot commented Sep 19, 2020

ben2077 commented Sep 25, 2020 •

edited

Loading

ben2077 commented Sep 25, 2020

ti-challenge-bot bot commented Sep 25, 2020

ben2077 commented Sep 26, 2020 •

edited

Loading

Adding index in parallel #19386

Adding index in parallel #19386

Comments

djshow832 commented Aug 24, 2020 • edited by Rustin170506 Loading

Description

Score

SIG Slack Channel

Mentor

Recommended Skills

Learning Materials

ghost commented Aug 25, 2020

TszKitLo40 commented Sep 9, 2020

TszKitLo40 commented Sep 14, 2020

djshow832 commented Sep 14, 2020 • edited Loading

ti-challenge-bot bot commented Sep 15, 2020

TszKitLo40 commented Sep 18, 2020

Rustin170506 commented Sep 18, 2020 • edited Loading

ben2077 commented Sep 18, 2020

ti-challenge-bot bot commented Sep 18, 2020

aierui commented Sep 19, 2020

ti-challenge-bot bot commented Sep 19, 2020

ben2077 commented Sep 25, 2020 • edited Loading

ben2077 commented Sep 25, 2020

ti-challenge-bot bot commented Sep 25, 2020

ben2077 commented Sep 26, 2020 • edited Loading

djshow832 commented Aug 24, 2020 •

edited by Rustin170506

Loading

djshow832 commented Sep 14, 2020 •

edited

Loading

Rustin170506 commented Sep 18, 2020 •

edited

Loading

ben2077 commented Sep 25, 2020 •

edited

Loading

ben2077 commented Sep 26, 2020 •

edited

Loading