Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br: pipeline backup schemas #43003

Merged
merged 8 commits into from Apr 17, 2023
Merged

Conversation

Leavrth
Copy link
Contributor

@Leavrth Leavrth commented Apr 12, 2023

What problem does this PR solve?

Issue Number: close #43002

Problem Summary:
when backup a large cluster, br will loads many table info into memory, which leads to oom.

What is changed and how it works?

pipeline backup schemas

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: Leavrth <jianjun.liao@outlook.com>
Signed-off-by: Leavrth <jianjun.liao@outlook.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Apr 12, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • BornChanger
  • YuJuncen

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none do-not-merge/needs-triage-completed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 12, 2023
Comment on lines 413 to 422
ranges, schemas, policies, err := BuildBackupRangeAndSchema(storage, tableFilter, backupTS, isFullBackup, true)
if err != nil {
return nil, nil, nil, errors.Trace(err)
}
// Add keyspace prefix to BackupRequest
for i := range ranges {
start, end := ranges[i].StartKey, ranges[i].EndKey
ranges[i].StartKey, ranges[i].EndKey = storage.GetCodec().EncodeRange(start, end)
}
return ranges, schemas, policies, err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: change the ranges with keyspace prefix only when get the ranges from kv storage instead of checkpoint data.

@@ -498,7 +507,7 @@ func BuildBackupRangeAndSchema(
backupTS uint64,
isFullBackup bool,
buildRange bool,
) ([]rtree.Range, *Schemas, []*backuppb.PlacementPolicy, error) {
) ([]rtree.Range, *SchemasV2, []*backuppb.PlacementPolicy, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: only records the size of the schemas for progress in this function

if err != nil {
return nil, nil, nil, errors.Trace(err)
}

if len(tables) == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the origin code might have a little mistake: When len(tables) > 0 but all the tables is skipped by table-filter, so the dbInfo won't be added into schemas

Copy link
Contributor

@YuJuncen YuJuncen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, is it possible to remove the old Schemas BTW?

br/pkg/backup/schema.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 14, 2023
Signed-off-by: Leavrth <jianjun.liao@outlook.com>
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2023
return nil
}

schemasNum += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why schemaNum increasing at the same frequency as tableNum?

Copy link
Contributor Author

@Leavrth Leavrth Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemaNum is to record the total number of (dbInfo, tableInfo) or (dbInfo, nil). (equivalent to schemas.Len()).
tableNum is to record whether the dbInfo has records any tableInfos. (equivalent to len(tableInfos)).

now tableNum has been changed to var hasTable bool

err = schemas.BackupSchemas(ctx, metaWriter, nil, s.mgr.GetStorage(), nil,
s.cfg.StartTS, schemasConcurrency, 0, true, nil)
s.cfg.StartTS, backup.DefaultSchemaConcurrency, 0, true, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency is different now. Is it intentional?

Copy link
Contributor Author

@Leavrth Leavrth Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because the new schemas need to iterates all tables to get the size, which is repeated in this situation. Therefore, we don't get the size of schemas.

Besides, we can see the workerpool in schemas.BackupSchemas only push the worker to channel. Only when call ApplyOnErrorGroup, it would create a new goroutine. So directly use backup.DefaultSchemaConcurrency is somewhat equivalent to mathutil.Min(backup.DefaultSchemaConcurrency, schemas.Len())).

If schemas.Len() >= backup.DefaultSchemaConcurrency, it is the same to use backup.DefaultSchemaConcurrency.
If schemas.Len() < backup.DefaultSchemaConcurrency, the new version would create the schemas.Len() goroutine in total, which is the same as old version. And backup.DefaultSchemaConcurrency - schemas.Len() more worker struct add into the channel, which is no effect.

@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 17, 2023
@Leavrth
Copy link
Contributor Author

Leavrth commented Apr 17, 2023

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: e0e19e5

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 17, 2023
@Leavrth
Copy link
Contributor Author

Leavrth commented Apr 17, 2023

/retest

1 similar comment
@Leavrth
Copy link
Contributor Author

Leavrth commented Apr 17, 2023

/retest

Signed-off-by: Leavrth <jianjun.liao@outlook.com>
@ti-chi-bot ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Apr 17, 2023
@Leavrth
Copy link
Contributor Author

Leavrth commented Apr 17, 2023

/retest

@Leavrth
Copy link
Contributor Author

Leavrth commented Apr 17, 2023

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 373705f

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 17, 2023
@Leavrth
Copy link
Contributor Author

Leavrth commented Apr 17, 2023

/retest

1 similar comment
@Leavrth
Copy link
Contributor Author

Leavrth commented Apr 17, 2023

/retest

@ti-chi-bot ti-chi-bot merged commit 9cf0ed8 into pingcap:master Apr 17, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backup a large cluster lead to oom
4 participants