[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092

innobead · 2022-06-09T03:28:07Z

Is your improvement request related to a feature? Please describe

Feedback from users.

We recently faced a need to move replicas to new nodes (in order to do maintenance on existing nodes).

Replicas that had low (or non) incoming traffic (like Grafana) were synced quickly, but a replica of Prometheus PVC (250Gbi) took more than 30h to resync fully, because PVC itself was actively used by Prometheus.

It would be great to have the speed of re-sync be faster.

Describe the solution you'd like

Need a further investigation for replica rebuilding performance. Do we have parallel sync from the source replica, have async-IO to write to the destination replica, or retrieve from multiple source replicas, etc?

Describe alternatives you've considered

N/A

Additional context

N/A

shuo-wu · 2022-06-09T03:45:26Z

Do you mean that, when the existing data size is identical, the speed would be much slower if there are new incoming data during the rebuilding?

innobead · 2022-06-09T05:31:10Z

Do you mean that, when the existing data size is identical, the speed would be much slower if there are new incoming data during the rebuilding?

I think yes in general but that's something we need to investigate further, part of replica rebuilding improvement.

voarsh2 · 2022-06-12T02:47:30Z

I think the entire way replica's work is a bit flawed in that it doesn't reuse any data when rebuilding.

derekbit · 2022-10-21T07:18:58Z

Does the resync here mean that replica runs rebuilding and becomes running in another node?
cc @innobead

innobead · 2022-10-21T08:07:14Z

Does the resync here mean that replica runs rebuilding and becomes running in another node? cc @innobead

This is the term from the original request, but it basically just the time to sync data from a healthy replica. So, YES.

innobead · 2022-12-09T09:02:22Z

I think the entire way replica's work is a bit flawed in that it doesn't reuse any data when rebuilding.

This is not quite right, because in fact, there is data reuse in current versions but the way is not efficient enough and will be improved in the upcoming 1.4.0.

Simply said, there are two types of replication.

Replica rebuilding is based on the potentially reusable replica locally - [IMPROVEMENT] Speed up replica rebuilding by the metadata such as ctime of snapshot disk files #4783 (will be introduced in 1.4.0)
Replica rebuilding fresh from a healthy replica (this issue) - This is about the data path improvement instead.

derekbit · 2022-12-19T15:51:15Z

Speed up rebuilding a newly created replica on a node by

Avoid unnecessary local data checksum calculation
Remote disk file is a newly created, so the comparison of checksum is not unnecessary.
Increase the chunk size
The sync chunk size is 32 KB, and its not efficient because of lots of syscalls and HTTP calls. In addition, the large IO will be benefit from the sequential IO.

Summary
After the proposed methods are applied, the replica rebuilding overall performance is improved by 60% in the experiment

Experiment Setup

Cloud vendor: Equinix
Host
- CPU: Intel(R) Xeon(R) E-2378G CPU @ 2.80GH
- RAM: 64 GiB
- Disk: Micron_5300_MTFD
Kubernetes: v1.23.6+rke2r2 (3 nodes)
Longhorn
- Volume: 50 GiB volume containg 20 GiB random data

Experiment Steps

After deleting one of the three replica, the rebuilding rebuilding is then triggered. Check the rebuilding time.

Experiment Result

Longhorn Version	Rebuilding Time (seconds)	Improvement (%)
v1.4.0-rc1 (without any optimization)	124.0
master-head (avoid unnecessary local data checksumming)	102.0	18 %
master-head (avoid unnecessary local data checksumming + 512 KiB sync chunk size)	66.1	47%
master-head (avoid unnecessary local data checksumming + 2 MiB sync chunk size)	51.0	60%

Note
dd the 20 GiB disk file to /dev/null took 50 seconds (dd if=${disk} of=/dev/null bs=2M count=10000).

longhorn-io-github-bot · 2022-12-19T15:52:16Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:
Does the PR include the explanation for the fix or the feature?
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at

longhorn/sparse-tools#83
longhorn/longhorn-engine#800
longhorn/backing-image-manager#80

Which areas/issues this PR might have potential impacts on?
Area: replica rebuilding performance
Issues

derekbit · 2022-12-21T01:35:39Z

e2e
https://ci.longhorn.io/job/private/job/longhorn-tests-regression/2740/

yangchiu · 2022-12-22T03:08:33Z

Verified passed on v1.4.x-head (longhorn-engine 3c3f314, backing-image-manager f6835db).

Compared v1.4.x-head with v1.4.0-rc1, The improvement of replica rebuilding time can be seen as expected.

Test Environment:

Equinix Metal c3.medium.x86
CPU 1 x AMD EPYC 7402P 24 cores @ 2.8 GHz
Storage 2 x 480 GB SSD
Memory 64 GB
Network 2 x 10 Gbps

Test Steps:

(1) Create a 50G Longhorn volume and write 20G random data:
dd if=/dev/urandom of=/dev/longhorn/test-1 bs=2M count=10000
(2) Repeatedly delete a random replica and measure the rebuilding time

Test Result:

For v1.4.0-rc1, the rebuilding time is ~ 65s
For v1.4.x-head, the rebuilding time is ~ 25s

innobead added area/v1-data-engine v1 data engine (iSCSI tgt) area/performance System, volume performance kind/improvement Request for improvement of existing function labels Jun 9, 2022

innobead added this to the Backlog milestone Jun 9, 2022

innobead added the investigation-needed Need to identify the case before estimating and starting the development label Jun 9, 2022

innobead modified the milestones: Backlog, v1.4.0 Jun 9, 2022

innobead added the priority/2 Nice to fix in this release (managed by PO) label Jul 6, 2022

innobead assigned derekbit Jul 6, 2022

innobead added priority/1 Highly recommended to fix in this release (managed by PO) and removed priority/2 Nice to fix in this release (managed by PO) labels Oct 21, 2022

guangbochen mentioned this issue Nov 16, 2022

[ENHANCEMENT] Harvester supports draining the node when entering maintenance mode harvester/harvester#2723

Closed

innobead modified the milestones: v1.4.0, v1.5.0 Nov 18, 2022

innobead added priority/0 Must be fixed in this release (managed by PO) and removed priority/1 Highly recommended to fix in this release (managed by PO) labels Dec 9, 2022

derekbit mentioned this issue Dec 19, 2022

Speed up rebuilding a newly created replica on a node longhorn/sparse-tools#83

Merged

derekbit mentioned this issue Dec 20, 2022

Speed up rebuilding a newly created replica on a node longhorn/longhorn-engine#800

Merged

innobead added backport/1.3.3 backport/1.4.1 labels Dec 20, 2022

innobead modified the milestones: v1.5.0, v1.4.0 Dec 20, 2022

github-actions bot mentioned this issue Dec 20, 2022

[BACKPORT][v1.4.1][IMPROVEMENT] Faster resync time for replicas that have new data incoming #5111

Closed

innobead added highlight Important feature/issue to highlight and removed investigation-needed Need to identify the case before estimating and starting the development labels Dec 20, 2022

github-actions bot mentioned this issue Dec 20, 2022

[BACKPORT][v1.3.3][IMPROVEMENT] Faster resync time for replicas that have new data incoming #5112

Closed

derekbit mentioned this issue Dec 20, 2022

vendor: update sparse-tools longhorn/backing-image-manager#80

Merged

This was referenced Dec 21, 2022

[BACKPORT][v1.4.0] vendor: update sparse-tools longhorn/longhorn-engine#803

Merged

[BACKPORT][V1.4.0] vendor: update sparse-tools longhorn/backing-image-manager#81

Closed

innobead removed the backport/1.4.1 label Dec 21, 2022

innobead changed the title ~~[IMPROVEMENT] Faster resync time for replicas that have new data incoming~~ [IMPROVEMENT] Faster resync time for fresh replica rebuilding Dec 21, 2022

innobead assigned yangchiu Dec 21, 2022

yangchiu closed this as completed Dec 22, 2022

innobead removed the backport/1.3.3 label Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092

[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092

innobead commented Jun 9, 2022 •

edited by derekbit

shuo-wu commented Jun 9, 2022

innobead commented Jun 9, 2022

voarsh2 commented Jun 12, 2022

derekbit commented Oct 21, 2022

innobead commented Oct 21, 2022

innobead commented Dec 9, 2022 •

edited

derekbit commented Dec 19, 2022 •

edited

longhorn-io-github-bot commented Dec 19, 2022 •

edited by derekbit

derekbit commented Dec 21, 2022

yangchiu commented Dec 22, 2022

[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092

[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092

Comments

innobead commented Jun 9, 2022 • edited by derekbit

Is your improvement request related to a feature? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

shuo-wu commented Jun 9, 2022

innobead commented Jun 9, 2022

voarsh2 commented Jun 12, 2022

derekbit commented Oct 21, 2022

innobead commented Oct 21, 2022

innobead commented Dec 9, 2022 • edited

derekbit commented Dec 19, 2022 • edited

longhorn-io-github-bot commented Dec 19, 2022 • edited by derekbit

Pre Ready-For-Testing Checklist

derekbit commented Dec 21, 2022

yangchiu commented Dec 22, 2022

innobead commented Jun 9, 2022 •

edited by derekbit

innobead commented Dec 9, 2022 •

edited

derekbit commented Dec 19, 2022 •

edited

longhorn-io-github-bot commented Dec 19, 2022 •

edited by derekbit