Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092

Closed
innobead opened this issue Jun 9, 2022 · 10 comments
Closed

[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092

innobead opened this issue Jun 9, 2022 · 10 comments
Assignees
Labels
area/performance System, volume performance area/v1-data-engine v1 data engine (iSCSI tgt) highlight Important feature/issue to highlight kind/improvement Request for improvement of existing function priority/0 Must be fixed in this release (managed by PO)
Milestone

Comments

@innobead
Copy link
Member

innobead commented Jun 9, 2022

Is your improvement request related to a feature? Please describe

Feedback from users.

We recently faced a need to move replicas to new nodes (in order to do maintenance on existing nodes).

Replicas that had low (or non) incoming traffic (like Grafana) were synced quickly, but a replica of Prometheus PVC (250Gbi) took more than 30h to resync fully, because PVC itself was actively used by Prometheus.

It would be great to have the speed of re-sync be faster.

Describe the solution you'd like

Need a further investigation for replica rebuilding performance. Do we have parallel sync from the source replica, have async-IO to write to the destination replica, or retrieve from multiple source replicas, etc?

Describe alternatives you've considered

N/A

Additional context

N/A

@innobead innobead added area/v1-data-engine v1 data engine (iSCSI tgt) area/performance System, volume performance kind/improvement Request for improvement of existing function labels Jun 9, 2022
@innobead innobead added this to the Backlog milestone Jun 9, 2022
@innobead innobead added the investigation-needed Need to identify the case before estimating and starting the development label Jun 9, 2022
@shuo-wu
Copy link
Contributor

shuo-wu commented Jun 9, 2022

Do you mean that, when the existing data size is identical, the speed would be much slower if there are new incoming data during the rebuilding?

@innobead
Copy link
Member Author

innobead commented Jun 9, 2022

Do you mean that, when the existing data size is identical, the speed would be much slower if there are new incoming data during the rebuilding?

I think yes in general but that's something we need to investigate further, part of replica rebuilding improvement.

@innobead innobead modified the milestones: Backlog, v1.4.0 Jun 9, 2022
@voarsh2
Copy link

voarsh2 commented Jun 12, 2022

I think the entire way replica's work is a bit flawed in that it doesn't reuse any data when rebuilding.

@innobead innobead added the priority/2 Nice to fix in this release (managed by PO) label Jul 6, 2022
@derekbit
Copy link
Member

Does the resync here mean that replica runs rebuilding and becomes running in another node?
cc @innobead

@innobead
Copy link
Member Author

Does the resync here mean that replica runs rebuilding and becomes running in another node? cc @innobead

This is the term from the original request, but it basically just the time to sync data from a healthy replica. So, YES.

@innobead innobead added priority/1 Highly recommended to fix in this release (managed by PO) and removed priority/2 Nice to fix in this release (managed by PO) labels Oct 21, 2022
@innobead innobead modified the milestones: v1.4.0, v1.5.0 Nov 18, 2022
@innobead innobead added priority/0 Must be fixed in this release (managed by PO) and removed priority/1 Highly recommended to fix in this release (managed by PO) labels Dec 9, 2022
@innobead
Copy link
Member Author

innobead commented Dec 9, 2022

I think the entire way replica's work is a bit flawed in that it doesn't reuse any data when rebuilding.

This is not quite right, because in fact, there is data reuse in current versions but the way is not efficient enough and will be improved in the upcoming 1.4.0.

Simply said, there are two types of replication.

@derekbit
Copy link
Member

derekbit commented Dec 19, 2022

Speed up rebuilding a newly created replica on a node by

  1. Avoid unnecessary local data checksum calculation
    Remote disk file is a newly created, so the comparison of checksum is not unnecessary.
  2. Increase the chunk size
    The sync chunk size is 32 KB, and its not efficient because of lots of syscalls and HTTP calls. In addition, the large IO will be benefit from the sequential IO.

Summary
After the proposed methods are applied, the replica rebuilding overall performance is improved by 60% in the experiment

Experiment Setup

  • Cloud vendor: Equinix
  • Host
    • CPU: Intel(R) Xeon(R) E-2378G CPU @ 2.80GH
    • RAM: 64 GiB
    • Disk: Micron_5300_MTFD
  • Kubernetes: v1.23.6+rke2r2 (3 nodes)
  • Longhorn
    • Volume: 50 GiB volume containg 20 GiB random data

Experiment Steps

After deleting one of the three replica, the rebuilding rebuilding is then triggered. Check the rebuilding time.

Experiment Result

Longhorn Version Rebuilding Time (seconds) Improvement (%)
v1.4.0-rc1 (without any optimization) 124.0
master-head (avoid unnecessary local data checksumming) 102.0 18 %
master-head (avoid unnecessary local data checksumming + 512 KiB sync chunk size) 66.1 47%
master-head (avoid unnecessary local data checksumming + 2 MiB sync chunk size) 51.0 60%

Note
dd the 20 GiB disk file to /dev/null took 50 seconds (dd if=${disk} of=/dev/null bs=2M count=10000).

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Dec 19, 2022

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Does the PR include the explanation for the fix or the feature?

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/sparse-tools#83
longhorn/longhorn-engine#800
longhorn/backing-image-manager#80

  • Which areas/issues this PR might have potential impacts on?
    Area: replica rebuilding performance
    Issues

@derekbit
Copy link
Member

@innobead innobead changed the title [IMPROVEMENT] Faster resync time for replicas that have new data incoming [IMPROVEMENT] Faster resync time for fresh replica rebuilding Dec 21, 2022
@yangchiu
Copy link
Member

Verified passed on v1.4.x-head (longhorn-engine 3c3f314, backing-image-manager f6835db).

Compared v1.4.x-head with v1.4.0-rc1, The improvement of replica rebuilding time can be seen as expected.

Test Environment:

Equinix Metal c3.medium.x86
CPU 1 x AMD EPYC 7402P 24 cores @ 2.8 GHz
Storage 2 x 480 GB SSD
Memory 64 GB
Network 2 x 10 Gbps

Test Steps:

(1) Create a 50G Longhorn volume and write 20G random data:
dd if=/dev/urandom of=/dev/longhorn/test-1 bs=2M count=10000
(2) Repeatedly delete a random replica and measure the rebuilding time

Test Result:

For v1.4.0-rc1, the rebuilding time is ~ 65s
For v1.4.x-head, the rebuilding time is ~ 25s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance System, volume performance area/v1-data-engine v1 data engine (iSCSI tgt) highlight Important feature/issue to highlight kind/improvement Request for improvement of existing function priority/0 Must be fixed in this release (managed by PO)
Projects
None yet
Development

No branches or pull requests

6 participants