New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMPROVEMENT] Faster resync time for fresh replica rebuilding #4092
Comments
Do you mean that, when the existing data size is identical, the speed would be much slower if there are new incoming data during the rebuilding? |
I think yes in general but that's something we need to investigate further, part of replica rebuilding improvement. |
I think the entire way replica's work is a bit flawed in that it doesn't reuse any data when rebuilding. |
Does the |
This is the term from the original request, but it basically just the time to sync data from a healthy replica. So, YES. |
This is not quite right, because in fact, there is data reuse in current versions but the way is not efficient enough and will be improved in the upcoming 1.4.0. Simply said, there are two types of replication.
|
Speed up rebuilding a newly created replica on a node by
Summary Experiment Setup
Experiment Steps After deleting one of the three replica, the rebuilding rebuilding is then triggered. Check the rebuilding time. Experiment Result
Note |
Pre Ready-For-Testing Checklist
longhorn/sparse-tools#83
|
Verified passed on v1.4.x-head (longhorn-engine 3c3f314, backing-image-manager f6835db). Compared v1.4.x-head with v1.4.0-rc1, The improvement of replica rebuilding time can be seen as expected. Test Environment: Equinix Metal c3.medium.x86 Test Steps: (1) Create a 50G Longhorn volume and write 20G random data: Test Result: For v1.4.0-rc1, the rebuilding time is ~ 65s |
Is your improvement request related to a feature? Please describe
Feedback from users.
We recently faced a need to move replicas to new nodes (in order to do maintenance on existing nodes).
Replicas that had low (or non) incoming traffic (like Grafana) were synced quickly, but a replica of Prometheus PVC (250Gbi) took more than 30h to resync fully, because PVC itself was actively used by Prometheus.
It would be great to have the speed of re-sync be faster.
Describe the solution you'd like
Need a further investigation for replica rebuilding performance. Do we have parallel sync from the source replica, have async-IO to write to the destination replica, or retrieve from multiple source replicas, etc?
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: