Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Replica rebuilding gets triggered if network bandwidth is restricted below 80mbit #2882

Closed
khushboo-rancher opened this issue Aug 13, 2021 · 3 comments
Assignees
Labels
backport/1.1.3 Require to backport to 1.1.3 release branch kind/bug severity/3 Function working but has a major issue w/ workaround
Milestone

Comments

@khushboo-rancher
Copy link
Contributor

Describe the bug
Replica rebuilding is observed if the network between engine and replica is restricted below 80mbit.

To Reproduce
Steps to reproduce the behavior:

  1. Create a volume with 3 replicas.
  2. Go to engine and run the below command to restrict the network bandwidth
tc qdisc del dev eth0 root
tc qdisc add dev eth0 root tbf rate 50mbit latency 0.1ms burst 50mbit
  1. Attach the volume to a node and run below job to write data.
fio -filename=/dev/longhorn/vol -name=write-test -ioengine=libaio -direct=1 -iodepth=32 -rw=write -numjobs=2 -runtime=60 -group_reporting -bs=1m
  1. Observe the rebuilding of replicas (Either all or one of them)

Expected behavior
Replica rebuilding should not happen.

Environment:

  • Longhorn version: Longhorn v1.2.0-preview1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE 1.21.3
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 3
  • Node config
    • OS type and version: Ubuntu 20.04
    • CPU per node: 4
    • Memory per node: 8 Gi
    • Disk type(e.g. SSD/NVMe): SSD
    • Network bandwidth between the nodes: Upto 10 Gi
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): DO
  • Number of Longhorn volumes in the cluster: 1
@khushboo-rancher khushboo-rancher added kind/bug severity/3 Function working but has a major issue w/ workaround labels Aug 13, 2021
@yasker yasker added this to the v1.2.0 milestone Aug 13, 2021
@yasker yasker added the backport/1.1.3 Require to backport to 1.1.3 release branch label Aug 13, 2021
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Aug 17, 2021

Pre Ready-For-Testing Checklist

* [ ] Is there a workaround for the issue? If so, where is it documented?
The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

* [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
The PR for the YAML change is at:
The PR for the chart change is at:

* [ ] If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
The LEP PR is at

* [ ] If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
The UI issue/PR is at

* [ ] If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
The documentation issue/PR is at

* [ ] If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
The automation skeleton PR is at
The automation test case PR is at
The issue of automation test case implementation is at (please create by the template)

* [ ] If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
The engine automation PR is at

* [ ] If labeled: require/manual-test-plan Has the manual test plan been documented?
The updated manual test plan is at

* [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
The compatibility issue is filed at

@innobead
Copy link
Member

@keithalucas Please help move to ready for testing state next time if the issue gets ready to verify.

@khushboo-rancher
Copy link
Contributor Author

khushboo-rancher commented Aug 24, 2021

Verified with Longhorn-v1.1.3-head and Longhorn-master 08/23/2021

Validation - Pass

Validated with 10mbit bandwidth, no replica rebuilding is observed and writing worked fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1.3 Require to backport to 1.1.3 release branch kind/bug severity/3 Function working but has a major issue w/ workaround
Projects
None yet
Development

No branches or pull requests

5 participants