[FEATURE] Local volume for distributed data workloads #3957

innobead · 2022-05-10T08:13:32Z

Is your feature request related to a problem? Please describe

Longhorn is a highly available replica-based storage system, and it's good for fault tolerance, read performance, data protection, etc, but on the other side, it also needs some extra costs like requiring more disk paces for replication.

In some cases, especially for distributed data workloads (SS) like databases (ex: Cassandra, Kafka, etc), they already have their own data replication, sharding, etc, so we should provide a better volume type for these use cases but also still support existing specific volume functionalities like snapshotting, backup/restore, etc.

Describe the solution you'd like

Extend Data Locality with strict or enforced mode to require one replica should be local and next to the workload
Connecting the local replica via a local socket file instead of a TCP connection

Describe alternatives you've considered

N/A

Additional context

#1965

The text was updated successfully, but these errors were encountered:

derekbit · 2022-06-15T02:52:40Z

As the benchmarking result, if the volume and the single replica are on the same node, the latency and iops can be improved significantly.

In addition, the tcp can be replaced with unix domain socket to gain more performance in this case.

derekbit · 2022-06-15T09:24:34Z

Replace the tcp connection between the engine and the single replica with unix-domain-socket.

The IOPS and Latency are improved.
The bandwidth number is saturated, so the improvement is not seen here.

joshimoo · 2022-06-15T21:43:20Z

@derekbit good job on the evaluation :)

innobead · 2022-10-27T14:46:41Z

@derekbit as we discussed, let's add this to 1.4.0.

Bessonov · 2022-11-22T14:44:28Z

This is a great news! May I ask what brings IO down in comparison to the baseline?

innobead · 2022-11-23T23:55:42Z

This is a great news! May I ask what brings IO down in comparison to the baseline?

It's more about the improvement of latency. Basically, there are 3 primary things:

Data locality
Data transfer over Unix socket instead of TCP stack
No remote replication

longhorn-io-github-bot · 2022-11-24T06:26:53Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:
Does the PR include the explanation for the fix or the feature?
Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
The PR for the YAML change is at:
The PR for the chart change is at:

#4918

Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at

longhorn/longhorn-manager#1562
longhorn/longhorn-engine#771

Which areas/issues this PR might have potential impacts on?
Area: data path, performance
Issues
If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
The LEP PR is at lep: add local-volume #4928
If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
The documentation issue/PR is at
If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
The automation skeleton PR is at
The automation test case PR is at
The issue of automation test case implementation is at (please create by the template)

Bessonov · 2022-11-24T10:13:54Z

It's more about the improvement of latency. Basically, there are 3 primary things:

Data locality

Data transfer over Unix socket instead of TCP stack

No remote replication

Thanks for your answer. Probably, my question was misleading. I've asked about difference between local-path-provisioner and longhorn with data locality, unix socket and without replication. The difference between 90,531 and 47,667, and also between 77,799 and 21,158 is still huge.

innobead · 2022-11-24T10:59:17Z

Thanks for your answer. Probably, my question was misleading. I've asked about difference between local-path-provisioner and longhorn with data locality, unix socket and without replication. The difference between 90,531 and 47,667, and also between 77,799 and 21,158 is still huge.

I see.

Longhorn local volume is not for achieving a similar performance of local-path-provisioner you mentioned, but rather still based on the existing data path w/ some changes above to ensure strict data locality between engine and replica to gain some IO performance if compared with volumes with best-effort or disabled locality.

derekbit · 2022-11-24T13:37:46Z

Performance update

michaelandrepearce · 2022-11-24T14:57:12Z

Latency is still somewhat 500% local path

derekbit · 2022-11-24T15:53:16Z

@michaelandrepearce @Bessonov

The local volume's data path is not changed a lot in this improvement in order to preserve the existing functionalities such as snapshotting, backup, restore, etc.

We will continue the improvement the local volume, e.g. pass-througn, to squeeze more performance. However, the performance difference is still significant after these improvements because of the existing data path design.

michaelandrepearce · 2022-11-24T17:02:35Z

sure 150% or something is fine to get the overlay, but 500% on latency is a bit too much to buy, makes it unusable for localpath usecases which is where you want fast low latency disk access for systems that themselves take care of replication (cassandra, redpanda, postgres, chronicle stores), also its a quite some what of a reduction of IOPS available, looking at stats above write is from 97k down to 28k at best. Its not to dismiss what work has been done here, i think its a great step in the right direction, its just to realistically fit localpv use cases perf is somewhat off atm. Did anything get done with SPDK? I think in last discussions it was an idea that it could help reduce some of that..

michaelandrepearce · 2022-11-24T17:06:11Z

what i raised and has been closed for this issue, #1965 the point here was to and i quote use case for localpv's akin to openebs;s local pv offerings e.g. their lvm-localpv or hostpath-localpv , or minio's directpv. without needing to switch vendor, and also still having some unified management, e.g. ui,monitoring, backup.

"Using K8s native LocalPV's are useful as no network-based storage can keep up with baremetal in write IOPS/latency/throughput, when using NVME/Optane disks. Giving Direct I/O: Near-zero disk performance overhead"

derekbit · 2022-11-25T07:43:22Z

Need to update hte upgrade test image, because new options are added.
cc @longhorn/qa

chriscchien · 2022-11-28T09:33:07Z

Hi @derekbit

When I attached strict-local volume form one node to another node, volume will looped between attaching and detaching, the only active button was delete, did this circumstance expected? thank you.

derekbit · 2022-11-28T10:43:18Z

volume will looped between attaching and detaching

This is expected, because the strict-local volume one replica should satisfy

single replica
engine and replica are on the same

So, if you attach the volume to another node, the attach-detach loop should happe because of the lack of a replica.

chriscchien · 2022-11-29T03:10:47Z

Verified in longhorn master 400b8c with test steps
Result Pass

Can successfully create a local volume with numberOfReplicas=1 and dataLocality=strict-local
Webhook rejected the following cases when the volume is created or attached
- Local volume with dataLocality=strict-local but numberOfReplicas>1
- Update a attached local volume's numberOfReplicas to a value greater than one
- Update a attached local volume's dataLocality to disabled or best-effort
Volume and restored volume can used by workload and data kept consistent (tested by deployment with nodeName set)

innobead · 2022-11-29T03:16:26Z

volume will looped between attaching and detaching

This is expected, because the strict-local volume one replica should satisfy

single replica

engine and replica are on the same

So, if you attach the volume to another node, the attach-detach loop should happe because of the lack of a replica.

After discussing with @derekbit , see if we need to have a validation hook to avoid unnecessary intended reconsiling for this situation.

innobead · 2022-11-30T03:07:49Z

We also need to check if local volume supports auto/manual salvage.

https://github.com/longhorn/longhorn-tests/blob/ee78127d9dd5c05dbf66f95ab3b5f6936b90b447/docs/content/manual/pre-release/ha/single-replica-node-down.md?plain=1#L23-L32

cc @chriscchien @longhorn/qa

chriscchien · 2022-11-30T04:43:40Z

Verify test case Node restart/down scenario with Pod Deletion Policy When Node is Down set to delete-both-statefulset-and-deployment-pod on longhorn master 066dde with strict-local volume
Result Pass

With Pod Deletion Policy When Node is Down set to delete-both-statefulset-and-deployment-pod, after volume attached node power off for 10 minutes then power up that node, deployment pod will eventually recreated and attached to node which local volume attached to, and the data kept consistent.

innobead added kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) component/longhorn-manager Longhorn manager (control plane) highlight Important feature/issue to highlight labels May 10, 2022

innobead added this to the v1.4.0 milestone May 10, 2022

innobead added the require/lep Require adding/updating enhancement proposal label May 10, 2022

joshimoo added component/longhorn-instance-manager Longhorn instance manager (interface between control and data plane) area/csi CSI related like control/node driver, sidecars area/v1-data-engine v1 data engine (iSCSI tgt) labels May 11, 2022

innobead assigned derekbit Jun 2, 2022

joshimoo mentioned this issue Jun 17, 2022

[TASK] Investigate data plane optimization potential #3446

Closed

innobead mentioned this issue Aug 4, 2022

[BUG] Faulted volume when node crashes (non replicated volumes) #3670

Open

innobead modified the milestones: v1.4.0, v1.5.0 Sep 19, 2022

innobead modified the milestones: v1.5.0, v1.4.0 Oct 27, 2022

innobead mentioned this issue Nov 7, 2022

Support Data Locality option to always keep a replica local to the engine #1045

Closed

This was referenced Nov 22, 2022

feat: support local volume longhorn/longhorn-manager#1562

Merged

feat: support local volume (data path) longhorn/longhorn-engine#771

Merged

local volume: update crds and manifests #4918

Closed

innobead mentioned this issue Nov 22, 2022

[FEATURE] Longhorn LocalPV #1965

Closed

derekbit mentioned this issue Nov 23, 2022

local volume: update crds and manifests #4929

Merged

derekbit mentioned this issue Nov 24, 2022

[FEATURE] Local volume data path pass-through #4935

Open

derekbit mentioned this issue Nov 25, 2022

lep: add local-volume #4928

Merged

innobead assigned chriscchien Nov 28, 2022

chriscchien closed this as completed Nov 29, 2022

chriscchien mentioned this issue Nov 30, 2022

[TEST] Add strict-local volume test scenario into single-replica-node-down test #4967

Open

1 task

c3y1huang mentioned this issue Dec 5, 2022

chore: run k8s/generate_code.sh longhorn/longhorn-manager#1584

Merged

derekbit mentioned this issue Dec 8, 2022

Update data-locality doc longhorn/website#633

Merged

innobead added the require/doc Require updating the longhorn.io documentation label Dec 9, 2022

liyimeng mentioned this issue Jan 19, 2023

[QUESTION] Very disappointing performance, is this expected? #3037

Open

TheoBassaw mentioned this issue Feb 13, 2023

[KB]NVMe PCIe - Slow Virtual Machine Performance harvester/harvester#3356

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Local volume for distributed data workloads #3957

[FEATURE] Local volume for distributed data workloads #3957

innobead commented May 10, 2022 •

edited

derekbit commented Jun 15, 2022 •

edited

derekbit commented Jun 15, 2022

joshimoo commented Jun 15, 2022

innobead commented Oct 27, 2022

Bessonov commented Nov 22, 2022

innobead commented Nov 23, 2022

longhorn-io-github-bot commented Nov 24, 2022 •

edited by derekbit

Bessonov commented Nov 24, 2022

innobead commented Nov 24, 2022 •

edited

derekbit commented Nov 24, 2022

michaelandrepearce commented Nov 24, 2022

derekbit commented Nov 24, 2022 •

edited

michaelandrepearce commented Nov 24, 2022 •

edited

michaelandrepearce commented Nov 24, 2022 •

edited

derekbit commented Nov 25, 2022 •

edited

chriscchien commented Nov 28, 2022 •

edited

derekbit commented Nov 28, 2022

chriscchien commented Nov 29, 2022

innobead commented Nov 29, 2022

innobead commented Nov 30, 2022 •

edited

chriscchien commented Nov 30, 2022

[FEATURE] Local volume for distributed data workloads #3957

[FEATURE] Local volume for distributed data workloads #3957

Comments

innobead commented May 10, 2022 • edited

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

derekbit commented Jun 15, 2022 • edited

derekbit commented Jun 15, 2022

joshimoo commented Jun 15, 2022

innobead commented Oct 27, 2022

Bessonov commented Nov 22, 2022

innobead commented Nov 23, 2022

longhorn-io-github-bot commented Nov 24, 2022 • edited by derekbit

Pre Ready-For-Testing Checklist

Bessonov commented Nov 24, 2022

innobead commented Nov 24, 2022 • edited

derekbit commented Nov 24, 2022

michaelandrepearce commented Nov 24, 2022

derekbit commented Nov 24, 2022 • edited

michaelandrepearce commented Nov 24, 2022 • edited

michaelandrepearce commented Nov 24, 2022 • edited

derekbit commented Nov 25, 2022 • edited

chriscchien commented Nov 28, 2022 • edited

derekbit commented Nov 28, 2022

chriscchien commented Nov 29, 2022

innobead commented Nov 29, 2022

innobead commented Nov 30, 2022 • edited

chriscchien commented Nov 30, 2022

innobead commented May 10, 2022 •

edited

derekbit commented Jun 15, 2022 •

edited

longhorn-io-github-bot commented Nov 24, 2022 •

edited by derekbit

innobead commented Nov 24, 2022 •

edited

derekbit commented Nov 24, 2022 •

edited

michaelandrepearce commented Nov 24, 2022 •

edited

michaelandrepearce commented Nov 24, 2022 •

edited

derekbit commented Nov 25, 2022 •

edited

chriscchien commented Nov 28, 2022 •

edited

innobead commented Nov 30, 2022 •

edited