There are two main components of Longhorn:
- Longhorn Engine implements the data plane.
- Longhorn Manager implements the control plane.
Longhorn Engine implements the data plane of a Longhorn volume. Longhorn Engine has two working modes: engine, or replica.
Replica utilized the disk space to store the volume data. Each volume can have multiple replicas. Each replica contains a full copy of the volume data.
- That also means if you only have one replica survived a disaster, you can still recover the whole volume from it.
Engine connects to replicas to implement the data plane of the volume.
- Any write to the volume will be dispatched to all of the volume replicas.
- Any read operation will be executed by one of the healthy replicas.
By default, the frontend is a block device on the node that the engine is running.
- Currently, we're using iSCSI to implement the block device. A customized
tgtdiSCSI target framework has been used in combination with the
iscsiadmon the host to expose the block device to the end-user.
Longhorn engine doesn't depend on Kubernetes.
- Provide a high availability data plane for access to the volume data, through block device or iSCSI target.
- Make sure any disconnected replica from the controller will not interrupt the data flow of the volume.
- Make sure as long as there is one replica connected to the controller, the data flow of the volume continues.
- Rebuild replicas
- Identify the most likely healthy replica if all the replicas are dead.
- Create/delete/revert snapshots
- Create/delete/restore backups
Status of the volume
- Replica information
- Snapshot information
- Backup/restore/rebuild/purge progress
- Real-time bandwidth/IOPS
Engine related components
Before v0.6.0, each Longhorn engine or replica starts as a Kubernetes pod. Instance Manager was introduced in v0.6.0 release to solve the per node Pod limitation issue.
Now Instance Managers are running on every node to provide the means to launch engine and replica processes. Instead of starting engine and replica as Pod, Instance Manager will start the engine and replica as processes respectively.
The Instance Manager for the engines is also responsible for starting the iSCSI target daemon(
go-iscsi-helper is a library implements the logic of creating iSCSI target daemon and block device for the engine. It is able to create both iSCSI targets (using built-in
tgtd) and use iSCSI initiator on the host to connect to the target to create the block device for the end-user.
Backupstore implements the backup and restoration mechanism for Longhorn. Now it supports two protocols: NFS and S3. It also provides the support for get/list/delete backups in the backupstore.
Longhorn depends on the filesystem's sparse file support to store the metadata of the volume data, e.g. which block has been written. But normal Linux commands may not preserve the sparseness metadata for a file. So we created
sparse-tools to ensure the metadata of the files would be preserved for some file operations. There are two main function of it:
sfoldfor coalescing snapshot files, as a part of the snapshot deletion process.
ssyncto copy snapshot files, as a part of the replica rebuilding process.
- Create Longhorn volume according to Kubernetes's request
- Schedule Longhorn engines across the Kubernetes cluster
- Decide where to put the data when a volume was created for the first time
- Decide where to rebuild the replica when a replica went down
- Provide user-facing functionalities
- Recurring backup
- Multiple disks management
- Disaster Recovery volume
- and more
- Monitoring the attached volume status
Longhorn manager runs on every worker nodes in the Kubernetes cluster, normally as a daemon set. Each manager is responsible for monitoring the attached volumes on the same node.
Managers are implemented using Kubernetes controller pattern.
All the Longhorn manager data are stored in Kubernetes API server (in the etcd) using CRD. Each CRD has it's own controller.
Each CRD resource has two major fields:
status. Controllers inside Longhorn Manager is constantly trying to reconcile the differences between the
spec (the desired state) and the
status (the observed state).
There are a few guidelines for writing a controller:
- One object cannot be created or deleted by its own controller.
- Only one object's own controller should update the object's
statusfield. All the other controllers can only update the
A list of major CRD objects are below:
Represent a Longhorn volume.
The volume controller is responsible for creating/deleting/attaching/detaching/upgrading the Longhorn volume, by creating/deleting/updating the Engine and Replica object.
Represent a Longhorn engine.
The engine controller is responsible for starting/stopping the engine, as well as monitoring the volume status from the engine's report.
Represent a Longhorn replica. The replica controller is responsible for starting/stopping the replica.
Represent an instance manager. The instance manager can be for either engines or replicas.
The status of the instance manager object reflects the status report from the instance manager running on the node, which includes the status of engine or replica processes.
Represent a node in the Kubernetes cluster.
Node controller is responsible for collecting the node and disks information (e.g. remained space on the disk), as well as setting schedulable flags, node/disk tags, etc.
Represent one version of the Longhorn engine.
Since Longhorn engines are microservices, it's possible for each volume to run a different version of engine images. It will happen when the Longhorn Managers are upgraded but the Engines haven't been upgraded yet. Longhorn Manager uses the engine binaries deployed on the node to talk with the volume. Those engine binaries are deployed using the engine image controller.
Engine image controller also responsible for creating/deleting the instance manager object, since each version of the engine will need to run with the instance managers of the same version.
Manager's API is the API facing the end-user. It can be used by either Longhorn UI or Longhorn CSI/Flexvolume drivers.
The majority of the function is provided by changing the spec of CRD object in the Kubernetes API server, to trigger the reconciliation loop inside the Manager's controllers. An action like volume attaching/detaching/healing is done in this way.
API may also call into the running volume's engine to execute operations like snapshot and backup since those are not stored in the CRD yet.