Skip to content

Latest commit

 

History

History
236 lines (186 loc) · 20.3 KB

stateful-workloads.mdx

File metadata and controls

236 lines (186 loc) · 20.3 KB
layout page_title description
docs
Considerations for Stateful Workloads
Learn about persistent storage options for stateful workloads on Nomad.

Considerations for Stateful Workloads

By default, Nomad's allocation storage is ephemeral. Nomad can discard it during new deployments, when rescheduling jobs, or if it loses a client. This is undesirable when running persistent workloads such as databases.

This document explores the options for persistent storage of workloads running in Nomad. The information provided is for practitioners familiar with Nomad and with a foundational understanding of storage basics.

Considerations

Consider access patterns, performance, reliability and availability needs, and maintenance to choose the most appropriate storage strategy.

Local storage is performant and available. If it has enough capacity it does not need much maintenance. But it is not redundant; if a single node, disk, or group of disks fails, data loss and service interruption will occur.

A geographically distributed networked storage with multiple redundancies, including disks, controllers, and network paths, provides higher availability and resilience, and can tolerate multiple hardware failures before risking data loss. But the performance and reliability of networked storage depends on the network. It can have higher latency and lower throughput than local storage, and may require more maintenance.

Consider whether Nomad is running in the public cloud or on-premises, and what storage options are available in that environment. From there, the most optimal choice will depend your organizational and application needs.

Public cloud

Public cloud providers offer different storage services with various tradeoffs. Usually they're comprised of local disks, network attached block devices, and networked shared storage.

AWS

AWS service Availability Persistence Performance Suitability
Instance Storage Locally on some instance types Limited, not persistent across instance stops/terminations or hardware failures High throughput and low latency Temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content
Elastic Block Store Zonal block devices attached to one or more instances Persistent, with an independent lifecycle Configurable, but higher latency than Instance Store General purpose persistent storage
Elastic File System Regional/Multi-regional file storage that can be available to multiple instances Persistent, with an independent lifecycle Configurable, but with less throughput and higher latency than Instance Store or EBS File storage that needs to be available to multiple instances in multiple zones (even only as a failover)

Azure

Azure service Availability Persistence Performance Suitability
Ephemeral OS disks Locally on some instance types Limited, not persistent across instance stops/terminations or hardware failures High throughput and low latency Temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content
Managed Disks Zonal or regional block devices attached to one or more VMs Persistent, with an independent lifecycle Configurable General purpose persistent storage
Azure Files Zonal/Regional/Multi-regional file storage that can be available to multiple VMs Persistent, with an independent lifecycle Configurable File storage that needs to be available to multiple VMs in multiple zones (even only as a failover)

GCP

GCP service Availability Persistence Performance Suitability
Local SSD Locally on some instance types Limited, not persistent across instance stops/terminations or hardware failures High throughput and low latency Temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content
Persistent Disk Zonal or regional block devices attached to one or more instances Persistent, with an independent lifecycle Configurable General purpose persistent storage
Filestore Zonal/Regional file storage that can be available to multiple instances Persistent, with an independent lifecycle Configurable File storage that needs to be available to multiple VMs in multiple zones (even only as a failover)

Private cloud or on-premises

When running workloads on-premises in a self-managed private cloud, SAN/NAS systems or Software Defined Storage like Portworx or Ceph usually provide non-local storage. Compute instances can access the storage using a block protocol such as iSCSI, FC, NVMe-oF, or a file protocol such as NFS, CIFS, or both. Dedicated storage teams manage these systems in most organizations.

Consuming persistent storage from Nomad

Since environments differ depending on application requirements, consider performance, reliability, availability, and maintenance when choosing the most appropriate storage driver.

CSI

Container Storage Interface is a vendor-neutral specification that allows storage providers to develop plugins that orchestrators such as Nomad can use. Some CSI plugins can dynamically provision and manage volume lifecycles, including snapshots, deletion, and dynamic resizing. The exact feature set each plugin supports will depend on the plugin and the underlying storage platform.

Find a list of plugins and their feature set in the Kubernetes CSI Developer Documentation.

While Nomad follows the CSI specification, some plugins may implement orchestrator-specific logic that makes them incompatible with Nomad. You should validate that your chosen plugin works with Nomad before using it. Refer to the plugin documentation from the storage provider for more information.

There are three CSI plugin subtypes:

  • Controller: Communicates with the storage provider to manage the volume lifecycle.
  • Node: Runs on all Nomad clients and handles all local operations (for example, mounting/unmounting volumes in allocations). The node must be privileged to perform those operations.
  • Monolithic: Combines both the above roles.

All types can and should be run as Nomad jobs - system jobs for Node and Monolithic, service for Controllers. More information can be found on the CSI concepts documentation page.

CSI plugins are useful when storage requirements are quickly and constantly evolving. For example, an environment that sees new workloads with persistent storage added or removed frequently is well suited for CSI. However, they present some challenges in terms of maintenance - most notably, they need to run continuously, be configured (including authentication and connectivity to the storage platform), and updated to keep track with new features and bug fixes and keep compatibility with the underlying storage platform. They also introduce a couple of moving parts, can be difficult to troubleshoot, and have a complex security profile (due to needing to run as privileged containers in order to be able to mount volumes).

The Stateful Workloads with CSI tutorial and the Nomad CSI demo repository offer guidance and examples on how to use CSI plugins with Nomad and include job files for running the plugins and configuration files for creating and consuming volumes.

Host volumes

Host volumes mount paths from the host (the Nomad client) into allocations. Nomad is aware of host volume availability and makes use of it for job scheduling. However, Nomad does not know about the volume's underlying characteristics, such as if it is a standard folder on a local ext4 filesystem, backed by a distributed networked storage such as GlusterFS, or a mounted NFS/CIFS volume from a NAS or a public cloud service such as AWS EFS. Therefore you can use host volumes for both local somewhat persistent storage and for highly persistent networked storage.

Because you need to declare host volumes in the Nomad agent's configuration file, you must restart the Nomad client to reconfigure them. This makes host volumes impractical if you frequently change your storage configuration. Furthermore, it might require coordination between different personas to configure and consume host volumes. For example, a Nomad Administrator must modify Nomad's configuration file to add/update/remove host volumes to make them available for consumption by Nomad Operators. Or, with networked host volumes, a Storage Administrator will need to provision the volumes and make them available to the Nomad clients. A System Administrator will then mount them on the Nomad clients.

Host volumes backed by local storage help persist data that is not critical, for example an on-disk cache that can be rebuilt if needed. When backed by networked storage such as NFS/CIFS-mounted volumes or distributed storage via GlusterFS/Ceph, host volumes provide a quick option to consume highly available and reliable storage.

Refer to the Stateful workloads with Nomad host volumes tutorial to learn more about using host volumes with Nomad.

NFS caveats

A few caveats with NFS-backed host volumes include ACLs, reliability, and performance. NFS mount options should be the same on all mounting Nomad clients.

Depending on your NFS version, the UID/GID (user/group IDs) can differ between the different Nomad clients, leading to issues when an allocation on another host tries to access the volume. The only way to ensure this isn't an issue is to use NFS v4 with ID mapping based on Kerberos or to have a reliable configuration management/image-building process that ensures UID/GIDs synchronize between hosts. You should use hard mounts to prevent data loss, optionally with intr to enable the option to interrupt NFS requests, which prevents the whole system from locking up in case of NFS server unavailability.

A significant factor in the performance of NFS-backed storage is the wsize and rsize mount options that determine the maximum read/write size of a block. Smaller sizes mean bigger operations will be split into smaller chunks, significantly impacting performance. The underlying storage system's vendor provides the optimal sizes. For example, AWS EFS recommends a value of 1048576 bytes of data for both wsize and rsize.

To learn more about NFS mount options, visit Red Hat's NFS documentation.

Ephemeral disks

Nomad ephemeral disks, describe the best-effort persistence of a Nomad allocation's folder. They support data migrations between hosts (which require network connectivity between the Nomad client nodes) and are size-aware for scheduling. Since persistence is the best effort, however, you will lose data if the client or underlying storage fails. Ephemeral disks are perfect for data that you can rebuild if needed, such as an in-progress cache or a local copy of data.

Storage comparison

With the information laid out in this document, use the following table to choose the storage options that best addresses your Nomad storage requirements.

Storage option Advantages Disadvantages Ideal for
CSI volumes
  • Wide ecosystem with many providers
  • Advanced features such as snapshots, cloning, and resizing
  • Dynamic, flexible, and self-service (anyone with the correct ACL policies can create volumes on-demand)
  • Some complexity and ongoing maintenance
  • Plugin upgrades have to follow the underlying storage provider's API changes/upgrades
  • Not all CSI plugins implement all features
  • Not all CSI plugins respect the CSI spec and are Nomad compatible
  • Node plugins need to run in privileged mode to be able to mount the volumes in allocations
  • Environments where Nomad cluster operators and consumers need to easily add/change storage, and where the storage provider of choice has a CSI plugin that respects the CSI spec
Host volumes backed by local storage
  • Readily available
  • Fast due to being local
  • Doesn't require ongoing maintenance
  • Requires coordination between multiple personas to configure and consume (operators running the Nomad clients need to configure them statically in the Nomad client's configuration file)
  • Not fault tolerant. In case of hardware failure on a single instance, the data will be lost
  • Environments with low persistent storage requirements that could tolerate some failure but prefer not to or have high performance and low latency needs.
Host volumes backed by networked or clustered storage
  • Readily available
  • Require no ongoing maintenance on the Nomad side (but might on the storage provider)
  • Require coordination between multiple personas to configure and consume (storage admins need to provision volumes, operators running the Nomad clients need to configure them statically in the Nomad client's configuration file)
  • The underlying networked storage and its limitations are decoupled from the consumer, but need to be understood. For example, is concurrent access possible
  • Environments with low amounts or low frequency of change of storage that have an existing storage provider that can be consumed via NFS/CIFS.
Ephemeral disks
  • Fast due to being local
  • Basic best effort persistence, including optional migration across Nomad clients
  • Not fault tolerant. In case of hardware failure on a single instance, the data will be lost
  • Environments that need temporary caches, somewhere to store files undergoing processing, etc. Everything which is ephemeral and can be easily rebuilt.

Additional resources

To learn more about Nomad and the topics covered in this document, visit the following resources:

Allocations

CSI