SPDK v19.07: NVMe-oF FC Transport, VMD, NVMe-oF Persistent reservations, Bdev I/O with separate metadata
EXPERIMENTAL: Added ability to mirror writes to persistent write buffer cache
to allow for recovery from dirty shutdown event.
Added handling of Asynchronous Nand Management Events (ANM).
EXPERIMENTAL: Added Intel Volume Management Device (VMD) driver. VMD is an integrated
controller inside the CPU PCIe root complex. It enables virtual HBAs for the connected
spdk_vmd_init() enumerates NVMe SSDs behind VMD device and hook them into
SPDK PCI subsystem.
spdk_nvme_connect() can be used to connect
NVMe driver to the device located at the given transport ID.
To obtain transport ID of NVMe SSD behind VMD
spdk_lspci can be used.
Current implementation does not support hotplug.
Blobfs file asynchronous operations were added to public APIs.
A new file API
spdk_posix_file_load was added to load file content into a data buffer.
spdk_dif_get_length_with_md have been added,
and existing APIs
spdk_dif_generate_stream have been refined to insert or strip DIF by stream
fasion with any alignment.
spdk_dix_remap_ref_tag have been added to remap DIF reference tag.
spdk_dif_update_crc32c_stream have been
added to compute CRC-32C checksum for extended LBA payload.
Bdevperf and bdevio applications now support starting tests with application specific
RPCs. Please see helper Python scripts in their respective directories.
This is a move towards simpler RPC-only configuration for all main
and auxiliary applications.
Legacy INI style configuration for SPDK applications will become deprecated in SPDK 19.10,
and removed in SPDK 20.01. Please consider moving to JSON-RPC configuration files and/or
RPC driven run-time configuration.
EXPERIMENTAL: A Fibre Channel transport that supports Broadcom HBAs has been
added. This depends on the FC HBA driver at
https://github.com/ecdufcdrvr/bcmufctdrvr. See the documentation
for more information.
Persistent reservation emulation has been added to the NVMe-oF target. Persistent reservation
state is stored in a JSON file on the local filesystem between target restart. To support this,
an optional parameter to the RPC method
--ptpl-file was added.
This allows the user to specify which file to store the persistent reservation state in. Note
that this is done per namespace.
The c2h success optimization under which a command capsule response is not sent
for reads is turned on by default. A config knob was added to allow disabling
the optimization. This will mostly be used for integration testing with 5.0.x kernels
while some compatibility fixes make their way down the pipeline for 5.1.x kernels.
The sock priority setting of the TCP connection owned by the tcp transport is added. It is
used to optimize the TCP connection performance under designated traffic classes. And the
priority is used to differeniate the sock priority between SPDK NVMe-oF TCP target application
and other TCP based applications.
Shared receive queue can now be disabled even for NICs that support it using the
nvmf_create_transport RPC method parameter
no_srq. The actual use of a shared
receive queue is predicated on hardware support when this flag is not used.
spdk_nvmf_get_optimal_poll_group was added, which is used to return the optimal
poll group for the qpair. And
ConnectionScheduler configuration is added into the
[Nvmf] section in etc/spdk/nvmf.conf.in to demonstrate how to configure the connection
scheduling strategy among different spdk threads.
Added infrastructure to retrieve global and per poll group NVMf statistics.
DIF strip and insert is now supported for TCP transport. When it is enabled, DIF
setting is not exposed to the NVMe-oF initiator, and DIF is attached into data
for write I/O and stripped from data for read I/O.
Added a field
dif_insert_or_strip to struct spdk_nvmf_transport_opts, and
updated the related rpc function nvmf_create_transport to make this
configurable parameter available to users. The
dif_insert_or_strip is relevant
for TCP transport for now and used to configure the DIF strip and insert.
Added infrastructure to retrieve NVMf transport statistics.
respectively. And update type name of callback accordingly.
The format of the data returned by the get_bdevs_iostat RPC has changed to
make it easier to parse. It now returns an object with a "ticks" object
and "bdevs" array with the per-bdev statistics.
A new bdev module
delay has been added which simulates a drive latency when placed
on top of a Null bdev. This module is intended only for testing and can be created using
the new RPC
bdev_delay_create. That RPC takes the name of the underlying bdev as well
as average and p99 latency arguments for both read and write operations. Average latency is
defined as a value close to what you would expect a perf tool such as FIO to report back as
the mean latency of all I/O submitted to the drive. p99 latency is defined as the value one
would expect the drive to see the slowest 1% of I/O report. For underlying drives with already
significant latency, the latency values provided to the drive will be additive. This should be
taken into account if trying to achieve an artificial latency on top of an nvme drive or aio device.
DIF reference tag remapping is now supported for partition type virtual bdev
modules. When using partition type virtual bdevs, block address space is
remapped during I/O processing and DIF reference tag is remapped accordingly.
Added spdk_bdev_*_with_md() functions allowing for IO with metadata being transferred in
separate buffer. To check support for separatate metadata, use spdk_bdev_is_md_separate().
All bdevs now have a UUID. For devices whose backing hardware does not provide a UUID,
one is automatically generated. Across runs of SPDK, bdevs whose UUID is automatically
generated may change.
A new virtual bdev module
compress has been added to provide compression services on top of
a thinly provisioned logical volume. See documentation for complete details.
Added an optional parameter
--io-queue-requests to RPC
can be used to change the number of requests allocated for one NVMe I/O queue. For
very big I/O size, e.g. 128MiB, with this option user will not get an error due to
limited requests in NVMe driver layer.
Added spdk_nvme_ctrlr_get_transport_id() to get the transport ID from a
previously attached controller.
Nvme Opal library spdk_opal_cmd deprecated. Adding seperate command APIs.
NVMe Opal library add support for activating locking SP which will make the transaction
from "Manufactured-Inactive" state to "Manufactured" state. Upon successfully invoking
of this method, lock and unlock features will be enabled.
NVMe Opal library add support for locking/unlocking range and list locking range info.
NVMe opal library add support for multiuser. Admin can enable user and add user to specific
locking range and the user can lock/unlock his range.
Added spdk_nvme_ctrlr_io_cmd_raw_no_payload_build() allowing a caller to pass
a completely formed command to an NVMe submission queue (buffer addresses and all).
This is supported on the PCIe transport only.
Added spdk_nvme_get_ctrlr_registers() to return a pointer to the virtual address
of the NVMe controller registers. This is supported on the PCIe transport only.
Added additional options to the spdk_nvme_ctrlr_alloc_qpair() option parameter
structure to allow caller to override the virtual and optionally physical address
of the submission and completion queue pair to be created. This is supported on
the PCIe transport only.
disable_error_logging to struct spdk_nvme_ctrlr_opts, which disables
logging of failed requests. By default logging is enabled.
Added spdk_nvme_qpair_print_command(), spdk_nvme_qpair_print_completion() and
spdk_nvme_cpl_get_status_string(). Allowing for easier display of error messages.
Added support for NVMe Sanitize command.
free_space has been added to spdk_ring_enqueue() to wait when
the ring is almost full and resume when there is enough space available in
A new API
spdk_mempool_lookup has been added to lookup the memory pool created
by the primary process.
Added spdk_pci_get_first_device() and spdk_pci_get_next_device() to allow
iterating over PCI devices detected by SPDK. Because of this, all SPDK APIs
to attach/detach PCI devices are no longer thread safe. They are now meant to
be called from only a single thread only, the same only that called spdk_env_init().
This applies to the newly added APIs as well.
SPDK now supports VPP version 19.04.2, up from VPP 18.01.
VPP socket abstraction now uses VPP Session API, instead of VLC (VPP Communications Library).
This allows for better control over sessions and queues.
Please see VPP documentation for more details:
VPP Host Stack
Add spdk_sock_get_optimal_sock_group(), which returns the optimal sock group for
this socket. When a socket is created, it is often assigned to a sock group using
spdk_sock_group_add_sock so that a set of sockets can be polled more efficiently.
For some network devices, it is optimal to assign particular sockets to specific
sock groups. This API is intended to provide the user with that information.
spdk_sock_group_get_ctx() was added to return the context of the spdk_sock_group.
spdk_sock_group_create() is updated to allow input the user provided ctx.
spdk_sock_set_priority() is added to set the priority of the socket.
Added thread_get_stats RPC method to retrieve existing statistics.
Added nvmf_get_stats RPC method to retrieve NVMf susbsystem statistics.
Response buffers for RPC requests are now always pre-allocated, which implies
that all spdk_jsonrpc_begin_result() calls always succeed and return a valid
buffer for JSON response. RPC calls no longer need to check if the buffer is
Added SPDK_RPC_REGISTER_ALIAS_DEPRECATED to help with deprecation process when
renaming existing RPC. First time a deprecated alias is used, it will print
a warning message.
get_rpc_methods was renamed
rpc_get_methods. The old name is still usable,
but is now deprecated.
A snapshot can now be deleted if there is only a single clone on top of it.
Preliminary support for cross compilation is now available. Targeting an older
CPU on the same architecture using your native compiler can be accomplished by
--target-arch option to
configure as follows:
Additionally, some support for cross-compiling to other architectures has been
added via the
--cross-prefix argument to
configure. To cross-compile, set CC
and CXX to the cross compilers, then run configure as follows:
./configure --target-arch=aarm64 --cross-prefix=aarch64-linux-gnu
A security vulnerability has been identified and fixed in SPDK Vhost-SCSI target.
A malicious client (e.g. a virtual machine) could send a carefully prepared,
invalid I/O request to crash the entire SPDK process. All users of SPDK Vhost-SCSI
target are strongly recommended to update. All SPDK versions < 19.07 are affected.
By default, SPDK will now rely on upstream DPDK's rte_vhost instead of its fork
located inside SPDK repo. The internal fork is still kept around to support older
DPDK versions, but is considered legacy and will be eventually removed.
configure will now automatically use the upstream rte_vhost if the used DPDK
version is >= 19.05.
spdk_vhost_init() is now asynchronous and accepts a completion callback.
A security vulnerability has been identified and fixed in SPDK iSCSI target.
A malicious client (e.g. an iSCSI initiator) could send a carefully prepared,
invalid I/O request to crash the entire SPDK process. All users of SPDK iSCSI
target are strongly recommended to update. All SPDK versions < 19.07 are affected.
Exposed spdk_set_thread() in order for applications to associate
with SPDK thread when necessary.
Added spdk_thread_destroy() to allow framework polling the thread to
release resources associated with that thread.
Updated OCF submodule to OCF v19.3.2
Added support for many-to-one configuration for OCF bdev.
Multiple core devices can now be cached on single cache device.
Added persistent metadata support, allowing to restore cache state after shutdown.
During start of SPDK application, the devices are examined and if OCF metadata
is present - appropriate OCF bdevs will be recreated.
Added Write-Back mode support. In this mode, data is first written to
caching device and periodically synchronized to the core devices.
Dirty data is saved as persistent metadata on cache device,
allowing for safe restore during application restart.
For more details please see OCF documentation:
OpenCAS cache configuration