Commits
Nitesh-Shetty/…
Name already in use
Commits on Nov 23, 2022
-
fs: add support for copy file range in zonefs
copy_file_range is implemented using copy offload, copy offloading to device is always enabled. To disable copy offloading mount with "no_copy_offload" mount option. At present copy offload is only used, if the source and destination files are on same block device, otherwise copy file range is completed by generic copy file range. copy file range implemented as following: - write pending writes on the src and dest files - drop page cache for dest file if its conv zone - copy the range using offload - update dest file info For all failure cases we fallback to generic file copy range At present this implementation does not support conv aggregation Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
-
dm kcopyd: use copy offload support
Introduce copy_jobs to use copy-offload, if supported by underlying devices otherwise fall back to existing method. run_copy_jobs() calls block layer copy offload API, if both source and destination request queue are same and support copy offload. On successful completion, destination regions copied count is made zero, failed regions are processed via existing method. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
-
dm: Enable copy offload for dm-linear target
Setting copy_offload_supported flag to enable offload. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
-
dm: Add support for copy offload.
Before enabling copy for dm target, check if underlying devices and dm target support copy. Avoid split happening inside dm target. Fail early if the request needs split, currently splitting copy request is not supported. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
-
nvmet: add copy command support for bdev and file ns
Add support for handling target command on target. For bdev-ns we call into blkdev_issue_copy, which the block layer completes by a offloaded copy request to backend bdev or by emulating the request. For file-ns we call vfs_copy_file_range to service our request. Currently target always shows copy capability by setting NVME_CTRL_ONCS_COPY in controller ONCS. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
-
nvme: add copy offload support
For device supporting native copy, nvme driver receives read and write request with BLK_COPY op flags. For read request the nvme driver populates the payload with source information. For write request the driver converts it to nvme copy command using the source information in the payload and submits to the device. current design only supports single source range. This design is courtesy Mikulas Patocka's token based copy trace event support for nvme_copy_cmd. Set the device copy limits to queue limits. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Javier González <javier.gonz@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
-
block: Introduce a new ioctl for copy
Add new BLKCOPY ioctl that offloads copying of one or more sources ranges to one or more destination in a device. COPY ioctl accepts a 'copy_range' structure that contains no of range, a reserved field , followed by an array of ranges. Each source range is represented by 'range_entry' that contains source start offset, destination start offset and length of source ranges (in bytes) MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. Example code, to issue BLKCOPY: /* Sample example to copy three entries with [dest,src,len], * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */ int main(void) { int i, ret, fd; unsigned long src = 0, dst = 32768, len = 4096; struct copy_range *cr; cr = (struct copy_range *)malloc(sizeof(*cr)+ (sizeof(struct range_entry)*3)); cr->nr_range = 3; cr->reserved = 0; for (i = 0; i< cr->nr_range; i++, src += len, dst += len) { cr->ranges[i].dst = dst; cr->ranges[i].src = src; cr->ranges[i].len = len; cr->ranges[i].comp_len = 0; } fd = open("/dev/nvme0n1", O_RDWR); if (fd < 0) return 1; ret = ioctl(fd, BLKCOPY, cr); if (ret != 0) printf("copy failed, ret= %d\n", ret); for (i=0; i< cr->nr_range; i++) if (cr->ranges[i].len != cr->ranges[i].comp_len) printf("Partial copy for entry %d: requested %llu, completed %llu\n", i, cr->ranges[i].len, cr->ranges[i].comp_len); close(fd); free(cr); return ret; } Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Javier González <javier.gonz@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> -
For the devices which does not support copy, copy emulation is added. Copy-emulation is implemented by reading from source ranges into memory and writing to the corresponding destination asynchronously. For zoned device we maintain a linked list of read submission and try to submit corresponding write in same order. Also emulation is used, if copy offload fails or partially completes. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
-
block: Add copy offload support infrastructure
Introduce blkdev_issue_copy which supports source and destination bdevs, and an array of (source, destination and copy length) tuples. Introduce REQ_COPY copy offload operation flag. Create a read-write bio pair with a token as payload and submitted to the device in order. Read request populates token with source specific information which is then passed with write request. This design is courtesy Mikulas Patocka's token based copy Larger copy will be divided, based on max_copy_sectors limit. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
-
block: Introduce queue limits for copy-offload support
Add device limits as sysfs entries, - copy_offload (RW) - copy_max_bytes (RW) - copy_max_bytes_hw (RO) Above limits help to split the copy payload in block layer. copy_offload: used for setting copy offload(1) or emulation(0). copy_max_bytes: maximum total length of copy in single payload. copy_max_bytes_hw: Reflects the device supported maximum limit. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> -
Merge branch 'for-6.2/block' into for-next
* for-6.2/block: drbd: use consistent license lru_cache: remove unused lc_private, lc_set, lc_index_of lru_cache: remove compiled out code lru_cache: use atomic operations when accessing lc->flags, always
-
DRBD currently has a mix of GPL-2.0 and GPL-2.0-or-later SPDX license identifiers. We have decided to stick with GPL 2.0 only, so consistently use that identifier. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221122134301.69258-5-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
lru_cache: remove unused lc_private, lc_set, lc_index_of
Signed-off-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221122134301.69258-4-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
lru_cache: remove compiled out code
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221122134301.69258-3-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
lru_cache: use atomic operations when accessing lc->flags, always
Or, depending on the way locking is implemented at the call sites, some updates could be lost (has not been observed). Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221122134301.69258-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Nov 22, 2022
-
Merge branch 'for-6.2/block' into for-next
* for-6.2/block: block: fix missing nr_hw_queues update in blk_mq_realloc_tag_set_tags
-
block: fix missing nr_hw_queues update in blk_mq_realloc_tag_set_tags
The commit ee9d552 ("blk-mq: simplify blk_mq_realloc_tag_set_tags") cleaned up the function blk_mq_realloc_tag_set_tags. After this change, the function does not update nr_hw_queues of struct blk_mq_tag_set when new nr_hw_queues value is smaller than original. This results in failure of queue number change of block devices. To avoid the failure, add the missing nr_hw_queues update. Fixes: ee9d552 ("blk-mq: simplify blk_mq_realloc_tag_set_tags") Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/linux-block/20221118140640.featvt3fxktfquwh@shindev/ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221122084917.2034220-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Merge branch 'for-6.2/io_uring' into for-next
* for-6.2/io_uring: (22 commits) io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post() Revert "io_uring: disallow self-propelled ring polling" io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups eventfd: provide a eventfd_signal_mask() helper eventpoll: add EPOLL_URING_WAKE poll wakeup flag io_uring: inline __io_req_complete_post() io_uring: split tw fallback into a function io_uring: inline io_req_task_work_add() io_uring: update outdated comment of callbacks io_uring/poll: remove outdated comments of caching io_uring: allow multishot recv CQEs to overflow io_uring: revert "io_uring fix multishot accept ordering" io_uring: do not always force run task_work in io_uring_register io_uring: fix two assignments in if conditions io_uring/net: move mm accounting to a slower path io_uring: move zc reporting from the hot path io_uring/net: inline io_notif_flush() io_uring/net: rename io_uring_tx_zerocopy_callback io_uring/net: preset notif tw handler io_uring/net: remove extra notif rsrc setup ...
-
Merge branch 'for-6.2/block' into for-next
* for-6.2/block: (116 commits) blk-crypto: move internal only declarations to blk-crypto-internal.h blk-crypto: add a blk_crypto_config_supported_natively helper blk-crypto: don't use struct request_queue for public interfaces blk-cgroup: Flush stats at blkgs destruction path blk-cgroup: Optimize blkcg_rstat_flush() blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path block: don't allow a disk link holder to itself block: store the holder kobject in bd_holder_disk block: fix use after free for bd_holder_dir block: remove delayed holder registration dm: track per-add_disk holder relations in DM dm: make sure create and remove dm device won't race with open and close table dm: cleanup close_table_device dm: cleanup open_table_device dm: remove free_table_devices block: clear ->slave_dir when dropping the main slave_dir reference sbitmap: Try each queue to wake up at least one waiter wait: Return number of exclusive waiters awaken sbitmap: Advance the queue index before waking up a queue block: remove blkdev_writepages ...
-
io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post()
__io_cq_unlock_post() is identical to io_cq_unlock_post(), and io_cqring_ev_posted() has a single caller so migth as well just inline it there. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Revert "io_uring: disallow self-propelled ring polling"
This reverts commit 7fdbc5f. This patch dealt with a subset of the real problem, which is a potential circular dependency on the wakup path for io_uring itself. Outside of io_uring, eventfd can also trigger this (see details in 03e02ac) and so can epoll (see details in caf1aea). Now that we have a generic solution to this problem, get rid of the io_uring specific work-around. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups
Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related wakups, so that we can check for a circular event dependency between eventfd and epoll. If this flag is set when our wakeup handlers are called, then we know we have a dependency that needs to terminate multishot requests. eventfd and epoll are the only such possible dependencies. Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
eventfd: provide a eventfd_signal_mask() helper
This is identical to eventfd_signal(), but it allows the caller to pass in a mask to be used for the poll wakeup key. The use case is avoiding repeated multishot triggers if we have a dependency between eventfd and io_uring. If we setup an eventfd context and register that as the io_uring eventfd, and at the same time queue a multishot poll request for the eventfd context, then any CQE posted will repeatedly trigger the multishot request until it terminates when the CQ ring overflows. In preparation for io_uring detecting this circular dependency, add the mentioned helper so that io_uring can pass in EPOLL_URING as part of the poll wakeup key. Cc: stable@vger.kernel.org # 6.0 [axboe: fold in !CONFIG_EVENTFD fix from Zhang Qilong] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Nov 21, 2022
-
blk-crypto: move internal only declarations to blk-crypto-internal.h
blk_crypto_get_keyslot, blk_crypto_put_keyslot, __blk_crypto_evict_key and __blk_crypto_cfg_supported are only used internally by the blk-crypto code, so move the out of blk-crypto-profile.h, which is included by drivers that supply blk-crypto functionality. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042944.1009870-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
blk-crypto: add a blk_crypto_config_supported_natively helper
Add a blk_crypto_config_supported_natively helper that wraps __blk_crypto_cfg_supported to retrieve the crypto_profile from the request queue. With this fscrypt can stop including blk-crypto-profile.h and rely on the public consumer interface in blk-crypto.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042944.1009870-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
blk-crypto: don't use struct request_queue for public interfaces
Switch all public blk-crypto interfaces to use struct block_device arguments to specify the device they operate on instead of th request_queue, which is a block layer implementation detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042944.1009870-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
eventpoll: add EPOLL_URING_WAKE poll wakeup flag
We can have dependencies between epoll and io_uring. Consider an epoll context, identified by the epfd file descriptor, and an io_uring file descriptor identified by iofd. If we add iofd to the epfd context, and arm a multishot poll request for epfd with iofd, then the multishot poll request will repeatedly trigger and generate events until terminated by CQ ring overflow. This isn't a desired behavior. Add EPOLL_URING so that io_uring can pass it in as part of the poll wakeup key, and io_uring can check for that to detect a potential recursive invocation. Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: inline __io_req_complete_post()
There is only one user of __io_req_complete_post(), inline it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef4c9059950a3da5cf68df00f977f1fd13bd9306.1668597569.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: split tw fallback into a function
When the target process is dying and so task_work_add() is not allowed we push all task_work item to the fallback workqueue. Move the part responsible for moving tw items out of __io_req_task_work_add() into a separate function. Makes it a bit cleaner and gives the compiler a bit of extra info. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e503dab9d7af95470ca6b214c6de17715ae4e748.1668162751.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: inline io_req_task_work_add()
__io_req_task_work_add() is huge but marked inline, that makes compilers to generate lots of garbage. Inline the wrapper caller io_req_task_work_add() instead. before and after: text data bss dec hex filename 47347 16248 8 63603 f873 io_uring/io_uring.o text data bss dec hex filename 45303 16248 8 61559 f077 io_uring/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/26dc8c28ca0160e3269ef3e55c5a8b917c4d4450.1668162751.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: update outdated comment of callbacks
Previous commit ebc11b6 ("io_uring: clean io-wq callbacks") rename io_free_work() into io_wq_free_work() for consistency. This patch also updates relevant comment to avoid misunderstanding. Fixes: ebc11b6 ("io_uring: clean io-wq callbacks") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20221110122103.20120-1-linma@zju.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring/poll: remove outdated comments of caching
Previous commit 13a9901 ("io_uring: remove events caching atavisms") entirely removes the events caching optimization introduced by commit 8145935 ("io_uring: cache req->apoll->events in req->cflags"). Hence the related comment should also be removed to avoid misunderstanding. Fixes: 13a9901 ("io_uring: remove events caching atavisms") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20221110060313.16303-1-linma@zju.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: allow multishot recv CQEs to overflow
With commit aa1df3a ("io_uring: fix CQE reordering"), there are stronger guarantees for overflow ordering. Specifically ensuring that userspace will not receive out of order receive CQEs. Therefore this is not needed any more for recv/recvmsg. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107125236.260132-4-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: revert "io_uring fix multishot accept ordering"
This is no longer needed after commit aa1df3a ("io_uring: fix CQE reordering"), since all reordering is now taken care of. This reverts commit cbd2574 ("io_uring: fix multishot accept ordering"). Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107125236.260132-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: do not always force run task_work in io_uring_register
Running task work when not needed can unnecessarily delay operations. Specifically IORING_SETUP_DEFER_TASKRUN tries to avoid running task work until the user requests it. Therefore do not run it in io_uring_register any more. The one catch is that io_rsrc_ref_quiesce expects it to have run in order to process all outstanding references, and so reorder it's loop to do this. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107123349.4106213-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>