-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvme: Use interim buffer for nvme_get_log_sanitize() #951
Conversation
Certain drives tend to return more data than expected resulting in stack corruption. Use an interim buffer large enough to handle the excess bytes. storaged-project/udisks#1152 linux-nvme/libnvme#684
Makes sense until we figure out how to deal with such devices. I suspect we need to introduce quirks for this. BTW, would it possible to add some debug code which checks if the device writes too much data? |
We can afford having extra checks here, any suggestion how to achieve that other than analyzing data in the buffer itself? |
The only idea I have is to write a pattern into buffer and check how much was overwritten. So similar to what KASAN is doing. Not really clever I know. What we also could do instead bloating up production code, is to write a small stand alone program which does this. |
I tried to play with this patch today. And I got more confusion. I would like to know how large the buffer is needed, so I took igaw's idea:
I modified the patch to BDNVMESanitizeLog * bd_nvme_get_sanitize_log (const gchar *device, GError **error) {
int ret;
int fd;
char buf[65536] = ZERO_INIT;
int i, buf_len;
FILE *save_buf_file = NULL;
FILE *save_ij_file = NULL;
struct nvme_sanitize_log_page *sanitize_log;
BDNVMESanitizeLog *log;
__u16 sstat;
buf_len = sizeof(buf) / sizeof(buf[0]);
memset(buf, 'V', buf_len);
/* open the block device */
fd = _open_dev (device, error);
if (fd < 0)
return NULL;
/* send the NVME_LOG_LID_SANITIZE ioctl */
ret = nvme_get_log_sanitize (fd, FALSE /* rae */, (struct nvme_sanitize_log_page *) &buf);
if (ret != 0) {
_nvme_status_to_error (ret, FALSE, error);
g_prefix_error (error, "NVMe Get Log Page - Sanitize Status Log command error: ");
close (fd);
return NULL;
}
close (fd);
for(i = buf_len - 1; i >= 0; --i) {
if(buf[i] != 'V') {
i++;
break;
}
}
save_ij_file = fopen("/path/to/my-ij.bin", "ab");
if(save_ij_file) {
fwrite(&i, sizeof(i), 1, save_ij_file);
fclose(save_ij_file);
}
save_buf_file = fopen("/path/to/my-nvme.bin", "ab");
if(save_buf_file) {
fwrite(buf, sizeof(buf[0]), i, save_buf_file);
fclose(save_buf_file);
}
/* need to use interim buffer that is large enough for broken drives
* returning more data than expected
*/
sanitize_log = (struct nvme_sanitize_log_page *) &buf;
log = g_new0 (BDNVMESanitizeLog, 1);
log->sanitize_progress = 0;
sstat = GUINT16_FROM_LE (sanitize_log->sstat);
if ((sstat & NVME_SANITIZE_SSTAT_STATUS_MASK) == NVME_SANITIZE_SSTAT_STATUS_IN_PROGESS)
log->sanitize_progress = ((gdouble) GUINT16_FROM_LE (sanitize_log->sprog) * 100) / 0x10000;
log->global_data_erased = sstat & NVME_SANITIZE_SSTAT_GLOBAL_DATA_ERASED;
log->overwrite_passes = (sstat >> NVME_SANITIZE_SSTAT_COMPLETED_PASSES_SHIFT) & NVME_SANITIZE_SSTAT_COMPLETED_PASSES_MASK;
switch (sstat & NVME_SANITIZE_SSTAT_STATUS_MASK) {
case NVME_SANITIZE_SSTAT_STATUS_COMPLETE_SUCCESS:
log->sanitize_status = BD_NVME_SANITIZE_STATUS_SUCCESS;
break;
case NVME_SANITIZE_SSTAT_STATUS_IN_PROGESS:
log->sanitize_status = BD_NVME_SANITIZE_STATUS_IN_PROGESS;
break;
case NVME_SANITIZE_SSTAT_STATUS_COMPLETED_FAILED:
log->sanitize_status = BD_NVME_SANITIZE_STATUS_FAILED;
break;
case NVME_SANITIZE_SSTAT_STATUS_ND_COMPLETE_SUCCESS:
log->sanitize_status = BD_NVME_SANITIZE_STATUS_SUCCESS_NO_DEALLOC;
break;
case NVME_SANITIZE_SSTAT_STATUS_NEVER_SANITIZED:
default:
log->sanitize_status = BD_NVME_SANITIZE_STATUS_NEVER_SANITIZED;
break;
}
log->time_for_overwrite = (GUINT32_FROM_LE (sanitize_log->eto) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->eto);
log->time_for_block_erase = (GUINT32_FROM_LE (sanitize_log->etbe) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->etbe);
log->time_for_crypto_erase = (GUINT32_FROM_LE (sanitize_log->etce) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->etce);
log->time_for_overwrite_nd = (GUINT32_FROM_LE (sanitize_log->etond) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->etond);
log->time_for_block_erase_nd = (GUINT32_FROM_LE (sanitize_log->etbend) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->etbend);
log->time_for_crypto_erase_nd = (GUINT32_FROM_LE (sanitize_log->etcend) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->etcend);
return log;
} Then I got result:
So this code is called TWICE during boot, as there are two records in file. Both record length 0x940 and 0x5c0 are divisible by 32. So I got more confusion:
Anyway the patch itself work. Thanks for all of it. |
We are requesting 512 bytes and thus the device should only transfer 512 bytes. Older kernels did always use bounce buffers for this operation but with the fix for the dma alignment setting (torvalds/linux@52fde2c) newer kernels are using zero copy I/O. So we just need one more experiment to have all statements supported by experiements. Could you build a kernel and hack --- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -627,7 +627,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
struct rq_map_data *map_data,
const struct iov_iter *iter, gfp_t gfp_mask)
{
- bool copy = false, map_bvec = false;
+ bool copy = true, map_bvec = false;
unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
struct bio *bio = NULL;
struct iov_iter i;
|
I tried to flip that bool on linux-6.4.11, and it wouldn't help with the bug. Udisks2 would still be stack smashing error. Thanks for looking into this. |
Okay, let me play with this. I though I got the right place to force the kernel to use bounce buffers. Obviously, it was the wrong place... |
@allfoxwy I was stupid, the correct test patch is here: storaged-project/udisks#1152 (comment) |
I would try it, and reply in that thread. Thanks. |
Maybe nvmecli could just allocate a page aligned and sized buffer instead? |
Let's merge this at its current state as @vojtechtrefny is planning to make a new release tomorrow. Feel free to continue here even when the ticket is closed. I'm watching the linux-nvme mailing list thread but no clear outcome as of yet. |
Sorry, got a bit drowned in work. Anyway, Christoph had and idea for another test patch. The idea here is to introduce a 512 byte alignment for the buffer and enable KASAN to see if this device will work then. From our previous tests and bissection, this has good chance to work. Anyway, will look into it... hopefully soon. |
Could we also try my suggestion from http://lists.infradead.org/pipermail/linux-nvme/2023-August/041901.html? But as an experiment, memset to 0xff, then check if the padding bytes are unchanged after the expected 512b log read? If so, then I think we can conclude that Christoph's hypothesis of bad PRP parsing is likely the bug in the device and a quirk for DMA alignment sounds like the right kernel side fix. If the data is still corrupted with an aligned buffer, though, then I don't think we can simply force 512b DMA alignment. |
I'm not confident if I could understand all your suggestion, I think it means if we could make the log buffer's start address at an aligned position, then there would be good news, right? So I tried an aligned-HACK patch like this: --- libblockdev-nvme-27/src/plugins/nvme/nvme-info.c
+++ align-libblockdev-nvme-27/src/plugins/nvme/nvme-info.c
@@ -1026,7 +1026,10 @@
BDNVMESanitizeLog * bd_nvme_get_sanitize_log (const gchar *device, GError **error) {
int ret;
int fd;
- char buf[65536] = ZERO_INIT;
+ const int buf_size = 0x10000; // 65536
+ char *check_ptr;
+ FILE *len_file, *buf_file;
+ int i;
struct nvme_sanitize_log_page *sanitize_log;
BDNVMESanitizeLog *log;
__u16 sstat;
@@ -1036,20 +1039,51 @@
if (fd < 0)
return NULL;
+ if(posix_memalign((void*)&sanitize_log, getpagesize(), buf_size)) {
+ g_prefix_error (error, "NVMe Get Log Page - Sanitize Status Log command error: Failed to get aligned mem");
+ close(fd);
+ return NULL;
+ }
+
+ memset(sanitize_log, 'A', buf_size);
+
/* send the NVME_LOG_LID_SANITIZE ioctl */
- ret = nvme_get_log_sanitize (fd, FALSE /* rae */, (struct nvme_sanitize_log_page *) &buf);
+ ret = nvme_get_log_sanitize (fd, FALSE /* rae */, sanitize_log);
if (ret != 0) {
_nvme_status_to_error (ret, FALSE, error);
g_prefix_error (error, "NVMe Get Log Page - Sanitize Status Log command error: ");
+ free(sanitize_log);
close (fd);
return NULL;
}
close (fd);
+ check_ptr = (char*)sanitize_log;
+ for(i = buf_size - 1; i >= 0; --i) {
+ if(check_ptr[i] != 'A') {
+ i++;
+ break;
+ }
+ }
+
+ if(i > 0) {
+ len_file = fopen("/path/to/length.bin", "ab");
+ if(len_file) {
+ fwrite(&i, sizeof(i), 1, len_file);
+ fclose(len_file);
+ }
+
+ buf_file = fopen("/path/to/content.bin", "ab");
+ if(buf_file) {
+ fwrite(sanitize_log, sizeof(check_ptr[0]), i, buf_file);
+ fclose(buf_file);
+ }
+ }
+
+
/* need to use interim buffer that is large enough for broken drives
* returning more data than expected
*/
- sanitize_log = (struct nvme_sanitize_log_page *) &buf;
log = g_new0 (BDNVMESanitizeLog, 1);
log->sanitize_progress = 0;
@@ -1085,5 +1119,7 @@
log->time_for_block_erase_nd = (GUINT32_FROM_LE (sanitize_log->etbend) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->etbend);
log->time_for_crypto_erase_nd = (GUINT32_FROM_LE (sanitize_log->etcend) == 0xffffffff) ? -1 : (gint64) GUINT32_FROM_LE (sanitize_log->etcend);
+ free(sanitize_log);
+
return log;
} And I think there are indeed some NEWS During first boot after apply aligned-HACK patch:
During third boot after apply aligned-HACK patch:
The buffer always be written for 4096 bytes. This is different than my former experiment #951 (comment) , as it was resulting in a different size every boot. However now with an aligned start address, the buffer always length as 4096 bytes. Not sure if I understand that POSIX aligned memory function properly. Not sure if I programmed correctly. But I feel there is something that you are right, so I post a quick reply. |
Ok, if the device is overrunning an aligned buffer, I think we can conclude it's not a bad interpretation of a PRP with an offset. The device just always writes a full 4k page into a 512b log. That's not good: the kernel DMA alignment suggestion isn't viable and it's preferred that user space work around this with an over-allocated buffer instead of relying on the kernel to do something about it. I think the reason you may have seen different overruns in previous experiments may be because the original buffer + extra unwanted data straddles different page offsets, causing the device to transfer the bulk of the corruption to PRP2, which the driver hadn't set and would transfer to nowhere. Different page offsets will cause different observable buffer overruns. |
Okay, I think we should handle this inside libnvme somehow. |
Certain drives tend to return more data than expected resulting in stack corruption. Use an interim buffer large enough to handle the excess bytes.
storaged-project/udisks#1152
linux-nvme/libnvme#684
This is a dirty workaround until a proper fix is implemented either in libnvme or in the kernel.