Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort sanitize command #752

Closed
neitsab opened this issue Jun 22, 2020 · 10 comments · Fixed by #1650
Closed

Abort sanitize command #752

neitsab opened this issue Jun 22, 2020 · 10 comments · Fixed by #1650

Comments

@neitsab
Copy link

neitsab commented Jun 22, 2020

Hi, it seems the sanitize Block Erase operation I started is stuck and I can't find the way to unblock it in the docs. Somewhere in the spec I read a failed sanitize operation could be resumed/aborted but I cannot fathom how to do so. Would you care to help or point me to a place (forum, IRC...) where I could get some help?

root@archiso ~ # nvme --version
nvme version 1.12
root@archiso ~ # nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     BTNH936500NS512A     INTEL SSDPEKNW512G8                      1         512.11  GB / 512.11  GB    512   B +  0 B   002C    
root@archiso ~ # nvme sanitize-log /dev/nvme0        
Sanitize Progress                      (SPROG) :  655
Sanitize Status                        (SSTAT) :  0x2
Sanitize Command Dword 10 Information (SCDW10) :  0x2
Estimated Time For Overwrite                   :  4294967295 (No time period reported)
Estimated Time For Block Erase                 :  4294967295 (No time period reported)
Estimated Time For Crypto Erase                :  4294967295 (No time period reported)
Estimated Time For Overwrite (No-Deallocate)   :  0
Estimated Time For Block Erase (No-Deallocate) :  0
Estimated Time For Crypto Erase (No-Deallocate):  0

The command I used to start the sanitization was nvme sanitize /dev/nvme0 -a 2. I suddenly realized it was a Block Erase and not a Crypto Erase operation as I had intended, so I power cycled the machine in the hope it would abort it, but sanitize doesn't abort on restart of course.

Progress has been stuck in the same state for over an hour.

edit: and if I try another sanitize command it fails with the following error:

NVMe status: SANITIZE_IN_PROGRESS: The requested function is prohibited while a sanitize operation is in progress(0x1d)

And as expected, any read/write operation raises an error:

root@archiso ~ # fdisk -l /dev/nvme0
fdisk: cannot open /dev/nvme0: Illegal seek
1 root@archiso ~ # fdisk -l /dev/nvme0n1                                                                                                                              :(
fdisk: cannot open /dev/nvme0n1: Input/output error

Thanks in advance for any help!

@neitsab
Copy link
Author

neitsab commented Jun 22, 2020

Note: I tried nvme sanitize /dev/nvme0 -a 1 for "Exit Failure mode" value, to no avail.

@keithbusch
Copy link
Contributor

Sanitize can take a long time. Is the sanitize progress marker progressing at all?

@neitsab
Copy link
Author

neitsab commented Jun 22, 2020

Thank you for your answer, no this is what tipped me to the issue, # nvme sanitize-log /dev/nvme0 still returns the exact same result after two hours:

Sanitize Progress                      (SPROG) :  655
Sanitize Status                        (SSTAT) :  0x2
Sanitize Command Dword 10 Information (SCDW10) :  0x2
Estimated Time For Overwrite                   :  4294967295 (No time period reported)
Estimated Time For Block Erase                 :  4294967295 (No time period reported)
Estimated Time For Crypto Erase                :  4294967295 (No time period reported)
Estimated Time For Overwrite (No-Deallocate)   :  0
Estimated Time For Block Erase (No-Deallocate) :  0
Estimated Time For Crypto Erase (No-Deallocate):  0

Reading the spec more in depth I tried to send an Abort Admin Command with nvme admin-passthru /dev/nvme0 -o 08h but it failed:

NVMe status: QID_INVALID: The creation of the I/O Completion Queue failed due to an invalid queue identifier specified as part of the command. An invalid queue identifier is one that is currently in use or one that is outside the range supported by the controller(0x101)

I am surely missing an argument, but I'm not sure which one.

@neitsab
Copy link
Author

neitsab commented Jun 22, 2020

I tried with nvme admin-passthru /dev/nvme0 -o 0x08h and got

NVMe command result:00000001

However, the sanitize log hasn't bulged a bit.

@neitsab
Copy link
Author

neitsab commented Jun 23, 2020

Well, I just rebooted into a different live OS (Solus 4.2) and during bootup I saw the same I/O error to the drive; then I installed the latest Github release of nvme-cli and lo and behold... The sanitize operation seems to have completed:

Sanitize Progress                      (SPROG) :  65535
Sanitize Status                        (SSTAT) :  0x101
Sanitize Command Dword 10 Information (SCDW10) :  0x2
Estimated Time For Overwrite                   :  4294967295 (No time period reported)
Estimated Time For Block Erase                 :  4294967295 (No time period reported)
Estimated Time For Crypto Erase                :  4294967295 (No time period reported)
Estimated Time For Overwrite (No-Deallocate)   :  0
Estimated Time For Block Erase (No-Deallocate) :  0
Estimated Time For Crypto Erase (No-Deallocate):  0

For the record, before rebooting into the new live env I had sent a nvme reset /dev/nvme0 command as well for good measure.

This experience increased my feeling that we should expand the sanitize man/help page a bit to include the following information:

  • correspondence between binary and decimal values (this made me really confused for a while since the command does not accept the binary values given in the spec and the man page)
  • warning about the length of the operation (can be checked beforehand with nvme sanitize-log /dev/nvme0 for some devices apparently?)
  • update the given examples to mention that the first one sends a Start Block Erase and the Second an Exit Failure Mode operations.

Thanks!

@neitsab neitsab changed the title Abord sanitize command Abort sanitize command Jun 23, 2020
@keithbusch
Copy link
Contributor

Looks stuck.

The abort command won't help because that only works on an active command id, and the sanitize command completes immediately while the operation runs in the background. So there's no command to abort.

I don't know off the top of my head how to cancel an in progress sanitize, or even if there is a spec defined way. Maybe we can ask a friendly Intel engineer...

@RevanthRajashekar, do you happen to know if the reporter can abandon an in progress sanitize on the reported model? SSDPEKNW512G8

@neitsab
Copy link
Author

neitsab commented Jun 23, 2020

Sorry if I haven't made myself clear in my previous comment: I meant that the operation has exited and I now have full Read/Write access to the drive again. I haven't checked back the meaning of

Sanitize Progress                      (SPROG) :  65535
Sanitize Status                        (SSTAT) :  0x101

in the spec but results are there for sure: I can access and use my drive without issue again. So if you don't want to follow on my documentation-expansion suggestions (or would rather that I open a new issue or make a pull request), on my side this issue can be closed. Thank you again!

@keithbusch
Copy link
Contributor

Ah, I see. I was replying from my phone and didn't see your latest message. I was specifically looking at the comments preceding.

An SSTAT of 101h is reserved in the most recent public spec. This is a not a spec compliant status, afaik.

@neitsab
Copy link
Author

neitsab commented Jun 29, 2020

Sorry for the delay, and thanks. You're right, I thought the 0x101 status meant it completed successfully because it appears in the results linked in #679, which I thought were okay.

@igaw
Copy link
Collaborator

igaw commented Jan 19, 2022

No progress in this bug report for a long time. If you think it's still relevant please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants