Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The storage device SanDisk Extreme Pro is likely to fail soon! #163

Open
noloader opened this issue Dec 29, 2022 · 14 comments
Open

The storage device SanDisk Extreme Pro is likely to fail soon! #163

noloader opened this issue Dec 29, 2022 · 14 comments
Labels
drivedb Entries to the drivedb.h undecided

Comments

@noloader
Copy link

noloader commented Dec 29, 2022

I am running Ubuntu 22.04, x86_64, fully patched. I purchased a new SanDisk 128GB Extreme PRO USB 3.2 Solid State Flash Drive (SDCZ880-128G-GAM46), https://www.amazon.com/gp/product/B08GYM5F8G . I removed it from its original display packaging. When I plugged the drive into my USB hub I got a message stating the drive was failing:

thumbdrive-failing

smartctl v7.2 and v7.3 report a problem with Perc_Avail_Resrvd_Space. However, Googling says the problem may be Total_Write/Erase_Count.[1] I suspect this has something to do with a SSD controller being packaged on a thumbdrive, and a problem with interpreting the statistics.

I reported the problem at Ubuntu because I could not find info about the issue in the tracker: https://bugs.launchpad.net/ubuntu/+source/smartmontools/+bug/2000656 .

Please advise.

[1] https://www.truenas.com/community/threads/critical-smart-alerts-on-both-usb-boot-drives-after-upgrade-to-11-1.60160/


Here is the result of smartctl 7.2 supplied by Ubuntu:

$ sudo /usr/sbin/smartctl -A /dev/sdc
[sudo] password for jwalton: 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-56-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0002   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       0
165 Total_Write/Erase_Count 0x0002   100   100   000    Old_age   Always       -       0
171 Program_Fail_Count      0x0002   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0002   100   100   000    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0002   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0002   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0002   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   092   008   000    Old_age   Always       -       8 (Min/Max 0/33)
230 Perc_Write/Erase_Count  0x0002   100   100   000    Old_age   Always       -       0
232 Perc_Avail_Resrvd_Space 0x0003   000   100   005    Pre-fail  Always   FAILING_NOW 0
234 Perc_Write/Erase_Ct_BC  0x0002   100   100   000    Old_age   Always       -       10000
241 Total_LBAs_Written      0x0002   100   100   000    Old_age   Always       -       0
242 Total_LBAs_Read         0x0002   100   100   000    Old_age   Always       -       0

And for smartctl 7.3 built from the tarball on GitHub:

$ sudo smartctl -A /dev/sdc
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.0-56-generic] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0002   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       0
165 Total_Write/Erase_Count 0x0002   100   100   000    Old_age   Always       -       4
171 Program_Fail_Count      0x0002   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0002   100   100   000    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0002   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0002   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0002   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   092   008   000    Old_age   Always       -       8 (Min/Max 0/33)
230 Perc_Write/Erase_Count  0x0002   100   100   000    Old_age   Always       -       0
232 Perc_Avail_Resrvd_Space 0x0003   000   100   005    Pre-fail  Always   FAILING_NOW 0
234 Perc_Write/Erase_Ct_BC  0x0002   100   100   000    Old_age   Always       -       10000
241 Total_LBAs_Written      0x0002   100   100   000    Old_age   Always       -       0
242 Total_LBAs_Read         0x0002   100   100   000    Old_age   Always       -       0

Here is the tail of dmesg for the USB drive.

[1092783.014419] usb 4-1.1.3: new SuperSpeed USB device number 4 using xhci_hcd
[1092783.044795] usb 4-1.1.3: New USB device found, idVendor=0781, idProduct=5588, bcdDevice= 1.00
[1092783.044799] usb 4-1.1.3: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[1092783.044800] usb 4-1.1.3: Product: USB Extreme Pro
[1092783.044801] usb 4-1.1.3: Manufacturer: SanDisk
[1092783.044802] usb 4-1.1.3: SerialNumber: A914BB78A140
[1092783.047556] usb-storage 4-1.1.3:1.0: USB Mass Storage device detected
[1092783.047745] scsi host6: usb-storage 4-1.1.3:1.0
[1092784.071456] scsi 6:0:0:0: Direct-Access SanDisk Extreme Pro 0 PQ: 0 ANSI: 6
[1092784.071666] sd 6:0:0:0: Attached scsi generic sg3 type 0
[1092784.072995] sd 6:0:0:0: [sdc] 250085376 512-byte logical blocks: (128 GB/119 GiB)
[1092784.073969] sd 6:0:0:0: [sdc] Write Protect is off
[1092784.073972] sd 6:0:0:0: [sdc] Mode Sense: 43 00 00 00
[1092784.074917] sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[1092784.077874] sdc: sdc1
[1092784.079944] sd 6:0:0:0: [sdc] Attached SCSI removable disk
@chrfranke chrfranke added drivedb Entries to the drivedb.h undecided labels Dec 29, 2022
@chrfranke
Copy link

The smartctl output above lacks ATA IDENTIFY information, so it is impossible to see which drive database entry is in effect. For further diagnostics, please provide the full smartctl -x output for the device. Do not use -a as it only prints legacy SMART information.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
...
232 Perc_Avail_Resrvd_Space 0x0003   000   100   005    Pre-fail  Always   FAILING_NOW 0

Regardless of the attribute name, smartctl prints FAILING_NOW because VALUE <= THRESH. The attribute is declared as Pre-fail, so smartctl -H (included in -x) should also report SMART overall-health self-assessment test result: FAILED!.

To suppress this behavior, create a local drive database entry which adds for example -v 232,hex64,Bogus_Attribute. See -B option on smartctl man page for details and configured path. If this is a generic problem with all these drives, we could possibly add this to upstream drivedb.h.

This is not a smartctl bug. The attribute interpretation was specified by the original SMART draft standard SFF-8035i (1995). All released ATA standards declare the whole attribute data block as vendor specific.

@noloader
Copy link
Author

noloader commented Jan 11, 2023

@chrfranke,

For further diagnostics, please provide the full smartctl -x output for the device. Do not use -a as it only prints legacy SMART information.

I am so sorry for the late reply. I forgot to run that command for you.

# smartctl -x /dev/sdc
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.0-57-generic] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SanDisk based SSDs
Device Model:     SanDisk pSSD
Serial Number:    004831c55
LU WWN Device Id: 5 001b44 08304551c
Firmware Version: 6EB 1030
User Capacity:    128,043,712,512 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      1.8 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jan 11 15:53:06 2023 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Disabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x51) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  41) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   -O----   100   100   000    -    4
  9 Power_On_Hours          -O----   100   100   000    -    317
 12 Power_Cycle_Count       -O----   100   100   000    -    0
165 Total_Write/Erase_Count -O----   100   100   000    -    2054
171 Program_Fail_Count      -O----   100   100   000    -    0
172 Erase_Fail_Count        -O----   100   100   000    -    4
173 Avg_Write/Erase_Count   -O----   100   100   000    -    0
174 Unexpect_Power_Loss_Ct  -O----   100   100   000    -    0
187 Reported_Uncorrect      -O----   100   100   000    -    0
194 Temperature_Celsius     -O---K   092   008   000    -    8 (Min/Max 0/33)
230 Perc_Write/Erase_Count  -O----   100   100   000    -    0
232 Perc_Avail_Resrvd_Space PO----   000   100   005    NOW  0
234 Perc_Write/Erase_Ct_BC  -O----   100   100   000    -    10000
241 Total_LBAs_Written      -O----   100   100   000    -    0
242 Total_LBAs_Read         -O----   100   100   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 0
SMART           Log Directory Version 0
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test Log not supported

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Commands not supported

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2            0  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            0  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0001  2            0  Command failed due to ICRC error

@noloader
Copy link
Author

noloader commented Jan 11, 2023

@chrfranke,

This is not a smartctl bug. The attribute interpretation was specified by the original SMART draft standard SFF-8035i (1995). All released ATA standards declare the whole attribute data block as vendor specific.

Thanks. I was wondering where to find the standard for SMART.

If this is a generic problem with all these drives, we could possibly add this to upstream drivedb.h.

I reached out to SanDisk/Western Digital support. I hope to have a data sheet and information on vendor specific data shortly.

@nihil-admirari
Copy link

If this is a generic problem with all these drives, we could possibly add this to upstream drivedb.h.

I'm also having the same problem with the same drive.

@nihil-admirari
Copy link

Adding

{
    "SanDisk based SSDs",          // Model family.
    "SanDisk pSSD",                // Device model regex.
    "6EB 1030",                    // Firmware version regex.
    "",                            // Warning message.
    "-v 232,hex64,Bogus_Attribute" // Ignore Perc_Avail_Resrvd_Space.
}

to /etc/smart_drivedb.h removes the warning from smartctl output, but doesn't prevent the notification popup.

@chrfranke
Copy link

..., but doesn't prevent the notification popup.

This popup is not part of smartmontools. The hex64 format or others like raw8:vw3210 suppress the value, worst and threshold fields in both plaintext and JSON output. If a monitoring GUI frontend keeps notifying then, this tool should possibly be fixed. You may want to report this to the package maintainer or upstream project of this tool.

@GibsDev
Copy link

GibsDev commented Jul 23, 2023

I am also having this issue with the same drive.

@NorthLight-EWR
Copy link

Same drive, same issue. /etc/smart_drivedb.h solution works, but is only temporay, would be great, if this goes into mainstream source?

@samm-git
Copy link
Contributor

Hi.

  1. I am not sure that we should do any blacklist here. That is not the case until we see some ERRATA from the Sandisk.
  2. Smartmontools correctly interpreting failure.

My recommendation would be to contact the vendor and ask about the failing attribute or firmware update.

@NorthLight-EWR
Copy link

Well, this is a very, very big vendor and I can't identify any way for starting this feedback in a promising way.
I think, it is far easier to just accept this as mod in the source, just to close the (vendor-)bug.
Yes, it's the wrong side for the fix, but on the other hand, there is no chance, that the vendor will fix it, I guess.
It is a mass product and Windows does not complain, so there will surely be no effort done by Sandisk, for Linux.

@samm-git
Copy link
Contributor

let me explain.

  1. I am unsure that all models with such firmware have this bug.
  2. I am not sure that the disk is actually okay if it comes with such an error from the factory does not indicate that it's perfect. Perc_Avail_Resrvd_Space is an important indicator, and i have no idea if firmware would behave correctly if its set wrongly.
  3. Windows is not complaining simply because it does not check SMART data, so it is a wrong argument.

I personally would not use such a device as it clearly indicates failure, and to me, it would be a reason for RMA. If you really want to use it - please make a local exception. If you want to upstream it - try to contact vendor and get any kind of response about the issue. Also try to request new firmware with a fix, that may help too.

@samm-git
Copy link
Contributor

(from the other side there are a number of reports [1],[2],[3]) showing a problem with the same drive. But again, i would not consider that as a harmless bug before getting vendor confirmation of such)

[1] https://www.amazon.com/gp/customer-reviews/RZZ3TPNRW16SU/ref=cm_cr_getr_d_rvw_ttl?ie=UTF8&ASIN=B01MU8TZRV
[2] https://www.reddit.com/r/techsupport/comments/77w4mr/bought_a_usb_stick_its_warm_to_the_touch_when/
[3] https://forums.sandisk.com/t/sandisk-extreme-pro-usb-flash-drive-6eb-1030-witch-smart-failure/33576/2

@samm-git
Copy link
Contributor

P.S. It also shows Temperature_Celsius as 8, which is unlikely true.

@samm-git
Copy link
Contributor

So short summary - it seems that SMART data on that stick is really broken and cannot be trusted. I would recommend blacklisting it in the monitoring tool you are using (and which generated popup). The problem is not specific to smartmontools and could be reproduced with any SMART monitoring software; see reports above. So closing the report without resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
drivedb Entries to the drivedb.h undecided
Projects
None yet
Development

No branches or pull requests

6 participants