Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lsmcli -u hpsa:// ld PLUGIN_BUG(2): Expected key missing from SmartArray ssacli output:'Serial Number' #420

Closed
cstoyanov-scality opened this issue May 12, 2020 · 4 comments · Fixed by #435

Comments

@cstoyanov-scality
Copy link
Contributor

  • libstoragemgmt version: 1.7.3 and 1.8.0
  • ssacli version: 3.30, 3.40 and 4.17

Description

When the Serial Number of the physical disk cannot be retrieved, lsmcli fails with a PLUGIN_BUG error. For example:

$ lsmcli -u hpsa:// ld
PLUGIN_BUG(2): Expected key missing from SmartArray ssacli output:'Serial Number', please try to upgrade ssacli tool

Depending on the way the disk is broken, the ssacli report can contains or not some info. In the hpsa plugin of lsm, some information are expected to exist (if not, it raises with a KeyError exception).

the ssacli output which cannot be parsed by lsm is:

Smart Array P420i in Slot 1
   Bus Interface: PCI
   Slot: 1

[...]

   Physical Drives
      physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA HDD, 4 GB, Failed)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA HDD, 0 GB, Failed)
      physicaldrive 2I:1:9 (port 2I:box 1:bay 9, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:10 (port 2I:box 1:bay 10, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:11 (port 2I:box 1:bay 11, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:12 (port 2I:box 1:bay 12, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:13 (port 2I:box 1:bay 13, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:14 (port 2I:box 1:bay 14, SATA HDD, 4 TB, OK)
      physicaldrive 2I:1:15 (port 2I:box 1:bay 15, SATA SSD, 480 GB, OK)

[...]

   Array: A
  
[...]

   Array: H
      Interface Type: SATA
      Unused Space: 0 MB (0%)
      Used Space: Unknown (0%)
      Status: Failed Physical Drive
      Array Type: Data 
      Smart Path: disable

      Warning: One of the drives on this array have failed or has been removed.



      Logical Drive: 8
         Size: 3.64 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: Failed
         Caching:  Enabled
         Unique Identifier: 600508B1001C327D3424425077DE8C28
         Logical Drive Label: 1D3AB656PCFBB%%LM6S1BR2BC0
         Drive Type: Data
         LD Acceleration Method: Controller Cache


      physicaldrive 2I:1:8
         Port: 2I
         Box: 1
         Bay: 8
         Status: Failed
         Last Failure Reason: Init request sense failed
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 0 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 0/0

         WWID: 5001438013764919
         SATA NCQ Capable: False
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         PHY Physical Link Rate: Unknown
         PHY Maximum Link Rate: Unknown
         Drive Authentication Status: Not Applicable
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None



   Array: I
      Interface Type: SATA
      Unused Space: 0 MB (0.00%)
      Used Space: 3.64 TB (100.00%)
      Status: OK
      Array Type: Data 
      Smart Path: disable


      Logical Drive: 9
         Size: 3.64 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001C97390716929B5C5A4285
         Disk Name: /dev/sdg 
         Mount Points: None
         Logical Drive Label: 2172C964PCFBB%%LM6S1BR9E8B
         Drive Type: Data
         LD Acceleration Method: Controller Cache


      physicaldrive 2I:1:9
         Port: 2I
         Box: 1
         Bay: 9
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 4 TB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 7200
         Firmware Revision: 01.01K04
         Serial Number: WD-WMC130F0K5D6
         WWID: 500143801376491A
         Model: ATA     WDC WD4000FYYZ-0
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         PHY Physical Link Rate: Unknown
         PHY Maximum Link Rate: Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: True
         Unrestricted Sanitize Supported: True
         Shingled Magnetic Recording Support: None
[...]

The following values does not exist for the failed disk 2I:1:8 compared to a healthy disk:

  • Rotational Speed
  • Firmware Revision
  • Serial Number
  • Model

In using default values for the ones which cannot be find, the problem does not occur anymore. For example, a patch could be:

diff --git a/plugin/hpsa/hpsa.py b/plugin/hpsa/hpsa.py
index c56cb21..3404811 100644
--- a/plugin/hpsa/hpsa.py
+++ b/plugin/hpsa/hpsa.py
@@ -699,16 +699,23 @@ def volumes(self, search_key=None, search_value=None,
     @staticmethod
     def _hp_disk_to_lsm_disk(hp_disk, sys_id, ctrl_num, key_name,
                              flag_free=False):
-        disk_id = hp_disk['Serial Number']
+        NOT_AVAILABLE_MESSAGE = 'N/A'
+        disk_id = hp_disk.get('Serial Number', NOT_AVAILABLE_MESSAGE)
         disk_num = key_name[len("physicaldrive "):]
-        disk_name = "%s %s" % (hp_disk['Model'], disk_num)
+        disk_name = "%s %s" % (hp_disk.get('Model', NOT_AVAILABLE_MESSAGE),
+                               disk_num)
         disk_type = _disk_type_of(hp_disk)
         try:
-            blk_size_str = hp_disk['Native Block Size']
+            blk_size_str = hp_disk.get('Native Block Size', '0')
         except KeyError:
-            blk_size_str = hp_disk['Logical/Physical Block Size'].split("/")[0]
+            blk_size_str = hp_disk.get('Logical/Physical Block Size', '0/0').split("/")[0]
         blk_size = int(blk_size_str)
-        blk_count = int(int_div(_hp_size_to_lsm(hp_disk['Size']), blk_size))
+        disk_size = hp_disk.get('Size', NOT_AVAILABLE_MESSAGE)
+
+        if not blk_size or disk_size == NOT_AVAILABLE_MESSAGE:
+            blk_count = NOT_AVAILABLE_MESSAGE
+        else:
+            blk_count = int(int_div(_hp_size_to_lsm(disk_size), blk_size))
         try:
             disk_port, disk_box, disk_bay = disk_num.split(":")
         except ValueError:
@@ -848,7 +855,7 @@ def pool_member_info(self, pool, flags=Client.FLAG_RSVD):
                     elif array_key_name.startswith("physicaldrive"):
                         hp_disk = ctrl_data[key_name][array_key_name]
                         if hp_disk['Drive Type'] == 'Data Drive':
-                            disk_ids.append(hp_disk['Serial Number'])
+                            disk_ids.append(hp_disk.get('Serial Number', 'N/A'))
                 break
 
         if len(disk_ids) == 0:

With this patch, the output looks like:

$ lsmcli -u hpsa:// ld
ID              | Name                             | Type | Size | Status | System ID      | SCSI VPD 0x83 | Disk Paths | Revolutions Per Minute | Link Type
------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
N/A             | N/A 2I:1:8                       | SATA |      | Other  | PCFBB%%LM6S1BR |               |            | Non-Rotating Medium    | PATA/SATA
WD-WMC130F0K5D6 | ATA     WDC WD4000FYYZ-0 2I:1:9  | SATA |      | OK     | PCFBB%%LM6S1BR |               |            | 7200                   | PATA/SATA
[...]
@tasleson
Copy link
Member

@cstoyanov-scality Thanks for creating the issue, please open a pull request for review and inclusion. Thanks!

@tasleson
Copy link
Member

When picking default values for things that are not available, we need to keep them the same type. For example when we cannot retrieve the blk_size or blk_count the values should be numeric, not "N/A". Also if a value is being used to uniquely identify a resource and that value is not available, we need to come up with another means of creating an ID which can be used to uniquely identify it so that if we have more than 1 device that is broken we can still distinguish each of them. This could be using what we do know and generating a md5 or some other hash of it. In this specific case we could perhaps use the hash of "2I:1:8" or the WWID.

@tasleson
Copy link
Member

@cstoyanov-scality Any update on this one? Does my comment above make sense?

@tasleson
Copy link
Member

tasleson commented Oct 23, 2020

@cstoyanov-scality I'm going to be rolling a new release next week. If you would like this to be included please supply a patch to mailing list or PR soon. Thanks

Also as a follow-up to my previous comment, the library does have some constants for things that aren't retrieveable, like when you can't get the RPM. We will want to utilize these where they make sense.

#define LSM_DISK_RPM_UNKNOWN -1

tasleson added a commit to tasleson/libstoragemgmt that referenced this issue Oct 27, 2020
Resolves: libstorage#420

Note: This change is an adaptation of the one provided by
Christophe Stoyanov.  Specifically, when we cannot retrieve all
the needed values we select default values of the proper type
and domain eg.

id, disk_name -> string
blk_size, blk_count -> unsigned integer
rpm -> Disk.RPM_UNKNOWN

Signed-off-by: Tony Asleson <tasleson@redhat.com>
tasleson added a commit to tasleson/libstoragemgmt that referenced this issue Oct 30, 2020
Resolves: libstorage#420

Note: This change is an adaptation of the one provided by
Christophe Stoyanov.  Specifically, when we cannot retrieve all
the needed values we select default values of the proper type
and domain eg.

id, disk_name -> string
blk_size, blk_count -> unsigned integer
rpm -> Disk.RPM_UNKNOWN

Signed-off-by: Tony Asleson <tasleson@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants