Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fail when connection list is empty #77

Merged
merged 2 commits into from Feb 21, 2022
Merged

Conversation

vjuranek
Copy link
Member

@vjuranek vjuranek commented Feb 16, 2022

Don't fail when engine sends empty connection list on storage server disconnect. This happens when multiple iSCSI SDs use same iSCSI target and therefore it cannot be disconnected as it's still used by other SDs. Engine should be fixed not to send disconnect request in such case, but we need to handle older engines. Log warning in such case, do nothing and return empty list as calling code expects iterable with results..
On the other hand engine always sends non-empty connection list when connecting SD. If the list is empty for some reason when connecting storage server, also log warning, but let vdsm throw KeyError as this is unexpected.

Fixes https://bugzilla.redhat.com/2054745

@vjuranek vjuranek requested a review from nirs as a code owner February 16, 2022 21:44
@vjuranek vjuranek changed the title Bz2054745 Don't fail when connection list is empty Feb 16, 2022
@vjuranek
Copy link
Member Author

/ost

@vjuranek
Copy link
Member Author

/ost

@vjuranek
Copy link
Member Author

Verified manually, engine:

2022-02-16 17:20:37,888-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [4f1c0404-47c7-43c9-8e0f-b505b3b8a71c] START, DisconnectStorageServerVDSCommand(HostName = stream8-host-test3, StorageServerConnectionManagementVDSParameters:{hostId='8bd4044f-bd4b-4031-a102-04e63247396a', storagePoolId='19016d46-17b4-11ec-9517-5254003aa1f8', storageType='ISCSI', connectionList='[]', sendNetworkEventOnFailure='true'}), log id: 31a8cbbf
2022-02-16 17:20:42,115-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [4f1c0404-47c7-43c9-8e0f-b505b3b8a71c] FINISH, DisconnectStorageServerVDSCommand, return: {}, log id: 31a8cbbf

vdsm:

2022-02-16 17:20:37,891-0500 INFO  (jsonrpc/7) [vdsm.api] START disconnectStorageServer(domType=3, spUUID='19016d46-17b4-11ec-9517-5254003aa1f8', conList=[]) from=::ffff:192.168.122.192,38962, flow_id=4f1c0404-47c7-43c9-8e0f-b505b3b8a71c, task_id=1dbd2dbf-1184-4f2e-bb44-b062a732f31c (api:48)
2022-02-16 17:20:37,892-0500 WARN  (jsonrpc/7) [storage.storageServer] Disconnect: connection list is empty, dom_type=3 (storageServer:937)
2022-02-16 17:20:39,913-0500 INFO  (monitor/f345fb3) [storage.multipath] Waited 2.06 seconds for multipathd (tries=2, ready=2) (multipath:139)
2022-02-16 17:20:39,914-0500 INFO  (monitor/f345fb3) [storage.multipath] Resizing multipath devices (multipath:220)
2022-02-16 17:20:39,921-0500 INFO  (monitor/f345fb3) [storage.multipath] Resizing multipath devices: 0.01 seconds (utils:390)
2022-02-16 17:20:39,922-0500 INFO  (monitor/f345fb3) [storage.storagedomaincache] Refreshing storage domain cache: 2.22 seconds (utils:390)
2022-02-16 17:20:39,928-0500 INFO  (jsonrpc/7) [storage.storagedomaincache] Refreshing storage domain cache (resize=False) (sdc:80)
2022-02-16 17:20:39,928-0500 INFO  (jsonrpc/7) [storage.iscsi] Scanning iSCSI devices (iscsi:462)
2022-02-16 17:20:39,988-0500 INFO  (jsonrpc/7) [storage.iscsi] Scanning iSCSI devices: 0.06 seconds (utils:390)
2022-02-16 17:20:39,989-0500 INFO  (jsonrpc/7) [storage.hba] Scanning FC devices (hba:59)
2022-02-16 17:20:40,070-0500 INFO  (jsonrpc/7) [storage.hba] Scanning FC devices: 0.08 seconds (utils:390)
2022-02-16 17:20:40,070-0500 INFO  (jsonrpc/7) [storage.multipath] Waiting until multipathd is ready (multipath:112)
2022-02-16 17:20:40,271-0500 INFO  (jsonrpc/5) [api.host] START getAllVmStats() from=::1,50920 (api:48)
2022-02-16 17:20:40,271-0500 INFO  (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::1,50920 (api:54)
2022-02-16 17:20:42,114-0500 INFO  (jsonrpc/7) [storage.multipath] Waited 2.04 seconds for multipathd (tries=2, ready=2) (multipath:139)
2022-02-16 17:20:42,115-0500 INFO  (jsonrpc/7) [storage.storagedomaincache] Refreshing storage domain cache: 2.19 seconds (utils:390)
2022-02-16 17:20:42,115-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH disconnectStorageServer return={'statuslist': []} from=::ffff:192.168.122.192,38962, flow_id=4f1c0404-47c7-43c9-8e0f-b505b3b8a71c, task_id=1dbd2dbf-1184-4f2e-bb44-b062a732f31c (api:54)

@vjuranek
Copy link
Member Author

/ost

@nirs
Copy link
Member

nirs commented Feb 16, 2022

Verified manually, engine:

2022-02-16 17:20:37,891-0500 INFO  (jsonrpc/7) [vdsm.api] START disconnectStorageServer(domType=3, spUUID='19016d46-17b4-11ec-9517-5254003aa1f8', conList=[]) from=::ffff:192.168.122.192,38962, flow_id=4f1c0404-47c7-43c9-8e0f-b505b3b8a71c, task_id=1dbd2dbf-1184-4f2e-bb44-b062a732f31c (api:48)
2022-02-16 17:20:37,892-0500 WARN  (jsonrpc/7) [storage.storageServer] Disconnect: connection list is empty, dom_type=3 (storageServer:937)

The warning is good - but disconnectStorageServer should return now.

We should not refresh storage or do any other change bellow:

2022-02-16 17:20:39,913-0500 INFO (monitor/f345fb3) [storage.multipath] Waited 2.06 seconds for multipathd (tries=2, ready=2) (multipath:139)
2022-02-16 17:20:39,914-0500 INFO (monitor/f345fb3) [storage.multipath] Resizing multipath devices (multipath:220)
2022-02-16 17:20:39,921-0500 INFO (monitor/f345fb3) [storage.multipath] Resizing multipath devices: 0.01 seconds (utils:390)
2022-02-16 17:20:39,922-0500 INFO (monitor/f345fb3) [storage.storagedomaincache] Refreshing storage domain cache: 2.22 seconds (utils:390)
2022-02-16 17:20:39,928-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Refreshing storage domain cache (resize=False) (sdc:80)
2022-02-16 17:20:39,928-0500 INFO (jsonrpc/7) [storage.iscsi] Scanning iSCSI devices (iscsi:462)
2022-02-16 17:20:39,988-0500 INFO (jsonrpc/7) [storage.iscsi] Scanning iSCSI devices: 0.06 seconds (utils:390)
2022-02-16 17:20:39,989-0500 INFO (jsonrpc/7) [storage.hba] Scanning FC devices (hba:59)
2022-02-16 17:20:40,070-0500 INFO (jsonrpc/7) [storage.hba] Scanning FC devices: 0.08 seconds (utils:390)
2022-02-16 17:20:40,070-0500 INFO (jsonrpc/7) [storage.multipath] Waiting until multipathd is ready (multipath:112)
2022-02-16 17:20:40,271-0500 INFO (jsonrpc/5) [api.host] START getAllVmStats() from=::1,50920 (api:48)
2022-02-16 17:20:40,271-0500 INFO (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::1,50920 (api:54)
2022-02-16 17:20:42,114-0500 INFO (jsonrpc/7) [storage.multipath] Waited 2.04 seconds for multipathd (tries=2, ready=2) (multipath:139)
2022-02-16 17:20:42,115-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Refreshing storage domain cache: 2.19 seconds (utils:390)
2022-02-16 17:20:42,115-0500 INFO (jsonrpc/7) [vdsm.api] FINISH disconnectStorageServer return={'statuslist': []} from=::ffff:192.168.122.192,38962, flow_id=4f1c0404-47c7-43c9-8e0f-b505b3b8a71c, task_id=1dbd2dbf-1184-4f2e-bb44-b062a732f31c (api:54)

I think the needed change is something like:

diff --git a/lib/vdsm/storage/hsm.py b/lib/vdsm/storage/hsm.py
index f67df2696..0f01998a2 100644
--- a/lib/vdsm/storage/hsm.py
+++ b/lib/vdsm/storage/hsm.py
@@ -2168,6 +2168,9 @@ class HSM(object):
                   successful
         :rtype: dict
         """
+        if not conList:
+            raise se.InvalidParameterException("conList", conList)
+
         vars.task.setDefaultException(
             se.StorageServerConnectionError(
                 "domType=%s, spUUID=%s, conList=%s" %
@@ -2226,6 +2229,12 @@ class HSM(object):
                   successful
         :rtype: dict
         """
+        if not conList:
+            # Engine can send invalid request with empty list. This should be
+            # fixed in engine, but we must support older engines.
+            self.log.warning("Ignoring request with emtpy connection list")
+            return dict(statuslist=[])
+
         vars.task.setDefaultException(
             se.StorageServerDisconnectionError(
                 "domType=%s, spUUID=%s, conList=%s" %

Input must be validated when it enters the system. We should not pass untrusted
input into the system. If we had proper rpc layer, the rpc code would fail and
we will not reach this point with an empty connection list.

lib/vdsm/storage/storageServer.py Outdated Show resolved Hide resolved
lib/vdsm/storage/storageServer.py Outdated Show resolved Hide resolved
lib/vdsm/storage/storageServer.py Show resolved Hide resolved
tests/storage/storageserver_test.py Outdated Show resolved Hide resolved
@vjuranek
Copy link
Member Author

/ost

@vjuranek
Copy link
Member Author

engine:

2022-02-17 08:15:49,334-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-60) [bc9bba55-d345-4f4a-86ec-4dc150ca470c] START, DisconnectStorageServerVDSCommand(HostName = stream8-host-test3, StorageServerConnectionManagementVDSParameters:{hostId='8bd4044f-bd4b-4031-a102-04e63247396a', storagePoolId='19016d46-17b4-11ec-9517-5254003aa1f8', storageType='ISCSI', connectionList='[]', sendNetworkEventOnFailure='true'}), log id: 41858b28
2022-02-17 08:15:49,337-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-60) [bc9bba55-d345-4f4a-86ec-4dc150ca470c] FINISH, DisconnectStorageServerVDSCommand, return: {}, log id: 41858b28

vdsm:

2022-02-17 08:15:49,341-0500 INFO  (jsonrpc/3) [vdsm.api] START disconnectStorageServer(domType=3, spUUID='19016d46-17b4-11ec-9517-5254003aa1f8', conList=[]) from=::ffff:192.168.122.192,38160, flow_id=bc9bba55-d345-4f4a-86ec-4dc150ca470c, task_id=359a0d7a-c909-4978-a47d-6cb117891877 (api:48)
2022-02-17 08:15:49,341-0500 WARN  (jsonrpc/3) [storage.hsm] Connection list is empty, ignoring request (hsm:2235)
2022-02-17 08:15:49,341-0500 INFO  (jsonrpc/3) [vdsm.api] FINISH disconnectStorageServer return={'statuslist': []} from=::ffff:192.168.122.192,38160, flow_id=bc9bba55-d345-4f4a-86ec-4dc150ca470c, task_id=359a0d7a-c909-4978-a47d-6cb117891877 (api:54)

OST: https://redir.apps.ovirt.org/dj/job/ds-ost-baremetal_manual/28586

@vjuranek vjuranek added the verified Change was tested; please describe how it was tested in the PR label Feb 17, 2022
@vjuranek vjuranek requested a review from nirs February 17, 2022 13:21
Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second patch looks good, but it seems that we are missing xfail mark
in the first.

tests/storage/hsm_connect_test.py Outdated Show resolved Hide resolved
tests/storage/hsm_connect_test.py Outdated Show resolved Hide resolved
When connecting or disconnecting storage server, current code assumes
that connection list sent by engine is never empty. However, this is not
the case and when there are two or more iSCSI SDs which use same iSCSI
target, engine sends empty connection list when requesting disconnect,
see BZ #2054745 [1] for more details. In such case current code fails
with

    UnboundLocalError: local variable 'con_info' referenced before assignment

which is bad in any case. Avoid using potentially undefied variable.

Better handling of empty connection list will be improved in follow-up
patch.

[1] https://bugzilla.redhat.com/2054745

Change-Id: I1a4a5d068ff75669d3ceafee1cc427213a552538
Signed-off-by: Vojtech Juranek <vjuranek@redhat.com>
When multiple iSCSI SDs use same iSCSI target and one of these SDs is
put into maintenance, engine sends vdsm disconnect storage server
request with empty connection list. While it's questionable why engine
sends such request, vdsm still has handle it and has to do nothing as
the target is still used by other SDs. Do nothing in this case, but
log a warning, because this should be fixed in the engine and engine
shouldn't request disconnecting storage server when it's used by other
SDs. As calling code expect iterable with results, return empty list.

Change-Id: I7de2b77dbda788883600c14893bc2d1485f59804
Bug-Url: https://bugzilla.redhat.com/2054745
Signed-off-by: Vojtech Juranek <vjuranek@redhat.com>
@sonarcloud
Copy link

sonarcloud bot commented Feb 18, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
2.4% 2.4% Duplication

@vjuranek vjuranek requested a review from nirs February 18, 2022 19:21
@nirs nirs merged commit ade8b5c into oVirt:master Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
verified Change was tested; please describe how it was tested in the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants