Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host cannot connect to storage domains #155

Closed
ferradeira opened this issue Apr 27, 2022 · 1 comment · Fixed by #172
Closed

Host cannot connect to storage domains #155

ferradeira opened this issue Apr 27, 2022 · 1 comment · Fixed by #172
Labels
bug Issue is a bug or fix for a bug gluster

Comments

@ferradeira
Copy link

ferradeira commented Apr 27, 2022

After upgrade from 4.4 to 4.5 host cannot be activated because cannot connect to data domain.
I have a data domain in NFS (master) and a GlusterFS. It complains about the Gluster domain:
The error message for connection node1-teste.acloud.pt:/data1 returned by VDSM was: XML error

# rpm -qa|grep glusterfs*
glusterfs-10.1-1.el8s.x86_64
glusterfs-selinux-2.0.1-1.el8s.noarch
glusterfs-client-xlators-10.1-1.el8s.x86_64
glusterfs-events-10.1-1.el8s.x86_64
libglusterfs0-10.1-1.el8s.x86_64
glusterfs-fuse-10.1-1.el8s.x86_64
glusterfs-server-10.1-1.el8s.x86_64
glusterfs-cli-10.1-1.el8s.x86_64
glusterfs-geo-replication-10.1-1.el8s.x86_64

engine log:

[2022-04-27 13](callto:2022-04-27 13):35:16,118+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
be79c6] EVENT_ID: VDS_STORAGES_CONNECTION_FAILED(188), Failed to connect Host NODE1 to the Storage Domains DATA1.
[2022-04-27 13](callto:2022-04-27 13):35:16,169+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
be79c6] EVENT_ID: STORAGE_DOMAIN_ERROR(996), The error message for connection node1-teste.acloud.pt:/data1 returned by VDSM was: XML error
[2022-04-27 13](callto:2022-04-27 13):35:16,170+01 ERROR [org.ovirt.engine.core.bll.storage.connection.FileStorageHelper] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [ebe79c6
] The connection with details 'node1-teste.acloud.pt:/data1' failed because of error code '4106' and error message is: xml error

vdsm log:

[2022-04-27 13](callto:2022-04-27 13):40:07,125+0100 ERROR (jsonrpc/4) [storage.storageServer] Could not connect to storage server (storageServer:92)
Traceback (most recent call last):
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 90, in connect_all
   con.connect()
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 233, in connect
   self.validate()
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 365, in validate
   if not self.volinfo:
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 352, in volinfo
   self._volinfo = self._get_gluster_volinfo()
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 405, in _get_gluster_volinfo
   self._volfileserver)
 File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
   return callMethod()
 File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
   **kwargs)
 File "<string>", line 2, in glusterVolumeInfo
 File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
   raise convert_to_error(kind, result)
vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=() err=[b'<cliOutput>\n  <opRet>0</opRet>\n  <opErrno>0</opErrno>\n  <opErrstr />\n  <volInfo>\n    <volumes>\
n      <volume>\n        <name>data1</name>\n        <id>d7eb2c38-[2707-4774-9873](callto:2707-4774-9873)-a7303d024669</id>\n        <status>1</status>\n        <statusStr>Started</statusStr>\n        <sn
apshotCount>0</snapshotCount>\n        <brickCount>2</brickCount>\n        <distCount>2</distCount>\n        <replicaCount>1</replicaCount>\n        <arbiterCount>0</arbiterCount>
\n        <disperseCount>0</disperseCount>\n        <redundancyCount>0</redundancyCount>\n        <type>0</type>\n        <typeStr>Distribute</typeStr>\n        <transport>0</tran
sport>\n        <bricks>\n          <brick uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/home/brick1<name>node1-teste.acloud.pt:/home/brick1</name><hostUuid>0
8c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b</hostUuid><isArbiter>0</isArbiter></brick>\n          <brick uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/brick2<name>nod
e1-teste.acloud.pt:/brick2</name><hostUuid>08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b</hostUuid><isArbiter>0</isArbiter></brick>\n        </bricks>\n        <optCount>23</optCount>\n   
    <options>\n          <option>\n            <name>nfs.disable</name>\n            <value>on</value>\n          </option>\n          <option>\n            <name>transport.addre
ss-family</name>\n            <value>inet</value>\n          </option>\n          <option>\n            <name>storage.fips-mode-rchecksum</name>\n            <value>on</value>\n  
       </option>\n          <option>\n            <name>storage.owner-uid</name>\n            <value>36</value>\n          </option>\n          <option>\n            <name>storag
e.owner-gid</name>\n            <value>36</value>\n          </option>\n          <option>\n            <name>cluster.min-free-disk</name>\n            <value>5%</value>\n        
 </option>\n          <option>\n            <name>performance.quick-read</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>perfor
mance.read-ahead</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>performance.io-cache</name>\n            <value>off</value>\n  
       </option>\n          <option>\n            <name>performance.low-prio-threads</name>\n            <value>32</value>\n          </option>\n          <option>\n            <
name>network.remote-dio</name>\n            <value>enable</value>\n          </option>\n          <option>\n            <name>cluster.eager-lock</name>\n            <value>enable<
/value>\n          </option>\n          <option>\n            <name>cluster.quorum-type</name>\n            <value>auto</value>\n          </option>\n          <option>\n         
  <name>cluster.server-quorum-type</name>\n            <value>server</value>\n          </option>\n          <option>\n            <name>cluster.data-self-heal-algorithm</name>\n
           <value>full</value>\n          </option>\n          <option>\n            <name>cluster.locking-scheme</name>\n            <value>granular</value>\n          </option>
\n          <option>\n            <name>cluster.shd-wait-qlength</name>\n            <value>10000</value>\n          </option>\n          <option>\n            <name>features.shar
d</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>user.cifs</name>\n            <value>off</value>\n          </option>\n       
  <option>\n            <name>cluster.choose-local</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>client.event-threads</name>\
n            <value>4</value>\n          </option>\n          <option>\n            <name>server.event-threads</name>\n            <value>4</value>\n          </option>\n         
<option>\n            <name>performance.client-io-threads</name>\n            <value>on</value>\n          </option>\n        </options>\n      </volume>\n      <count>1</count>\
n    </volumes>\n  </volInfo>\n</cliOutput>']
[2022-04-27 13](callto:2022-04-27 13):40:07,125+0100 INFO  (jsonrpc/4) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74)
[2022-04-27 13](callto:2022-04-27 13):40:07,125+0100 INFO  (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'dede3145-651a-4b01-b8d2-82bff8670696', 'status': 4106}]} from=
::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), flow_id=4c170005, task_id=cec6f36f-46a4-462c-9d0a-feb8d814b465 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,410+0100 INFO  (jsonrpc/5) [api.host] START getAllVmStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,411+0100 INFO  (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::ffff:192.168.5.1
65,42132 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,785+0100 INFO  (jsonrpc/7) [api.host] START getStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] START repoStats(domains=()) from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=4fa4e8c4-7c65-499a-827e-8ae153aa875e (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] FINISH repoStats return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=4fa4e8c4-7c65-499a-827e-8ae153aa875e (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] START multipath_health() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=c6390f2a-845b-420b-a833-475605a24078 (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] FINISH multipath_health return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=c6390f2a-845b-420b-a833-475605a24078 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,802+0100 INFO  (jsonrpc/7) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (
api:54)
[2022-04-27 13](callto:2022-04-27 13):40:11,980+0100 INFO  (jsonrpc/6) [api.host] START getAllVmStats() from=::1,37040 (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:11,980+0100 INFO  (jsonrpc/6) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::1,37040 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:12,365+0100 INFO  (periodic/2) [vdsm.api] START repoStats(domains=()) from=internal, task_id=f5084096-e5c5-4ca8-9c47-a92fa5790484 (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:12,365+0100 INFO  (periodic/2) [vdsm.api] FINISH repoStats return={} from=internal, task_id=f5084096-e5c5-4ca8-9c47-a92fa5790484 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,417+0100 INFO  (jsonrpc/0) [api.host] START getAllVmStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,417+0100 INFO  (jsonrpc/0) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::ffff:192.168.5.1
65,42132 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,805+0100 INFO  (jsonrpc/1) [api.host] START getStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] START repoStats(domains=()) from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=a9fb939c-ea1a-4116-a22f-d14a99e6eada (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] FINISH repoStats return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=a9fb939c-ea1a-4116-a22f-d14a99e6eada (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] START multipath_health() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=5eee2f63-2631-446a-98dd-4947f9499f8f (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] FINISH multipath_health return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=5eee2f63-2631-446a-98dd-4947f9499f8f (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,822+0100 INFO  (jsonrpc/1) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (
api:54)
@nirs nirs added bug Issue is a bug or fix for a bug gluster labels Apr 27, 2022
@nirs
Copy link
Member

nirs commented Apr 28, 2022

@gobindadas suggests that this is a result of:
gluster/glusterfs#3133

I had discussion with gluster folks and this is what they say
We do add stripe_count only if a user do upgrade from older release to release-10
For new installation stripe_count is not available so you have to change your script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is a bug or fix for a bug gluster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants