Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__srlock__ not populating with proper ID and exception "401: You must log in" #72

Open
mdmeier opened this issue Apr 20, 2018 · 1 comment

Comments

@mdmeier
Copy link

mdmeier commented Apr 20, 2018

Hi Roman,

Long time no chat. I've recently undertaken to upgrade CEPH from kraken to luminous and have come across a strange problem. When migrating a VDI from another SR to CEPH I'm getting the following every second in /var/log/SMlog:

Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock.Lock._trylock
Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock: Trying to lock 'srlock'
Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock.Lock.held`
Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock.Lock._get_srlocker
Apr 20 01:51:11 cloud103-15 SM: [10286] ['rbd', '--format', 'json', '--name', 'client.admin', '--pool', 'RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678', 'lock', 'list', 'srlock']
Apr 20 01:51:11 cloud103-15 SM: [10286] pread SUCCESS

Which is odd, because from what I can tell, srlock should be replaced with an actual VDI ID? When I check locks I see:

rbd --name client.admin --pool RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678 lock list srlock

There is 1 exclusive lock on this image.
Locker ID Address
client.467981955 locked 192.168.1xx.1xx:0/2059369044

Which is always another server. If I'm persistent enough I can "rbd lock remove" them so that this server catches the lock, but then I get:

Apr 20 01:53:54 cloud103-15 SM: [10286] rbdsr_lock: acquired 'client.467921638'
Apr 20 01:53:54 cloud103-15 SM: [10286] Exception in activate/attach
Apr 20 01:53:54 cloud103-15 SM: [10286] failed to remove tag: <Fault 401: 'You must log in'>
Apr 20 01:53:54 cloud103-15 SM: [10286] ***** BLKTAP2:<function _activate_locked at 0x14776e0>: > EXCEPTION <class 'xmlrpclib.Fault'>, <Fault 401: 'You must log in'>
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 87, in wrapper
Apr 20 01:53:54 cloud103-15 SM: [10286] ret = op(self, *args)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1602, in _activate_locked
Apr 20 01:53:54 cloud103-15 SM: [10286] self._remove_tag(vdi_uuid)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1452, in _remove_tag
Apr 20 01:53:54 cloud103-15 SM: [10286] vdi_ref = self._session.xenapi.VDI.get_by_uuid(vdi_uuid)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 254, in call
Apr 20 01:53:54 cloud103-15 SM: [10286] return self.__send(self.__name, args)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 157, in xenapi_request
Apr 20 01:53:54 cloud103-15 SM: [10286] raise xmlrpclib.Fault(401, 'You must log in')
Apr 20 01:53:54 cloud103-15 SM: [10286]
Apr 20 01:53:54 cloud103-15 SM: [10286] lock: released /var/lock/sm/302db214-90cf-4600-84ac-6bc9b053c61c/vdi
Apr 20 01:53:54 cloud103-15 SM: [10286] ***** generic exception: vdi_activate: EXCEPTION <class 'xmlrpclib.Fault'>, <Fault 401: 'You must log in'>
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 110, in run
Apr 20 01:53:54 cloud103-15 SM: [10286] return self._run_locked(sr)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Apr 20 01:53:54 cloud103-15 SM: [10286] rv = self._run(sr, target)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 264, in _run
Apr 20 01:53:54 cloud103-15 SM: [10286] writable, caching_params)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1541, in activate
Apr 20 01:53:54 cloud103-15 SM: [10286] if self._activate_locked(sr_uuid, vdi_uuid, options):
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 87, in wrapper
Apr 20 01:53:54 cloud103-15 SM: [10286] ret = op(self, *args)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1602, in _activate_locked
Apr 20 01:53:54 cloud103-15 SM: [10286] self._remove_tag(vdi_uuid)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1452, in _remove_tag
Apr 20 01:53:54 cloud103-15 SM: [10286] vdi_ref = self._session.xenapi.VDI.get_by_uuid(vdi_uuid)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 254, in call
Apr 20 01:53:54 cloud103-15 SM: [10286] return self.__send(self.__name, args)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 157, in xenapi_request
Apr 20 01:53:54 cloud103-15 SM: [10286] raise xmlrpclib.Fault(401, 'You must log in')
Apr 20 01:53:54 cloud103-15 SM: [10286]
Apr 20 01:53:54 cloud103-15 SM: [10286] ***** RBD: EXCEPTION <class 'xmlrpclib.Fault'>, <Fault 401: 'You must log in'>
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 353, in run
Apr 20 01:53:54 cloud103-15 SM: [10286] ret = cmd.run(sr)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 110, in run
Apr 20 01:53:54 cloud103-15 SM: [10286] return self._run_locked(sr)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Apr 20 01:53:54 cloud103-15 SM: [10286] rv = self._run(sr, target)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 264, in _run
Apr 20 01:53:54 cloud103-15 SM: [10286] writable, caching_params)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1541, in activate
Apr 20 01:53:54 cloud103-15 SM: [10286] if self._activate_locked(sr_uuid, vdi_uuid, options):
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 87, in wrapper
Apr 20 01:53:54 cloud103-15 SM: [10286] ret = op(self, *args)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1602, in _activate_locked
Apr 20 01:53:54 cloud103-15 SM: [10286] self._remove_tag(vdi_uuid)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1452, in _remove_tag
Apr 20 01:53:54 cloud103-15 SM: [10286] vdi_ref = self._session.xenapi.VDI.get_by_uuid(vdi_uuid)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 254, in call
Apr 20 01:53:54 cloud103-15 SM: [10286] return self.__send(self.__name, args)
Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 157, in xenapi_request
Apr 20 01:53:54 cloud103-15 SM: [10286] raise xmlrpclib.Fault(401, 'You must log in')
Apr 20 01:53:54 cloud103-15 SM: [10286]
Apr 20 01:53:54 cloud103-15 SM: [10286] lock: closed /var/lock/sm/302db214-90cf-4600-84ac-6bc9b053c61c/vdi

Any chance you can help with this? I'm unable to create new VDIs on my CEPH SR right now because of this.

@mdmeier
Copy link
Author

mdmeier commented Apr 20, 2018

The fundamental problem appears to be with mapping rbd?

[root@cloud103-15 ~]# rbd nbd map --device /dev/ndb1 --nbds_max 64 RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678/VHD-50d28620-24b7-45f9-99f4-7f5ee0bc739e --name client.admin
rbd-nbd: ignoring kernel module parameter options: nbd module already loaded
rbd-nbd: failed to open device: /dev/ndb1
rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: exit status: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant