Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One VDI becomes unbootable #51

Open
maxcuttins opened this issue Jun 25, 2017 · 8 comments
Open

One VDI becomes unbootable #51

maxcuttins opened this issue Jun 25, 2017 · 8 comments
Assignees

Comments

@maxcuttins
Copy link
Collaborator

I have an issue with one VDI.
Suddenly stop and show:

"Failed","Starting VM '
Internal error: xenopsd internal error: Memory_interface.Internal_error("VM = 980788af-7864-4a96-b5c3-8fbde2961fa9; domid = 42; Bootloader.Bad_error Traceback (most recent call last):\n  File \"/usr/bin/pygrub\", line 984, in <module>\n    part_offs = get_partition_offsets(file)\n  File \"/usr/bin/pygrub\", line 116, in get_partition_offsets\n    image_type = identify_disk_image(file)\n  File \"/usr/bin/pygrub\", line 60, in identify_disk_image\n    buf = os.read(fd, read_size_roundup(fd, 0x8006))\nOSError: [Errno 5] Input/output error\n")

@maxcuttins
Copy link
Collaborator Author

3 VDI

@maxcuttins
Copy link
Collaborator Author

I got this after a xe sr-scan on the Ceph Storage:
`There was an SR backend failure.
status: non-zero exit
stdout:
stderr: Traceback (most recent call last):
File "/opt/xensource/sm/RBDSR", line 774, in
SRCommand.run(RBDSR, DRIVER_INFO)
File "/opt/xensource/sm/SRCommand.py", line 352, in run
ret = cmd.run(sr)
File "/opt/xensource/sm/SRCommand.py", line 110, in run
return self._run_locked(sr)
File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
rv = self._run(sr, target)
File "/opt/xensource/sm/SRCommand.py", line 338, in _run
return sr.scan(self.params['sr_uuid'])
File "/opt/xensource/sm/RBDSR", line 244, in scan
scanrecord.synchronise_new()
File "/opt/xensource/sm/SR.py", line 581, in synchronise_new
vdi._db_introduce()
File "/opt/xensource/sm/VDI.py", line 312, in _db_introduce
vdi = self.sr.session.xenapi.VDI.db_introduce(uuid, self.label, self.description, self.sr.sr_ref, ty, self.shareable, self.read_only, {}, self.location, {}, sm_config, self.managed, str(self.size), str(self.utilisation), metadata_of_pool, is_a_snapshot, xmlrpclib.DateTime(snapshot_time), snapshot_of)
File "/usr/lib/python2.7/site-packages/XenAPI.py", line 248, in call
return self.__send(self.__name, args)
File "/usr/lib/python2.7/site-packages/XenAPI.py", line 150, in xenapi_request
result = _parse_result(getattr(self, methodname)(*full_params))
File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in call
return self.__send(self.__name, args)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1581, in __request
allow_none=self.__allow_none)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1086, in dumps
data = m.dumps(params)
File "/usr/lib64/python2.7/xmlrpclib.py", line 633, in dumps
dump(v, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 655, in __dump
f(self, value, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 757, in dump_instance
self.dump_struct(value.dict, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 736, in dump_struct
dump(v, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 655, in __dump
f(self, value, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 757, in dump_instance
self.dump_struct(value.dict, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 736, in dump_struct
dump(v, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 655, in __dump
f(self, value, write)
File "/usr/lib64/python2.7/xmlrpclib.py", line 666, in dump_int
raise OverflowError, "int exceeds XML-RPC limits"
OverflowError: int exceeds XML-RPC limits

[root@xenserver-11 archive]# xe sr-scan uuid=51a45fd8-a4d1-4202-899c-00a0f81054cc

Broadcast message from systemd-journald@xenserver-11 (Sun 2017-06-25 05:34:16 CEST):

tapdisk[4632]: tapdisk-syslog: 1 messages dropped

Broadcast message from systemd-journald@xenserver-11 (Sun 2017-06-25 05:34:16 CEST):

tapdisk[4632]: tapdisk-syslog: 3 messages dropped

Broadcast message from systemd-journald@xenserver-11 (Sun 2017-06-25 05:34:16 CEST):

tapdisk[4632]: tapdisk-syslog: 1 messages dropped
`

@rposudnevskiy
Copy link
Owner

Hi,
Could you please run xe sr-scan again and send me the files /var/log/SMlog and /var/log/xensource.log
Thanks

rposudnevskiy added a commit that referenced this issue Jul 3, 2017
Registry with references to nbd devices has been added to SR sm_config
@rposudnevskiy
Copy link
Owner

The reason of this error is that for some operations (resize, update, snapshot, clone etc.) it is required to unmap rbd-nbd device, execute operation, and mount again. If at the certain moment only one unmap/map operation is executed you don't have problem. But if several Vdi is mapped simultaneously, then the rbd-nbd device that was unmapped previously (for resize, update operation etc.) may be mapped again with other device instance number that differs the number that it had before unmap. As result the Vdi can't be unpaused after operation and we have error mentioned above.

Last update should fix this problem.

@maxcuttins
Copy link
Collaborator Author

maxcuttins commented Jul 4, 2017

I don't know if it's a good way to fix the issue.
Thinking about it probably the best way is to stop using cache in order to expect some results that are unattended by system.
Probably it's better to create a function that runned retrieve automatically the nbd-device everytime we need to issue a new command. Having no-cache probably means less speed but up-to-date data and references.
Probably a simple function that parse the UUID of the VDI on the fly and retrieve the right NBD-device should be a really thin and light piece of function.

@rposudnevskiy
Copy link
Owner

Hi,
It's not a cache. The registry just track the attached vdi and store the reference to nbd device corresponding to the attached vdi. On vdi attach the reference is created and on vdi detach the reference is deleted.
Also the registry is cleared if SR is detached.
So the registry should always have an actual info about attached vdis, i hope.

@blodone
Copy link
Collaborator

blodone commented Jul 16, 2018

a workaround is to detach, rename the UUID of the rbd image, then set :uuid with image-meta set ... rescan and attach. Boot then works... it seems if the unmap is interrupted or anything other does not unlock -> with renaming its fixed without rebooting / re attaching the whole SR

@blodone
Copy link
Collaborator

blodone commented Jul 20, 2018

for v2.0 rbd nbd i created a patch to use the names instead of /dev/nbdXX numbers:
#79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants