Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64 data disk attached VM, after migration if reboot it won't boot #3

Open
rojarmj opened this issue Nov 21, 2017 · 5 comments
Open

Comments

@rojarmj
Copy link

rojarmj commented Nov 21, 2017

Steps:

  1. Attach 64 disks into a VM "KVM1__010_193"
  2. Reboot VM
    Reboot works fine
  3. Migrate VM into another host
    VM migrated and up running fine
  4. Reboot VM
    Failure

Vm reboot failed, on console I can see below message

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43c98 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

195 >

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43ca0 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

196 >

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43ca8 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

197 >

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43cb0 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

198 >

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43cb8 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

199 >

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43cc0 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

19a >

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43cc8 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

19b >

( 300 ) Data Storage Exception [ 7e568e48 ]

R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31

000000007dbe0308 000000007dc43cd0 0000000000000000 000000007dbe1940
000000007e45e6f0 000000007e4e9220 0000000000000000 0000000000000006
000000007dbffc00 000000007dc57130 0000000000000000 000000007dbf6800
000000007dc45000 0000000000000030 000000007dbe0d54 000000007e45f060
0000000000000000 000000007dbf8c18 000000007dbfe070 000000007dbf92b0
000000007e568e48 c000000007b80000 000000000000004e 0000000000000003
000000007e45b008 0000000000000000 000000007e55f4b5 3030303030302f73
000000007e45b010 0000000000000000 000000007e568e36 000000007e45b010

CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
    88000444   000000007dbe20f4   000000007dbe0c18   3030303030302f73  

0000000020000000 000000007dbe0bdc 8000000000001000 40000000

19c >

( 300 ) Data

@nvcastet
Copy link

Full trace with latest SLOF:

Domain KVM1__010_193-f4c5d232-000000c1 started
Connected to domain KVM1__010_193-f4c5d232-000000c1
Escape character is ^]


SLOF **********************************************************************
QEMU Starting
 Build Date = Nov 20 2017 10:41:58
 FW Version = git-0cbcd1512b987603
 Press "s" to enter Open Firmware.

Press F12 for boot menu.

Populating /vdevice methods
Populating /vdevice/vty@30000000
Populating /vdevice/nvram@71000000
Populating /pci@800000020000000
                     00 2800 (D) : 1af4 1000    virtio [ net ]
                     00 2000 (D) : 1234 1111    qemu vga
                     00 1800 (D) : 1af4 1002    unknown-legacy-device*
                     00 1000 (D) : 1033 0194    serial bus [ usb-xhci ]
                     00 0800 (D) : 1af4 1004    virtio [ scsi ]
Populating /pci@800000020000000/scsi@1
       SCSI: Looking for devices
          100004100000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100004000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003f00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003e00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003d00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003c00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003b00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003a00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003900000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003800000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003700000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003600000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003500000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003400000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003300000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003200000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003100000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100003000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002f00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002e00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002d00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002c00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002b00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002a00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002900000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002800000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002700000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002600000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002500000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002400000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002300000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002200000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002100000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100002000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001f00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001e00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001d00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001c00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001b00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001a00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001900000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001800000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001700000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001600000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001500000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001400000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001300000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001200000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001100000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100001000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000f00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000e00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000d00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000c00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000b00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000a00000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000900000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000800000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000700000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000600000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000500000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000400000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000300000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000300000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          100000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
No NVRAM common partition, re-initializing...
Installing QEMU fb



Scanning USB 
  XHCI: Initializing
    USB Keyboard 
    USB mouse 
No console specified using screen & keyboard
     
  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

 

( 300 ) Data Storage Exception [ 7e462010 ]


    R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31
000000007dbe3128   000000007dc4a010   0000000000000000   000000007dc09910   
000000007e666000   000000007dbe1034   0000000000000000   0000000000000006   
000000007dc0f200   000000007e706e38   0000000000000000   000000007dc05000   
0000000000000054   000000007dc085a0   000000007dbe0dd8   000000007dc09740   
000000007dc4a018   000000007dc08538   0000000000000054   0000000000000003   
000000007e462010   0000000000000000   000000007e76b9eb   ffffffffffffffff   
000000007dc5e248   0000000000000000   000000007e773de5   000000007e462010   
000000007e462008   0000000000000000   0000000000000000   3030303031363030   

    CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
        80000404   000000007dbe4f34   000000007dbe1060   3030303031363030   
0000000020000000   000000007dbe1034   8000000000001000           40000000   


3 >  

( 300 ) Data Storage Exception [ 7dc4a018 ]


    R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31
000000007dbe213c   000000007e462010   0000000000000000   000000007dc09910   
000000007e666000   000000007e706ff0   0000000000000000   0000000000000006   
000000007dc0f200   000000007dc4a010   0000000000000000   000000007dc05000   
000000007dc4c000   0000000000000000   000000007dbe0dd8   000000007dc09740   
0000000000000000   000000007dc08538   000000000000004e   0000000000000003   
000000007dc4a018   0000000000000000   000000007e767695   ffffffffffffffff   
000000007e770d18   0000000000000000   000000007e770d06   000000007e462010   
000000007e462008   0000000000000000   0000000000000000   3130303030314630   

    CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
        80000404   000000007dbe2914   000000007dbe0c6c   3130303030314630   
0000000020000000   000000007dbe0c28   8000000000001000           40000000   


4 >  

@rojarmj
Copy link
Author

rojarmj commented Nov 24, 2017

up to 15 attached disks boots works fine after migration, if we use 20 attached disks it fails

@nikunjad
Copy link
Contributor

Sent patch to fix this bug to the list. Its not related to migration. It was reproducible without migration.
http://patchwork.ozlabs.org/patch/842011/

Please test

@rojarmj
Copy link
Author

rojarmj commented Nov 30, 2017

tried patch, it works fine

@nikunjad
Copy link
Contributor

nikunjad commented Dec 1, 2017

Thanks, after discussion on the list, send a v2, please test

http://patchwork.ozlabs.org/patch/843450/

aik pushed a commit to aik/SLOF that referenced this issue Dec 13, 2017
We were concatenating the word " parse-load" and $bootdev list that was input to
evaluate. Open code EVALUATE work such that concatenation is not required.
"load" and "load-next" does not use $cat anymore.

Reported here: qemu/SLOF#3

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
aik pushed a commit to aik/SLOF that referenced this issue Dec 13, 2017
The catpad size is 1K size, which can overflow easily with around 20 devices
having bootindex. Replace usage of $cat with a dynamically allocated buffer(16K)
here. Introduce new words to work on the buffer (allocate, free and
concatenate)

Reported here: qemu/SLOF#3

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants