Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

Isolinux 6.03 hangs #15

Closed
delfer opened this issue Dec 13, 2017 · 5 comments
Closed

Isolinux 6.03 hangs #15

delfer opened this issue Dec 13, 2017 · 5 comments
Assignees
Labels

Comments

@delfer
Copy link

delfer commented Dec 13, 2017

I'm trying to install linux distro under qemu with Intel HAXM acceleration.

Qemu version:
QEMU emulator version 2.10.95 (v2.11.0-rc5-11692-g50cdacc703-dirty)

Intel HAXM version:
v6.2.1

Environment:

  • Intel Core i5-4210M
  • Virtualization in BIOS enabled
  • Windows 10 64-bit
  • Hyper-V disabled

Steps to reproduce:

The same with:

  • archlinux-2017.12.01-x86_64.iso
  • CentOS-7-x86_64-NetInstall-1708.iso
  • debian-9.3.0-amd64-netinst.iso

Possible workaround:

  1. Do not use Intel HAXM
    Install OS without -accel hax and enable it after installation
  2. Load kernel without Isolinux
    • Extract linux and initrd.gz from (ubuntu) iso
    • Start qemu-system-x86_64 -m 4095 -accel hax -kernel linux -initrd initrd.gz -append vga=788
@raphaelning
Copy link
Contributor

This is actually a known issue, but thanks for the nice summary! Now we can track it on GitHub.

We haven't got the bandwidth to investigate it, but I'd appreciate any tips on debugging ISOLINUX, e.g. how to make it more verbose so we can get a better idea where it hangs.

@delfer
Copy link
Author

delfer commented Dec 14, 2017

git clone --recursive git://repo.or.cz/syslinux.git
vim syslinux/doc/isolinux.txt

ISOLINUX is by default built in two versions, one version with extra
debugging messages enabled. If you are having problems with ISOLINUX,
I would greatly appreciate if you could try out the debugging version
(isolinux-debug.bin) and let me know what it reports. The debugging
version does not include hybrid mode support

wget http://archive.ubuntu.com/ubuntu/dists/xenial/main/installer-amd64/current/images/netboot/mini.iso
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/syslinux-6.03.tar.gz
tar -xvf syslinux-6.03.tar.gz
xorriso -indev mini.iso -map syslinux-6.03/bios/core/isolinux-debug.bin isolinux.bin -boot_image isolinux dir=/ -outdev mini-debug.iso

default

@delfer
Copy link
Author

delfer commented Dec 14, 2017

http://repo.or.cz/syslinux.git/blob/HEAD:/core/isolinux.asm

Here is two debug messages: Main image LBA =... and Image read, jumping to main code.... It's hangs between them:

%ifdef DEBUG_MESSAGES
                mov si,offset_msg
                call writemsg
                call writehex8
                call crlf_early
%endif

                ; Load the rest of the file.  However, just in case there
                ; are still BIOSes with 64K wraparound problems, we have to
                ; take some extra precautions.  Since the normal load
                ; address (TEXT_START) is *not* 2K-sector-aligned, we round
                ; the target address upward to a sector boundary,
                ; and then move the entire thing down as a unit.
MaxLMA          equ 384*1024            ; Reasonable limit (384K)

                mov bx,((TEXT_START+2*SECTOR_SIZE-1) & ~(SECTOR_SIZE-1)) >> 4
                mov bp,[ImageSectors]
                push bx                 ; Load segment address

.more:
                push bx                 ; Segment address
                push bp                 ; Sector count
                mov es,bx
                mov cx,0xfff
                and bx,cx
                inc cx
                sub cx,bx
                shr cx,SECTOR_SHIFT - 4
                jnz .notaligned
                mov cx,0x10000 >> SECTOR_SHIFT  ; Full 64K segment possible
.notaligned:
                cmp bp,cx
                jbe .ok
                mov bp,cx
.ok:
                xor bx,bx
                push bp
                push eax
                call getlinsec
                pop eax
                pop cx
                movzx edx,cx
                pop bp
                pop bx

                shl cx,SECTOR_SHIFT - 4
                add bx,cx
                add eax,edx
                sub bp,dx
                jnz .more

                ; Move the image into place, and also verify the
                ; checksum
                pop ax                          ; Load segment address
                mov bx,(TEXT_START + SECTOR_SIZE) >> 4
                mov ecx,[ImageDwords]
                mov edi,[FirstSecSum]           ; First sector checksum
                xor si,si

move_verify_image:
.setseg:
                mov ds,ax
                mov es,bx
.loop:
                mov edx,[si]
                add edi,edx
                dec ecx
                mov [es:si],edx
                jz .done
                add si,4
                jnz .loop
                add ax,1000h
                add bx,1000h
                jmp .setseg
.done:
                mov ax,cs
                mov ds,ax
                mov es,ax

                ; Verify the checksum on the loaded image.
                cmp [bi_csum],edi
                je integrity_ok

                mov si,checkerr_msg
                call writemsg
                jmp kaboom

integrity_ok:
%ifdef DEBUG_MESSAGES
                mov si,allread_msg
                call writemsg
%endif
                jmp all_read                    ; Jump to main code

@raphaelning
Copy link
Contributor

Thanks a lot! This is a good starting point for anyone who wants to look into the issue. With some patience, one should be able to insert more debug messages and further narrow down the hang.

@raphaelning raphaelning self-assigned this Mar 26, 2018
raphaelning added a commit that referenced this issue Mar 26, 2018
For some variants of the OUTS instruction, handle_string_io() fails
to determine the correct guest virtual address (GVA) from which to
copy data. For example, the long-standing issue where ISOLINUX
boots to a hang under HAXM is in fact due to misemulation of the
following real-mode instruction (part of rom16.o of SeaBIOS):

 26 67 f3 6f  rep outsl %es:(%si),(%dx)

(Cf. outsw_fl() in src/farptr.h of SeaBIOS source tree. For the
record, it is called by ata_atapi_process_op() and eventually by
ISOLINUX via the INT 13h AH=42h BIOS interface.)

The disassembler treats it as a 32-bit instruction, thus the wrong
operand size. But one thing is clear: the instruction overrides the
default segment (DS) with ES, so the GVA should be ES:SI. However,
the current handle_string_io() logic does not parse the instruction
and assumes that the GVA is always DS:SI for OUTS. As a result, it
reads the wrong data into the I/O buffer.

Fix this bug by utilizing the Guest-Linear Address field of VMCS,
which is convenient and is guaranteed to give the correct GVA.

+ Remove the old hack for INS emulation. It is unclear why it was
  needed, but it doesn't seem necessary now.

Fixes #15.
@raphaelning
Copy link
Contributor

Finally I spent some time fixing the bug (with #36). It was a small patch, but the debugging indeed required patience :) Now both Ubuntu mini and desktop ISO images can boot to installer GUI with -accel hax.

As I said in the commit log, the instruction that led to the hang was actually in real-mode SeaBIOS code. In the ISOLINUX code quoted by @delfer above, call getlinsec (under .ok) would eventually call into SeaBIOS to read data from CDROM. Due to inaccurate emulation of string instructions, a particular rep outsw instruction kept copying the wrong data to I/O port 0x170. If we dump all outsw instructions with destination port 0x170, we can see the difference.

First, with TCG:

out(2) @ 0x0:0xedbfc: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x6671
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x6673
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x6675
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x6677
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x6679
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x667b
out(2) @ 0x0:0xedbf2: port=0x170, data=0028, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b1
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b3
out(2) @ 0x0:0xedc0d: port=0x170, data=1100, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b5
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b7
out(2) @ 0x0:0xedc0d: port=0x170, data=0001, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b9
out(2) @ 0x0:0xedc0d: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66bb
out(2) @ 0x0:0xedbf2: port=0x170, data=0028, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b1
out(2) @ 0x0:0xedbf2: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b3
out(2) @ 0x0:0xedbf2: port=0x170, data=4c00, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b5
out(2) @ 0x0:0xedbf2: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b7
out(2) @ 0x0:0xedbf2: port=0x170, data=0001, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b9
out(2) @ 0x0:0xedbf2: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66bb
out(2) @ 0x0:0xedbf2: port=0x170, data=0028, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b1
out(2) @ 0x0:0xedbf2: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b3
out(2) @ 0x0:0xedbf2: port=0x170, data=3b6f, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b5
out(2) @ 0x0:0xedbf2: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b7
out(2) @ 0x0:0xedbf2: port=0x170, data=0001, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66b9
out(2) @ 0x0:0xedbf2: port=0x170, data=0000, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66bb
out(2) @ 0xf0000:0x9b0d: port=0x170, data=0028, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0xc
out(2) @ 0xf0000:0x9b38: port=0x170, data=0000, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0xe
out(2) @ 0xf0000:0x9b38: port=0x170, data=3c6f, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x10
out(2) @ 0xf0000:0x9b38: port=0x170, data=0000, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x12
out(2) @ 0xf0000:0x9b38: port=0x170, data=000f, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x14
out(2) @ 0xf0000:0x9b38: port=0x170, data=0000, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x16
out(2) @ 0xf0000:0x9b03: port=0x170, data=0028, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0xc
out(2) @ 0xf0000:0x9b03: port=0x170, data=0000, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0xe
out(2) @ 0xf0000:0x9b03: port=0x170, data=4b6f, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x10
out(2) @ 0xf0000:0x9b03: port=0x170, data=0000, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x12
out(2) @ 0xf0000:0x9b03: port=0x170, data=0006, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x14
out(2) @ 0xf0000:0x9b03: port=0x170, data=0000, cr0=0x10, ds_sel/base=0xd900/0xd9000, es_sel/base=0xe8cd/0xe8cd0, rsi=0x16
...

Then, with HAXM (before patch):

haxm_info:*** REP OUTSW (6) port=0x170 @ 0x0:0xf0c39, start=0x6681, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x6681, data[0..5]=0x0 0x0 0x0 0x0 0x0 0x0
haxm_info:*** REP OUTSW (6) port=0x170 @ 0x0:0xf0c39, start=0x66c1, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66c1, data[0..5]=0x28 0x0 0x1100 0x0 0x1 0x0
haxm_info:*** REP OUTSW (6) port=0x170 @ 0x0:0xf0c39, start=0x66c1, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66c1, data[0..5]=0x28 0x0 0x4c00 0x0 0x1 0x0
haxm_info:*** REP OUTSW (6) port=0x170 @ 0x0:0xf0c39, start=0x66c1, cr0=0x11, ds_sel/base=0x10/0x0, es_sel/base=0x10/0x0, rsi=0x66c1, data[0..5]=0x28 0x0 0x3b6f 0x0 0x1 0x0
haxm_info:*** REP OUTSW (6) port=0x170 @ 0xf0000:0xaa51, start=0xdc800, cr0=0x10, ds_sel/base=0xdc80/0xdc800, es_sel/base=0xec2a/0xec2a0, rsi=0x0, data[0..5]=0x0 0x0 0x0 0x0 0x0 0x0
...

By comparison, the last outsw (the only one in real mode) in the HAXM log had the wrong data. Interestingly, for the same instructions, HAXM logged very different memory addresses (IP, DS, ES, SI, etc.) than TCG did (and only TCG's IPs matched the disassembly), yet HAXM was able to boot the guest. Maybe Intel VT-x needs to rearrange instructions for Unrestricted Guest mode?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants