New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to boot smartos-live at Intel NUC 7i5BNH #727

Closed
ricco386 opened this Issue Jun 7, 2017 · 58 comments

Comments

Projects
None yet
@ricco386

ricco386 commented Jun 7, 2017

I have downloaded latest image from official page and created USB image according wiki. When I try to boot from the usb system freeze.

Build version:
SunOS Release 5.11 Version joyent_20170601T212107Z 64-bit

Hardware:
Intel NUC 7i5BNH
Intel Core i5 7260U

Last messages in verbose boot mode:

ramdisk0 at root
ramdisk0 is /ramdisk
root on /ramdisk:a fstype ufs

Possibly related to #722 as without the verbose boot mode. I am stuck just after grub with blinking cursor.

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Jun 8, 2017

Member

Hmm, so I guess these are the Kaby Lake based NUCs, correct? So, in this case we are getting to the OS, but something is going wrong in early boot it sounds like. We may need to use kmdb to step through a bit and see what's going on.

Member

rmustacc commented Jun 8, 2017

Hmm, so I guess these are the Kaby Lake based NUCs, correct? So, in this case we are getting to the OS, but something is going wrong in early boot it sounds like. We may need to use kmdb to step through a bit and see what's going on.

@ricco386

This comment has been minimized.

Show comment
Hide comment
@ricco386

ricco386 Jun 28, 2017

Yes it is. I have done further testing and identified the issue better. It seems that Intel Skylake/Kaby Lake processors have broken hyper-threading. I have come across issue mentioned in Debian mailing list. Unfortunately in my case disabling hyper-threading in BIOS didn't help.

In the time of reporting issue same behaviour was both when booting from USB and also if I run Linux and do a nested virtualization in KVM and booted SmartOS. However in the mean time I am not able to reproduce this issue in KVM (neither with hyper-threading nor without it), but the issue is still present if I want to boot from USB directly on hardware (both with hyper-threading also without it).

BIOS info:

Intel Desktop Board NUC7i5BNB
BIOS version: BNKBL357.86A.0042.2017.0303.1854 (currently latest BIOS according Intel website).

More info about mu hardware (I got the when I booted CentOS):

[root@nanuk ~]# dmidecode -t processor | grep HTT
		HTT (Multi-threading)
[root@nanuk ~]# cat /proc/cpuinfo |grep processor
processor	: 0
processor	: 1
processor	: 2
processor	: 3
[root@nanuk ~]# cat /proc/cpuinfo |grep name
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz

After disabled hyperthreading:

[root@nanuk ~]# dmidecode -t processor | grep HTT
		HTT (Multi-threading)
[root@nanuk ~]# cat /proc/cpuinfo |grep processor
processor	: 0
processor	: 1
[root@nanuk ~]# cat /proc/cpuinfo |grep name
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz

I am able to select SmartOS + kmdb and I got the debug prompt:

Loading kmdb...

Welcome to kmdb
Loaded modules: [ unix krtld genuix ]
[0]>

But not sure what to do next in order to help to resolve this issue?

ricco386 commented Jun 28, 2017

Yes it is. I have done further testing and identified the issue better. It seems that Intel Skylake/Kaby Lake processors have broken hyper-threading. I have come across issue mentioned in Debian mailing list. Unfortunately in my case disabling hyper-threading in BIOS didn't help.

In the time of reporting issue same behaviour was both when booting from USB and also if I run Linux and do a nested virtualization in KVM and booted SmartOS. However in the mean time I am not able to reproduce this issue in KVM (neither with hyper-threading nor without it), but the issue is still present if I want to boot from USB directly on hardware (both with hyper-threading also without it).

BIOS info:

Intel Desktop Board NUC7i5BNB
BIOS version: BNKBL357.86A.0042.2017.0303.1854 (currently latest BIOS according Intel website).

More info about mu hardware (I got the when I booted CentOS):

[root@nanuk ~]# dmidecode -t processor | grep HTT
		HTT (Multi-threading)
[root@nanuk ~]# cat /proc/cpuinfo |grep processor
processor	: 0
processor	: 1
processor	: 2
processor	: 3
[root@nanuk ~]# cat /proc/cpuinfo |grep name
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz

After disabled hyperthreading:

[root@nanuk ~]# dmidecode -t processor | grep HTT
		HTT (Multi-threading)
[root@nanuk ~]# cat /proc/cpuinfo |grep processor
processor	: 0
processor	: 1
[root@nanuk ~]# cat /proc/cpuinfo |grep name
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz
model name	: Intel(R) Core(TM) i5-7260U CPU @ 2.20GHz

I am able to select SmartOS + kmdb and I got the debug prompt:

Loading kmdb...

Welcome to kmdb
Loaded modules: [ unix krtld genuix ]
[0]>

But not sure what to do next in order to help to resolve this issue?

@vrou

This comment has been minimized.

Show comment
Hide comment
@vrou

vrou Jun 30, 2017

Hi,

I have the same issue, though I am using the NUC7I5BNH kit, but the same CPU.

I have upgraded to BIOS version 47, which should fix the HT bug, but no luck getting further than you.

vrou commented Jun 30, 2017

Hi,

I have the same issue, though I am using the NUC7I5BNH kit, but the same CPU.

I have upgraded to BIOS version 47, which should fix the HT bug, but no luck getting further than you.

@vrou

This comment has been minimized.

Show comment
Hide comment
@vrou

vrou Jul 1, 2017

I researched a bit more yesterday, and found the following reference to a boot message I also got when I booted the SmartOS image:
WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group.
http://blog.alainodea.com/en/article/445/smartos-on-multi-socket-and-the-acpi-srat

Could this NUMA/ACPI SRAT conflict be the reason behind this?

Today I also tried to boot on a OmniOS r151022 stable release, which ended up with the same result I got with the latest SmartOS image, minus the

vrou commented Jul 1, 2017

I researched a bit more yesterday, and found the following reference to a boot message I also got when I booted the SmartOS image:
WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group.
http://blog.alainodea.com/en/article/445/smartos-on-multi-socket-and-the-acpi-srat

Could this NUMA/ACPI SRAT conflict be the reason behind this?

Today I also tried to boot on a OmniOS r151022 stable release, which ended up with the same result I got with the latest SmartOS image, minus the

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Jul 1, 2017

Member

The NUMA/ACPI srat warning is probably a red herring here. In general, we'd like to be able to inject an NMI when this happens, but I don't think there's a good way on the NUC to do that. Maybe a first order approximation is to add the -v flag to the boot options in grub to get a slightly more verbose output. This won't tell us where it'll hang (usually where it hangs is the very next step), but it may give us something a bit more informative for where to start looking in kmdb.

The idea with kmdb is that we could single step to the point where we know it fails, but without having a starting point, that might be a little too painful. Alternatively, we may be able to enable the alternate break sequence (@jclulow do you remember how to do that exactly?) and use that to drop into kmdb when we're stuck.

Thank you for your patience with this problem.

Member

rmustacc commented Jul 1, 2017

The NUMA/ACPI srat warning is probably a red herring here. In general, we'd like to be able to inject an NMI when this happens, but I don't think there's a good way on the NUC to do that. Maybe a first order approximation is to add the -v flag to the boot options in grub to get a slightly more verbose output. This won't tell us where it'll hang (usually where it hangs is the very next step), but it may give us something a bit more informative for where to start looking in kmdb.

The idea with kmdb is that we could single step to the point where we know it fails, but without having a starting point, that might be a little too painful. Alternatively, we may be able to enable the alternate break sequence (@jclulow do you remember how to do that exactly?) and use that to drop into kmdb when we're stuck.

Thank you for your patience with this problem.

@ricco386

This comment has been minimized.

Show comment
Hide comment
@ricco386

ricco386 Jul 3, 2017

@vrou If ACPI SRAT is not available in your BIOS configuration, disable NUMA/Node interleaving. Otherwise the following message may appear during boot time:
WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group.

@rmustacc
Problem seems to be in startup_modules however I have encountered two different scenarios. If i set break point to startup_end debuger freeze, same with segkmem_gc debuger is also frozen but system seems to be continue booting, until the message in verbose mode:

ramdisk0 at root
ramdisk0 is /ramdisk
root on /ramdisk:a fstype ufs

I have tried to find out the point where debuger is freezing, it seems that setup_ddi looops in cycle and freeze at some point, I have tried to set break point at e_ddi_instance_init and it gets there but debuger prompt is frozen.

Break point set to get_neighbors and cycle run with continue I have end up with frozen debuger. Last message is

<-[mkmdb: stop at ndi_hold_devi
kmdb: target stopped at:
ndi_hold_devi: pushq %rbp

ricco386 commented Jul 3, 2017

@vrou If ACPI SRAT is not available in your BIOS configuration, disable NUMA/Node interleaving. Otherwise the following message may appear during boot time:
WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group.

@rmustacc
Problem seems to be in startup_modules however I have encountered two different scenarios. If i set break point to startup_end debuger freeze, same with segkmem_gc debuger is also frozen but system seems to be continue booting, until the message in verbose mode:

ramdisk0 at root
ramdisk0 is /ramdisk
root on /ramdisk:a fstype ufs

I have tried to find out the point where debuger is freezing, it seems that setup_ddi looops in cycle and freeze at some point, I have tried to set break point at e_ddi_instance_init and it gets there but debuger prompt is frozen.

Break point set to get_neighbors and cycle run with continue I have end up with frozen debuger. Last message is

<-[mkmdb: stop at ndi_hold_devi
kmdb: target stopped at:
ndi_hold_devi: pushq %rbp
@jclulow

This comment has been minimized.

Show comment
Hide comment
@jclulow

jclulow Jul 3, 2017

Member

It's possible that you can break into the debugger with the system console keyboard with one of the key combinations F1+A, or Shift+Pause. Does that work?

Member

jclulow commented Jul 3, 2017

It's possible that you can break into the debugger with the system console keyboard with one of the key combinations F1+A, or Shift+Pause. Does that work?

@ricco386

This comment has been minimized.

Show comment
Hide comment
@ricco386

ricco386 Jul 4, 2017

@jclulow no once it freeze, it is frozen and I can only turn the machine off with button, no other action is possible (I have tried your key combinations, and also others).

ricco386 commented Jul 4, 2017

@jclulow no once it freeze, it is frozen and I can only turn the machine off with button, no other action is possible (I have tried your key combinations, and also others).

@secult

This comment has been minimized.

Show comment
Hide comment
@secult

secult Jul 17, 2017

I'm reporting a freeze on the same place with:
Lenovo Thinkpad T470p
Intel Core i5-7300HQ
Bios is up to date (v 1.13) from June 2017.
I managed to workaround this point by disabling both Intel SpeedStep technology and CPU Power Management in BIOS.

secult commented Jul 17, 2017

I'm reporting a freeze on the same place with:
Lenovo Thinkpad T470p
Intel Core i5-7300HQ
Bios is up to date (v 1.13) from June 2017.
I managed to workaround this point by disabling both Intel SpeedStep technology and CPU Power Management in BIOS.

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Jul 17, 2017

Unfortunately Intel NUC doesn't have an option to disable SpeedStep. Thus we cannot test it. But seems to me a good hint anyway.

YanChii commented Jul 17, 2017

Unfortunately Intel NUC doesn't have an option to disable SpeedStep. Thus we cannot test it. But seems to me a good hint anyway.

@nerakhon

This comment has been minimized.

Show comment
Hide comment
@nerakhon

nerakhon Jul 28, 2017

Do we know if the broadwell NUCs still work? I am deciding on which NUC to purchase for my home smartos box, so I'd like to know if it is safe to use the broadwell one, or if there is some valid workaround for the skylake/kaby lake ones?

nerakhon commented Jul 28, 2017

Do we know if the broadwell NUCs still work? I am deciding on which NUC to purchase for my home smartos box, so I'd like to know if it is safe to use the broadwell one, or if there is some valid workaround for the skylake/kaby lake ones?

@ricco386

This comment has been minimized.

Show comment
Hide comment
@ricco386

ricco386 Jul 28, 2017

In February I have tried SmartOS on Intel NUC 5I7RYH which has CPU Intel Core i7 5557U 3.4GHz Broadwell and it was working fine.

My colleague @secult has installed Danube Cloud (SmartOS based system) on this NUC and wrote a blog about it.

I have experienced issues only with this newer NUC (Skylake/Kaby Lake) version.

ricco386 commented Jul 28, 2017

In February I have tried SmartOS on Intel NUC 5I7RYH which has CPU Intel Core i7 5557U 3.4GHz Broadwell and it was working fine.

My colleague @secult has installed Danube Cloud (SmartOS based system) on this NUC and wrote a blog about it.

I have experienced issues only with this newer NUC (Skylake/Kaby Lake) version.

@pims

This comment has been minimized.

Show comment
Hide comment
@pims

pims Sep 26, 2017

I’m also interested in purchasing a NUC to run SmartOS.
Did anyone figure this out?

pims commented Sep 26, 2017

I’m also interested in purchasing a NUC to run SmartOS.
Did anyone figure this out?

@gbmeuk

This comment has been minimized.

Show comment
Hide comment
@gbmeuk

gbmeuk Oct 3, 2017

@pims, I run on i3 and i5 fine but did have to update the BIOS on i5. Afterwards, everything works fine.

gbmeuk commented Oct 3, 2017

@pims, I run on i3 and i5 fine but did have to update the BIOS on i5. Afterwards, everything works fine.

@ricco386

This comment has been minimized.

Show comment
Hide comment
@ricco386

ricco386 Oct 5, 2017

@pims both Skylake/Kaby Lake and Broadwell have i3, i5 processors. It was working fine on Broadwell, reported issue is for Skylake/Kaby Lake.
@gbmeuk Which one do you have?

ricco386 commented Oct 5, 2017

@pims both Skylake/Kaby Lake and Broadwell have i3, i5 processors. It was working fine on Broadwell, reported issue is for Skylake/Kaby Lake.
@gbmeuk Which one do you have?

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Oct 5, 2017

Member

I'll have to double check things, but IIRC, the skylake based processors were fine and we've only had issues on kaby lake at the moment.

Member

rmustacc commented Oct 5, 2017

I'll have to double check things, but IIRC, the skylake based processors were fine and we've only had issues on kaby lake at the moment.

@davefinster

This comment has been minimized.

Show comment
Hide comment
@davefinster

davefinster Oct 25, 2017

Seem to have struck this issue with an associates custom-build machine. CPU is a Intel® Core i3-8100 Processor (Coffee Lake) running on top of a Gigabyte Z370M D3H.

Went through and disabled all the shiny BIOS features including Speed Step. This particular CPU doesn't have HyperThreading and of course being a consumer board doesn't have any out of band management. The key combinations suggested by jclulow are also ineffective.

davefinster commented Oct 25, 2017

Seem to have struck this issue with an associates custom-build machine. CPU is a Intel® Core i3-8100 Processor (Coffee Lake) running on top of a Gigabyte Z370M D3H.

Went through and disabled all the shiny BIOS features including Speed Step. This particular CPU doesn't have HyperThreading and of course being a consumer board doesn't have any out of band management. The key combinations suggested by jclulow are also ineffective.

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Oct 25, 2017

Hi,
I have more info about the problem. The frozen console is ACPI related. I've traced down the instruction where the console becomes unresponsive. Steps to reproduce:

  1. at kmdb prompt before kernel init, set
::bp -Dn 1 acpica`AcpiOsWritePort
:c
  1. at breakpoint, set
.+0x95::bp
:c
  1. the problematic instruction is outb (%dx) (with %rdx=0xb2 and %rax=a0)
    Stepping over this instruction freezes the debugger.

Looks like acpica bug?

Anyway I've managed to get over this by rewriting the outb destination (needs to be done several times because of loops):

0>dx
:c

With this, I'm able to continue debugging far behind the place where @ricco386 ended up with unresponsive console.

The actual problem is in the xhci`xhci_controller_takeover function, more specifically, instruction movl %edx,(%rsi) in ddi_io_put32+0x15.

Steps to get there:

::bp -Dn 1 -c '.+0x95::bp -c "0>dx;::cont" ; ::cont' acpica`AcpiOsWritePort
::bp -Dn 1 -c 'ddi_io_put32+0x15::bp ; ::cont' xhci`xhci_controller_takeover
:c

If you need any more info, please ask. Currently, I don't know how to help more.
Thanks you guys in advance for looking at it.

Jan

YanChii commented Oct 25, 2017

Hi,
I have more info about the problem. The frozen console is ACPI related. I've traced down the instruction where the console becomes unresponsive. Steps to reproduce:

  1. at kmdb prompt before kernel init, set
::bp -Dn 1 acpica`AcpiOsWritePort
:c
  1. at breakpoint, set
.+0x95::bp
:c
  1. the problematic instruction is outb (%dx) (with %rdx=0xb2 and %rax=a0)
    Stepping over this instruction freezes the debugger.

Looks like acpica bug?

Anyway I've managed to get over this by rewriting the outb destination (needs to be done several times because of loops):

0>dx
:c

With this, I'm able to continue debugging far behind the place where @ricco386 ended up with unresponsive console.

The actual problem is in the xhci`xhci_controller_takeover function, more specifically, instruction movl %edx,(%rsi) in ddi_io_put32+0x15.

Steps to get there:

::bp -Dn 1 -c '.+0x95::bp -c "0>dx;::cont" ; ::cont' acpica`AcpiOsWritePort
::bp -Dn 1 -c 'ddi_io_put32+0x15::bp ; ::cont' xhci`xhci_controller_takeover
:c

If you need any more info, please ask. Currently, I don't know how to help more.
Thanks you guys in advance for looking at it.

Jan

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Oct 25, 2017

Member

@YanChii Thanks for the additional data here. Can you share the stack at that point that you're skipping over the AcpiOsWritePort?

Regarding the xhci takeover, there are a couple of different things that this could mean. One is that depending on how the keyboard is wired up in this case (presumably over USB) that we lose the ability to actually use the keyboard until the stack brings up enough of the other plumbing.

The second possibility (maybe more likely) is that we're writing a value in such a way that firmware doesn't like it. This may or may not have something to do with the ACPI tables that are present and the ports that we're not writing to. The thing that might help is to know what the arguments to xhci_put32() were at the time of that last ddi_io_put32. I'm assuming that we're in the path that's trying to take over BIOS ownership, but it's possible we're not.

Member

rmustacc commented Oct 25, 2017

@YanChii Thanks for the additional data here. Can you share the stack at that point that you're skipping over the AcpiOsWritePort?

Regarding the xhci takeover, there are a couple of different things that this could mean. One is that depending on how the keyboard is wired up in this case (presumably over USB) that we lose the ability to actually use the keyboard until the stack brings up enough of the other plumbing.

The second possibility (maybe more likely) is that we're writing a value in such a way that firmware doesn't like it. This may or may not have something to do with the ACPI tables that are present and the ports that we're not writing to. The thing that might help is to know what the arguments to xhci_put32() were at the time of that last ddi_io_put32. I'm assuming that we're in the path that's trying to take over BIOS ownership, but it's possible we're not.

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Oct 26, 2017

@rmustacc
Here is the stack:

acpica`AcpiOsWritePort+0x95(b2, a0, 8)
acpica`AcpiHwWritePort+0xd6(b2, a0, 8)
acpica`AcpiHwSetMode+0x200(1)
acpica`AcpiEnable+0x56()
acpica`AcpiEnableSubsystem+0x124(0)
acpica`acpica_init+0x114()
acpidev`acpidev_initialize+0x105()
acpidev`acpidev_boot_probe+0x6d(0)
impl_bus_initialprobe+0x65()
impl_setup_ddi+0xcb()
create_devinfo_tree+0xbd()
setup_ddi+0x13()
startup_modules+0xf6()
startup+0x55()
main+0x3b()
_locore_start+0x90()

The keyboard is connected using USB.
I've tried to set both registers to zero before the steping over movl %edx,(%rsi). It caused the keyboard to keep working.. but I had to do it multiple times (probably some loop) and ended up in double fault state later.

Regarding the parameters of xhci_put32(), the arguments in C code are
xhci_put32(xhcip, XHCI_R_CAP, off, val)
But I don't see any arguments passed in ::stack, nor the values pushed into stack before calling the xhci_put32().

Any hint here?
Thx.
J.

YanChii commented Oct 26, 2017

@rmustacc
Here is the stack:

acpica`AcpiOsWritePort+0x95(b2, a0, 8)
acpica`AcpiHwWritePort+0xd6(b2, a0, 8)
acpica`AcpiHwSetMode+0x200(1)
acpica`AcpiEnable+0x56()
acpica`AcpiEnableSubsystem+0x124(0)
acpica`acpica_init+0x114()
acpidev`acpidev_initialize+0x105()
acpidev`acpidev_boot_probe+0x6d(0)
impl_bus_initialprobe+0x65()
impl_setup_ddi+0xcb()
create_devinfo_tree+0xbd()
setup_ddi+0x13()
startup_modules+0xf6()
startup+0x55()
main+0x3b()
_locore_start+0x90()

The keyboard is connected using USB.
I've tried to set both registers to zero before the steping over movl %edx,(%rsi). It caused the keyboard to keep working.. but I had to do it multiple times (probably some loop) and ended up in double fault state later.

Regarding the parameters of xhci_put32(), the arguments in C code are
xhci_put32(xhcip, XHCI_R_CAP, off, val)
But I don't see any arguments passed in ::stack, nor the values pushed into stack before calling the xhci_put32().

Any hint here?
Thx.
J.

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Oct 26, 2017

I'm including also some screenshots
img_20171026_093904
img_20171026_093926
img_20171025_165852

YanChii commented Oct 26, 2017

I'm including also some screenshots
img_20171026_093904
img_20171026_093926
img_20171025_165852

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Oct 26, 2017

Member

The reason you won't see the values pushed on the stack there is probably because ddi_io_put32 doesn't push a frame pointer. However, knowing where we're blowing up in the xhci driver gives me an idea here. I'm going to put together a change and build an image (and provide a patch) that may address the takeover freeze. I'll try to have something before too long.

Once that's done, I'll try and get back to the acpi issue. Thanks for your help digging into this.

Member

rmustacc commented Oct 26, 2017

The reason you won't see the values pushed on the stack there is probably because ddi_io_put32 doesn't push a frame pointer. However, knowing where we're blowing up in the xhci driver gives me an idea here. I'm going to put together a change and build an image (and provide a patch) that may address the takeover freeze. I'll try to have something before too long.

Once that's done, I'll try and get back to the acpi issue. Thanks for your help digging into this.

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Nov 2, 2017

Member

So, there are a lot of different ways that the takeover can occur. Importantly, it's possible for ACPI to have done this such that the driver doesn't have to. The register for this capability supports 8-bit reads and writes; however, we're always doing a 32-bit write. Which means that we're rewriting the BIOS owned value. As a result, I've experimented by doing a single 8-bit write on the upper byte.

The patch also has some additional logging just to help us understand what's happening. The patch is here:

diff --git a/usr/src/uts/common/io/usb/hcd/xhci/xhci.c b/usr/src/uts/common/io/usb/hcd/xhci/xhci.c
index 84bd1a1..c23f66d 100644
--- a/usr/src/uts/common/io/usb/hcd/xhci/xhci.c
+++ b/usr/src/uts/common/io/usb/hcd/xhci/xhci.c
@@ -1510,9 +1510,22 @@ xhci_controller_takeover(xhci_t *xhcip)
 		return (B_FALSE);
 	}
 
+	xhci_error(xhcip, "^Capability register: 0x%x", val);
 	if (val & XHCI_BIOS_OWNED) {
+		uint8_t own;
+
+		xhci_error(xhcip, "^XHCI is BIOS owned, attempting take over");
+
+		/*
+		 * The USB Legacy support capability (USBLEGSUP) register allows
+		 * for byte-wide accesses. We take advantage of this for trying
+		 * to write the OS owned register. We do this so we don't end up
+		 * having to risk writing the HC BIOS owned bit, which could
+		 * potentially cause problems.
+		 */
 		val |= XHCI_OS_OWNED;
-		xhci_put32(xhcip, XHCI_R_CAP, off, val);
+		own = (val >> 24) & 0xff;
+		xhci_put8(xhcip, XHCI_R_CAP, off + 3, own);
 		if (xhci_check_regs_acc(xhcip) != DDI_FM_OK) {
 			xhci_error(xhcip, "failed to write BIOS take over "
 			    "registers: encountered fatal FM register error");
@@ -1520,6 +1533,7 @@ xhci_controller_takeover(xhci_t *xhcip)
 			    DDI_SERVICE_LOST);
 			return (B_FALSE);
 		}
+		xhci_error(xhcip, "^Made it past the takeover write");
 
 		/*
 		 * Wait up to 5 seconds for things to change. While this number
@@ -1529,6 +1543,7 @@ xhci_controller_takeover(xhci_t *xhcip)
 		 */
 		ret = xhci_reg_poll(xhcip, XHCI_R_CAP, off,
 		    XHCI_BIOS_OWNED | XHCI_OS_OWNED, XHCI_OS_OWNED, 500, 10);
+		xhci_error(xhcip, "^take over register poll completed with %d", ret);
 		if (ret == EIO)
 			return (B_FALSE);
 		if (ret == ETIMEDOUT) {

I've also gone ahead and made a new set of platform images with the vga as the default console which are available here:

Raw Platform
ISO Image
USB Image

Thanks for your patience on this one.

Member

rmustacc commented Nov 2, 2017

So, there are a lot of different ways that the takeover can occur. Importantly, it's possible for ACPI to have done this such that the driver doesn't have to. The register for this capability supports 8-bit reads and writes; however, we're always doing a 32-bit write. Which means that we're rewriting the BIOS owned value. As a result, I've experimented by doing a single 8-bit write on the upper byte.

The patch also has some additional logging just to help us understand what's happening. The patch is here:

diff --git a/usr/src/uts/common/io/usb/hcd/xhci/xhci.c b/usr/src/uts/common/io/usb/hcd/xhci/xhci.c
index 84bd1a1..c23f66d 100644
--- a/usr/src/uts/common/io/usb/hcd/xhci/xhci.c
+++ b/usr/src/uts/common/io/usb/hcd/xhci/xhci.c
@@ -1510,9 +1510,22 @@ xhci_controller_takeover(xhci_t *xhcip)
 		return (B_FALSE);
 	}
 
+	xhci_error(xhcip, "^Capability register: 0x%x", val);
 	if (val & XHCI_BIOS_OWNED) {
+		uint8_t own;
+
+		xhci_error(xhcip, "^XHCI is BIOS owned, attempting take over");
+
+		/*
+		 * The USB Legacy support capability (USBLEGSUP) register allows
+		 * for byte-wide accesses. We take advantage of this for trying
+		 * to write the OS owned register. We do this so we don't end up
+		 * having to risk writing the HC BIOS owned bit, which could
+		 * potentially cause problems.
+		 */
 		val |= XHCI_OS_OWNED;
-		xhci_put32(xhcip, XHCI_R_CAP, off, val);
+		own = (val >> 24) & 0xff;
+		xhci_put8(xhcip, XHCI_R_CAP, off + 3, own);
 		if (xhci_check_regs_acc(xhcip) != DDI_FM_OK) {
 			xhci_error(xhcip, "failed to write BIOS take over "
 			    "registers: encountered fatal FM register error");
@@ -1520,6 +1533,7 @@ xhci_controller_takeover(xhci_t *xhcip)
 			    DDI_SERVICE_LOST);
 			return (B_FALSE);
 		}
+		xhci_error(xhcip, "^Made it past the takeover write");
 
 		/*
 		 * Wait up to 5 seconds for things to change. While this number
@@ -1529,6 +1543,7 @@ xhci_controller_takeover(xhci_t *xhcip)
 		 */
 		ret = xhci_reg_poll(xhcip, XHCI_R_CAP, off,
 		    XHCI_BIOS_OWNED | XHCI_OS_OWNED, XHCI_OS_OWNED, 500, 10);
+		xhci_error(xhcip, "^take over register poll completed with %d", ret);
 		if (ret == EIO)
 			return (B_FALSE);
 		if (ret == ETIMEDOUT) {

I've also gone ahead and made a new set of platform images with the vga as the default console which are available here:

Raw Platform
ISO Image
USB Image

Thanks for your patience on this one.

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Nov 2, 2017

Thanks, I'll try the new image and get back asap.
Jan

YanChii commented Nov 2, 2017

Thanks, I'll try the new image and get back asap.
Jan

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Nov 3, 2017

Now the keyboard freezes at

ddi_io_put8+0x15  (movb %dl,(%rsi)
xhci_put8
xhci_controller_takeover

The actual code makes it further, printing all your debug (and freezes somwhere else). But there's not much I can do without a keyboard.
img_20171103_092143
img_20171103_101826
Mdb to get there:

::bp -Dn 1 -c '.+0x95::bp -c "0>dx;::cont" ; ::cont' acpica`AcpiOsWritePort
::bp -Dn 1 -c 'ddi_io_put8+0x15::bp; ::cont" xhci`xhci_controller_takeover
::cont

Thank you for your help.
Jan

YanChii commented Nov 3, 2017

Now the keyboard freezes at

ddi_io_put8+0x15  (movb %dl,(%rsi)
xhci_put8
xhci_controller_takeover

The actual code makes it further, printing all your debug (and freezes somwhere else). But there's not much I can do without a keyboard.
img_20171103_092143
img_20171103_101826
Mdb to get there:

::bp -Dn 1 -c '.+0x95::bp -c "0>dx;::cont" ; ::cont' acpica`AcpiOsWritePort
::bp -Dn 1 -c 'ddi_io_put8+0x15::bp; ::cont" xhci`xhci_controller_takeover
::cont

Thank you for your help.
Jan

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Nov 3, 2017

Member

Hmm. So likely the change I had isn't actually changing anything and we're getting stuck elsewhere. I'm not entirely sure what the right next step is. I have two different questions to try and figure out some things.

The first is to do the ACPI workaround, but skip the xhci bits. This will be done by adding a -B disable-xhci=true to the boot arguments. I know this'll mean that we don't have a keyboard, but it might help us narrow down what's going on.

A second course of action is to try and turn on moddebug and just see if that gives us more information by chance. What we'll want to do is set moddebug the following way in mdb:

> moddebug/W 0xF0040000

That'll give us a bunch of additional things that are happening. I'm hoping that between the two we might be able to narrow it down. I realize that the fact that we can't use the USB keyboard in this environment in kmdb really hurts -- I'm sorry I don't have a better option quite at this time.

Member

rmustacc commented Nov 3, 2017

Hmm. So likely the change I had isn't actually changing anything and we're getting stuck elsewhere. I'm not entirely sure what the right next step is. I have two different questions to try and figure out some things.

The first is to do the ACPI workaround, but skip the xhci bits. This will be done by adding a -B disable-xhci=true to the boot arguments. I know this'll mean that we don't have a keyboard, but it might help us narrow down what's going on.

A second course of action is to try and turn on moddebug and just see if that gives us more information by chance. What we'll want to do is set moddebug the following way in mdb:

> moddebug/W 0xF0040000

That'll give us a bunch of additional things that are happening. I'm hoping that between the two we might be able to narrow it down. I realize that the fact that we can't use the USB keyboard in this environment in kmdb really hurts -- I'm sorry I don't have a better option quite at this time.

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Nov 7, 2017

Well. Something strange happened. The OS came online into the prompt-config. But it might be an accident because it already happened to me one time before (out of hundred).
I've set -B disable-xhci=true, moddebug and worked around the first acpica problem.
Now I have the OS running and I'm a bit afraid to shut it down :).
And the best thing is that even with disable-xhci=true, the USB keyboard is working.
I'm writing you this in case you'd like to run some commands from the OS itself. If you don't need anything from the OS, I'll try to replicate the successful boot.
Thx.
Jan

YanChii commented Nov 7, 2017

Well. Something strange happened. The OS came online into the prompt-config. But it might be an accident because it already happened to me one time before (out of hundred).
I've set -B disable-xhci=true, moddebug and worked around the first acpica problem.
Now I have the OS running and I'm a bit afraid to shut it down :).
And the best thing is that even with disable-xhci=true, the USB keyboard is working.
I'm writing you this in case you'd like to run some commands from the OS itself. If you don't need anything from the OS, I'll try to replicate the successful boot.
Thx.
Jan

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Nov 8, 2017

Member

That's good to know and really helps us focus our inquiry. So we know that there's something buggy going on in the xhci driver that's probably causing some kind of deadlock (though what, I can't say) or livelock. I'll need a little bit to think how we can better root cause what's happening there. I'll see if I can investigate if there are any other xhci related circumstances that things go awry that we can leverage for this.

I guess in parallel we should really figure out the ACPI workaround and why that's required. Thanks for all your help through this.

Member

rmustacc commented Nov 8, 2017

That's good to know and really helps us focus our inquiry. So we know that there's something buggy going on in the xhci driver that's probably causing some kind of deadlock (though what, I can't say) or livelock. I'll need a little bit to think how we can better root cause what's happening there. I'll see if I can investigate if there are any other xhci related circumstances that things go awry that we can leverage for this.

I guess in parallel we should really figure out the ACPI workaround and why that's required. Thanks for all your help through this.

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Nov 8, 2017

NP, glad to help.
More info: the machine boots only when all 3 factors are in place:

  1. -B disable-xhci=true
  2. ::bp -Dn 1 -c '.+0x95::bp -c "0>dx;::cont" ; ::cont' acpica`AcpiOsWritePort
  3. moddebug/W 0xF0040000

If I omit any of them, it hangs. Yes, it hangs even without moddebug.

And without disable-xhci, the last thing the moddebug prints is this:
img_20171108_170741

Jan

YanChii commented Nov 8, 2017

NP, glad to help.
More info: the machine boots only when all 3 factors are in place:

  1. -B disable-xhci=true
  2. ::bp -Dn 1 -c '.+0x95::bp -c "0>dx;::cont" ; ::cont' acpica`AcpiOsWritePort
  3. moddebug/W 0xF0040000

If I omit any of them, it hangs. Yes, it hangs even without moddebug.

And without disable-xhci, the last thing the moddebug prints is this:
img_20171108_170741

Jan

@jmarcedwards

This comment has been minimized.

Show comment
Hide comment
@jmarcedwards

jmarcedwards Apr 6, 2018

Was this problem every solved on this NUC series?

jmarcedwards commented Apr 6, 2018

Was this problem every solved on this NUC series?

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Apr 6, 2018

I'm not aware of that. The only thing that moved until the last conversation is this:
https://www.mail-archive.com/smartos-discuss@lists.smartos.org/msg05726.html
Jan

YanChii commented Apr 6, 2018

I'm not aware of that. The only thing that moved until the last conversation is this:
https://www.mail-archive.com/smartos-discuss@lists.smartos.org/msg05726.html
Jan

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Apr 6, 2018

Member

No, it hasn't sorry. It's in our queue to finish digging into.

Member

rmustacc commented Apr 6, 2018

No, it hasn't sorry. It's in our queue to finish digging into.

@m-gibson

This comment has been minimized.

Show comment
Hide comment
@m-gibson

m-gibson Apr 16, 2018

In case a +1 is useful, I'm getting this on skylake with post-meltdown bios

i5-6500
MSI Z170-A Pro (bios 7971v1J)

my last moddebug output without disable-xhci

image

m-gibson commented Apr 16, 2018

In case a +1 is useful, I'm getting this on skylake with post-meltdown bios

i5-6500
MSI Z170-A Pro (bios 7971v1J)

my last moddebug output without disable-xhci

image

@jasonbking

This comment has been minimized.

Show comment
Hide comment
@jasonbking

jasonbking Jul 19, 2018

Contributor

This is probably a bit of a long shot, but I've been working on an early boot hang (my system seems to trigger it pretty easily). There are some similarities to the hangs I've experienced (though not exact -- so it's possible it might be the same issue, but is far from definitive). I have test images with the proposed fix for my hang available at:

https://us-east.manta.joyent.com/jbk/public/OS-7079/platform-20180719T001516Z.iso
https://us-east.manta.joyent.com/jbk/public/OS-7079/platform-20180719T001516Z.usb.bz2

NOTE: These images are set to enter kmdb upon an NMI (the SmartOS default is to panic, however having the system panic where my hang was occurring wasn't very useful -- though this may not matter for your system).

If you'd like, feel free to give them a try and see if they work (if you do try them out, please let me know one way or the other what the results are -- as I said, it's probably a long shot that it's the same issue, but might be worth at least giving it a try). It's sufficient if the system comes up to the installer (it's not necessary to install SmartOS if you don't wish to).

Contributor

jasonbking commented Jul 19, 2018

This is probably a bit of a long shot, but I've been working on an early boot hang (my system seems to trigger it pretty easily). There are some similarities to the hangs I've experienced (though not exact -- so it's possible it might be the same issue, but is far from definitive). I have test images with the proposed fix for my hang available at:

https://us-east.manta.joyent.com/jbk/public/OS-7079/platform-20180719T001516Z.iso
https://us-east.manta.joyent.com/jbk/public/OS-7079/platform-20180719T001516Z.usb.bz2

NOTE: These images are set to enter kmdb upon an NMI (the SmartOS default is to panic, however having the system panic where my hang was occurring wasn't very useful -- though this may not matter for your system).

If you'd like, feel free to give them a try and see if they work (if you do try them out, please let me know one way or the other what the results are -- as I said, it's probably a long shot that it's the same issue, but might be worth at least giving it a try). It's sufficient if the system comes up to the installer (it's not necessary to install SmartOS if you don't wish to).

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Jul 26, 2018

Member

Thanks everyone for your patience and assistance to date. I've been digging into this and have arrived at a root cause and with a WIP fix in place am able to boot successfully on a Kaby Lake NUC. The problem basically is a logic bug in the core ACPI processing code that leads to an attempt to recursively grab a mutex which is already acquired and ultimately leads to a hang.

A full analysis is available here: https://smartos.org/bugview/OS-7093. I'm hoping I'll be in a better place for getting test bits out shortly.

Member

rmustacc commented Jul 26, 2018

Thanks everyone for your patience and assistance to date. I've been digging into this and have arrived at a root cause and with a WIP fix in place am able to boot successfully on a Kaby Lake NUC. The problem basically is a logic bug in the core ACPI processing code that leads to an attempt to recursively grab a mutex which is already acquired and ultimately leads to a hang.

A full analysis is available here: https://smartos.org/bugview/OS-7093. I'm hoping I'll be in a better place for getting test bits out shortly.

@jmarcedwards

This comment has been minimized.

Show comment
Hide comment
@jmarcedwards

jmarcedwards Jul 27, 2018

jmarcedwards commented Jul 27, 2018

@nerakhon

This comment has been minimized.

Show comment
Hide comment
@nerakhon

nerakhon Jul 27, 2018

nerakhon commented Jul 27, 2018

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Jul 27, 2018

Member
Member

rmustacc commented Jul 27, 2018

@huang96962

This comment has been minimized.

Show comment
Hide comment
@huang96962

huang96962 commented Jul 28, 2018

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Aug 1, 2018

Member

Just wanted to provide an update on this. We noticed that there were problems on a couple of different platforms that were seeing ACPI errors after this update. While it doesn't occur on the NUCs, we're working to root cause those and understand them before we get wider testing done.

Member

rmustacc commented Aug 1, 2018

Just wanted to provide an update on this. We noticed that there were problems on a couple of different platforms that were seeing ACPI errors after this update. While it doesn't occur on the NUCs, we're working to root cause those and understand them before we get wider testing done.

@jmarcedwards

This comment has been minimized.

Show comment
Hide comment
@jmarcedwards

jmarcedwards Aug 1, 2018

jmarcedwards commented Aug 1, 2018

@huang96962

This comment has been minimized.

Show comment
Hide comment
@huang96962

huang96962 Aug 2, 2018

I uesd acpica version 20171110, it work well last year. This year, I update BIOS, I see APCI errors, but it work well. So I think we should update acpica to newest, we can get it from https://github.com/acpica/acpica/releases.

huang96962 commented Aug 2, 2018

I uesd acpica version 20171110, it work well last year. This year, I update BIOS, I see APCI errors, but it work well. So I think we should update acpica to newest, we can get it from https://github.com/acpica/acpica/releases.

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Aug 2, 2018

Member
Member

rmustacc commented Aug 2, 2018

@xmerlin

This comment has been minimized.

Show comment
Hide comment
@xmerlin

xmerlin Aug 10, 2018

I've the same problem trying smartos 20180621T003454Z with:

  • supermicro X11SRA-RF + Xeon W-2145 Skylake W
  • Asrock z370 extreme4 + intel core i5 8600K

the second one boots disabling acpi (I've not tryed with the first one)

xmerlin commented Aug 10, 2018

I've the same problem trying smartos 20180621T003454Z with:

  • supermicro X11SRA-RF + Xeon W-2145 Skylake W
  • Asrock z370 extreme4 + intel core i5 8600K

the second one boots disabling acpi (I've not tryed with the first one)

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Aug 14, 2018

Member

I've put together the following images to help test this. Note that I've changed things such that it ends up putting a larger amount of ACPI data out to the console. There are both debug and non-debug versions present:

non-debug raw platform
non-debug ISO vga
non-debug ISO ttya
non-debug ISO ttyb
non-debug USB vga
non-debug USB ttya
non-debug USB ttyb

debug raw platform
debug ISO vga
debug ISO ttya
debug ISO ttyb
debug USB vga
debug USB ttya
debug USB ttyb

Please let me know if it works or fails and if it does. It'd help to know what kind of system it is / bios rev if possible. Some of this is available through smbios.

Member

rmustacc commented Aug 14, 2018

I've put together the following images to help test this. Note that I've changed things such that it ends up putting a larger amount of ACPI data out to the console. There are both debug and non-debug versions present:

non-debug raw platform
non-debug ISO vga
non-debug ISO ttya
non-debug ISO ttyb
non-debug USB vga
non-debug USB ttya
non-debug USB ttyb

debug raw platform
debug ISO vga
debug ISO ttya
debug ISO ttyb
debug USB vga
debug USB ttya
debug USB ttyb

Please let me know if it works or fails and if it does. It'd help to know what kind of system it is / bios rev if possible. Some of this is available through smbios.

@fejfighter

This comment has been minimized.

Show comment
Hide comment
@fejfighter

fejfighter Aug 16, 2018

Tried the non-debug USB vga on an X1 carbon 6th gen, and after a screenfull of text related acpi table output, it went to install as expected.

by comparison, lastest PI image (02/08?) got stuck with no output, as did the bloody OmniOSce and OI 2018.04 which is what got me looking at this issue.

as mentioned, latest Lenovo X1 Carbon 6th Gen, latest BIOS, lshw output from Fedora 28


     *-firmware
          description: BIOS
          vendor: LENOVO
          physical id: b
          version: N23ET52W (1.27 )
          date: 07/18/2018
          size: 128KiB
          capacity: 16MiB
          capabilities: pci pnp upgrade shadowing cdboot bootselect edd int13floppy720 int5printscreen int9keyboard int14serial int17printer int10video acpi usb biosbootspecification uefi

if anyone has an idea on getting serial output from this laptop, I'd be happy to provide more info

fejfighter commented Aug 16, 2018

Tried the non-debug USB vga on an X1 carbon 6th gen, and after a screenfull of text related acpi table output, it went to install as expected.

by comparison, lastest PI image (02/08?) got stuck with no output, as did the bloody OmniOSce and OI 2018.04 which is what got me looking at this issue.

as mentioned, latest Lenovo X1 Carbon 6th Gen, latest BIOS, lshw output from Fedora 28


     *-firmware
          description: BIOS
          vendor: LENOVO
          physical id: b
          version: N23ET52W (1.27 )
          date: 07/18/2018
          size: 128KiB
          capacity: 16MiB
          capabilities: pci pnp upgrade shadowing cdboot bootselect edd int13floppy720 int5printscreen int9keyboard int14serial int17printer int10video acpi usb biosbootspecification uefi

if anyone has an idea on getting serial output from this laptop, I'd be happy to provide more info

@xmerlin

This comment has been minimized.

Show comment
Hide comment
@xmerlin

xmerlin Aug 16, 2018

Tried both images on Asrock z370 extreme4 + intel core i5 8600K and boot as expected

bios version 3.10 (10/lug/2018 - the last one)

xmerlin commented Aug 16, 2018

Tried both images on Asrock z370 extreme4 + intel core i5 8600K and boot as expected

bios version 3.10 (10/lug/2018 - the last one)

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Aug 16, 2018

Member

OK, great. I'm glad to hear that. I'm going to kick off a bit more illumos-wide testing to make sure there aren't any additional regressions. I'm hoping that we'll get this back before too long then. Thanks again everyone for your patience and the reports that items are now working that previously didn't is quite encouraging.

@fejfighter the screenful of ACPI related text was purposeful in case there were issues to help folks get a sense of what was happening.

Member

rmustacc commented Aug 16, 2018

OK, great. I'm glad to hear that. I'm going to kick off a bit more illumos-wide testing to make sure there aren't any additional regressions. I'm hoping that we'll get this back before too long then. Thanks again everyone for your patience and the reports that items are now working that previously didn't is quite encouraging.

@fejfighter the screenful of ACPI related text was purposeful in case there were issues to help folks get a sense of what was happening.

@ski2310

This comment has been minimized.

Show comment
Hide comment
@ski2310

ski2310 Aug 16, 2018

Worked on the following systems (that failed previously) -

Asus H110S2 - i3-6100T - bios ver. 3805
MSI C236M - i3-6100T - bios ver. 7972vDA

Also worked on these two existing systems that did not have any previous issues (regression testing).

HP DL320 G6 - X5650 - bios ver. W07
HP ML110 G7 - E3-1220 - bios ver. J01

ski2310 commented Aug 16, 2018

Worked on the following systems (that failed previously) -

Asus H110S2 - i3-6100T - bios ver. 3805
MSI C236M - i3-6100T - bios ver. 7972vDA

Also worked on these two existing systems that did not have any previous issues (regression testing).

HP DL320 G6 - X5650 - bios ver. W07
HP ML110 G7 - E3-1220 - bios ver. J01

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Aug 16, 2018

Member
Member

rmustacc commented Aug 16, 2018

@xmerlin

This comment has been minimized.

Show comment
Hide comment
@xmerlin

xmerlin Aug 16, 2018

It works also on this system that did not have any previous issues:
supermicro x9scm-f + e3-1220v2

xmerlin commented Aug 16, 2018

It works also on this system that did not have any previous issues:
supermicro x9scm-f + e3-1220v2

@fejfighter

This comment has been minimized.

Show comment
Hide comment
@fejfighter

fejfighter Aug 16, 2018

@rmustacc makes sense, would give an insight into where it got locked up, I wasn't sure if it was just printing errors/warnings rather than general debug/info messages.

Do you have a public branch or patches sitting around? I wouldn't mind building an (installable) image from it, noting that it's definitely not production/ops critical

fejfighter commented Aug 16, 2018

@rmustacc makes sense, would give an insight into where it got locked up, I wasn't sure if it was just printing errors/warnings rather than general debug/info messages.

Do you have a public branch or patches sitting around? I wouldn't mind building an (installable) image from it, noting that it's definitely not production/ops critical

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Aug 16, 2018

Member
Member

rmustacc commented Aug 16, 2018

@fejfighter

This comment has been minimized.

Show comment
Hide comment
@fejfighter

fejfighter Aug 16, 2018

I am not, might be time to subscribe,

but I have found the email on the archives, thanks again

fejfighter commented Aug 16, 2018

I am not, might be time to subscribe,

but I have found the email on the archives, thanks again

@snltd

This comment has been minimized.

Show comment
Hide comment
@snltd

snltd Aug 22, 2018

Gigabyte H110N board, with i5-6400 CPU. American Megatrends BIOS version F21 06/09/2017.

Boot locks up using a normal SmartOS image, but the system boots and runs just fine with the non-debug VGA build. Great job.

snltd commented Aug 22, 2018

Gigabyte H110N board, with i5-6400 CPU. American Megatrends BIOS version F21 06/09/2017.

Boot locks up using a normal SmartOS image, but the system boots and runs just fine with the non-debug VGA build. Great job.

@xmerlin

This comment has been minimized.

Show comment
Hide comment
@xmerlin

xmerlin Aug 28, 2018

supermicro X11SRA-RF + Xeon W-2145 Skylake W -> runs fine with the non debug VGA build

xmerlin commented Aug 28, 2018

supermicro X11SRA-RF + Xeon W-2145 Skylake W -> runs fine with the non debug VGA build

@YanChii

This comment has been minimized.

Show comment
Hide comment
@YanChii

YanChii Aug 28, 2018

Hi @rmustacc,

We have successfully booted the Intel NUC from the beginning of this issue. This image makes it seamlessly to the prompt-config:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-vga.usb.bz2

Thank you!

Jan

YanChii commented Aug 28, 2018

Hi @rmustacc,

We have successfully booted the Intel NUC from the beginning of this issue. This image makes it seamlessly to the prompt-config:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-vga.usb.bz2

Thank you!

Jan

@rmustacc

This comment has been minimized.

Show comment
Hide comment
@rmustacc

rmustacc Sep 1, 2018

Member

Thanks for the testing everyone. We've put the changes back to illumos-joyent today. This will be in the next release (in about two weeks). You can find it at joyent/illumos-joyent@1d4a584.

I appreciate the patience with this one. I know it's been a major pain.

Member

rmustacc commented Sep 1, 2018

Thanks for the testing everyone. We've put the changes back to illumos-joyent today. This will be in the next release (in about two weeks). You can find it at joyent/illumos-joyent@1d4a584.

I appreciate the patience with this one. I know it's been a major pain.

@rmustacc rmustacc closed this Sep 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment