-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[possibly issue] Watchdog detected hard LOCKUP #431
Comments
@henry0312 |
Thank you for your response. |
@henry0312 any updates? |
I'm sorry for late response. |
@henry0312 |
Maybe yes. |
Anyway, I believe that it is good to remove |
@henry0312 It depends. LightGBM v2 benefits (conditionally) a bit from using The difference is about 1%, which is large when taking into account trainings which last hours/days. @guolinke I think we can remove |
We shuldn't provide optimization depend on cpu architecture in compile. |
@henry0312 If your CPU doesn't have Intel Core 2 architecture instructions, then your CPU must be extremely old (so old that it is older than a Bobcat, which is AMD architecture name in 2011).
And it is beneficial for AMD to optimize the way for Intel Core 2, than just nothing (brand name does not matter). If it was untrue, this would be the same as saying AVX2 instructions are useless on Haswell and Excavator (AVX instructions are the most important breakthrough CPU instructions for Data Science, and there's a reason Intel has a separate clock rate on their CPUs dedicated to AVX instructions). |
@Laurae2 Thank you for your explaining. It seems that I misunderstood some of optimization of gcc. |
@henry0312 If you run LightGBM on <your CPU> microarchitecture (like AMD Ryzen), it will be as if you compiled the code for It is the best way to maintain compatibility (exploiting only MMX + SSE (1, 2, 3, E3) instructions) while getting a bit more performance for any CPU (exploiting ALL the features of your CPU), unless the additional instructions are useless for the code to compile for the compiler. |
I can confirm this is not related to |
Virtualized Ubuntu 16.04 (baremetal host = Ubuntu 16.04), same error but different CPU. It happened only once and I had to fully wipe my server to get back control over it (it would not boot anymore). Note: it crashed both the virtual machine + the host at the same time, not only the virtual machine. I noticed 40 threads with a large maximum number of leaves can result in awfully slow computer on rare occasions (so slow it becomes hardly responsive - this happens very randomly) on CPU only. |
@Laurae2 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1596866 |
Unfortunately, the same error happened a day ago, and thus |
However, in my environment, I confirmed that this problem happend on Ubuntu 17.04 with Intel Turbo Boost Max Technology 3.0 😢 |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
After 66b7f03, my system becomes to be crashed while running heavy task (using 20 threads for learning for several hours).
A kernel log says "NMI watchdog: Watchdog detected hard LOCKUP on cpu 10".
I'm not sure that the commit has a serious bug, but I would appreciate it if you could look into that for me.
Environment info
Operating System: Ubuntu 16.04.2 (kernel 4.4.0-71-generic and 4.4.0-72-generic)
CPU: Intel Core i7-6950X
C++/Python/R version: Python 3.5.3
The text was updated successfully, but these errors were encountered: