-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Kernel crashed when using GPU #6399
Comments
Thanks for using LightGBM and for the detailed report. Sorry you're running into this. Could please provide a few more details that'd help us to investigate this?
It'd also help if you could make this example more minimal. For example:
Those sorts of things would help to narrow down the source of the problem. |
@jameslamb Thanks for your quick reply, I will provide the relevant information:
R: NVIDIA GeForce RTX 3090. It works fine when training DNN or other ML models with GPU (like xgboost).
$ uname -a
Linux master 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/*release
CentOS Linux release 7.5.1804 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.5.1804 (Core)
CentOS Linux release 7.5.1804 (Core)
I removed the
I've tried setting only the most basic parameters:
The same when using the sklearn API:
As before, replacing
|
Thanks for reporting this. If you are using a single NVIDIA GPU for training, could you please try with our new CUDA version instead of the legacy GPU version (with -DUSE_CUDA=ON instead of -DUSE_GPU=ON)? It should be faster. |
@shiyu1994 Cmake failed when using
I tried to downgrade
Is it because my gcc version is still wrong, or should I modify some files? |
I removed the Replacing Thanks for the advice. |
Description
Kernel crash occurs in Jupyter Notebook when running LightGBM with GPU support enabled on a small dataset (~5MB). This issue arises on a remote Linux server, not on a local setup.
Reproducible example
The following is related code:
Output:
Jupyter notebook log does not have very valuable information:
The kernel crash happens specifically when the
'device': 'gpu'
parameter is set in the LightGBM configuration. Disabling GPU support allows the code to run correctly.Environment info
LightGBM version:
$ pip list | grep lightgbm lightgbm 4.3.0.99
I followed the documentation to install LightGBM with GPU Support:
cd ../ sh ./build-python.sh install --precompile
The issue seems related specifically to GPU utilization. Attempts to adjust
gpu_device_id
andgpu_platform_id
settings did not resolve the problem. Is there a recommended approach to debug or fix this, or might there have been a misstep in the GPU installation or compilation process?The text was updated successfully, but these errors were encountered: