[Python] GPU not working python package #1028

charlesmilk · 2017-11-01T23:35:36Z

Please search your question on previous issues, stackoverflow or other search engines before you open a new one.

For bugs and unexpected issues, please provide following information, so that we could reproduce on our system.

Environment info

Operating System: Ubuntu 16.04
CPU: i7, Nvidia 1060
C++/Python/R version: Python 2.7

Error Message:

LightGBMError: bin size 5858 cannot run on GPU

Reproducible examples

lgb.train({'device':'gpu'}, ds)

Steps to reproduce

Hi, I am sorry if this is already been answered but I did not find the answer for this. I was able to install the gpu version of lightgbm and I ran this with sucess: ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc

However, with all the defaults, when I try to use the python package, with gpu support, this error occurs. Any idea of what might be causing this? Thank you.

guolinke · 2017-11-02T01:57:56Z

@up201007037 how did you install the python package ?

chivee · 2017-11-02T04:57:53Z

@up201007037 please make sure that the gpu support was enabled via pip install lightgbm --install-option=--gpu
more information : https://pypi.python.org/pypi/lightgbm

charlesmilk · 2017-11-02T08:44:32Z

@guolinke I installed via github. Then I did python setup.py install. The cpu version works fine and the gpu version also works (but not with python).
I uninstalled the version from github and did pip install lightgbm --install-option=--gpu and when I try to import lightgbm the following error occurs:

OSError: /home/carlos/anaconda2/lib/python2.7/site-packages/lightgbm/lib_lightgbm.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

Thank you so much, and I am sorry for the dumb question.

chivee · 2017-11-02T09:12:51Z

checking your openGL version using
ls -l /usr/lib64 | grep -i opencl

make sure you are using the same version of header file and GPU drivers.

and try to compile from source:

https://github.com/Microsoft/LightGBM/blob/master/docs/GPU-Tutorial.rst#install-python-interface-optional

charlesmilk · 2017-11-02T13:40:04Z

I ran ls -l /usr/local/cuda/lib64/libOpenCL.so | grep -i opencl and the result is lrwxrwxrwx 1 root root 14 Jan 26 2017 /usr/local/cuda/lib64/libOpenCL.so -> libOpenCL.so.1

I do not have the directory lib64 under /usr. I do have usr/lib32/nvidia-384 but your command does not return results under that folder.

I built from source. I did:

git clone --recursive https://github.com/Microsoft/LightGBM
cd LightGBM
mkdir build ; cd build
cmake -DUSE_GPU=1 ..
make -j$(nproc)
cd ..

Then I went to the python folder and ran:

python setup.py install --precompile

I can import and run lightgbm in python, but not with the gpu it gives that error.

I also tried to modify the setup.py to

cmake_cmd = ["cmake", "../compile/"]
    if use_gpu:
        cmake_cmd.append("-DUSE_GPU=ON")
	cmake_cmd.append("-DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so")

And then python setup.py install --gpu

I remember that I already had CUDA installed but I got some kind of error in the compile and I did sudo apt-get install libboost-all-dev and was able to compile and install.
I also ran with gpu support successfully ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc.
But when I try to use the python package it gives that error. If I install via pip I got the other error of OpenCL.

Thank you so much for your help.

charlesmilk · 2017-11-02T18:55:26Z

I uninstalled lightgbm via pip uninstall and then:

git clone --recursive https://github.com/Microsoft/LightGBM
cd ./LightGBM
mkdir build; cd build
sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..
sudo make -j$(nproc)
cd ../python-package; python setup.py install --precompile

When i run sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ .. I get the following messages:

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /usr/local/cuda-8.0/lib64/libOpenCL.so (found version "1.2")
-- OpenCL include directory:/usr/local/cuda-8.0/include
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/carlos/Desktop/LightGBM/build

I was able to install with sucess, but when I run:

lgb.train({'device':'gpu'}, ds)

The same error occurs:
LightGBMError: bin size 5858 cannot run on GPU

I am running that with all the defaults... This is the terminal message:

`[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 14838
[LightGBM] [Info] Number of data: 1000000, number of used features: 22
[LightGBM] [Fatal] bin size 5858 cannot run on GPU

With that installation method I am able to run: ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading data in 18.109915 seconds
[LightGBM] [Info] Number of positive: 5564616, number of negative: 4935384
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 1535
[LightGBM] [Info] Number of data: 10500000, number of used features: 28
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1060, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 64 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 12
[LightGBM] [Info] 28 dense feature groups (280.38 MB) transfered to GPU in 0.212309 secs. 0 sparse feature groups.
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.771843
[LightGBM] [Info] 1.140919 seconds elapsed, finished iteration 1

Second method:
git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM
mkdir build ; cd build
cmake -DUSE_GPU=1 ..
make -j4

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.0")
-- OpenCL include directory:/usr/include
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/carlos/Desktop/LightGBM/build

Since we do not specify the CL version we are not using the cuda version.

Then: python setup.py install --gpu
When i try to import lightgbm the following error occurs:

OSError: /home/carlos/anaconda2/lib/python2.7/site-packages/lightgbm/lib_lightgbm.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

StrikerRUS · 2017-11-02T19:56:28Z

@up201007037 It seems that this answer #715 (comment) helped to 4 people, maybe you'll be the happy 5th one 😄 .
#902 is another issue where you could find the solution.

charlesmilk · 2017-11-02T20:04:31Z

@StrikerRUS I already followed all the steps described on those threads. It still does not work. This is not a problem of OpenCL I guess...
update: I am also able to run the gpu version with python in some datasets.

I am using the features from a pandas dataframe as categorical and I am not doing one hot encoding as suggested.

StrikerRUS · 2017-11-02T20:10:18Z

Then maybe @huanzhang12 have some thoughts about this situation.

charlesmilk · 2017-11-02T22:17:25Z

Thank you @StrikerRUS for your thoughts on this.
I am running with the gpu support. I have removed some features... and it works...
I am trying to run this: https://www.kaggle.com/kamilkk/simple-fast-lgbm-0-6685/code and the error occurs. If I remove the feature artist_name, composer and lyricist it runs good... Does anyone knows why?

chivee · 2017-11-03T04:52:36Z

@up201007037 the previous error that you encounter is because of that linking error, which has already solved by linking to the right opengl.
And the second error you have faced is that our GPU version did't support bin size more than 255. for the kaggle code that you are running, most of it's feature are sparse feature, which will leads to lots of bin

charlesmilk · 2017-11-03T17:16:39Z

Thank you @chivee. I would like to just give a suggestion to improve the documentation that is: if someone has CUDA installed then should run this:
sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..

Thank you for your attention and help!

chivee closed this as completed Nov 2, 2017

chivee reopened this Nov 2, 2017

chivee mentioned this issue Nov 4, 2017

[docs] Update GPU-Tutorial.rst #1037

Merged

chivee closed this as completed in #1037 Nov 6, 2017

Mtale mentioned this issue Sep 30, 2018

Training on GPU fails (OSError: exception: access violation) #1717

Open

lock bot locked as resolved and limited conversation to collaborators Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] GPU not working python package #1028

[Python] GPU not working python package #1028

charlesmilk commented Nov 1, 2017 •

edited

guolinke commented Nov 2, 2017

chivee commented Nov 2, 2017

charlesmilk commented Nov 2, 2017

chivee commented Nov 2, 2017 •

edited

charlesmilk commented Nov 2, 2017 •

edited

charlesmilk commented Nov 2, 2017 •

edited

StrikerRUS commented Nov 2, 2017

charlesmilk commented Nov 2, 2017

StrikerRUS commented Nov 2, 2017

charlesmilk commented Nov 2, 2017 •

edited

chivee commented Nov 3, 2017

charlesmilk commented Nov 3, 2017

[Python] GPU not working python package #1028

[Python] GPU not working python package #1028

Comments

charlesmilk commented Nov 1, 2017 • edited

Environment info

Error Message:

Reproducible examples

Steps to reproduce

guolinke commented Nov 2, 2017

chivee commented Nov 2, 2017

charlesmilk commented Nov 2, 2017

chivee commented Nov 2, 2017 • edited

charlesmilk commented Nov 2, 2017 • edited

charlesmilk commented Nov 2, 2017 • edited

StrikerRUS commented Nov 2, 2017

charlesmilk commented Nov 2, 2017

StrikerRUS commented Nov 2, 2017

charlesmilk commented Nov 2, 2017 • edited

chivee commented Nov 3, 2017

charlesmilk commented Nov 3, 2017

charlesmilk commented Nov 1, 2017 •

edited

chivee commented Nov 2, 2017 •

edited

charlesmilk commented Nov 2, 2017 •

edited

charlesmilk commented Nov 2, 2017 •

edited

charlesmilk commented Nov 2, 2017 •

edited