Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] GPU not working python package #1028

Closed
charlesmilk opened this issue Nov 1, 2017 · 12 comments · Fixed by #1037
Closed

[Python] GPU not working python package #1028

charlesmilk opened this issue Nov 1, 2017 · 12 comments · Fixed by #1037

Comments

@charlesmilk
Copy link

charlesmilk commented Nov 1, 2017

Please search your question on previous issues, stackoverflow or other search engines before you open a new one.

For bugs and unexpected issues, please provide following information, so that we could reproduce on our system.

Environment info

Operating System: Ubuntu 16.04
CPU: i7, Nvidia 1060
C++/Python/R version: Python 2.7

Error Message:

LightGBMError: bin size 5858 cannot run on GPU

Reproducible examples

lgb.train({'device':'gpu'}, ds)

Steps to reproduce

Hi, I am sorry if this is already been answered but I did not find the answer for this. I was able to install the gpu version of lightgbm and I ran this with sucess: ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc

However, with all the defaults, when I try to use the python package, with gpu support, this error occurs. Any idea of what might be causing this? Thank you.

@guolinke
Copy link
Collaborator

guolinke commented Nov 2, 2017

@up201007037 how did you install the python package ?

@chivee
Copy link
Collaborator

chivee commented Nov 2, 2017

@up201007037 please make sure that the gpu support was enabled via pip install lightgbm --install-option=--gpu
more information : https://pypi.python.org/pypi/lightgbm

@chivee chivee closed this as completed Nov 2, 2017
@chivee chivee reopened this Nov 2, 2017
@charlesmilk
Copy link
Author

@guolinke I installed via github. Then I did python setup.py install. The cpu version works fine and the gpu version also works (but not with python).
I uninstalled the version from github and did pip install lightgbm --install-option=--gpu and when I try to import lightgbm the following error occurs:

OSError: /home/carlos/anaconda2/lib/python2.7/site-packages/lightgbm/lib_lightgbm.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

Thank you so much, and I am sorry for the dumb question.

@chivee
Copy link
Collaborator

chivee commented Nov 2, 2017

checking your openGL version using
ls -l /usr/lib64 | grep -i opencl

make sure you are using the same version of header file and GPU drivers.

and try to compile from source:

https://github.com/Microsoft/LightGBM/blob/master/docs/GPU-Tutorial.rst#install-python-interface-optional

@charlesmilk
Copy link
Author

charlesmilk commented Nov 2, 2017

I ran ls -l /usr/local/cuda/lib64/libOpenCL.so | grep -i opencl and the result is lrwxrwxrwx 1 root root 14 Jan 26 2017 /usr/local/cuda/lib64/libOpenCL.so -> libOpenCL.so.1

I do not have the directory lib64 under /usr. I do have usr/lib32/nvidia-384 but your command does not return results under that folder.

I built from source. I did:

git clone --recursive https://github.com/Microsoft/LightGBM
cd LightGBM
mkdir build ; cd build
cmake -DUSE_GPU=1 ..
make -j$(nproc)
cd ..

Then I went to the python folder and ran:

python setup.py install --precompile

I can import and run lightgbm in python, but not with the gpu it gives that error.

I also tried to modify the setup.py to

cmake_cmd = ["cmake", "../compile/"]
    if use_gpu:
        cmake_cmd.append("-DUSE_GPU=ON")
	cmake_cmd.append("-DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so")

And then python setup.py install --gpu

I remember that I already had CUDA installed but I got some kind of error in the compile and I did sudo apt-get install libboost-all-dev and was able to compile and install.
I also ran with gpu support successfully ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc.
But when I try to use the python package it gives that error. If I install via pip I got the other error of OpenCL.

Thank you so much for your help.

@charlesmilk
Copy link
Author

charlesmilk commented Nov 2, 2017

I uninstalled lightgbm via pip uninstall and then:

git clone --recursive https://github.com/Microsoft/LightGBM
cd ./LightGBM
mkdir build; cd build
sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..
sudo make -j$(nproc)
cd ../python-package; python setup.py install --precompile

When i run sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ .. I get the following messages:

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /usr/local/cuda-8.0/lib64/libOpenCL.so (found version "1.2")
-- OpenCL include directory:/usr/local/cuda-8.0/include
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/carlos/Desktop/LightGBM/build

I was able to install with sucess, but when I run:

lgb.train({'device':'gpu'}, ds)

The same error occurs:
LightGBMError: bin size 5858 cannot run on GPU

I am running that with all the defaults... This is the terminal message:

`[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 14838
[LightGBM] [Info] Number of data: 1000000, number of used features: 22
[LightGBM] [Fatal] bin size 5858 cannot run on GPU

With that installation method I am able to run: ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading data in 18.109915 seconds
[LightGBM] [Info] Number of positive: 5564616, number of negative: 4935384
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 1535
[LightGBM] [Info] Number of data: 10500000, number of used features: 28
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1060, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 64 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 12
[LightGBM] [Info] 28 dense feature groups (280.38 MB) transfered to GPU in 0.212309 secs. 0 sparse feature groups.
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.771843
[LightGBM] [Info] 1.140919 seconds elapsed, finished iteration 1

Second method:
git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM
mkdir build ; cd build
cmake -DUSE_GPU=1 ..
make -j4

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.0")
-- OpenCL include directory:/usr/include
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/carlos/Desktop/LightGBM/build

Since we do not specify the CL version we are not using the cuda version.

Then: python setup.py install --gpu
When i try to import lightgbm the following error occurs:

OSError: /home/carlos/anaconda2/lib/python2.7/site-packages/lightgbm/lib_lightgbm.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

@StrikerRUS
Copy link
Collaborator

@up201007037 It seems that this answer #715 (comment) helped to 4 people, maybe you'll be the happy 5th one 😄 .
#902 is another issue where you could find the solution.

@charlesmilk
Copy link
Author

@StrikerRUS I already followed all the steps described on those threads. It still does not work. This is not a problem of OpenCL I guess...
update: I am also able to run the gpu version with python in some datasets.

I am using the features from a pandas dataframe as categorical and I am not doing one hot encoding as suggested.

@StrikerRUS
Copy link
Collaborator

Then maybe @huanzhang12 have some thoughts about this situation.

@charlesmilk
Copy link
Author

charlesmilk commented Nov 2, 2017

Thank you @StrikerRUS for your thoughts on this.
I am running with the gpu support. I have removed some features... and it works...
I am trying to run this: https://www.kaggle.com/kamilkk/simple-fast-lgbm-0-6685/code and the error occurs. If I remove the feature artist_name, composer and lyricist it runs good... Does anyone knows why?

@chivee
Copy link
Collaborator

chivee commented Nov 3, 2017

@up201007037 the previous error that you encounter is because of that linking error, which has already solved by linking to the right opengl.
And the second error you have faced is that our GPU version did't support bin size more than 255. for the kaggle code that you are running, most of it's feature are sparse feature, which will leads to lots of bin

@charlesmilk
Copy link
Author

Thank you @chivee. I would like to just give a suggestion to improve the documentation that is: if someone has CUDA installed then should run this:
sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..

Thank you for your attention and help!

@lock lock bot locked as resolved and limited conversation to collaborators Mar 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants