Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] [setup] improving installation #880

Merged
merged 33 commits into from Sep 8, 2017
Merged

[python] [setup] improving installation #880

merged 33 commits into from Sep 8, 2017

Conversation

StrikerRUS
Copy link
Collaborator

@StrikerRUS StrikerRUS commented Sep 1, 2017

This PR makes installation of python package more clear.
Should fix #874 .
Adds possibility to install without CMake on Windows from existing .sln-file and Visual Studio.

Am I right that existing solution file hasn't configuration with GPU support?

@msftclas
Copy link

msftclas commented Sep 1, 2017

@StrikerRUS,
Thanks for having already signed the Contribution License Agreement. Your agreement was validated by Microsoft. We will now review your pull request.
Thanks,
Microsoft Pull Request Bot

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 1, 2017

For what this line is responsible?

In my case it just creates lightgbm folder in the root of Anaconda directory and copies there .dll file:

...
running install_data
copying lightgbm\lib_lightgbm.dll -> C:\Program Files\Anaconda3\lightgbm

@guolinke
Copy link
Collaborator

guolinke commented Sep 2, 2017

@StrikerRUS it points to a empty line

logger.warning("Compilation with MSBuild from existing solution file failed.")
else:
os.chdir("..")
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to avoid return

@guolinke
Copy link
Collaborator

guolinke commented Sep 2, 2017

this PR looks good to me.
@wxchan any comments ?

@StrikerRUS
Copy link
Collaborator Author

@guolinke Sorry about empty line - I forgot to change the line after pushing new code.

I'm talking about this one

def run(self):
    if not self.precompile:
        copy_files(use_gpu=self.gpu)
        compile_cpp(use_mingw=self.mingw, use_gpu=self.gpu)
--->self.distribution.data_files = [('lightgbm', find_lib())]<---
    install.run(self)

As I said before, while installation it copies dll files into strange folder and while creating wheel it copies dll into wheel\lightgbm-2.0.6.data\data\lightgbm folder.

@StrikerRUS
Copy link
Collaborator Author

@guolinke Please answer

Am I right that existing solution file hasn't configuration with GPU support?

@guolinke
Copy link
Collaborator

guolinke commented Sep 2, 2017

@StrikerRUS
yeah, the GPU version need to be built by cmake now.

@StrikerRUS
Copy link
Collaborator Author

I suppose that this if statement is unnecessary

--->if os.path.isfile(os.path.join(dir_path, 'lightgbm', 'VERSION.txt')):<---
           version = open(os.path.join(dir_path, 'lightgbm', 'VERSION.txt')).read().strip()

since it'll anyway fail on this line with NameError

setup(name='lightgbm',
   --->version=version,<---
      ...

shutil.rmtree(file_path)
if os.path.isdir(path):
contents = os.listdir(path)
for file in contents:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look like file is a built-in, can you help rename it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan Sure.

shutil.rmtree(file_path)


def silent_call(cmd):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure about what's this function for, why makes it silent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan This PR was conceived as fix to #874

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StrikerRUS ok, got it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function be designed like def silent_call(cmd, raise_error=True, error_msg='')? In this case, if cmake_cmd fails, you don't need to call make_cmd then.

Copy link
Collaborator Author

@StrikerRUS StrikerRUS Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan
It already has described behavior. In case of any error it returns 1 and you should just check returned value.

Do you mean that I should change this code

status = silent_call(cmake_cmd + ["-G", "MinGW Makefiles"])
status += silent_call(["mingw32-make.exe", "_lightgbm"])
if status != 0:
    raise Exception('Please install CMake and MinGW first')

to

status = silent_call(cmake_cmd + ["-G", "MinGW Makefiles"])
if status != 0:
    raise Exception('Please install CMake first')
status = silent_call(["mingw32-make.exe", "_lightgbm"])
if status != 0:
    raise Exception('Please install MinGW first')

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think it's better, and you can move raise Exception into function to avoid duplicates, but it's trivial, up to you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan Moving raising Exception into the function will cause dozens of try/except into the main code and will make code messy.

Copy link
Contributor

@wxchan wxchan Sep 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't sth like below work? why need try/except in main code? not sure, LGTM now, it's up to you.

def silent_call(cmd, raise_error=True, error_msg='')
	try:
		// do sth
		return 0
	except:
		if raise_error:
			raise Exception(error_msg)
		return 1

silent_call(cmake_cmd, error_msg='Please install CMake first')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan It seems you are right! raise_error will help to deal with such situation
if status == 0 and os.path.exists(lib_path):

@StrikerRUS
Copy link
Collaborator Author

@wxchan Please comment on #880 (comment) and #880 (comment)

os.system(cmake_cmd + " ../lightgbm/")
build_cmd = "mingw32-make.exe _lightgbm"
logger.info("Starting to compile with CMake and MinGW.")
status = silent_call(cmake_cmd + ["-G", "MinGW Makefiles"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name status is a little confusing, can it be succeed or fail or sth like this? (I don't read through all codes because I am not sure about those VS install logic).

Copy link
Collaborator Author

@StrikerRUS StrikerRUS Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan Will exitcode be OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this.

@wxchan
Copy link
Contributor

wxchan commented Sep 2, 2017

@guolinke Sorry about empty line - I forgot to change the line after pushing new code.

I'm talking about this one

def run(self):
if not self.precompile:
copy_files(use_gpu=self.gpu)
compile_cpp(use_mingw=self.mingw, use_gpu=self.gpu)
--->self.distribution.data_files = [('lightgbm', find_lib())]<---
install.run(self)
As I said before, while installation it copies dll files into strange folder and while creating wheel it copies dll into wheel\lightgbm-2.0.6.data\data\lightgbm folder.

To be honest, I am not sure what some of the functions in distutils and setuptools do, like self.distribution.data_files. I just tried to make the installation work in most of common cases, under the support of test cases. If you are sure about what it does, you can alter it as you want.

I suppose that this if statement is unnecessary

--->if os.path.isfile(os.path.join(dir_path, 'lightgbm', 'VERSION.txt')):<---
version = open(os.path.join(dir_path, 'lightgbm', 'VERSION.txt')).read().strip()
since it'll anyway fail on this line with NameError

setup(name='lightgbm',
--->version=version,<---
...

@henry0312 can you help check this?

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 2, 2017

To be honest, I am not sure what some of the functions in distutils and setuptools do, like self.distribution.data_files. I just tried to make the installation work in most of common cases, under the support of test cases. If you are sure about what it does, you can alter it as you want.

@wxchan To be more precise, this command does following:

running install_data
creating build/bdist.linux-x86_64/wheel/lightgbm-2.0.5.data
creating build/bdist.linux-x86_64/wheel/lightgbm-2.0.5.data/data
creating build/bdist.linux-x86_64/wheel/lightgbm-2.0.5.data/data/lightgbm
copying lightgbm/lib_lightgbm.so -> build/bdist.linux-x86_64/wheel/lightgbm-2.0.5.data/data/lightgbm
running install_data
creating /home/travis/miniconda/lightgbm
copying ../lib_lightgbm.so -> /home/travis/miniconda/lightgbm

So, I suppose I can safely delete the line, because, as I understand, the library file should be located into the same directory as python files on user's machine, not into the separate one.

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 2, 2017

I'm not sure do we need to create the python wheel with .dll-file compiled by MinGW while Appveyor tests.

At present, Appveyor MINGW job just checks ability to execute compiled by MinGW dll, but all artifacts are still generated by MSVC.

@henry0312
Copy link
Contributor

I installed Python 2.7.13, 3.4.6 and 3.5.3 in my OSX, so I'll check the tests later if necessary (I'm busy today).

@StrikerRUS
Copy link
Collaborator Author

@guolinke I wanted to say, that just adding print made Linux + Python 3.5 successful:
Before:
image

After adding print:
image

I has only one though that it somehow affects random generator.

@StrikerRUS
Copy link
Collaborator Author

@henry0312 Thanks for the tip! Will use it next time.
But in case of error we already can see the output:
https://travis-ci.org/StrikerRUS/LightGBM/jobs/272676410#L1923

@guolinke
Copy link
Collaborator

guolinke commented Sep 7, 2017

@StrikerRUS maybe the bug is caused by python or sklearn ?

@StrikerRUS
Copy link
Collaborator Author

@guolinke I'm trying to check it by comparing datasets passed to the test in different environments.

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 7, 2017

@guolinke I've created two python environments on my machine: 3.5 and 3.6. After that I've copied the same dll in both of them (I took dll from successful build artifacts) and then launched the same task (from logs):

import sys
sys.version

import lightgbm as lgb
clf = lgb.sklearn.LGBMClassifier(min_data=1, min_data_in_bin=1)
X = [[ 0.5488135,   0.71518937,  0.60276338],
 [ 0.54488318,  0.4236548,   0.64589411],
 [ 0.43758721,  0.891773,    0.96366276],
 [ 0.38344152,  0.79172504,  0.52889492],
 [ 0.56804456,  0.92559664,  0.07103606],
 [ 0.0871293,   0.0202184,   0.83261985],
 [ 0.77815675,  0.87001215,  0.97861834],
 [ 0.79915856,  0.46147936,  0.78052918],
 [ 0.11827443,  0.63992102,  0.14335329],
 [ 0.94466892,  0.52184832,  0.41466194]]
y = [0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
clf.fit(X, y)

In both environments everything seems to be OK. Then I open sklearn.py in standard Windows Notepad, add new line (no matter where), delete it (reset changes) and press Ctrl + S, after that the same code fails with known error.

@Laurae2
Copy link
Contributor

Laurae2 commented Sep 7, 2017

@StrikerRUS You will mix CRLF and LF for line breaks when using Windows Notepad.

@StrikerRUS
Copy link
Collaborator Author

@Laurae2 Is there any automatic method to fix it?

@guolinke
Copy link
Collaborator

guolinke commented Sep 8, 2017

@StrikerRUS
you can use other editor, e.g. sublime or vs code.

@StrikerRUS
Copy link
Collaborator Author

@guolinke Sure, next time I wouldn't open with Notepad.
But what to do with the current situation? Do you think that errors on Python 3.5 and MinGW are really caused by EOL?

@guolinke
Copy link
Collaborator

guolinke commented Sep 8, 2017

@StrikerRUS I am not sure, you can run the test locally ?

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 8, 2017

@guolinke As I said after saving sklearn.py with Notepad test changes its behavior. I forget to add in that message this process is cyclic: no-error, Ctrl + S, error, Ctrl + S, no-error, ...

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 8, 2017

This may also explain that adding print changed the result of test on Linux (#880 (comment)).

However, my .gitconfig has autocrlf = true as it's advised here.

@guolinke
Copy link
Collaborator

guolinke commented Sep 8, 2017

@StrikerRUS
The error is the "cannot construct dataset" ?

@StrikerRUS
Copy link
Collaborator Author

@guolinke yeah.

@guolinke
Copy link
Collaborator

guolinke commented Sep 8, 2017

@StrikerRUS
It is very strange, I don't know why.
But I think it will show other error if was EOL problem.

@henry0312
Copy link
Contributor

henry0312 commented Sep 8, 2017

I reproduced the error with Python 3.5.3, and I solved it.
Just try #880 (comment) @StrikerRUS

❯ pytest -s --cache-clear tests/python_package_test/test_sklearn.py
====================================================================================== test session starts ======================================================================================
platform darwin -- Python 3.5.3, pytest-3.2.2, py-1.4.34, pluggy-0.4.0
rootdir: /Users/henry/.ghq/github.com/StrikerRUS/LightGBM, inifile:
collected 13 items

tests/python_package_test/test_sklearn.py ............[LightGBM] [Fatal] Cannot construct Dataset since there are not useful features.
F

=========================================================================================== FAILURES ============================================================================================
_____________________________________________________________________________ TestSklearn.test_sklearn_integration ______________________________________________________________________________

self = <test_sklearn.TestSklearn testMethod=test_sklearn_integration>

    def test_sklearn_integration(self):
        # sklearn <0.19 cannot accept instance, but many tests could be passed only with min_data=1 and min_data_in_bin=1
        if sklearn_at_least_019:
            # we cannot use `check_estimator` directly since there is no skip test mechanism
            for name, estimator in ((lgb.sklearn.LGBMClassifier.__name__, lgb.sklearn.LGBMClassifier),
                                    (lgb.sklearn.LGBMRegressor.__name__, lgb.sklearn.LGBMRegressor)):
                check_parameters_default_constructible(name, estimator)
                check_no_fit_attributes_set_in_init(name, estimator)
                # we cannot leave default params (see https://github.com/Microsoft/LightGBM/issues/833)
                estimator = estimator(min_data=1, min_data_in_bin=1)
                for check in _yield_all_checks(name, estimator):
                    if check.__name__ == 'check_estimators_nan_inf':
                        continue  # skip test because LightGBM deals with nan
                    try:
>                       check(name, estimator)

tests/python_package_test/test_sklearn.py:194:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/utils/testing.py:291: in wrapper
    return fn(*args, **kwargs)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py:814: in check_fit_score_takes_y
    func(X, y)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/sklearn.py:657: in fit
    callbacks=callbacks)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/sklearn.py:451: in fit
    callbacks=callbacks)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/engine.py:178: in train
    booster = Booster(params=params, train_set=train_set)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:1260: in __init__
    train_set.construct().handle,
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:814: in construct
    categorical_feature=self.categorical_feature, params=self.params)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:673: in _lazy_init
    self.__init_from_np2d(data, params_str, ref_dataset)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:731: in __init_from_np2d
    ctypes.byref(self.handle)))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ret = -1

    def _safe_call(ret):
        """Check the return value of C API call
        Parameters
        ----------
        ret : int
            return value from API calls
        """
        if ret != 0:
>           raise LightGBMError(_LIB.LGBM_GetLastError())
E           lightgbm.basic.LightGBMError: b'Cannot construct Dataset since there are not useful features.'

/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:48: LightGBMError
======================================================================================= warnings summary ========================================================================================
tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_backward_compatibility
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/sklearn.py:277: LGBMDeprecationWarning: The `seed` parameter is deprecated and will be removed in next version. Please use `random_state` instead.
    'Please use `random_state` instead.', LGBMDeprecationWarning)

-- Docs: http://doc.pytest.org/en/latest/warnings.html
======================================================================== 1 failed, 12 passed, 1 warnings in 7.28 seconds ========================================================================

@henry0312
Copy link
Contributor

Patch

diff --git a/tests/python_package_test/test_sklearn.py b/tests/python_package_test/test_sklearn.py
index f942543..0ee655d 100644
--- a/tests/python_package_test/test_sklearn.py
+++ b/tests/python_package_test/test_sklearn.py
@@ -186,7 +186,7 @@ class TestSklearn(unittest.TestCase):
                 check_parameters_default_constructible(name, estimator)
                 check_no_fit_attributes_set_in_init(name, estimator)
                 # we cannot leave default params (see https://github.com/Microsoft/LightGBM/issues/833)
-                estimator = estimator(min_data=1, min_data_in_bin=1)
+                estimator = estimator(min_child_samples=1, min_data_in_bin=1)
                 for check in _yield_all_checks(name, estimator):
                     if check.__name__ == 'check_estimators_nan_inf':
                         continue  # skip test because LightGBM deals with nan

Result with Python 2.7.13

❯ pytest tests/python_package_test/test_sklearn.py
====================================================================================== test session starts ======================================================================================
platform darwin -- Python 2.7.13, pytest-3.2.2, py-1.4.34, pluggy-0.4.0
rootdir: /Users/henry/.ghq/github.com/StrikerRUS/LightGBM, inifile:
collected 13 items

tests/python_package_test/test_sklearn.py .............

======================================================================================= warnings summary ========================================================================================
tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_backward_compatibility
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/sklearn.py:277: LGBMDeprecationWarning: The `seed` parameter is deprecated and will be removed in next version. Please use `random_state` instead.
    'Please use `random_state` instead.', LGBMDeprecationWarning)

tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_integration
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/2.7.13/Python.framework/Versions/2.7/lib/python2.7/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')

-- Docs: http://doc.pytest.org/en/latest/warnings.html
============================================================================ 13 passed, 11 warnings in 8.33 seconds =============================================================================

Result with Python 3.5.3

❯ pytest tests/python_package_test/test_sklearn.py
====================================================================================== test session starts ======================================================================================
platform darwin -- Python 3.5.3, pytest-3.2.2, py-1.4.34, pluggy-0.4.0
rootdir: /Users/henry/.ghq/github.com/StrikerRUS/LightGBM, inifile:
collected 13 items

tests/python_package_test/test_sklearn.py .............

======================================================================================= warnings summary ========================================================================================
tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_backward_compatibility
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/sklearn.py:277: LGBMDeprecationWarning: The `seed` parameter is deprecated and will be removed in next version. Please use `random_state` instead.
    'Please use `random_state` instead.', LGBMDeprecationWarning)

tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_integration
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')

-- Docs: http://doc.pytest.org/en/latest/warnings.html
============================================================================ 13 passed, 11 warnings in 8.96 seconds =============================================================================

Result with Python 3.6.2

❯ pytest tests/python_package_test/test_sklearn.py
====================================================================================== test session starts ======================================================================================
platform darwin -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /Users/henry/.ghq/github.com/StrikerRUS/LightGBM, inifile:
plugins: pep8-1.0.6, isort-0.1.0
collected 13 items

tests/python_package_test/test_sklearn.py .............

======================================================================================= warnings summary ========================================================================================
tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_backward_compatibility
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/sklearn.py:277: LGBMDeprecationWarning: The `seed` parameter is deprecated and will be removed in next version. Please use `random_state` instead.
    'Please use `random_state` instead.', LGBMDeprecationWarning)

tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_integration
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')
  /usr/local/var/pyenv/versions/3.6.2/Python.framework/Versions/3.6/lib/python3.6/site-packages/lightgbm/basic.py:423: UserWarning: Converting data to scipy sparse matrix.
    warnings.warn('Converting data to scipy sparse matrix.')

-- Docs: http://doc.pytest.org/en/latest/warnings.html
============================================================================ 13 passed, 11 warnings in 7.62 seconds =============================================================================

@henry0312
Copy link
Contributor

henry0312 commented Sep 8, 2017

The error occurs in check_fit_score_takes_y and X and y is the below at that time.

X = [[ 0.5488135   0.71518937  0.60276338]
 [ 0.54488318  0.4236548   0.64589411]
 [ 0.43758721  0.891773    0.96366276]
 [ 0.38344152  0.79172504  0.52889492]
 [ 0.56804456  0.92559664  0.07103606]
 [ 0.0871293   0.0202184   0.83261985]
 [ 0.77815675  0.87001215  0.97861834]
 [ 0.79915856  0.46147936  0.78052918]
 [ 0.11827443  0.63992102  0.14335329]
 [ 0.94466892  0.52184832  0.41466194]]
y = [0 1 2 0 1 2 0 1 2 0]

I don't know why [LightGBM] [Fatal] Cannot construct Dataset since there are not useful features. happens with the X and y for LGBMClassifier.fit

All tests logs:

❯ pytest --cache-clear tests/python_package_test/test_sklearn.py
====================================================================================== test session starts ======================================================================================
platform darwin -- Python 3.5.3, pytest-3.2.2, py-1.4.34, pluggy-0.4.0
rootdir: /Users/henry/.ghq/github.com/StrikerRUS/LightGBM, inifile:
collected 13 items

tests/python_package_test/test_sklearn.py ............F

=========================================================================================== FAILURES ============================================================================================
_____________________________________________________________________________ TestSklearn.test_sklearn_integration ______________________________________________________________________________

self = <test_sklearn.TestSklearn testMethod=test_sklearn_integration>

    def test_sklearn_integration(self):
        # sklearn <0.19 cannot accept instance, but many tests could be passed only with min_data=1 and min_data_in_bin=1
        if sklearn_at_least_019:
            # we cannot use `check_estimator` directly since there is no skip test mechanism
            for name, estimator in ((lgb.sklearn.LGBMClassifier.__name__, lgb.sklearn.LGBMClassifier),
                                    (lgb.sklearn.LGBMRegressor.__name__, lgb.sklearn.LGBMRegressor)):
                print(name, estimator)
                check_parameters_default_constructible(name, estimator)
                check_no_fit_attributes_set_in_init(name, estimator)
                # we cannot leave default params (see https://github.com/Microsoft/LightGBM/issues/833)
                estimator = estimator(min_data=1, min_data_in_bin=1)
                for check in _yield_all_checks(name, estimator):
                    if check.__name__ == 'check_estimators_nan_inf':
                        continue  # skip test because LightGBM deals with nan
                    try:
                        print(check.__name__)
>                       check(name, estimator)

tests/python_package_test/test_sklearn.py:196:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/utils/testing.py:291: in wrapper
    return fn(*args, **kwargs)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py:814: in check_fit_score_takes_y
    func(X, y)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/sklearn.py:658: in fit
    callbacks=callbacks)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/sklearn.py:451: in fit
    callbacks=callbacks)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/engine.py:178: in train
    booster = Booster(params=params, train_set=train_set)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:1260: in __init__
    train_set.construct().handle,
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:814: in construct
    categorical_feature=self.categorical_feature, params=self.params)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:673: in _lazy_init
    self.__init_from_np2d(data, params_str, ref_dataset)
/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:731: in __init_from_np2d
    ctypes.byref(self.handle)))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ret = -1

    def _safe_call(ret):
        """Check the return value of C API call
        Parameters
        ----------
        ret : int
            return value from API calls
        """
        if ret != 0:
>           raise LightGBMError(_LIB.LGBM_GetLastError())
E           lightgbm.basic.LightGBMError: b'Cannot construct Dataset since there are not useful features.'

/usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/basic.py:48: LightGBMError
------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------
LGBMClassifier <class 'lightgbm.sklearn.LGBMClassifier'>
check_estimators_dtypes
[[ 1.64644051  2.14556789  1.80829     1.63464952  1.27096438]
 [ 1.93768239  1.31276155  2.67531896  2.89098835  1.15032458]
 [ 2.375175    1.5866847   1.70413363  2.7767899   0.21310817]
 [ 0.26138791  0.06065519  2.49785948  2.33447027  2.61003637]
 [ 2.93585491  2.39747572  1.38443804  2.34158754  0.35482329]
 [ 1.91976309  0.43005985  2.83400679  1.56554496  1.24398589]
 [ 0.79366684  2.32270098  1.368451    1.70530176  0.0563694 ]
 [ 1.85290647  1.83628714  1.85080194  2.83124423  2.0454607 ]
 [ 1.07852364  1.31109583  2.0928936   0.18067642  2.00030017]
 [ 2.01191354  0.63114768  0.38677889  0.94628501  1.09113228]
 [ 1.71059024  1.3158046   2.96512151  0.30613443  0.62663031]
 [ 0.48392853  1.95932484  0.75987482  1.39893234  0.73327678]
 [ 0.47690874  0.33112544  1.96898866  0.41454887  0.58974707]
 [ 1.10617554  2.46297979  0.29130384  2.51383471  0.28829521]
 [ 2.92937827  1.40595365  2.93028331  1.81453657  2.21779084]
 [ 0.11756338  0.84842086  0.36058968  0.88842058  0.35618317]
 [ 0.95394957  1.24278891  0.19244248  2.07741642  1.69980431]
 [ 0.79616851  1.56974423  0.28182155  1.72783947  2.78788853]
 [ 0.95570683  2.00223112  0.39539361  2.14898157  0.8682183 ]
 [ 0.54957408  1.75953877  0.06032264  2.48682022  0.01408643]]
[1 1 2 0 2 1 0 1 1 2 1 0 0 1 2 0 0 0 0 0]
[[ 1.64644051  2.14556789  1.80829     1.63464952  1.27096438]
 [ 1.93768239  1.31276155  2.67531896  2.89098835  1.15032458]
 [ 2.375175    1.5866847   1.70413363  2.7767899   0.21310817]
 [ 0.26138791  0.06065519  2.49785948  2.33447027  2.61003637]
 [ 2.93585491  2.39747572  1.38443804  2.34158754  0.35482329]
 [ 1.91976309  0.43005985  2.83400679  1.56554496  1.24398589]
 [ 0.79366684  2.32270098  1.368451    1.70530176  0.0563694 ]
 [ 1.85290647  1.83628714  1.85080194  2.83124423  2.0454607 ]
 [ 1.07852364  1.31109583  2.0928936   0.18067642  2.00030017]
 [ 2.01191354  0.63114768  0.38677889  0.94628501  1.09113228]
 [ 1.71059024  1.3158046   2.96512151  0.30613443  0.62663031]
 [ 0.48392853  1.95932484  0.75987482  1.39893234  0.73327678]
 [ 0.47690874  0.33112544  1.96898866  0.41454887  0.58974707]
 [ 1.10617554  2.46297979  0.29130384  2.51383471  0.28829521]
 [ 2.92937827  1.40595365  2.93028331  1.81453657  2.21779084]
 [ 0.11756338  0.84842086  0.36058968  0.88842058  0.35618317]
 [ 0.95394957  1.24278891  0.19244248  2.07741642  1.69980431]
 [ 0.79616851  1.56974423  0.28182155  1.72783947  2.78788853]
 [ 0.95570683  2.00223112  0.39539361  2.14898157  0.8682183 ]
 [ 0.54957408  1.75953877  0.06032264  2.48682022  0.01408643]]
[1 1 2 0 2 1 0 1 1 2 1 0 0 1 2 0 0 0 0 0]
[[1 2 1 1 1]
 [1 1 2 2 1]
 [2 1 1 2 0]
 [0 0 2 2 2]
 [2 2 1 2 0]
 [1 0 2 1 1]
 [0 2 1 1 0]
 [1 1 1 2 2]
 [1 1 2 0 2]
 [2 0 0 0 1]
 [1 1 2 0 0]
 [0 1 0 1 0]
 [0 0 1 0 0]
 [1 2 0 2 0]
 [2 1 2 1 2]
 [0 0 0 0 0]
 [0 1 0 2 1]
 [0 1 0 1 2]
 [0 2 0 2 0]
 [0 1 0 2 0]]
[1 1 2 0 2 1 0 1 1 2 1 0 0 1 2 0 0 0 0 0]
[[1 2 1 1 1]
 [1 1 2 2 1]
 [2 1 1 2 0]
 [0 0 2 2 2]
 [2 2 1 2 0]
 [1 0 2 1 1]
 [0 2 1 1 0]
 [1 1 1 2 2]
 [1 1 2 0 2]
 [2 0 0 0 1]
 [1 1 2 0 0]
 [0 1 0 1 0]
 [0 0 1 0 0]
 [1 2 0 2 0]
 [2 1 2 1 2]
 [0 0 0 0 0]
 [0 1 0 2 1]
 [0 1 0 1 2]
 [0 2 0 2 0]
 [0 1 0 2 0]]
[1 1 2 0 2 1 0 1 1 2 1 0 0 1 2 0 0 0 0 0]
check_fit_score_takes_y
[[ 0.5488135   0.71518937  0.60276338]
 [ 0.54488318  0.4236548   0.64589411]
 [ 0.43758721  0.891773    0.96366276]
 [ 0.38344152  0.79172504  0.52889492]
 [ 0.56804456  0.92559664  0.07103606]
 [ 0.0871293   0.0202184   0.83261985]
 [ 0.77815675  0.87001215  0.97861834]
 [ 0.79915856  0.46147936  0.78052918]
 [ 0.11827443  0.63992102  0.14335329]
 [ 0.94466892  0.52184832  0.41466194]]
[0 1 2 0 1 2 0 1 2 0]
------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------
[LightGBM] [Fatal] Cannot construct Dataset since there are not useful features.
======================================================================================= warnings summary ========================================================================================
tests/python_package_test/test_sklearn.py::TestSklearn::test_sklearn_backward_compatibility
  /usr/local/var/pyenv/versions/3.5.3/Python.framework/Versions/3.5/lib/python3.5/site-packages/lightgbm/sklearn.py:277: LGBMDeprecationWarning: The `seed` parameter is deprecated and will be removed in next version. Please use `random_state` instead.
    'Please use `random_state` instead.', LGBMDeprecationWarning)

-- Docs: http://doc.pytest.org/en/latest/warnings.html
======================================================================== 1 failed, 12 passed, 1 warnings in 7.54 seconds ========================================================================

@guolinke guolinke merged commit 8984111 into microsoft:master Sep 8, 2017
@StrikerRUS
Copy link
Collaborator Author

@henry0312 Thank you very-very much!
But this fix doesn't explain strange behavior when adding print made test successful on Linux...

@guolinke
Copy link
Collaborator

guolinke commented Sep 8, 2017

@StrikerRUS
I think the reason is caused by the order of parameters.
e.g. min_data=1 min_child_sample=10 and min_child_sample=10 min_data=1 .
adding a print may change the order of this.

@wxchan should we solve the alias problem in python ?

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 8, 2017

@guolinke Your idea seems to be the most realistic.

@henry0312 @guolinke I've made another one experiment: I reset params back to min_data=1, min_data_in_bin=1; by Notepad++ converted EOLs in all *.py files to CRLF (to exclude possible reason mentioned by @Laurae2 ) and added second new line a the end of each file to trigger GitHub update.

In a result, MSVC + Python 3.5 task failed with cannot construct dataset but in check_sample_weights_pandas_series check (instead of check_fit_score_takes_y as it always was before).

@henry0312
Copy link
Contributor

henry0312 commented Sep 8, 2017

Python dictionary doesn't have key orders, therefore when both min_data and min_child_samples are set, it depends on running environemt which one will be used.
And it's not strange that the result may change when adding print.

cf. https://github.com/Microsoft/LightGBM/blob/de51990ed0292b571baa96718cb1b6cdc6acc689/python-package/lightgbm/sklearn.py#L273-L274

@wxchan
Copy link
Contributor

wxchan commented Sep 9, 2017

@guolinke I am kinda lost by your discussions. Why order of params will change result?

@guolinke
Copy link
Collaborator

guolinke commented Sep 9, 2017

@wxchan
for example, when set min_data in sklearn interface, it may be override by the default min_child_sample.

guolinke pushed a commit that referenced this pull request Oct 9, 2017
* disabled logs from compilers; fixed #874

* fixed safe clear_fplder

* added windows folder to manifest.in

* added windows folder to build

* added library path

* added compilation with MSBuild from .sln-file

* fixed unknown PlatformToolset returns exitcode 0

* hotfix

* updated Readme

* removed return

* added installation with mingw test to appveyor

* let's test appveyor with both VS 2015 and VS 2017; but MinGW isn't installed on VS 2017 image

* fixed built-in name 'file'

* simplified appveyor

* removed excess data_files

* fixed unreadable paths

* separated exceptions for cmake and mingw

* refactored silent_call

* don't create artifacts with VS 2015 and mingw

* be more precise with python versioning in Travis

* removed unnecessary if statement

* added classifiers for PyPI and python versions badge

* changed python version in travis

* added support of scikit-learn 0.18.x

* added more python versions to Travis

* added more python versions to Appveyor

* reduced number of tests in Travis

* Travis trick is not needed anymore

* attempt to fix according to #880 (comment)
@lock lock bot locked as resolved and limited conversation to collaborators Mar 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impossible install from sources with pip and localized Visual Studio
6 participants