Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irreproducible Segfault with LightGBM #1743

Closed
vruvora opened this issue Oct 11, 2018 · 24 comments

Comments

@vruvora
Copy link

@vruvora vruvora commented Oct 11, 2018

We are experiencing cases of irreproducible Segfault with LightGBM. Does anyone else have this issue?

Environment info

Operating System:

CPU/GPU model:

C++/Python/R version:

Error message

Reproducible examples

Steps to reproduce

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 11, 2018

Did you install LightGBM from PyPI?
The distributions on PyPI may be broken.
(I guess compile optimization may be a cause)
So, can you try to compile from source?

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Oct 11, 2018

We are installing with pip install. It just hangs in a middle of an optimization. We can try compiling from source but is there a reason why PyPI may be broken?

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 11, 2018

I guess there are something wrong with manylinux and -O3.
Maybe, we should compile without -O3 on manylinux.

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Oct 11, 2018

Interesting. Have you or anyone else experienced this because we have been experiencing this a fair amount but I have not seen any big reported use cases? This seems like it would break the package from being used in production.

@guolinke

This comment has been minimized.

Copy link
Member

@guolinke guolinke commented Oct 11, 2018

@vruvora
Did the Segfault occur often?

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Oct 11, 2018

Yes. It has been happening a quite a bit as of late. We have automated statistical unit tests with sample datasets which may be a degenerate optimization problem. However, we use them to make sure our pipeline is in order when we have big changes. Generally, ~1/5 times it will segfault and it will stop once we do it again.

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 11, 2018

I saw uncertain crash many times in my main job.

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Oct 11, 2018

@henry0312 Interesting. Have you been able to fix it?

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 11, 2018

@vruvora you should be sure to uninstall the package from PyPI and install manually.

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 11, 2018

@vruvora Yeah, installing from source have solved the problem.

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Oct 11, 2018

@henry0312 I will try this out and update accordingly. Thanks. @guolinke Do you know why source isn't up to date with PyPI?

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 11, 2018

@guolinke

This comment has been minimized.

Copy link
Member

@guolinke guolinke commented Oct 11, 2018

@vruvora you can use

pip install --no-binary :all: lightgbm

as well.

@guolinke

This comment has been minimized.

Copy link
Member

@guolinke guolinke commented Oct 11, 2018

@henry0312
building without O3 may slow-down the running speed.

I am not very sure why Segfault happened, and why building from source can fix it.

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 11, 2018

@guolinke Yes, -O3 makes cumpting perfomance better, but there may be something wrong with tunes for manylinux.

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Oct 11, 2018

@guolinke Will this be slower pip install --no-binary :all: lightgbm than regular pip? Can you elaborate on the differences?

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Oct 15, 2018

@vruvora have you solved your issue?

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Oct 18, 2018

@henry0312 Yes. As far as we know. No more segafults.

@guolinke

This comment has been minimized.

Copy link
Member

@guolinke guolinke commented Oct 18, 2018

should we add this problem and solution to the document?

@StrikerRUS

This comment has been minimized.

Copy link
Collaborator

@StrikerRUS StrikerRUS commented Nov 27, 2018

ping @henry0312

@vruvora

This comment has been minimized.

Copy link
Author

@vruvora vruvora commented Nov 27, 2018

Yes. We should.

@henry0312

This comment has been minimized.

Copy link
Collaborator

@henry0312 henry0312 commented Nov 28, 2018

yeah, I also think so.

@guolinke

This comment has been minimized.

Copy link
Member

@guolinke guolinke commented Nov 28, 2018

@vruvora

@guolinke Will this be slower pip install --no-binary :all: lightgbm than regular pip? Can you elaborate on the differences?

The install speed will be slower. But the running speed isn't affected.

@StrikerRUS

This comment has been minimized.

Copy link
Collaborator

@StrikerRUS StrikerRUS commented Nov 28, 2018

@vruvora A bit simplistic, the difference is in that regular pip simply downloads .whl file with prebuilt library file and copies its content in the appropriate directory.
In opposite, pip install --no-binary :all: compiles library file on your machine and performs copying.

@henry0312 Would you mind creating a PR with new FAQ entry about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.