Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Libsvm and liblinear rand() fix for convergence on windows targets (and improvement on all targets) #13511

Merged
merged 59 commits into from
Apr 6, 2020

Conversation

smarie
Copy link
Contributor

@smarie smarie commented Mar 25, 2019

This PR fixes part of #11536 on windows targets.

On windows platforms, liblinear and libsvm have strong convergence issues because of the way random numbers are generated: max random number in Windows is 15 bits (even on 64 bit windows), which is 32767, while max random number in linux+GCC is 31 bits (resp. 63 bits in 64 bits systems I guess) so that's 2147483647 (resp 9223372036854775807).

If I understand correctly, these random numbers are used in the coordinate gradient descent algorithms, to find the next coordinate to act upon. When the dimensionality (e.g. number of samples) is large, the random number generator on windows has a hard time to explore all dimensions.

This is a known bug documented in liblinear FAQ (strangely enough, not the libsvm FAQ) but the proposed workaround was wrong.

I made a patch for this years ago, that was approved by several users yet never merged: cjlin1/liblinear#28 .

Note that another user reported it on libsvm, I proposed a similar PR there: cjlin1/libsvm#140.

Since I realized that sklearn has it own internal copy of both libraries, this is the fix.

@smarie
Copy link
Contributor Author

smarie commented Mar 25, 2019

Continuous integration reports failures concerning SVM scores:

Windows py35_32 and Windows py37_64

______________ test_estimators[LinearSVR-check_regressors_train] ______________
estimator_checks.py:1870: in check_regressors_train
'0.48870169385239753 not greater than 0.5'

___ test_estimators[LinearSVR-check_regressors_train(readonly_memmap=True)] ___
estimator_checks.py:1870: in check_regressors_train
'0.48870169385239753 not greater than 0.5'

It is a bit strange, because on my Windows PC I am not able to reproduce these issues...
On my PC, both tests above lead to a score of 0.5826 and everything runs fine.

I tried to restart all CI jobs by pushing again - I obtain exactly the same result. Could anyone try on his windows machine to see if they can reproduce the CI bug ? I have to admit that I am clueless here.

@smarie smarie changed the title Libsvm and liblinear rand() fix for convergence on windows targets [Help wanted to debug on CI] Libsvm and liblinear rand() fix for convergence on windows targets Mar 25, 2019
sklearn/svm/src/liblinear/linear.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/liblinear/linear.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/liblinear/linear.cpp Outdated Show resolved Hide resolved
@smarie
Copy link
Contributor Author

smarie commented Mar 26, 2019

Following @jnothman review, this is the summary of where this PR is:

  • Track 1 I updated the proposal so that it the rand() fix is entirely done at compile time. The result is in this version. It results in two test errors (see above) that are mainly probably due to a very limited number of iterations

  • Track 2 I replaced rand() with the mersene twister random number generator, as recommended by https://codeforces.com/blog/entry/61587 . This is C++11 so I had to modify the setup.py for libsvm and liblinear. It results in ~20 test errors, I am not able to assess if this is minor and due to hardcoded assert values, or major with a remaining bug. Note: I used a hardcoded seed instead of a more variable seed so that behaviour is deterministic.

Help from a core team member with knowledge of the libsvm / liblinear estimators would therefore be needed at this stage.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures certainly look substantive.

The only place I can see they might come from is that you no longer seed the object with each new fit. Replace srand( with mt_rand.seed( in


srand(param->random_seed);

srand(param->random_seed);

sklearn/svm/src/liblinear/linear.cpp Outdated Show resolved Hide resolved
sklearn/svm/setup.py Show resolved Hide resolved
@jnothman
Copy link
Member

jnothman commented Mar 27, 2019 via email

@smarie
Copy link
Contributor Author

smarie commented Mar 27, 2019

There is, but this is a method instance that works on a given mersenne twister engine object. Since a new object is created when the library is loaded, I'll see if I can access it from the files you mention. If not, I'll have to pass this seed alon in the parameters struct.

@smarie
Copy link
Contributor Author

smarie commented Mar 27, 2019

It seems all good now. I'll just try to restart CI to get rid of the random download failures of circleci

@smarie
Copy link
Contributor Author

smarie commented Jan 21, 2020

Quick link to my last comment, for those ending up here: #13511 (comment)

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smarie

Small question but LGTM

assert_almost_equal(_average_precision(y_true, probas_pred),
precision_recall_auc, decimal=3)
precision_recall_auc, decimal=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember well, it is. Changing the random number generation seems to make this test fail. When I investigated a few months ago, I could not find a good reason except for precision of the _average_precision in case of 0.5 ties.

I would certainly appreciate a second pass of checking on that one, to confirm my impression.

@jnothman
Copy link
Member

jnothman commented Jan 22, 2020 via email

@smarie
Copy link
Contributor Author

smarie commented Jan 22, 2020

In addition to the benchmark suggestion, I suggest to create a python wrapper for the new bounded_rand_int function, so that we could create a dedicated unit test to ensure correct randomization quality on all platforms. This can be done in another issue+pr

@cmarmo
Copy link
Contributor

cmarmo commented Apr 3, 2020

@scikit-learn/core-devs this is a quite old PR thrice-approved, tagged in 0.23 milestone. Apart from syncing to upstream is there anything else to do but merge?

@adrinjalali
Copy link
Member

We talked about it in one of the meetings and I believe it can be merged as is. @smarie would you mind fixing the merge conflicts?

@smarie
Copy link
Contributor Author

smarie commented Apr 4, 2020

Done. I updated the headers, as we are now in 2020 :) I also created a header for the newrand.h file, it had none.

@smarie
Copy link
Contributor Author

smarie commented Apr 4, 2020

There is a doctest failure that seems unrelated:

=================================== FAILURES ===================================
_____________________ [doctest] unsupervised_learning.rst ______________________
176    :target: ../../auto_examples/cluster/plot_coin_ward_segmentation.html
177    :scale: 40
  File "/usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)

  File "/usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)

  File "/usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/skimage/io/_plugins/pil_plugin.py", line 36, in imread
    return pil_to_ndarray(im, dtype=dtype, img_num=img_num)

  File "/usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/skimage/io/_plugins/pil_plugin.py", line 66, in pil_to_ndarray
    image.seek(i)

  File "/usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/PIL/PngImagePlugin.py", line 748, in seek
    self._seek(f)

  File "/usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/PIL/PngImagePlugin.py", line 791, in _seek
    cid, pos, length = self.png.read()

AttributeError: 'NoneType' object has no attribute 'read'

/home/vsts/work/1/s/doc/tutorial/statistical_inference/unsupervised_learning.rst:185: UnexpectedException
=========================== short test summary info ============================
SKIPPED [1] /usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/_pytest/nose.py:32: Skipping dataset loading doctests
SKIPPED [1] /usr/share/miniconda/envs/testvenv/lib/python3.8/site-packages/_pytest/nose.py:32: Text tutorial requires large dataset download
========= 1 failed, 44 passed, 2 skipped, 37 warnings in 34.03 seconds =========
Makefile:34: recipe for target 'test-doc' failed
make: *** [test-doc] Error 1

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smarie , happy to merge if there are no objections.

sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/LIBSVM_CHANGES Outdated Show resolved Hide resolved
@smarie
Copy link
Contributor Author

smarie commented Apr 6, 2020

thanks for the review! we'll have to create a new dedicated issue to reference the idea of creating SVM benchmarks. I have made significant progress in the direction of creating reproducible benchmarks thanks to pytest plugins (pytest-cases and pytest-harvest in particular) in the past years ; maybe this could form the basis for a scikit-learn companion project. To be discussed.... later :)

@adrinjalali adrinjalali merged commit eaf0a04 into scikit-learn:master Apr 6, 2020
@adrinjalali
Copy link
Member

Thanks for your persistence @smarie . Could you please open a new issue for the remaining stuff?

@smarie
Copy link
Contributor Author

smarie commented Apr 7, 2020

Done: #13862 and #16864

@smarie smarie deleted the libsvm_liblinear_rand_fix branch April 7, 2020 12:01
@smarie smarie mentioned this pull request Apr 7, 2020
gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
…s (and improvement on all targets) (scikit-learn#13511)

* Fixed random number generation on windows. See cjlin1/liblinear#28

* Added correct PR number

* Fixed random number generator for libsvm on windows targets

* Updated comments

* Suppressed C4293 warnings using explicit cast.

* Suppressed C4293 warnings using explicit cast.

* Following code review: `myrand` is now inline, and all integer size checks for random generation fix are now static.

* Dummy comment edit to re-trigger build

* Fixed include error

* Attempt to provide a better and faster result by using a mersene twister random number generator as suggested by https://codeforces.com/blog/entry/61587

* Attempt to integrate the extra compiler arg requesting for explicit C++11 support

* Attempt to integrate the extra compiler arg requesting for explicit C++11 support. liblinear extension.

* the `extra_compile_args` for C++11 was not working when used in `config.add_extension` so now pre-creating liblinear library first with the flag.

* Fixed liblinear extension build: there was a missing library dependency.

* Fixed liblinear library dependency for extension compilation.

* Fixed liblinear library compilation: added tron

* Now correctly setting the seed in libsvm

* Now correctly setting the seed in liblinear. This is done using a new interface function `set_seed`.

* Updated what's new accordingly

* Increased tolerance when comparing `_average_precision` with `_average_precision_slow`. Indeed `_average_precision` is based on predictions ordering and has errors in case of ties (e.g. several 0.5 prediction values)

* Minor improvement of this test method for readability (no change in output)

* Minor doc update as per review

* dropped commented line

* Reverted test readability improvement change

* Using double barticks

* Clarified all comments

* removed useless space

* Clarified all comments, and included a `set_seed` method as per code review.

* Updated what's new about changed models as suggested by review.

* Replaced random number generator: now using a `bounded_rand_int` using tweaked Lemire post-processor from http://www.pcg-random.org/posts/bounded-rands.html)

* Updated readme rst and moved it to 0.22

* Whats new moved from 0.222 to 0.23

* Updated docs

* Update doc/whats_new/v0.23.rst

Co-Authored-By: Chiara Marmo <cmarmo@users.noreply.github.com>

* Dummy change to re-trigger the CI build

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.22.rst

* As per code review: added LogisticRegression in the list of changes, and repeated the list of classes affected in changelog.

* New random number generator used in libsvm / liblinear is now in an independent header file `newrand/newrand.h`.

* Update sklearn/svm/src/liblinear/linear.cpp

* Fixed dates in headers and added a header to newrand.h

* Added a sentence in the changelog to explain the impact in user-friendly terms.

* Added link to PR in readme file and removed commented lines as per code review

Co-authored-by: Sylvain MARIE <sylvain.marie@se.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
ariapoy referenced this pull request in ntucllab/libact Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants