Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Fix semi_supervised #9239

Merged
merged 34 commits into from Jul 3, 2017
Merged

Conversation

@musically-ut
Copy link
Contributor

@musically-ut musically-ut commented Jun 28, 2017

Closes #3550, #5774, #3758, #6727 and deprecates #9192.

Pinging @jnothman, @boechat107, @MechCoder. Am I forgetting someone?

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 29, 2017

@musically-ut
Copy link
Contributor Author

@musically-ut musically-ut commented Jun 29, 2017

Just making Travis happy, I think.

Overall, this is a piecemeal effort to fix semi_supervised module as some other issues will remain.
Nevertheless, I believe it is best to fix these first and then take care of the rest in smaller, more contained, PRs:

  • Updating the tutorial for semi_supervised learning.
  • Fixing implementation of kNN kernel #8008.
  • Adding a few more warnings, re: convergence (e.g. if max_iter is reached) and numerical stability (e.g. if an under/overflow occurs during computations).

I'm fixing the flake8 failures as we speak.

@musically-ut musically-ut changed the title [WIP] Fix semi supervised Fix semi supervised Jun 29, 2017
@ogrisel ogrisel added this to the 0.20 milestone Jun 29, 2017
@ogrisel ogrisel modified the milestones: 0.19, 0.20 Jun 29, 2017
@musically-ut
Copy link
Contributor Author

@musically-ut musically-ut commented Jun 29, 2017

The failure on appveyor was unrelated to the current changes.

@musically-ut musically-ut changed the title Fix semi supervised [MRG] Fix semi_supervised Jun 29, 2017
@amueller
Copy link
Member

@amueller amueller commented Jun 29, 2017

can you please merge master? that should get rid of the appveyor error.

@musically-ut
Copy link
Contributor Author

@musically-ut musically-ut commented Jul 2, 2017

Updated; awaiting review.

@@ -264,13 +280,21 @@ def fit(self, X, y):
l_previous = self.label_distributions_
self.label_distributions_ = safe_sparse_dot(
graph_matrix, self.label_distributions_)

if alpha == 'propagation':

This comment has been minimized.

@jnothman

jnothman Jul 2, 2017
Member

I think this should be self.variant

This comment has been minimized.

@musically-ut

musically-ut Jul 2, 2017
Author Contributor

Gah!

I wish we had a test to see if normalization was working. :(

@@ -264,13 +280,21 @@ def fit(self, X, y):
l_previous = self.label_distributions_
self.label_distributions_ = safe_sparse_dot(
graph_matrix, self.label_distributions_)

if alpha == 'propagation':
# LabelPropagation

This comment has been minimized.

@jnothman

jnothman Jul 2, 2017
Member

and i think this comment is then redundant

Copy link
Member

@jnothman jnothman left a comment

So is there some way to invent a graph where you get very different normalisation factors for different rows, so that that substantially affects the propagation?

- Fix :class:`semi_supervised.BaseLabelPropagation` to correctly implement
``LabelPropagation`` and ``LabelSpreading`` as done in the referenced
papers. :class:`semi_supervised.LabelPropagation` now always does hard
clamping. Its ``alpha`` parameter now defaults to ``None`` and is

This comment has been minimized.

@jnothman

jnothman Jul 2, 2017
Member

I think you should say that "alpha has no effect and is deprecated to be ...".

This comment has been minimized.

@musically-ut

musically-ut Jul 2, 2017
Author Contributor

👍

if self.variant == 'propagation':
# LabelPropagation
clamp_weights = np.ones((n_samples, 1))
clamp_weights[~unlabeled, 0] = 0.0

This comment has been minimized.

@jnothman

jnothman Jul 2, 2017
Member

This should be the same as clamp_weights = unlabelled.reshape(-1, 1)

clamp_weights[~unlabeled, 0] = 0.0
else:
# LabelSpreading
clamp_weights = alpha * np.ones((n_samples, 1))

This comment has been minimized.

@jnothman

jnothman Jul 2, 2017
Member

just use clamp_weights = alpha

This comment has been minimized.

@musically-ut

musically-ut Jul 2, 2017
Author Contributor

Have removed clamp_weights completely in favor of keeping only alpha.

normalizer = np.sum(
self.label_distributions_, axis=1)[:, np.newaxis]
self.label_distributions_ /= normalizer

# clamp
self.label_distributions_ = np.multiply(
clamp_weights, self.label_distributions_) + y_static

This comment has been minimized.

@jnothman

jnothman Jul 2, 2017
Member

Hmm. In the label prop case, we could just be using np.where(unlabelled, self.label_distributions, y_static) unless I'm mistaken

This comment has been minimized.

@musically-ut

musically-ut Jul 2, 2017
Author Contributor

Yes, np.where(unlabeled[:, np.newaxis], self.label_distributions_, y_static) to be more precise. Have factored out unlabeled[:, np.newaxis] in the actual implementation.

@musically-ut
Copy link
Contributor Author

@musically-ut musically-ut commented Jul 2, 2017

Err ... the Travis failures, this time, seem to be unrelated to this PR.

@@ -350,6 +373,14 @@ class LabelPropagation(BaseLabelPropagation):
LabelSpreading : Alternate label propagation strategy more robust to noise
"""

variant = 'propagation'

This comment has been minimized.

@ogrisel

ogrisel Jul 3, 2017
Member

Please make this a private attribute _variant. Otherwise it could be consider public API.

@jnothman I think we can merge once this is made private. Ok on your side?

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jul 3, 2017

You can ignore tests failing with "HTTP Error 500" on mldata.org.

@musically-ut
Copy link
Contributor Author

@musically-ut musically-ut commented Jul 3, 2017

Updated.

PS: I don't remember whether merely pushing commits sends out an notification e-mail to subscribers by default or not on GitHub (I vaguely recall turning off those notifications for myself somewhere). In case it does, please tell me and I'm sorry about the noise coming from these "Updated" comments.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jul 3, 2017

It's fine, let's wait for @jnothman's feedback and CI results.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 3, 2017

Yes, I'm +1 for merge, because it would be good to finally have this fixed in the release. But it would be nice if we had a non-regression test for that normalisation issue. Want to think about it in the background and try propose something, @musically-ut?

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jul 3, 2017

I will be offline soon. @jnothman I let you handle the end and merging of this PR with @musically-ut :)

@musically-ut
Copy link
Contributor Author

@musically-ut musically-ut commented Jul 3, 2017

👍

I am going to continue proposing some changes after the underlying algorithm is fixed (e.g. #8008), and will keep trying to come up with better tests in the background.

@ogrisel Thanks for the review!

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 3, 2017

Thanks and well done, @musically-ut, @boechat107, @MechCoder and all.

@jnothman jnothman merged commit 0dc2279 into scikit-learn:master Jul 3, 2017
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@amueller
Copy link
Member

@amueller amueller commented Jul 5, 2017

AWESOME! thank you so much to everybody working on this!

massich added a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants