Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Add float32 support for Linear Discriminant Analysis #13273

Merged
merged 10 commits into from Feb 27, 2019

Conversation

5 participants
@thibsej
Copy link
Contributor

commented Feb 26, 2019

Reference Issues/PRs

this PR works on #11000 by preserving the dtype in LDA.

cross reference: #8769 (comment)

What does this implement/fix? Explain your changes.

Any other comments?

thibsej added some commits Feb 26, 2019

@agramfort agramfort added this to In progress in Sprint Paris 2019 Feb 26, 2019

@thibsej thibsej marked this pull request as ready for review Feb 26, 2019

@@ -485,9 +485,10 @@ def fit(self, X, y):
raise ValueError("unknown solver {} (valid solvers are 'svd', "
"'lsqr', and 'eigen').".format(self.solver))
if self.classes_.size == 2: # treat binary case as a special case
self.coef_ = np.array(self.coef_[1, :] - self.coef_[0, :], ndmin=2)
my_type = np.float32 if (X.dtype == np.float32) else np.float64

This comment has been minimized.

Copy link
@thibsej

thibsej Feb 26, 2019

Author Contributor

This line is necessary when X.dtype is int32 or int64.
Should we keep the number of bits short, i.e. perfom casting from int32 to float32, or int32 to float 64 ?

ping @glemaitre

@thibsej thibsej changed the title Add float32 support for Linear Discriminant Analysis [MRG] Add float32 support for Linear Discriminant Analysis Feb 26, 2019

@@ -485,9 +485,10 @@ def fit(self, X, y):
raise ValueError("unknown solver {} (valid solvers are 'svd', "
"'lsqr', and 'eigen').".format(self.solver))
if self.classes_.size == 2: # treat binary case as a special case
self.coef_ = np.array(self.coef_[1, :] - self.coef_[0, :], ndmin=2)
my_type = np.float32 if (X.dtype in [np.float32, np.int32]) else np.float64

This comment has been minimized.

Copy link
@agramfort

agramfort Feb 26, 2019

Member

you should already have called check_X_y and/or as_float_array above. Then you can just use X.dtype to specify the dtype of arrays you allocate

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Feb 26, 2019

Member

Indeed, and the check_X_y of line 430 probably needs to be changed to "dtype=[np.float32, np.float64]"

@@ -295,6 +296,27 @@ def test_lda_dimension_warning(n_classes, n_features):
"min(n_features, n_classes - 1).")
assert_warns_message(FutureWarning, future_msg, lda.fit, X, y)

@pytest.mark.parametrize("data_type, expected_type",[
(np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float32), (np.int64, np.float64)])

This comment has been minimized.

Copy link
@agramfort

agramfort Feb 26, 2019

Member

pep8 violations. please check

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

No additional on top of those @agramfort and myself made above.

@@ -427,7 +427,8 @@ def fit(self, X, y):
Target values.
"""
# FIXME: Future warning to be removed in 0.23
X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
X = as_float_array(X)

This comment has been minimized.

Copy link
@thibsej

thibsej Feb 26, 2019

Author Contributor

This line is necessary to get coefficients of type float32 output when X is of type int32. Removing this line gives coefficients of type float64.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 26, 2019

Contributor

This is strange becasue the next check_X_y will convert to float anyway. It seems unecessary, isn't it?

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 26, 2019

Contributor

Oh I see why. So X will be converted to float64 and you expect it to be converted to float 32 because X is of type int32. I would say that there is no real reason for that. int32 can default to float64. I am fine with that.

@massich what is the mechanism in the other estimators that you modified?

This comment has been minimized.

Copy link
@massich

massich Feb 26, 2019

Contributor

There was no mechanism set. I've the vague idea that we talked about it long ago, but I can't recall that we agreed on anything. To me it makes sense that if someone willing creates the data in 32, wants to keep it in 32 bits. But if such individual is there she/he will scream at us, or would had done it already. So I've no objection on defaulting to float64.

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Feb 26, 2019

Member

An int32 is a "very big" int. Hence, it make sense to convert it to a float64. People who want to save memory with ints use int16. Indeed, an int has a larger span than a float for the same memory cost.

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

CI are failing

@glemaitre glemaitre self-requested a review Feb 26, 2019

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

It looks good. Could you add an entry in the what's new because we change the behavior of LDA.
I am still unsure about as_float_array call.

@@ -19,7 +19,7 @@
from .linear_model.base import LinearClassifierMixin
from .covariance import ledoit_wolf, empirical_covariance, shrunk_covariance
from .utils.multiclass import unique_labels
from .utils import check_array, check_X_y
from .utils import check_array, check_X_y, as_float_array

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 26, 2019

Contributor
Suggested change
from .utils import check_array, check_X_y, as_float_array
from .utils import check_array, check_X_y
from .utils import as_float_array
@@ -427,7 +427,8 @@ def fit(self, X, y):
Target values.
"""
# FIXME: Future warning to be removed in 0.23
X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
X = as_float_array(X)

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 26, 2019

Contributor

This is strange becasue the next check_X_y will convert to float anyway. It seems unecessary, isn't it?

thibsej added some commits Feb 26, 2019

@@ -296,6 +297,29 @@ def test_lda_dimension_warning(n_classes, n_features):
assert_warns_message(FutureWarning, future_msg, lda.fit, X, y)


@pytest.mark.parametrize("data_type, expected_type", [
(np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float32),

This comment has been minimized.

Copy link
@thibsej

thibsej Feb 26, 2019

Author Contributor

Convention for int32 is casting to float64.

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

@thibsej Could you add an entry to what's new

@@ -296,6 +297,29 @@ def test_lda_dimension_warning(n_classes, n_features):
assert_warns_message(FutureWarning, future_msg, lda.fit, X, y)


@pytest.mark.parametrize("data_type, expected_type", [
(np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float64),
(np.int64, np.float64)])

This comment has been minimized.

Copy link
@massich

massich Feb 26, 2019

Contributor

I find this one more readable:

@pytest.mark.parametrize("data_type, expected_type", [
  (np.float32, np.float32),
  (np.float64, np.float64),
  (np.int32, np.float64),
  (np.int64, np.float64)
])
# Check value consistency between types
rtol = 1e-6
assert_allclose(clf_32.coef_, clf_64.coef_.astype(np.float32),
rtol=rtol)

This comment has been minimized.

Copy link
@massich

massich Feb 26, 2019

Contributor

do not use astype assert_allclose should be able to handle it.

        assert_allclose(clf_32.coef_, clf_64.coef_, rtol=rtol)
@massich

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

@thibsej Could you add an entry to what's new

You should search for this section in doc/whats_new/v0.21.rst.
(if its not there you create it). Then you add the following entry

:mod:`sklearn.discriminant_analysis`
....................................


- |Enhancement| :class:`discriminant_analysis.LinearDiscriminantAnalysis` now
preserves ``float32`` and ``float64`` dtypes. :issues:`8769` and
:issues:`11000` by :user:`Thibault Sejourne <thibsej>`
@massich
Copy link
Contributor

left a comment

LGTM, once addressed the nitpicks

@GaelVaroquaux GaelVaroquaux changed the title [MRG] Add float32 support for Linear Discriminant Analysis [MRG+1] Add float32 support for Linear Discriminant Analysis Feb 26, 2019

@GaelVaroquaux GaelVaroquaux changed the title [MRG+1] Add float32 support for Linear Discriminant Analysis [MRG+2] Add float32 support for Linear Discriminant Analysis Feb 26, 2019

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

LGTM once the tests pass.

@pytest.mark.parametrize("data_type, expected_type", [
(np.float32, np.float32),
(np.float64, np.float64),
(np.int32, np.float64),

This comment has been minimized.

Copy link
@agramfort

agramfort Feb 26, 2019

Member

Going from int32 to float64 is not what was done by isotonic regression isofused branch this morning. We should be consistent here.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 26, 2019

Contributor

I am +1 for int32 -> float64. It was the semantic for check_array.
@ogrisel was also for this conversion. I would think that we should correct the isotonic regression then.

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Feb 27, 2019

Member

+1. Merging. This is a net improvement. We'll nitpick later :D

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

We should almost have a common test.

@GaelVaroquaux GaelVaroquaux merged commit 415fd83 into scikit-learn:master Feb 27, 2019

18 checks passed

LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 92.48%)
Details
codecov/project 92.55% (+0.07%) compared to b61283d
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
scikit-learn.scikit-learn Build #20190226.127 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35) Windows py35 succeeded
Details
scikit-learn.scikit-learn (Windows py37) Windows py37 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details

Sprint Paris 2019 automation moved this from In progress to Done Feb 27, 2019

@massich

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

@thibsej thibsej deleted the thibsej:float32_support_lda branch Feb 27, 2019

@glemaitre glemaitre referenced this pull request Mar 1, 2019

Open

Preserving dtype for float32 / float64 in transformers #11000

6 of 28 tasks complete

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

[MRG+2] Add float32 support for Linear Discriminant Analysis (scikit-…
…learn#13273)

* [skip ci] Empty commit to trigger PR

* Add dtype testing

* Fix: dtype testing

* Fix test_estimators[OneVsRestClassifier-check_estimators_dtypes]

* TST refactor using parametrize + Add failing test for int32

* Fix for int32

* Fix code according to review + Fix PEP8 violation

* Fix dtype for int32 and complex

* Fix pep8 violation

* Update whatsnew + test COSMIT

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.