Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogisticRegression convert to float64 (for SAG solver) #13243

Merged
merged 31 commits into from Feb 27, 2019

Conversation

6 participants
@massich
Copy link
Contributor

commented Feb 25, 2019

Reference Issues/PRs

Works on #8769 and #11000 (for SAG solver) Fixes #9040

closes #11155 (takes over)

massich and others added some commits Jun 16, 2017

Squash all the PR 9040 commits
initial PR commit

seq_dataset.pyx generated from template

seq_dataset.pyx generated from template #2

rename variables

fused types consistency test for seq_dataset

a

sklearn/utils/tests/test_seq_dataset.py

new if statement

add doc

sklearn/utils/seq_dataset.pyx.tp

minor changes

minor changes

typo fix

check numeric accuracy only up 5th decimal

Address oliver's request for changing test name

add test for make_dataset and rename a variable in test_seq_dataset
fix

@massich massich changed the title Henley is 8769 minus merged LogisticRegression convert to float64 (for SAG solver) Feb 25, 2019

@GaelVaroquaux GaelVaroquaux added this to To do in Sprint Paris 2019 via automation Feb 25, 2019

@GaelVaroquaux GaelVaroquaux moved this from To do to In progress in Sprint Paris 2019 Feb 25, 2019

@@ -245,8 +245,9 @@ def sag_solver(X, y, sample_weight=None, loss='log', alpha=1., beta=0.,
max_iter = 1000

if check_input:
X = check_array(X, dtype=np.float64, accept_sparse='csr', order='C')
y = check_array(y, dtype=np.float64, ensure_2d=False, order='C')
_dtype = [np.float64, np.float32]

This comment has been minimized.

Copy link
@massich

massich Feb 25, 2019

Author Contributor

@lesteve said (https://github.com/scikit-learn/scikit-learn/pull/11155/files#r192674850):

This does not seem to be covered by any of the tests.

@glemaitre responded:

I see a pattern that we applied sometimes: the solvers are always checking the inputs while the class level does not do it (e.g. KMeans)

Is it something that we want to extend here or we let the lines uncovered or removing it?

This comment has been minimized.

Copy link
@massich

massich Feb 25, 2019

Author Contributor

I'm not really sure about what you were pointing at. _dtype=[bla.] looks fine to me.

@massich massich force-pushed the massich:henley_is_8769_minus_merged branch from 151fb1c to 5eb99b8 Feb 25, 2019

X, y = datasets.make_classification(n_samples=1000, n_features=100,
n_informative=5, n_classes=2,
random_state=1234)
for seed in range(10):

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Feb 26, 2019

Member

10 seeds, that's a bit brutal.

What's the time of this test? Is it very fast? We really need to be careful with the run-time of our test suite.



if __name__ == '__main__':
solvers = ['saga', 'liblinear', 'lightning']
solvers = ['saga', ]

This comment has been minimized.

Copy link
@massich

massich Feb 26, 2019

Author Contributor

@NelleV @GaelVaroquaux liblinear and lightining were removed from the benchmark. I guess that was to speed up the process, shall we put them back?

This comment has been minimized.

Copy link
@GaelVaroquaux

assert xicsr_data_32.dtype == np.float32
assert xicsr_data_64.dtype == np.float64
assert isinstance(yicsr_32, float)

This comment has been minimized.

Copy link
@massich

massich Feb 26, 2019

Author Contributor

I forgot this one

dataset2 = CSRDataset(X_csr.data, X_csr.indptr, X_csr.indices,
y, sample_weight, seed=42)

X64 = iris.data.astype(np.float64)

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 26, 2019

Contributor

Could create a pytest fixture which will return either 64 bits or 32 bits datasets. It can return X, X_sparse, y, sample_weight.

We can then use this fixture in parametrize.

Show resolved Hide resolved sklearn/utils/tests/test_seq_dataset.py Outdated
Show resolved Hide resolved sklearn/utils/tests/test_seq_dataset.py Outdated

massich added some commits Feb 26, 2019

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

The linter is not happy

@massich

This comment has been minimized.

Copy link
Contributor Author

commented Feb 26, 2019

The last commit should solve the linter.

I tried to call the constructors with named parameters and avoid verbosity but when I do so, I get __cinit__ errors. That I don't seem to get around.

If green, we can merge it as it is or I could give a pass to sklearn/linear_model/tests/test_base.py

@massich

This comment has been minimized.

Copy link
Contributor Author

commented Feb 26, 2019

on second thought, since then you compare for consistancy within the test you need both 32 and 64 so you can't parametrize. We could always get a EXPECTED_VALUES and compare against that.

@@ -137,6 +137,11 @@ Support for Python 3.4 and below has been officially dropped.
:mod:`sklearn.linear_model`
...........................

- |Enhancement| :class:`linear_model.make_dataset` now preserves
``float32`` and ``float64`` dtypes. :issues:`8769` and `11000` by

This comment has been minimized.

Copy link
@thibsej

thibsej Feb 26, 2019

Contributor

11000 misses :issues: infront

This comment has been minimized.

Copy link
@massich

massich Feb 26, 2019

Author Contributor
- |Enhancement| :class:`linear_model.make_dataset` now preserves ``float32``
  and ``float64`` dtypes. :issues:`8769` and :issues:`11000` by :user:`Nelle
  Varoquaux`_, :user:`Arthur Imbert`_, :user:`Guillaume Lemaitre <glemaitre>`,
  and :user:`Joan Massich <massich>`

@GaelVaroquaux GaelVaroquaux requested a review from glemaitre Feb 27, 2019

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

commented Feb 27, 2019

LGTM. Merging. This is a net improvement. More improvements can follow later.

@GaelVaroquaux GaelVaroquaux merged commit f02ef9f into scikit-learn:master Feb 27, 2019

15 of 16 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
LGTM analysis: C/C++ No new or fixed alerts
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
scikit-learn.scikit-learn Build #20190227.13 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35) Windows py35 succeeded
Details
scikit-learn.scikit-learn (Windows py37) Windows py37 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details

Sprint Paris 2019 automation moved this from In progress to Done Feb 27, 2019

@massich massich deleted the massich:henley_is_8769_minus_merged branch Feb 27, 2019

@glemaitre glemaitre referenced this pull request Mar 1, 2019

Open

Preserving dtype for float32 / float64 in transformers #11000

6 of 28 tasks complete

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

LogisticRegression convert to float64 (for SAG solver) (scikit-learn#…
…13243)

* Remove unused code

* Squash all the PR 9040 commits

initial PR commit

seq_dataset.pyx generated from template

seq_dataset.pyx generated from template #2

rename variables

fused types consistency test for seq_dataset

a

sklearn/utils/tests/test_seq_dataset.py

new if statement

add doc

sklearn/utils/seq_dataset.pyx.tp

minor changes

minor changes

typo fix

check numeric accuracy only up 5th decimal

Address oliver's request for changing test name

add test for make_dataset and rename a variable in test_seq_dataset

* FIX tests

* TST more numerically stable test_sgd.test_tol_parameter

* Added benchmarks to compare SAGA 32b and 64b

* Fixing gael's comments

* fix

* solve some issues

* PEP8

* Address lesteve comments

* fix merging

* avoid using assert_equal

* use all_close

* use explicit ArrayDataset64 and CSRDataset64

* fix: remove unused import

* Use parametrized to cover ArrayDaset-CSRDataset-32-64 matrix

* for consistency use 32 first then 64 + add 64 suffix to variables

* it would be cool if this worked !!!

* more verbose version

* revert SGD changes as much as possible.

* Add solvers back to bench_saga

* make 64 explicit in the naming

* remove checking native python type + add comparison between 32 64

* Add whatsnew with everyone with commits

* simplify a bit the testing

* simplify the parametrize

* update whatsnew

* fix pep8

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.