MRG respect dtypes in pandas dataframes if homogeneous #15094

amueller · 2019-09-25T22:03:47Z

Fixes #15093.
Does that deserve/need a whatsnew?

jnothman

Yes, might as well have a change log entry

jnothman · 2019-09-25T22:20:34Z

Misfire

amueller · 2019-09-26T15:26:31Z

EDIT: never mind.

amueller · 2019-09-26T15:29:53Z

Correction: passing float16 to check_array(dtype=FLOAT_DTYPES) works as expected (result is float16), but passing int16 results in float64, which is somewhat unexpected.
Looks like np.common_type(any int) is float64 so may this is actually "the correct" behavior?

amueller · 2019-09-26T15:39:02Z

Ok I think I'm confused whether np.result_type or np.common_type is the right thing to do. I'm not tending towards np.result_type.

amueller · 2019-09-26T15:51:03Z

ok now this resolves anything pandas-related. It leaves the numpy-casting as it was, so we're still casting int32 to float64, not float32.

jnothman · 2019-09-26T22:47:42Z

CI failures, but I agree with your changes

amueller · 2019-09-27T16:24:22Z

If dtypes contains pandas dtypes then result_type doesn't work, so I think ideally we'd get the corresponding numpy dtype for the pandas dtype here.

amueller · 2019-09-27T16:53:06Z

It's actually basically impossible to correctly sniff out the types right now:
pandas-dev/pandas#22791 but I think the solution I just pushed should be ok for now (better than master lol).

amueller · 2019-09-27T19:29:46Z

green again yay

thomasjpfan · 2019-09-28T16:40:00Z

sklearn/utils/tests/test_validation.py

+    # check that we handle pandas dtypes in a semi-reasonable way
+    # this is actually tricky because we can't really know that this
+    # should be integer ahead of converting it.
+    assert (check_array(pd.DataFrame([pd.Categorical([1, 2, 3])])).dtype


For completeness, should we test for dtype=FLOAT_DTYPES as well?

an check what? That it's float64?

Right above this check we check that a int16 dataframe is casted to float64. It seems reasonable to test that this categorical goes to float64 as well.

amueller · 2019-10-04T15:43:06Z

@jnothman still good?

…r/scikit-learn into respect_pandas_homogeneous_dtype

amueller · 2019-10-08T11:08:40Z

Thanks!

respect dtypes in pandas dataframes if homogeneous

a87ee3e

amueller mentioned this pull request Sep 25, 2019

MaxAbsScaler Upcasts Pandas to float64 #15093

Closed

upcast to smallest common type if possible

dc2ab9d

jnothman approved these changes Sep 25, 2019

View reviewed changes

jnothman closed this Sep 25, 2019

jnothman reopened this Sep 25, 2019

add whatsnew

0ae3203

use result_type instead of find_common_type

cf7d0be

amueller added 2 commits September 27, 2019 12:48

don't try to sniff dtypes if there's pandas dtypes around

0dc1ce8

add test for integers within pd.Categorical

a36dc13

try to appease old pandas

0db2689

thomasjpfan reviewed Sep 28, 2019

View reviewed changes

amueller added 3 commits October 4, 2019 11:43

Merge branch 'master' into respect_pandas_homogeneous_dtype

d17bf69

check for casting categorical dtypes to float

8a3c932

Merge branch 'respect_pandas_homogeneous_dtype' of github.com:amuelle…

8295fb5

…r/scikit-learn into respect_pandas_homogeneous_dtype

thomasjpfan approved these changes Oct 4, 2019

View reviewed changes

jnothman merged commit b906078 into scikit-learn:master Oct 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRG respect dtypes in pandas dataframes if homogeneous #15094

MRG respect dtypes in pandas dataframes if homogeneous #15094

amueller commented Sep 25, 2019

jnothman left a comment

jnothman commented Sep 25, 2019

amueller commented Sep 26, 2019 •

edited

amueller commented Sep 26, 2019 •

edited

amueller commented Sep 26, 2019

amueller commented Sep 26, 2019

jnothman commented Sep 26, 2019

amueller commented Sep 27, 2019

amueller commented Sep 27, 2019

amueller commented Sep 27, 2019

thomasjpfan Sep 28, 2019

amueller Oct 4, 2019

thomasjpfan Oct 4, 2019

amueller Oct 4, 2019

amueller commented Oct 4, 2019

amueller commented Oct 8, 2019

MRG respect dtypes in pandas dataframes if homogeneous #15094

MRG respect dtypes in pandas dataframes if homogeneous #15094

Conversation

amueller commented Sep 25, 2019

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Sep 25, 2019

amueller commented Sep 26, 2019 • edited

amueller commented Sep 26, 2019 • edited

amueller commented Sep 26, 2019

amueller commented Sep 26, 2019

jnothman commented Sep 26, 2019

amueller commented Sep 27, 2019

amueller commented Sep 27, 2019

amueller commented Sep 27, 2019

thomasjpfan Sep 28, 2019

Choose a reason for hiding this comment

amueller Oct 4, 2019

Choose a reason for hiding this comment

thomasjpfan Oct 4, 2019

Choose a reason for hiding this comment

amueller Oct 4, 2019

Choose a reason for hiding this comment

amueller commented Oct 4, 2019

amueller commented Oct 8, 2019

amueller commented Sep 26, 2019 •

edited

amueller commented Sep 26, 2019 •

edited