Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG respect dtypes in pandas dataframes if homogeneous #15094

Merged

Conversation

amueller
Copy link
Member

@amueller amueller commented Sep 25, 2019

Fixes #15093.
Does that deserve/need a whatsnew?

Copy link
Member

@jnothman jnothman left a comment

Yes, might as well have a change log entry

@jnothman jnothman closed this Sep 25, 2019
@jnothman jnothman reopened this Sep 25, 2019
@jnothman
Copy link
Member

@jnothman jnothman commented Sep 25, 2019

Misfire

@amueller
Copy link
Member Author

@amueller amueller commented Sep 26, 2019

EDIT: never mind.

@amueller
Copy link
Member Author

@amueller amueller commented Sep 26, 2019

Correction: passing float16 to check_array(dtype=FLOAT_DTYPES) works as expected (result is float16), but passing int16 results in float64, which is somewhat unexpected.
Looks like np.common_type(any int) is float64 so may this is actually "the correct" behavior?

@amueller
Copy link
Member Author

@amueller amueller commented Sep 26, 2019

Ok I think I'm confused whether np.result_type or np.common_type is the right thing to do. I'm not tending towards np.result_type.

@amueller
Copy link
Member Author

@amueller amueller commented Sep 26, 2019

ok now this resolves anything pandas-related. It leaves the numpy-casting as it was, so we're still casting int32 to float64, not float32.

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 26, 2019

CI failures, but I agree with your changes

@amueller
Copy link
Member Author

@amueller amueller commented Sep 27, 2019

If dtypes contains pandas dtypes then result_type doesn't work, so I think ideally we'd get the corresponding numpy dtype for the pandas dtype here.

@amueller
Copy link
Member Author

@amueller amueller commented Sep 27, 2019

It's actually basically impossible to correctly sniff out the types right now:
pandas-dev/pandas#22791 but I think the solution I just pushed should be ok for now (better than master lol).

@amueller
Copy link
Member Author

@amueller amueller commented Sep 27, 2019

green again yay

# check that we handle pandas dtypes in a semi-reasonable way
# this is actually tricky because we can't really know that this
# should be integer ahead of converting it.
assert (check_array(pd.DataFrame([pd.Categorical([1, 2, 3])])).dtype
Copy link
Member

@thomasjpfan thomasjpfan Sep 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, should we test for dtype=FLOAT_DTYPES as well?

Copy link
Member Author

@amueller amueller Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an check what? That it's float64?

Copy link
Member

@thomasjpfan thomasjpfan Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right above this check we check that a int16 dataframe is casted to float64. It seems reasonable to test that this categorical goes to float64 as well.

Copy link
Member Author

@amueller amueller Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@amueller
Copy link
Member Author

@amueller amueller commented Oct 4, 2019

@jnothman still good?

@jnothman jnothman merged commit b906078 into scikit-learn:master Oct 8, 2019
19 checks passed
@amueller
Copy link
Member Author

@amueller amueller commented Oct 8, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants