Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: deprecate default of skipna=False in infer_dtype #24050

Merged
merged 19 commits into from
Jan 4, 2019

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Dec 2, 2018

I know that v.0.22 is generally not counted towards the "3 major releases between warning and deprecation" policy, but since this is a private method, I'm thinking it should be OK?

Would make life easier in a couple of places (especially in cleaning up some of the casting code).

@pep8speaks
Copy link

pep8speaks commented Dec 2, 2018

Hello @h-vetinari! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 04, 2019 at 11:14 Hours UTC

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on specific changes

pandas/conftest.py Outdated Show resolved Hide resolved
pandas/core/algorithms.py Outdated Show resolved Hide resolved
pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
pandas/io/sql.py Outdated Show resolved Hide resolved
pandas/tests/dtypes/test_inference.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Dec 2, 2018

Codecov Report

Merging #24050 into master will decrease coverage by <.01%.
The diff coverage is 60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24050      +/-   ##
==========================================
- Coverage   42.45%   42.44%   -0.01%     
==========================================
  Files         161      161              
  Lines       51561    51561              
==========================================
- Hits        21888    21886       -2     
- Misses      29673    29675       +2
Flag Coverage Δ
#single 42.44% <60%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/integer.py 37.97% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 48.25% <0%> (-0.17%) ⬇️
pandas/core/indexes/base.py 55.93% <100%> (ø) ⬆️
pandas/core/algorithms.py 50.47% <100%> (ø) ⬆️
pandas/core/dtypes/dtypes.py 75.9% <0%> (-0.26%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35b1702...2c9e2ea. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 2, 2018

Codecov Report

Merging #24050 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24050      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52490    52490              
==========================================
+ Hits        48493    48494       +1     
+ Misses       3997     3996       -1
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 43.04% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.26% <100%> (ø) ⬆️
pandas/core/arrays/array_.py 100% <100%> (ø) ⬆️
pandas/util/testing.py 88.09% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6249355...03c46cd. Read the comment docs.

@gfyoung gfyoung added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 2, 2018
@gfyoung
Copy link
Member

gfyoung commented Dec 2, 2018

Hmm...you do have to try a little just to get to the function via Python, so I'm leaning towards ok, but not 100% certain though.

cc @jreback

@jreback
Copy link
Contributor

jreback commented Dec 3, 2018

I don't think this was ever deprecated. So not really possible to actually switch it now. If you want to add a deprecation warning if its not passed (e.g. make the default None) that would be ok.

@h-vetinari
Copy link
Contributor Author

@jreback
The current docstring (from #17066) says "The default of False will be deprecated in a later version of pandas."

Also, it's a method in a module that's explicitly marked as private (big red box at https://pandas.pydata.org/pandas-docs/stable/api.html), so I wonder if deprecation is necessary at all?

pandas/io/stata.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtage
Thanks for taking a look!

pandas/io/stata.py Outdated Show resolved Hide resolved
@bashtage
Copy link
Contributor

bashtage commented Dec 4, 2018

What happens when you have

df = pd.DataFrame([[None],[None]],dtype='object')

IIRC None is an N/A type.

@h-vetinari
Copy link
Contributor Author

@bashtage

What happens when you have
df = pd.DataFrame([[None],[None]],dtype='object')
IIRC None is an N/A type.

This also doesn't change between the two variants:

>>> import pandas as pd
>>> import pandas._libs.lib as lib
>>> df = pd.DataFrame([[None], [None]], dtype=object)
>>>
>>> lib.infer_dtype(df, skipna=True)
'integer'
>>> lib.infer_dtype(df.dropna(), skipna=False)
'integer'

That said, the result is a bit unexpected, and seems to be due the 2d case being handled differently. In any case, both variants (what you commented on, resp. what I'm changing it to) coincide also for 1d here:

>>> lib.infer_dtype(df.values.ravel(), skipna=True)
'empty'
>>> lib.infer_dtype(df.dropna().values.ravel(), skipna=False)
'empty'
>>>
>>> # and for comparison
>>> lib.infer_dtype(df.values.ravel(), skipna=False)
'mixed'

@jorisvandenbossche jorisvandenbossche changed the title DEPR: switch default of skipna-kwarg in infer_dtype to True DEPR: deprecate default of skipna=False in infer_dtype Dec 6, 2018
@h-vetinari
Copy link
Contributor Author

@TomAugspurger

I implemented the DeprecationWarning like you requested - this should be ready.

pandas/_libs/lib.pyx Show resolved Hide resolved
pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/core/algorithms.py Outdated Show resolved Hide resolved
pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved
pandas/core/dtypes/common.py Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Contributor Author

@jreback Merged and green.

pandas/core/indexes/base.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
I think there's a misunderstanding here. Please have a look at my response and TAL.

pandas/core/indexes/base.py Outdated Show resolved Hide resolved
pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
I have very limited capability over the holidays. Will most likely not be able to make a precursor before the cutoff. I could switch everything to skipna=False, and then things would definitely be the same as before.

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Contributor Author

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Jan 3, 2019

@jreback @TomAugspurger

Merged in #24560 and restored the state that was approved by @TomAugspurger (in 1ea21fe), which is all green now. The philosophy of that was to use skipna=True in as many places as possible, not least since the original approach of this PR was to set the default to True right away.

To avoid getting bogged down in discussing the implications before the cut-off, I've added the last commit, which maintains the current behaviour (skipna=False), and only adds the deprecation. The usages within tests are an exception to this, because they are ipso facto correct with a passing CI.

If changing the defaults in the codebase is desired, I can make a follow-up with 1ea21fe.

@jreback jreback added this to the 0.24.0 milestone Jan 4, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lgtm. very minor change. ping on green.

assert result == 'integer'

def test_warn(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to test_deprecation & add the issue number as a comment

@h-vetinari
Copy link
Contributor Author

@jreback
Thanks. Any comment on whether you'd like me to transition the calls from skipna=False to skipna=True where possible (#24050 (comment))?

@jreback jreback merged commit 2f842f2 into pandas-dev:master Jan 4, 2019
@jreback
Copy link
Contributor

jreback commented Jan 4, 2019

thanks @h-vetinari

I think we should only change to True in non-trivial cases, IOW the trivial ones can stay False; I think should be very explicit and deliberate for True. (so a lot of the string like ones this makes sense)

@jorisvandenbossche
Copy link
Member

A bit late to the party here :-), but:

  1. What is actually the rationale for this change?

I went quickly through the PR here but didn't directly find any reasoning apart from "Would make life easier in a couple of places (especially in cleaning up some of the casting code)." in the top post. Was this discussed in another issue?

Also to quote Jeff from his last comment just above: "IOW the trivial ones can stay False; I think should be very explicit and deliberate for True." (my emphasis)
If you want to be explicit in doing True, why then making this the default?

For the string case I think it certainly makes sense to have this as default, but for the integer case you now get a inconsistency between what infer_dtype tells you and what Series does: infer_dtype([1, 2, np.nan] == 'integer vs pd.Series([1, 2, np.nan]) == float64 (of course, if we ever have NA support by default for ints, this argument is no longer valid).

  1. If we keep the warning (not really a strong opinion here, mainly wondering the rationale in the above comment): what do you all think of making this a DeprecationWarning instead of FutureWarning first, given that it is more a functions used by developers / other libraries and less directly by users?

@h-vetinari
Copy link
Contributor Author

@jorisvandenbossche

  1. What is actually the rationale for this change?

Actually, my main reason for deprecating was that it has been "announced" in the docstring since 0.21 (see diff of this PR):

The default of ``False`` will be deprecated in a later version of pandas.

That being said, I think the inference generally makes more sense when nans are ignored (just that historically, skipna was not available until #17066).

  1. If we keep the warning (not really a strong opinion here, mainly wondering the rationale in the above comment): what do you all think of making this a DeprecationWarning instead of FutureWarning first, given that it is more a functions used by developers / other libraries and less directly by users?

I agree that it's not very user-facing. I have no strong preference for the type of warning.

@h-vetinari h-vetinari deleted the infer_skipna branch January 4, 2019 16:58
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants