Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: deprecate default of skipna=False in infer_dtype #24050

Merged
merged 19 commits into from Jan 4, 2019

Conversation

Projects
None yet
7 participants
@h-vetinari
Copy link
Contributor

commented Dec 2, 2018

  • follows up on the changed in #17066
  • tests adapted / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

I know that v.0.22 is generally not counted towards the "3 major releases between warning and deprecation" policy, but since this is a private method, I'm thinking it should be OK?

Would make life easier in a couple of places (especially in cleaning up some of the casting code).

@pep8speaks

This comment has been minimized.

Copy link

commented Dec 2, 2018

Hello @h-vetinari! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 04, 2019 at 11:14 Hours UTC
@h-vetinari
Copy link
Contributor Author

left a comment

Some comments on specific changes

Show resolved Hide resolved pandas/conftest.py Outdated
Show resolved Hide resolved pandas/core/algorithms.py Outdated
Show resolved Hide resolved pandas/core/dtypes/cast.py Outdated
Show resolved Hide resolved pandas/core/indexes/base.py Outdated
Show resolved Hide resolved pandas/io/sql.py Outdated
Show resolved Hide resolved pandas/tests/dtypes/test_inference.py Outdated
@codecov

This comment has been minimized.

Copy link

commented Dec 2, 2018

Codecov Report

Merging #24050 into master will decrease coverage by <.01%.
The diff coverage is 60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24050      +/-   ##
==========================================
- Coverage   42.45%   42.44%   -0.01%     
==========================================
  Files         161      161              
  Lines       51561    51561              
==========================================
- Hits        21888    21886       -2     
- Misses      29673    29675       +2
Flag Coverage Δ
#single 42.44% <60%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/integer.py 37.97% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 48.25% <0%> (-0.17%) ⬇️
pandas/core/indexes/base.py 55.93% <100%> (ø) ⬆️
pandas/core/algorithms.py 50.47% <100%> (ø) ⬆️
pandas/core/dtypes/dtypes.py 75.9% <0%> (-0.26%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35b1702...2c9e2ea. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 2, 2018

Codecov Report

Merging #24050 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24050      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52490    52490              
==========================================
+ Hits        48493    48494       +1     
+ Misses       3997     3996       -1
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 43.04% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.26% <100%> (ø) ⬆️
pandas/core/arrays/array_.py 100% <100%> (ø) ⬆️
pandas/util/testing.py 88.09% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6249355...03c46cd. Read the comment docs.

@gfyoung gfyoung added the Dtypes label Dec 2, 2018

@gfyoung

This comment has been minimized.

Copy link
Member

commented Dec 2, 2018

Hmm...you do have to try a little just to get to the function via Python, so I'm leaning towards ok, but not 100% certain though.

cc @jreback

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 3, 2018

I don't think this was ever deprecated. So not really possible to actually switch it now. If you want to add a deprecation warning if its not passed (e.g. make the default None) that would be ok.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Dec 3, 2018

@jreback
The current docstring (from #17066) says "The default of False will be deprecated in a later version of pandas."

Also, it's a method in a module that's explicitly marked as private (big red box at https://pandas.pydata.org/pandas-docs/stable/api.html), so I wonder if deprecation is necessary at all?

Show resolved Hide resolved pandas/io/stata.py Outdated
@h-vetinari
Copy link
Contributor Author

left a comment

@bashtage
Thanks for taking a look!

Show resolved Hide resolved pandas/io/stata.py Outdated
@bashtage

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

What happens when you have

df = pd.DataFrame([[None],[None]],dtype='object')

IIRC None is an N/A type.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Dec 4, 2018

@bashtage

What happens when you have
df = pd.DataFrame([[None],[None]],dtype='object')
IIRC None is an N/A type.

This also doesn't change between the two variants:

>>> import pandas as pd
>>> import pandas._libs.lib as lib
>>> df = pd.DataFrame([[None], [None]], dtype=object)
>>>
>>> lib.infer_dtype(df, skipna=True)
'integer'
>>> lib.infer_dtype(df.dropna(), skipna=False)
'integer'

That said, the result is a bit unexpected, and seems to be due the 2d case being handled differently. In any case, both variants (what you commented on, resp. what I'm changing it to) coincide also for 1d here:

>>> lib.infer_dtype(df.values.ravel(), skipna=True)
'empty'
>>> lib.infer_dtype(df.dropna().values.ravel(), skipna=False)
'empty'
>>>
>>> # and for comparison
>>> lib.infer_dtype(df.values.ravel(), skipna=False)
'mixed'

@jorisvandenbossche jorisvandenbossche changed the title DEPR: switch default of skipna-kwarg in infer_dtype to True DEPR: deprecate default of skipna=False in infer_dtype Dec 6, 2018

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Dec 12, 2018

@TomAugspurger

I implemented the DeprecationWarning like you requested - this should be ready.

Show resolved Hide resolved pandas/_libs/lib.pyx
Show resolved Hide resolved pandas/_libs/lib.pyx Outdated
Show resolved Hide resolved pandas/core/algorithms.py Outdated
Show resolved Hide resolved pandas/core/dtypes/cast.py Outdated
Show resolved Hide resolved pandas/core/dtypes/common.py Outdated
Show resolved Hide resolved pandas/core/indexes/base.py Outdated
@h-vetinari
Copy link
Contributor Author

left a comment

Thanks for the review

Show resolved Hide resolved pandas/_libs/lib.pyx Outdated
Show resolved Hide resolved pandas/core/indexes/base.py Outdated
@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2018

@jreback Merged and green.

Show resolved Hide resolved pandas/core/indexes/base.py Outdated
@h-vetinari
Copy link
Contributor Author

left a comment

@jreback
I think there's a misunderstanding here. Please have a look at my response and TAL.

Show resolved Hide resolved pandas/core/indexes/base.py Outdated
Show resolved Hide resolved pandas/_libs/lib.pyx Outdated
Show resolved Hide resolved pandas/_libs/lib.pyx Outdated
Show resolved Hide resolved pandas/core/indexes/base.py Outdated
@h-vetinari
Copy link
Contributor Author

left a comment

@jreback
I have very limited capability over the holidays. Will most likely not be able to make a precursor before the cutoff. I could switch everything to skipna=False, and then things would definitely be the same as before.

Show resolved Hide resolved pandas/_libs/lib.pyx Outdated
Show resolved Hide resolved pandas/core/indexes/base.py Outdated
@h-vetinari

This comment has been minimized.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

@jreback @TomAugspurger

Merged in #24560 and restored the state that was approved by @TomAugspurger (in 1ea21fe), which is all green now. The philosophy of that was to use skipna=True in as many places as possible, not least since the original approach of this PR was to set the default to True right away.

To avoid getting bogged down in discussing the implications before the cut-off, I've added the last commit, which maintains the current behaviour (skipna=False), and only adds the deprecation. The usages within tests are an exception to this, because they are ipso facto correct with a passing CI.

If changing the defaults in the codebase is desired, I can make a follow-up with 1ea21fe.

@jreback jreback added this to the 0.24.0 milestone Jan 4, 2019

@jreback
Copy link
Contributor

left a comment

ok lgtm. very minor change. ping on green.

assert result == 'integer'

def test_warn(self):

This comment has been minimized.

Copy link
@jreback

jreback Jan 4, 2019

Contributor

rename to test_deprecation & add the issue number as a comment

@jsexauer jsexauer referenced this pull request Jan 4, 2019

Open

DEPR: deprecations from prior versions #6581

0 of 94 tasks complete
@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

@jreback
Thanks. Any comment on whether you'd like me to transition the calls from skipna=False to skipna=True where possible (#24050 (comment))?

@jreback

jreback approved these changes Jan 4, 2019

@jreback jreback merged commit 2f842f2 into pandas-dev:master Jan 4, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190104.16 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2019

thanks @h-vetinari

I think we should only change to True in non-trivial cases, IOW the trivial ones can stay False; I think should be very explicit and deliberate for True. (so a lot of the string like ones this makes sense)

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

A bit late to the party here :-), but:

  1. What is actually the rationale for this change?

I went quickly through the PR here but didn't directly find any reasoning apart from "Would make life easier in a couple of places (especially in cleaning up some of the casting code)." in the top post. Was this discussed in another issue?

Also to quote Jeff from his last comment just above: "IOW the trivial ones can stay False; I think should be very explicit and deliberate for True." (my emphasis)
If you want to be explicit in doing True, why then making this the default?

For the string case I think it certainly makes sense to have this as default, but for the integer case you now get a inconsistency between what infer_dtype tells you and what Series does: infer_dtype([1, 2, np.nan] == 'integer vs pd.Series([1, 2, np.nan]) == float64 (of course, if we ever have NA support by default for ints, this argument is no longer valid).

  1. If we keep the warning (not really a strong opinion here, mainly wondering the rationale in the above comment): what do you all think of making this a DeprecationWarning instead of FutureWarning first, given that it is more a functions used by developers / other libraries and less directly by users?
@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

@jorisvandenbossche

  1. What is actually the rationale for this change?

Actually, my main reason for deprecating was that it has been "announced" in the docstring since 0.21 (see diff of this PR):

The default of ``False`` will be deprecated in a later version of pandas.

That being said, I think the inference generally makes more sense when nans are ignored (just that historically, skipna was not available until #17066).

  1. If we keep the warning (not really a strong opinion here, mainly wondering the rationale in the above comment): what do you all think of making this a DeprecationWarning instead of FutureWarning first, given that it is more a functions used by developers / other libraries and less directly by users?

I agree that it's not very user-facing. I have no strong preference for the type of warning.

@h-vetinari h-vetinari deleted the h-vetinari:infer_skipna branch Jan 4, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.