Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Deprecate null_counts parameter of DataFrame.info #37999

Merged
merged 6 commits into from Nov 25, 2020

Conversation

hongshaoyang
Copy link
Contributor

@hongshaoyang hongshaoyang commented Nov 22, 2020

Background.

Might be worth it actually - I think people are only likely to use .info in an interactive session, rather than as part of production code, so perhaps it is worth going through the deprecation process to make this argument clearer

  • xref DOC: unclear kwarg null_counts #36805 (comment)
  • Deprecate null_counts parameter of DataFrame.info in favor of show_counts and add warning.
  • Add whatsnew and add tests to test for FutureWarning. Move usages of null_counts in tests to show_counts.

Doc preview:

Screenshot 2020-11-22 at 6 19 05 PM

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hongshaoyang for the PR!

We'll want to catch this warning in any tests that hit this.

We'll also need a whatsnew

@hongshaoyang
Copy link
Contributor Author

Thanks @arw2019 , I added tests and a whatsnew entry.

@jreback jreback added the Deprecate Functionality to remove in pandas label Nov 24, 2020
@jreback jreback added this to the 1.2 milestone Nov 24, 2020
@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

looks fine, @jorisvandenbossche ok here?

@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

just merged some changes that will affect this PR, can you merge master and update: #37868

@hongshaoyang hongshaoyang mentioned this pull request Nov 24, 2020
5 tasks
@@ -194,6 +194,11 @@ def check(null_counts, result):
check(True, False)
check(False, False)

with tm.assert_produces_warning(FutureWarning): # GH37999
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tm.assert_produces_warning now has an optional match kwarg to match the warning message.
Could use it here to ensure that the proper warning message is issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines 2648 to 2653
if null_counts is not None and show_counts is None:
warnings.warn(
"null_counts is deprecated. Use show_counts instead",
FutureWarning,
stacklevel=2,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user will not get this warning when setting both arguments at the same time.
For example, df.info(null_counts = True, show_counts=False).
This type of invocation is incorrect and should raise an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment. Otherwise good.

null_counts: Optional[bool] = None,
) -> None:
if null_counts is not None:
if show_counts is not None:
raise ValueError("null_counts used with show_counts")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding Use show_counts.
Otherwise looks good to me.

@@ -480,6 +480,7 @@ Deprecations
- The ``how`` keyword in :meth:`PeriodIndex.astype` is deprecated and will be removed in a future version, use ``index.to_timestamp(how=how)`` instead (:issue:`37982`)
- Deprecated :meth:`Index.asi8` for :class:`Index` subclasses other than :class:`DatetimeIndex`, :class:`TimedeltaIndex`, and :class:`PeriodIndex` (:issue:`37877`)
- The ``inplace`` parameter of :meth:`Categorical.remove_unused_categories` is deprecated and will be removed in a future version (:issue:`37643`)
- The ``null_counts`` parameter of :meth:`DataFrame.info` is deprecated and will be removed in a future version (:issue:`37999`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate here that the keyword is renamed to show_counts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an unrelated test on Travis that fails intermittently

FAILED pandas/tests/plotting/frame/test_frame.py::TestDataFramePlots::test_area_lim

@jreback jreback merged commit 08913f8 into pandas-dev:master Nov 25, 2020
@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

thanks @hongshaoyang very nice

@hongshaoyang hongshaoyang deleted the 36805-deprecate-nullcounts branch November 26, 2020 00:05
@jbrockmendel
Copy link
Member

@hongshaoyang want to make a PR enforcing this deprecation?

HyukjinKwon pushed a commit to apache/spark that referenced this pull request Apr 24, 2023
### What changes were proposed in this pull request?
Remove `null_counts` from info()

### Why are the changes needed?
Pandas 2.0
_Removed deprecated null_counts argument in [DataFrame.info()](https://pandas.pydata.org/pandas-docs/version/2.0/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info). Use show_counts instead ([GH37999](pandas-dev/pandas#37999

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tested local

### Before this PR

`F05.info()`

```
TypeError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 F05.info()

File /opt/spark/python/pyspark/pandas/frame.py:12167, in DataFrame.info(self, verbose, buf, max_cols, null_counts)
  12163     count_func = self.count
  12164     self.count = (  # type: ignore[assignment]
  12165         lambda: count_func()._to_pandas()  # type: ignore[assignment, misc, union-attr]
  12166     )
> 12167     return pd.DataFrame.info(
  12168         self,  # type: ignore[arg-type]
  12169         verbose=verbose,
  12170         buf=buf,
  12171         max_cols=max_cols,
  12172         memory_usage=False,
  12173         null_counts=null_counts,
  12174     )
  12175 finally:
  12176     del self._data

TypeError: DataFrame.info() got an unexpected keyword argument 'null_counts'

```

### With this PR

`F05.info()`

```
<class 'pyspark.pandas.frame.DataFrame'>
Int64Index: 5257 entries, 0 to 5256
Data columns (total 203 columns):
 #    Column                                                               Non-Null Count  Dtype
---   ------                                                               --------------  -----
 0    DOFFIN_APPENDIX:EXPRESSION_OF_INTEREST_URL                           471 non-null    object
(...)

```

Closes #40913 from bjornjorgensen/remove-null_counts.

Authored-by: bjornjorgensen <bjornjorgensen@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: unclear kwarg null_counts
6 participants