Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR or DOCS: pd.value_counts() is public, but not documented. Deprecate or document? #47862

Closed
3 tasks done
Dr-Irv opened this issue Jul 26, 2022 · 12 comments · Fixed by #53493
Closed
3 tasks done

DEPR or DOCS: pd.value_counts() is public, but not documented. Deprecate or document? #47862

Dr-Irv opened this issue Jul 26, 2022 · 12 comments · Fixed by #53493
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Deprecate Functionality to remove in pandas Docs Needs Discussion Requires discussion from core team before further action

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jul 26, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.value_counts([1,2,3,2,1])

Issue Description

pd.value_counts() works, but it does not appear in the documentation.

Two possibilities:

  1. It should not be exported and is not intended to be available.
  2. It should be exported, and needs documentation by modifying pandas/doc/source/reference/general_functions.rst

Uncovered when cleaning up pandas-stubs: pandas-dev/pandas-stubs#170 (comment)

Expected Behavior

Unclear!

Installed Versions

pandas 1.4.3 and associated docs

@Dr-Irv Dr-Irv added Bug Needs Triage Issue that has not been reviewed by a pandas team member Docs Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jul 26, 2022
@mroeschke
Copy link
Member

It appears in the user guide and v0.20 whatsnew

https://pandas.pydata.org/pandas-docs/dev/user_guide/basics.html#value-counts-histogramming-mode
https://pandas.pydata.org/pandas-docs/dev/whatsnew/v0.20.0.html#uint64-support-improved

But IMO +1 to deprecating this as redundant with the Series/DataFrame method which is way more publicized instead.

@mroeschke mroeschke added Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2022
@simonjayhawkins
Copy link
Member

It appears in the user guide and v0.20 whatsnew

https://pandas.pydata.org/pandas-docs/dev/user_guide/basics.html#value-counts-histogramming-mode https://pandas.pydata.org/pandas-docs/dev/whatsnew/v0.20.0.html#uint64-support-improved

looks like added in #1392, so been around since 0.8.0

But IMO +1 to deprecating this as redundant with the Series/DataFrame method which is way more publicized instead.

from #46749 (comment)

As the function only accepts 1d numpy arrays and not any pandas object then I would also be inclined to remove this for the public api in a future pandas version.

Looking some more the accepted types are documented (in the docstring) incorrectly, as the code also explicitly handles EAs and the values = Series(values) means that the top-level method also works with any array-like accepted by the Series constructor. (or types accepted the internal _ensure_arraylike function - not currently typed.)

Issue Description

pd.value_counts() works, but it does not appear in the documentation.

Two possibilities:

  1. It should not be exported and is not intended to be available.
  2. It should be exported, and needs documentation by modifying pandas/doc/source/reference/general_functions.rst

Uncovered when cleaning up pandas-stubs: pandas-dev/pandas-stubs#170 (comment)

Expected Behavior

Unclear!

For now, it was definitely intended to be public and is a long-standing feature (for passing to functions that accept a callable), so should really be documented.

So to answer the question in the title, BUG or DOCS, then not a bug. So maybe best to rename issue and remove the BUG label.

could maybe instead make this issue a discussion on deprecation or just label as a DOC issue.

@Dr-Irv Dr-Irv changed the title BUG or DOCS: Is pd.value_counts() public? If yes, it needs docs. If not, it should not get exported. DEPR or DOCS: pd.value_counts() is public, but not documented. Deprecate or document? Jul 27, 2022
@jreback
Copy link
Contributor

jreback commented Jul 27, 2022

i thought we depreciated and removed this quite a while ago

definitely should be depreciated if it still exists

@simonjayhawkins
Copy link
Member

i thought we depreciated and removed this quite a while ago

I didn't find any open issues about this on a quick search.

definitely should be depreciated if it still exists

sure, but since I didn't find anything to reference, can you perhaps give a reason here.

@jreback
Copy link
Contributor

jreback commented Jul 27, 2022

#13790

we don't expose any other functions like this at top level

+1 to deprecate

@simonjayhawkins
Copy link
Member

OK will milestone 1.5 since I will assume for now that we will deprecate this, and if we don't get any objections, we don't need to update the docs. Also for typing, can be removed from top-level api.

any analogy with pandas.unique? Is this useful as a top-level function or do the same arguments apply?

@simonjayhawkins simonjayhawkins added this to the 1.5 milestone Jul 27, 2022
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 27, 2022

any analogy with pandas.unique? Is this useful as a top-level function or do the same arguments apply?

Same question about pandas.factorize()

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 27, 2022

OK will milestone 1.5 since I will assume for now that we will deprecate this, and if we don't get any objections, we don't need to update the docs. Also for typing, can be removed from top-level api.

Do have to update the docs here that uses pd.value_counts() from the examples.
https://pandas.pydata.org/docs/user_guide/basics.html#value-counts-histogramming-mode

@simonjayhawkins
Copy link
Member

Do have to update the docs here that uses pd.value_counts() from the examples.
https://pandas.pydata.org/docs/user_guide/basics.html#value-counts-histogramming-mode

nice catch. my comment about not updating the docs was misleading.

and need to be sure to remove the mention of the top-level function in the text (there and maybe elsewhere)

The value_counts() Series method and top-level function computes a histogram of a 1D array of values. It can also be used as a function on regular arrays:(https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html#pandas.Series.value_counts) Series method and top-level function computes a histogram of a 1D array of values. It can also be used as a function on regular arrays:

@mroeschke
Copy link
Member

Might be good to discuss pd.unique and pd.factorize in separate issues to keep this discussion scoped to pd.value_counts.

@jreback
Copy link
Contributor

jreback commented Jul 27, 2022

actually looking at this again. maybe we just deprecate and move to a new namespace e.g. pd.api.functions or similar (factorize, unique & value_counts).

@mroeschke mroeschke removed this from the 1.5 milestone Aug 15, 2022
@jbrockmendel
Copy link
Member

value_counts we should definitely deprecate, unique/factorize at least have a use case for ndarrays that we don't allow in Series e.g. numpy string dtypes or dt64/td64 with unsupported resos (the latter currently cast to second but im yak-shaving a change for that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Deprecate Functionality to remove in pandas Docs Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants