Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make .topk() and .value_counts() output same column names #8540

Open
1 task done
NickCrews opened this issue Mar 4, 2024 · 0 comments
Open
1 task done

feat: make .topk() and .value_counts() output same column names #8540

NickCrews opened this issue Mar 4, 2024 · 0 comments
Labels
feature Features or general enhancements

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Mar 4, 2024

Is your feature request related to a problem?

Ideally, these two methods would return the same data, except that .topk() is .order_by()ed and .limit()ed.

But currently the output column names are different (a_count vs Count(a))

import ibis

ibis.options.interactive = True

t = ibis.memtable({"a": [1, 1, 2, None]})
t.a.value_counts()
# ┏━━━━━━━━━┳━━━━━━━━━┓
# ┃ a       ┃ a_count ┃
# ┡━━━━━━━━━╇━━━━━━━━━┩
# │ float64 │ int64   │
# ├─────────┼─────────┤
# │     2.0 │       1 │
# │    NULL │       1 │
# │     1.0 │       2 │
# └─────────┴─────────┘
t.a.topk(10)
# ┏━━━━━━━━━┳━━━━━━━━━━┓
# ┃ a       ┃ Count(a) ┃
# ┡━━━━━━━━━╇━━━━━━━━━━┩
# │ float64 │ int64    │
# ├─────────┼──────────┤
# │     1.0 │        2 │
# │     2.0 │        1 │
# │    NULL │        0 │
# └─────────┴──────────┘

Describe the solution you'd like

I think adding a param name, with the same semantics as lname and rname in join, with a default of {name}_count, would be nice. Or, just making this always be the behavior, with no option, would also be ok. I vote for {name}_count over Count({name}), because it is easier to construct the first one programatically, eg

cols = ["a", "b", "c"]
counts = [t[col].value_counts().filter(_[col + "_count"] > 0)]

Also this is more consistent with our default behavior of adding suffixes in .join()

Anything else we could do to make these two methods have the same API would be great, but I think it doesn't make sense to unify them any more.

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the feature Features or general enhancements label Mar 4, 2024
@NickCrews NickCrews changed the title feat: make .topk() and .value_counts() have same column names feat: make .topk() and .value_counts() output same column names Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

1 participant