-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: percent_rank vs cume_dist #1975
Comments
suggestion 2:
maybe related to this file: https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/algos_rank_helper.pxi.in I really don't know if this topic was discussed before, maybe @jreback could give us some information/feedback |
@jreback do you know someone who could help in this discussion? |
I also opened an issue on pandas github: pandas-dev/pandas#28975 |
I don’t know much about the specific feature in question, but I wanted to chime in with a thought about functionalities in various backends. It seems like ibis is meant to provide a unifying abstraction over many backends. The different backends may have varying functionality, but is it really ibis’s scope to fill in capabilities that are missing in a particular backend? Some functionality might just not be supported for a particular backend, or if possible add it to the backend directly. |
hey @scottcode related to your question .. IMHO
In general I think that creates an operation just inside a particular backend directly would be dangerous because another person could add the same operation in a future with different name or any small unnecessary differences ... so it would be inconsistent ..
In my own experience, I have tried to understand how pandas implement that operation .. and if pandas has this operation I have tried to port that to Ibis as similar as possible. related to the current issue ... it seems that percent_rank and cume_dist are both used by backends such as omniscidb, postgresql, mysql, mssql ... |
maybe we can use a similar approach used for |
@xmnlab Can you clarify what might be actionable here? Should we rename |
hi @cpcloud so basically the percent_rank tests uses pandas df.rank(pct=True) .. but this works as SQL CumeDist. So, the easiest way would be to rename the operation to CumeDist. and for PercentRank the test should be implemented manually as described in that old PR. also some backend should change the translation to percent_rank to cume_dist. let me know if you want more information about that. |
We're now correctly implementing |
It seems the Ibis percent_rank operation is in fact the
cume_dist
SQL operation as explained at [1].So how could we implement
percent_rank
SQL operation?Suggestion 1:
percent_rank
could have one optional argument likecume_dist
(default: True)scipy
as a dependence.any thoughts about this issue?
refs:
[1]
ibis/ibis/tests/all/test_window.py
Line 43 in ae71b3a
[2] https://stackoverflow.com/questions/39823470/getting-postgresql-percent-rank-and-scipy-stats-percentileofscore-results-to-mat
extra ref: https://riptutorial.com/sql/example/27456/percent-rank-and-cume-dist
The text was updated successfully, but these errors were encountered: