fix: Descending search in search_sorted(), and nested dtypes from Python in search_sorted()#21266
fix: Descending search in search_sorted(), and nested dtypes from Python in search_sorted()#21266itamarst wants to merge 51 commits intopola-rs:mainfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #21266 +/- ##
==========================================
+ Coverage 80.73% 81.09% +0.35%
==========================================
Files 1640 1651 +11
Lines 235830 232630 -3200
Branches 2719 2749 +30
==========================================
- Hits 190403 188650 -1753
+ Misses 44784 43324 -1460
- Partials 643 656 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Going to start on fixing the underlying issue, will resubmit when that's done. |
|
Sorry @itamarst for not reviewing this earlier. It would take me quite a bit of time to properly review this now, and to also solve the problem in this PR with nested datatypes in the row encoding. There are some other items that need more urgent attention. |
|
This will be a lot nicer once #21946 is fixed (I am working on it) so closing for now. |
|
I have fixed the issues with nested dtypes, so it's a lot nicer now. |
There was a problem hiding this comment.
Nice that you were able to resolve the row encoding issue.
I am not sure about the signature you are giving search_sorted now. The usage pl.Series([1, 2, 3]).search_sorted([2, 3]) implies that it is (T, List(T)) -> IdxSize instead of what it is which is (T, T) -> IdxSize. I just solved a bunch of other functions with the same problem. If you want to search multiple items you should probably do pl.Series([1, 2, 3]).search_sorted(pl.Series([2, 3])). This would also resolve your comment in the python docstring because the function would be consistent across nested and non-nested types.
Maybe have a quick look at #22149 to see the problem. This one is essentially the reverse of that.
|
Thank you! Will look into this more tomorrow or later and respond/change. But as context (and it's been a while, so this is from memory) I was trying to be backwards compatible with this change. |
Those other functions required an implicit |
|
Could you extract the row encoding fix to a separate PR? |
|
I can, yeah. I tried to do that originally in separate PR but failed to find a test that would reproduce the issue in isolation. |
|
Done, #22557. Once that is merged, will close this and probably open a new PR from new branch for the rest, the git history on this one is too messy, and the merge is not going to be fun. |
|
Opened new PR for this (#22633), now that part has been merged. |
Fixes #21100
In addition:
search_sorted()give correct answers for descending sort; previously it silently gave the wrong answer. For nested dtypes this required a fix to row encoding.pl.ArraySeries, which will probably result in performance improvements in some cases.