Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sort by multiple columns parallel #3549

Merged
merged 1 commit into from
Jun 1, 2022
Merged

Conversation

ritchie46
Copy link
Member

closes #3537

@github-actions github-actions bot added python Related to Python Polars rust Related to Rust Polars labels Jun 1, 2022
@ritchie46 ritchie46 merged commit 9654f7d into master Jun 1, 2022
@ritchie46 ritchie46 deleted the par_sort_multiple_cols branch June 1, 2022 07:45
}
} else {
// no floats, so we can compare unchecked
unsafe { a.partial_cmp(b).unwrap_unchecked() }
Copy link
Contributor

@gyscos gyscos Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this means IsFloat should be an unsafe trait (unsafe to implement). Otherwise safe user code can implement it incorrectly on some local type (on which they also implement PartialOrd), leading to a soundness issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. We should make that trait unsafe indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation could also help understand what IsFloat means.

For example, I could define a local type for 2D points on the integer grid (struct Point { x: i32, y: i32 }). This struct could have a partial ordering implementation (a > b iff a.x > b.x && a.y > b.y), returning None in some cases.

Implementing IsFloat sounds easy in that case (it's not a float), but even then we shouldn't be implementing this trait at all here, since there is no easy way to properly behave in this sort function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not meant to be public. It's only for polars physical types.

It is public because its in one of our subcrates, but any user implementation can never be used here. Because we only get here via polars physical types.

But for correctness I will make it an unsafe trait.

Copy link
Contributor

@gyscos gyscos Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually impossible to call this method on user-defined types?

PolarsNumericType can be implemented on local types. I haven't tried yet, but is it really impossible to build a ChunkedArray<LocalType> (or a ChunkedArray<Option<LocalType>>)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe you want to "seal" these traits to make sure they cannot be implemented for local types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes that would indeed be an elegant solution for this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PolarsNumericType can be implemented on local types. I haven't tried yet, but is it really impossible to build a ChunkedArray (or a ChunkedArray<Option>)?

Yes, that would blow up very quickly. You would not get your type into arrow arrays.

@@ -20,6 +20,8 @@ impl IsFloat for u8 {}
impl IsFloat for u16 {}
impl IsFloat for u32 {}
impl IsFloat for u64 {}
impl IsFloat for &str {}
impl<T: IsFloat> IsFloat for Option<T> {}
Copy link
Contributor

@gyscos gyscos Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is dangerous, as users could implement PartialOrd for Option<LocalType> in ways that disagree with this.

For example, a user could have a float wrapper struct LocalType(f32), have impl PartialOrd for LocalType by just forwarding the call to the wrapped float, and correctly implement IsFloat for LocalType. This code would look totally fine under review.

They could also (arguably correctly) implement PartialOrd for Option<LocalType> by returning None if either of the two compared values is None. Here again, nothing looks unusual during an audit.

Still, this would lead to a possible soundness issue if attempting to sort values of type Option<LocalType>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it an unsafe trait. For all types this is intended to be used for this is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.sort() (on multiple columns with many duplicates) is ~3x slower than pandas
2 participants