Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

% operator is inconsistent with Python / Numpy behaviour #3824

Closed
thomasaarholt opened this issue Jun 27, 2022 · 5 comments
Closed

% operator is inconsistent with Python / Numpy behaviour #3824

thomasaarholt opened this issue Jun 27, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@thomasaarholt
Copy link
Contributor

thomasaarholt commented Jun 27, 2022

What language are you using?

Python

Have you tried latest version of polars?

yes

What version of polars are you using?

0.13.50

What operating system are you using polars on?

Mac OS 12.3.1

What language version are you using

Python 3.10

Describe your bug.

In Rust, (and C, C++, D, C#, F# and Java), the % operator is the remainder. In Python (and Perl and Ruby), % is the modulus.

# data
arr = np.arange(-5, 5)
>>> arr
array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

# polars
>>> pl.DataFrame({"a": arr}).with_column(pl.col("a") % 5).to_series().to_numpy()
array([ 0, -4, -3, -2, -1,  0,  1,  2,  3,  4])

# numpy
>>> arr % 5
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

# python
>>> [i % 5 for i in arr]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

I would expect the python-style behaviour to be the one used by %, with perhaps the introduction of a pl.remainder.

@thomasaarholt thomasaarholt added the bug Something isn't working label Jun 27, 2022
@thomasaarholt
Copy link
Contributor Author

Having read up a bit about modulus on wikipedia, I am now a bit confused about what the strict definition of modulus is. Still, I'd expect the same behaviour of % in polars as in numpy.

@slonik-az
Copy link
Contributor

slonik-az commented Jun 27, 2022

TL;DR.

Whatever you do, make sure that

# Python
assert((n//m)*m + (n%m) == n)

Details:

Modulo '%' is closely related to integer division. Typically, integer division comes first, then the modulo (remainder) is defined from the invariant (n/m)*m + n%m == n. C,Rust,C++ may use different integer division rules (can even be architecture dependent) from Python. Hence the impact on '%'. In any case, polars should maintain the invariant (I have not checked it yet).

@thomasaarholt thomasaarholt changed the title % operator is remainder, should be modulus (in Python) % operator is inconsistent with Python / Numpy behaviour Jun 27, 2022
@rostamn739
Copy link

This seems portable in current Rust

https://doc.rust-lang.org/stable/std/primitive.i32.html#method.rem_euclid

@slonik-az
Copy link
Contributor

Again, it makes no sense to change '%' without simultaneously changing integer division (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.div_euclid).

Euclid algo is much slower than the native hardware implementation which goes contrary to polars stated goals.

@ritchie46
Copy link
Member

I will close this as the implementation language is Rust and we follow that behavior in this respect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants