Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modulo operations #448

Closed
mpusz opened this issue Apr 4, 2023 · 42 comments
Closed

Modulo operations #448

mpusz opened this issue Apr 4, 2023 · 42 comments
Labels
design Design-related discussion

Comments

@mpusz
Copy link
Owner

mpusz commented Apr 4, 2023

What is your gut feeling about modulo operation for quantities? Which of the following should be valid, and what should be the result (value and unit):

  1. 7m % 2m
  2. 7km % 2000m
  3. 1300m % 1km
  4. 7m % 2
  5. 7km % 2000
  6. 1300m % 1*10^3
  7. 1km % 1%

Which of them would you provide in a physical units library?

FYI, std::chrono::duration supports all of them.

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 4, 2023

The result type of 1-3 should be the same as operator/. Then, it can be read as

  • 3m % 2m == 1m, "3 m modulo 2 m is 1 m",
  • 2km % 1999m == 1m, "2 km modulo 1999 m is 1 m", and
  • 1001m % 1km == 1m, "1001 m modulo 1 km is 1 m".

As for the rest, considering #412, I'm not sure if it makes sense to support all of them. But if they are to be supported, they should behave like the built-in operator (https://eel.is/c++draft/expr.mul#4). Let x be a quantity and y a dimensionless quantity. If x / y works, (x / y) * y + x % y should equal x (in value, not in type).

@i-ky
Copy link
Contributor

i-ky commented Apr 4, 2023

If we are talking about x = y * n + z, where n is an integer (and has therefore dimension "one"), then [x] = [y] = [z]. So [x / y] must also have dimension "one" and [x % y] must have dimension [x]. Anything beyond that (4-7) will not make sense from dimensional analysis standpoint. 1-3 are OK, but probably won't worth the effort if you are not planning to provide % 2pi rad for angles. For me it would also make sense to consistently treat all these cases with fmod() and remainder().

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 4, 2023

You're right. Now I remember that std::chrono's operator%s return a duration [1] [2].

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Thanks, taking into account https://eel.is/c++draft/expr.mul#4, I think that we need to support both versions (like std::chrono::duration does) as both of them are correct in value and type for this equation:

  1. 7m / 3m * 3m + 7m % 3m == 7m
  2. 7m / 3 * 3 + 7m % 3 == 7m
    The result of the modulo operation in both cases should be a quantity of the same dimension as the dividend (not necessarily of the same unit).

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

What about the unit of the dimensionless divisor. Should it be constrained to have a unit of:

  1. one
  2. any ratio (i.e. percent, or 3/5)
  3. any integral ratio (i.e. 10^3)

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

I just realized that the below provided by @JohelEGP are wrong:

2km % 1999m == 1m, "2 km modulo 1999 m is 1 m", and
1001m % 1km == 1m, "1001 m modulo 1 km is 1 m".

Again to satisfy https://eel.is/c++draft/expr.mul#4:

  • as 2 km / 1999 m == 0 km/m, then the result of the modulo operation should probably be 2 km
  • as 1001 m / 1 km == 1001 m/km, then the result of the modulo operation should probably be 0 m

Does it have any sense? 😕

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

Aren't they converted to the common unit before the operation? Just like operator/, and all others.
2km % 1999m performs 2000m % 1999m that equals 1m.
1001m % 1km performs 1001m % 1000m that equals 1m.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Multiplication and division never converted to a common unit. The ratio (sometimes really huge) is stored in a type. Also, what is a common unit for 40 * km / (1 * h)?
Common unit is used only in addition and subtraction operators.

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

I shouldn't have referred to operator/, besides / for the equivalence explained at https://eel.is/c++draft/expr.mul#4. operator% behaves much like operator+, actually.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Exactly, but the equation https://eel.is/c++draft/expr.mul#4 should be preserved, right?
Any ideas on how to make it correct and avoid surprises?

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

Where do we stand with the original examples?

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

We can implement anything 😉

Modulo arithmetic, as you described in #448 (comment), makes sense but is inconsistent with the division, which means https://eel.is/c++draft/expr.mul#4 will not hold.

#448 (comment) is consistent with division and will make https://eel.is/c++draft/expr.mul#4 happy, but such results may be really confusing to the users.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

This is what we have on the mainline compared to chrono: https://godbolt.org/z/7q9o8dWr4.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

I also tried on Python's Pint:

>>> from pint import UnitRegistry
>>> ureg = UnitRegistry()
>>> print(4 * ureg.h / (100 * ureg.min))
0.04 hour / minute

They are doing the same as we do (but decay to floating-point).

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

We could consider using a common unit if we are dealing with the division of the same quantities and integer types, but it would :

  • be inconsistent with handling of floating-point types
  • be inconsistent with the division of different quantities
  • be inconsistent with the multiplication (or maybe we should also account for multiplication by inversion?)
  • prevent storing huge ratios of the same quantities

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Any ideas on how to proceed?

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

Right. mp_units' operator/ is more general.

std::chrono only ever deals with dimensionless quantities of unit one.

https://godbolt.org/z/66Gvcs954

Any ideas on how to proceed?

For starters, there seems to be something wrong with operator%. Its unit is always that of the lhs, rather than a common unit.
https://github.com/mpusz/units/blob/b40c7b2fcd8ebb45204ce610ba7b96fb83b80e2d/src/core/include/units/quantity.h#L495-L505

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

For starters, there seems to be something wrong with operator%. Its unit is always that of the lhs, rather than a common unit.

Yes, I noticed that and started to investigate. And the more I thought about that, the more I was confused about what is the expected behavior. Let's agree on some reasonable behavior and I will implement it.

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

It seems you applied #231 (comment), which goes against:

If x / y works, (x / y) * y + x % y should equal x (in value, not in type).
-- #448 (comment)

Although operator% isn't the one in position to know that. When you do z + x % y, the lhs of +, z, may not be (x / y) * y, so % shouldn't dictate the return type expecting that to happen.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Right, but it would be good to have a consistent behavior between division and its reminder for integral types.

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

Right now, it's wrong: https://godbolt.org/z/66Gvcs954.

5h % 120min = 5 h
300min % 2h = 0 min
300min % 120min = 60 min

They should all output 60 min.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

So, should we implement modulo in terms of the common unit and ignore the fact that the division will work otherwise?

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

Really interesting question!

First, I can say what Au does. We support % for integral Rep inputs only. We convert both inputs to their common unit, and this is the unit we use for the result. Note that this implies we only support % for inputs of the same dimension. I wrote up a couple quick tests to confirm this --- one with dimensionless denominator, and one with some other unit --- and verified that yep, we don't allow this.

I had never actually considered a case such as 7m % 3 before. I can see how in some ways it makes sense, mathematically. However, on reflection, I think it takes us into territory similar to #432. If it's nonsense to ask about "the ceiling of my height", then it's probably also nonsense to ask about "the remainder of this length divided by 3" (or any other raw integer). We want to encourage users to think of 7m as, in many ways, a notional abstract quantity, but the result of length % 3 depends on the units: you get 1m if it's expressed in meters, and 1cm if it's expressed in cm, which is a very different result.

All of which is to say: I now think that I accidentally made the right choice in forbidding this operation. 😁


I think the main worry with this approach is that it's inconsistent with 7.6.5. However, I think this is fine. IMO, that part of the standard only applies to raw numeric types. Quantities are different in a variety of ways, and break all kinds of otherwise reasonable assumptions (e.g., T * T -> T). I think we can just add this to the list.


Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I think it would be nice if it reproduced the units as well, but I don't see a pathway to get there with simple rules, and complicated rules would really worry me. So I think this is actually fine as well.


I guess I shouldn't neglect the original quiz. 🙂

  1. 7m % 2m -> 1m
  2. 7km % 2000m -> 1000m
  3. 1300m % 1km -> 300m
  4. 7m % 2 FORBID
  5. 7km % 2000 FORBID
  6. 1300m % 1*10^3 FORBID
  7. 1km % 1% FORBID

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

Right now, it's wrong: https://godbolt.org/z/66Gvcs954.

5h % 120min = 5 h
300min % 2h = 0 min
300min % 120min = 60 min

They should all output 60 min.

I just cooked up this test (in Au). It passes! 😁

TEST(Modulo, ReturnsResultInCommonUnits) {
    EXPECT_THAT(hours(5) % minutes(120), SameTypeAndValue(minutes(60)));
    EXPECT_THAT(minutes(300) % hours(2), SameTypeAndValue(minutes(60)));
    EXPECT_THAT(minutes(300) % minutes(120), SameTypeAndValue(minutes(60)));
}

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

I think the main worry with this approach is that it's inconsistent with 7.6.5. However, I think this is fine. IMO, that part of the standard only applies to raw numeric types. Quantities are different in a variety of ways, and break all kinds of otherwise reasonable assumptions (e.g., T * T -> T). I think we can just add this to the list.

Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I agree with this. It should be sufficient that it holds when n and d are quantities of the same dimension, regardless of the units of operands or results.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I am not sure how it should work if the below are true?

5h / 120min = 0 h/min
5h % 120min = 60 min

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

Seems it's because it returns a quantity of number 5 / 120, with the references of the input quantities divided, which divides their magnitudes. That is different from the old behavior of bringing them to their common unit before dividing.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

The library never brought two units into a common unit before dividing. In general, for quantities of different dimensions it is impossible to do so. The same is done in Pint and JSR 385 (Java library). See the list of drawbacks here: #448 (comment).

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Also, see the #231 discusion.

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

I think that's what it effectively did in 0.6.0: https://godbolt.org/z/c5hjo1hf8.

5h % 120min = 60 min
300min % 2h = 60 min
300min % 120min = 60 min

https://github.com/mpusz/units/blob/3f1eb80aaf090a95e84b67648521b8402c0db53f/src/include/units/quantity.h#L281-L292

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Yes, in 0.6.0 the modulo worked as we agreed above, but the division was always truncating the value and never used a common unit before division.

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

Ah, you're

Right. mp_units' operator/ is more general.

operator% could be limited to quantities of the same unit.

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

For what it's worth, I don't think we need to restrict to the same unit. If we have auto angle = degrees(5678);, I think it's nice to be able to write angle % revolutions(1).

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I am not sure how it should work if the below are true?

5h / 120min = 0 h/min
5h % 120min = 60 min

I think you're right: I goofed here. If the numerator has a larger unit than the denominator --- and thus, a smaller value --- then (n / d) can truncate a nonzero-in-principle result to zero, and * d can't recover the lost information.

That's a really nicely chosen example.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

Yes, this is why the modulo operator was changed in 0.7.0 to complement such a "broken" division according to #231 (comment).

However, as we agreed, the current solution is surprising to the users, to say the least. I could provide the rule like "if both operands are integral and of the same dimension then a division of them should use a common unit" but I am not sure if that is a good idea?

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

However, as we agreed, the current solution is surprising to the users, to say the least. I could provide the rule like "if both operands are integral and of the same dimension then a division of them should use a common unit" but I am not sure if that is a good idea?

I'd be very skeptical of a special-cased rule like that. For multiplication and division generally, users can reason independently about the unit, and the result. So, for 5h / 120min, the value should be 5 / 120, and the unit should be h / min.

I currently think the crux of the issue is this. It is crystal clear how we should handle division (in isolation). And it's crystal clear how we should handle modulus (in isolation). The problem is that when we combine those operations, they don't cooperate the way we expect.

It's true that #231 (comment) draws a compelling line in the sand: with integral rep, the quotient-remainder theorem must hold. The best idea I currently have for making that possible is to provide a separate function for quotient/remainder decomposition. This gives you an API boundary which can be a "trigger point" for that common unit conversion. You might call it like this:

const auto [q, r] = quotient_and_remainder(5h, 120min);

If we did this, I would expect q to be 2, and r to be 60min. Now, granted, q * d + r won't be the same type as n. But if we're going to let users mix units --- a situation that couldn't even occur in the use cases typically contemplated by the Q-R theorem --- then I think that's a reasonable property to sacrifice.

And by using a separate function, we wouldn't interfere with the standard use cases for division or modulus, when either of them is used independently of the other.

@i-ky
Copy link
Contributor

i-ky commented Apr 5, 2023

You might call it like this:

const auto [q, r] = quotient_and_remainder(5h, 120min);

I've already mentioned standard function names (fmod() and remainder()) in #448 (comment) that implement the same thing for float and double. Let's stick to them.

@mpusz
Copy link
Owner Author

mpusz commented Apr 5, 2023

I've already mentioned standard function names (fmod() and remainder()) in #448 (comment) that implement the same thing for float and double. Let's stick to them.

Should we use fmod() for integrals, or maybe an overload of mod() for both representation types should be provided?

@JohelEGP
Copy link
Collaborator

JohelEGP commented Apr 5, 2023

And it's crystal clear how we should handle modulus (in isolation).

How would you define it? I created a concept for it, and defined it like the C++ standard, in terms of /. It works for integral types and std::chrono::duration. But it seems like it never worked for mp_units.

1680722716
1680723270
1680723301

[P1673]'s 11.6 describes problems with number concepts. I suspect most problems go away if the concepts separate the set an operation is conceptually carried on vs. the mapping of the operation's value to the C++ type. Similar to how 1 / 4 is carried out in the set of real numbers, but 0.25 has its fraction part discarded (https://eel.is/c++draft/expr.mul#4.sentence-3) before being mapped to int as 0.

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

And it's crystal clear how we should handle modulus (in isolation).

How would you define it? I created a concept for it, and defined it like the C++ standard, in terms of /. It works for integral types and std::chrono::duration. But it seems like it never worked for mp_units.

Yeah... sorry I glossed over this. But thanks for prompting me to expand!

What I meant here is that modulus is a sensible "pure-quantity" operation: it doesn't depend on the units.

Let's momentarily restrict to positive inputs, just for simplicity. Then we could define a % b to mean "keep subtracting b from a until the latter is smaller than b, and take this final value of a". The resulting quantity is perfectly well defined, no matter what units we choose to express it.

This "subtraction-based" viewpoint also explains why the result is expressed in the common unit of the inputs. (Of course, we wouldn't actually implement it by repeated subtraction, but I think it's a useful way to define it.)

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

You might call it like this:

const auto [q, r] = quotient_and_remainder(5h, 120min);

I've already mentioned standard function names (fmod() and remainder()) in #448 (comment) that implement the same thing for float and double. Let's stick to them.

I like your dimensional analysis in that comment, and I think your conclusions about which cases are valid (and which ones aren't) are spot-on.

As to the naming, though, I don't think fmod and remainder are relevant here, because those are different functions than the one I'm proposing. However, I did a little digging, and I did find one that was much more like this one: std::div(). However, I think this has a design flaw. It was written before structured bindings, so even though the following line would work, you don't actually know which is the quotient and which is the remainder!

const auto [q, r] = div(5h, 120min);

I think that argues in favor of a name like quotient_and_remainder, with a correspondingly structured return type, because it can be robustly used with structured bindings.

@chiphogg
Copy link
Collaborator

chiphogg commented Apr 5, 2023

Should we use fmod() for integrals, or maybe an overload of mod() for both representation types should be provided?

My understanding of the situation on the ground is that fmod is always used for floating point types (never integrals), and % is always used for integral types (never floating point). If we use these names, we should keep it that way.

I could, however, see a case for adding a function with a new name that could work with both floating point types and integral types. It strikes me as unobjectionable, although I wouldn't bother adding it unless people ask for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design-related discussion
Projects
None yet
Development

No branches or pull requests

4 participants