Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mod(a,d) function #43

Merged
merged 4 commits into from Jan 18, 2019
Merged

Add mod(a,d) function #43

merged 4 commits into from Jan 18, 2019

Conversation

ecerulm
Copy link
Contributor

@ecerulm ecerulm commented Dec 10, 2018

No description provided.

@larsga
Copy link
Collaborator

larsga commented Dec 10, 2018

Yeah, I've thought about adding this function, too. Out of interest: what's your use case?

One question is whether it should be an operator (%) instead, although we may want to preserve that for future use.

Another question is: what about negative numbers, both first and second argument?

I see most programming languages define modulus in terms of division, which is kind of strange to me. Modular arithmetic seems like a more intuitive way to define it, which might give some guidance to the issue of negative arguments. If we define it in terms of division it seems we have to handle real numbers as well.

We also need test cases for the first or second argument being null. I see your code handles it, but we need the test cases.

@ecerulm
Copy link
Contributor Author

ecerulm commented Dec 10, 2018

The use case is to use in checks to discard messages. The idea is to have a filter based on mod(.property, 100) == 0 to select only 1 percent of the inputs.

The difference with random() < 0.1 is that the check using modulo is deterministic, that is , given the same input we will get the same check result.

Regarding the definition in terms of division: I must be honest and admit that I just wanted to have the same semantics as the java modulo since it's the underlying language where the mod() is implemented and that simplifies the implementation of mod().

I definitely add the test cases for null and for negative numbers.

I can rethink the definition to make it match exactly with the java definition (which I think it's only defined for integral numbers, not real numbers)

@larsga
Copy link
Collaborator

larsga commented Dec 11, 2018

I like the idea. random() is easy and practical, but a nightmare to test. With this proposal one gets testability at some (probably very slight) risk of bias. I suppose it also requires some suitable key, although it should actually be possible to use the entire JSON input as the key.

Thinking about it I think we'd rather not support real numbers for mod. If there ever emerges a good reason to do we can lift the restriction and allow it.

We also need to have a test for mod(10, 0) => error.

This Wikipedia page has a very good overview. Looks like the behaviour in different programming languages is all over the place because the full definition of modulo in terms of division actually involves a choice between several more precise definitions that give different results in this case. I need to set aside some time to read that properly.

@ecerulm
Copy link
Contributor Author

ecerulm commented Dec 11, 2018

I already have written test for the real number, negative number, and description referencing the Java Remainder operator %.

But I can create another commit to remove the real numbers with a test for error.

I also notice the different behaviour or the modulo/remainder operator in different languages I was surprise to find that

  • java: -10 % 3 == -1
  • java: 10 % -3 == 1
  • java: -10 % -3 == -1
  • python: -10 % 3 == 2
  • python: 10 % -3 == 2
  • python: -10 % -3 == -1

@ecerulm
Copy link
Contributor Author

ecerulm commented Dec 11, 2018

@larsga Please take a look at the changes. I updated the description in functions.md to refer to the Java remainder operator ,mod() will error if arguments are real numbers, negative number still supported with the same semantics as java.

@larsga
Copy link
Collaborator

larsga commented Dec 11, 2018

I dug a little deeper into this and found that Java defines a % b according to the equation (a/b)*b+(a%b) == a.

The amusing thing is that Python appears to define it the same way: x == (x/y)*y + (x%y).

So why do they get different results? Because the division operators are not the same. -5 / 2 == -3 in Python, but it's -2 in Java. I think this is what the diagram on the upper right in the Wikipedia page is trying to say: the round-off in division could be toward the lower number, or toward zero. For positive numbers that's the same thing, but for negative numbers it's not.

I see this can be argued many ways, but one thing I don't like about the Java operator is that the expression x % y is not guaranteed to return a result 0..y, because if x drops below zero it could suddenly be -y..0.

Thinking about this some more, this matter for our use case. Let's say I write mod(hash-int( something ), 100) < 2 and hash-int can return negative numbers, then I'm getting 51% of the data and not 2%.

Thankfully, JSLT doesn't have integer division at the moment, so we haven't committed to one of these two choices, and thus the mod function can be defined independently. I'm leaning toward the Python definition, to avoid people being surprised when % returns a negative number. (If they use a negative number as the second operand then it's far less confusing to get a negative number, imho.)

@larsga
Copy link
Collaborator

larsga commented Dec 11, 2018

I see Java hashCode can return negative numbers, so if hash-int is based on that this will matter.

@larsga
Copy link
Collaborator

larsga commented Jan 18, 2019

@ecerulm Thank you very much for the work so far. I will take over this branch now, just so we can get this finished.

@larsga
Copy link
Collaborator

larsga commented Jan 18, 2019

Given that this function is defined in terms of the remainder, and not in terms of modulo arithmetic, I think it's better to call it remainder.

Java % is the "remainder operator": https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.17.3

Python calls it "modulo operator": https://docs.python.org/2/reference/expressions.html#binary-arithmetic-operations, but the text describes it this way: "The % (modulo) operator yields the remainder from the division of the first argument by the second."

@larsga
Copy link
Collaborator

larsga commented Jan 18, 2019

I want to work through the test cases with negative numbers, to verify that I understand what the correct answers are, and why. The guiding principle is this equation: x == (x/y)*y + (x%y).

If we rewrite that to JSLT we get $x == floor($x/$y)*$y + remainder($x, $y).

Note that ceiling(-2.5) == -2.

remainder(-10,3) should be larger than zero. floor(-10/3.0)*3.0 == -12.0. So the correct answer is ... not 100% intuitively ... 2. Evaluating -10 % 3 in Python gives the same answer.

remainder(10,-3) next, floor(10/-3.0)*-3.0 == 12 gives -2.

Somehow I convinced myself that Python always returns a value 0..y, but that's wrong. As long as this operation really means division remainder we are going to have to live with negative results.

So ... perhaps best to backtrack. What if this were a real modulo function where the definition is mod(x, y) = the integer x modulo y?

@larsga
Copy link
Collaborator

larsga commented Jan 18, 2019

Using this paper I found an approach that I think works. Use the same basic equation, but Euclidean division. That gives a simple definition, a simple implementation, and a beautifully simple rule for understanding the result.

Basically 0 mod 3 == 0, 1 mod 3 == 1, 2 mod 3 == 2, 3 mod 3 == 0. This cycle simply repeats endlessly. So if n mod 3 == 0, then n-1 mod 3 == 2. If n mod 3 == 2 then n+1 mod 3 == 0. Always. Therefore, -1 mod 3 == 2. And we get:

-10 mod 3 = 2
-10 mod -3 = 2
-9 mod 3 = 0
-9 mod -3 = 0
-8 mod 3 = 1
-8 mod -3 = 1
-7 mod 3 = 2
-7 mod -3 = 2
-6 mod 3 = 0
-6 mod -3 = 0
-5 mod 3 = 1
-5 mod -3 = 1
-4 mod 3 = 2
-4 mod -3 = 2
-3 mod 3 = 0
-3 mod -3 = 0
-2 mod 3 = 1
-2 mod -3 = 1
-1 mod 3 = 2
-1 mod -3 = 2
0 mod 3 = 0
0 mod -3 = 0
1 mod 3 = 1
1 mod -3 = 1
2 mod 3 = 2
2 mod -3 = 2
3 mod 3 = 0
3 mod -3 = 0
4 mod 3 = 1
4 mod -3 = 1
5 mod 3 = 2
5 mod -3 = 2
6 mod 3 = 0
6 mod -3 = 0
7 mod 3 = 1
7 mod -3 = 1
8 mod 3 = 2
8 mod -3 = 2
9 mod 3 = 0
9 mod -3 = 0
10 mod 3 = 1
10 mod -3 = 1

Note how the right-hand side is just (0, 1, 2) endlessly repeated in that order.

So ... this is what we'll go for.

@larsga
Copy link
Collaborator

larsga commented Jan 18, 2019

And since it now really is modulo, we can call the function mod after all.

@torarvid
Copy link

Interesting... This seems different from C and Node.js, where I get the feeling that they use truncation instead of the floor function to do the integer division part... (In both languages, -10 % 3 == -1)

@larsga larsga merged commit 3799079 into schibsted:master Jan 18, 2019
@larsga
Copy link
Collaborator

larsga commented Jan 18, 2019

I think that's correct. (I haven't checked the % definition in those languages.)

@ecerulm ecerulm deleted the modulo branch November 29, 2019 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants