Add `mod(a,d)` function #43

ecerulm · 2018-12-10T21:05:26Z

No description provided.

larsga · 2018-12-10T21:17:52Z

Yeah, I've thought about adding this function, too. Out of interest: what's your use case?

One question is whether it should be an operator (%) instead, although we may want to preserve that for future use.

Another question is: what about negative numbers, both first and second argument?

I see most programming languages define modulus in terms of division, which is kind of strange to me. Modular arithmetic seems like a more intuitive way to define it, which might give some guidance to the issue of negative arguments. If we define it in terms of division it seems we have to handle real numbers as well.

We also need test cases for the first or second argument being null. I see your code handles it, but we need the test cases.

ecerulm · 2018-12-10T21:32:18Z

The use case is to use in checks to discard messages. The idea is to have a filter based on mod(.property, 100) == 0 to select only 1 percent of the inputs.

The difference with random() < 0.1 is that the check using modulo is deterministic, that is , given the same input we will get the same check result.

Regarding the definition in terms of division: I must be honest and admit that I just wanted to have the same semantics as the java modulo since it's the underlying language where the mod() is implemented and that simplifies the implementation of mod().

I definitely add the test cases for null and for negative numbers.

I can rethink the definition to make it match exactly with the java definition (which I think it's only defined for integral numbers, not real numbers)

larsga · 2018-12-11T08:26:07Z

I like the idea. random() is easy and practical, but a nightmare to test. With this proposal one gets testability at some (probably very slight) risk of bias. I suppose it also requires some suitable key, although it should actually be possible to use the entire JSON input as the key.

Thinking about it I think we'd rather not support real numbers for mod. If there ever emerges a good reason to do we can lift the restriction and allow it.

We also need to have a test for mod(10, 0) => error.

This Wikipedia page has a very good overview. Looks like the behaviour in different programming languages is all over the place because the full definition of modulo in terms of division actually involves a choice between several more precise definitions that give different results in this case. I need to set aside some time to read that properly.

ecerulm · 2018-12-11T08:34:04Z

I already have written test for the real number, negative number, and description referencing the Java Remainder operator %.

But I can create another commit to remove the real numbers with a test for error.

I also notice the different behaviour or the modulo/remainder operator in different languages I was surprise to find that

java: -10 % 3 == -1
java: 10 % -3 == 1
java: -10 % -3 == -1
python: -10 % 3 == 2
python: 10 % -3 == 2
python: -10 % -3 == -1

ecerulm · 2018-12-11T08:49:58Z

@larsga Please take a look at the changes. I updated the description in functions.md to refer to the Java remainder operator ,mod() will error if arguments are real numbers, negative number still supported with the same semantics as java.

larsga · 2018-12-11T12:55:15Z

I dug a little deeper into this and found that Java defines a % b according to the equation (a/b)*b+(a%b) == a.

The amusing thing is that Python appears to define it the same way: x == (x/y)*y + (x%y).

So why do they get different results? Because the division operators are not the same. -5 / 2 == -3 in Python, but it's -2 in Java. I think this is what the diagram on the upper right in the Wikipedia page is trying to say: the round-off in division could be toward the lower number, or toward zero. For positive numbers that's the same thing, but for negative numbers it's not.

I see this can be argued many ways, but one thing I don't like about the Java operator is that the expression x % y is not guaranteed to return a result 0..y, because if x drops below zero it could suddenly be -y..0.

Thinking about this some more, this matter for our use case. Let's say I write mod(hash-int( something ), 100) < 2 and hash-int can return negative numbers, then I'm getting 51% of the data and not 2%.

Thankfully, JSLT doesn't have integer division at the moment, so we haven't committed to one of these two choices, and thus the mod function can be defined independently. I'm leaning toward the Python definition, to avoid people being surprised when % returns a negative number. (If they use a negative number as the second operand then it's far less confusing to get a negative number, imho.)

larsga · 2018-12-11T12:56:56Z

I see Java hashCode can return negative numbers, so if hash-int is based on that this will matter.

larsga · 2019-01-18T11:26:03Z

@ecerulm Thank you very much for the work so far. I will take over this branch now, just so we can get this finished.

larsga · 2019-01-18T11:30:07Z

Given that this function is defined in terms of the remainder, and not in terms of modulo arithmetic, I think it's better to call it remainder.

Java % is the "remainder operator": https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.17.3

Python calls it "modulo operator": https://docs.python.org/2/reference/expressions.html#binary-arithmetic-operations, but the text describes it this way: "The % (modulo) operator yields the remainder from the division of the first argument by the second."

larsga · 2019-01-18T12:23:22Z

I want to work through the test cases with negative numbers, to verify that I understand what the correct answers are, and why. The guiding principle is this equation: x == (x/y)*y + (x%y).

If we rewrite that to JSLT we get $x == floor($x/$y)*$y + remainder($x, $y).

Note that ceiling(-2.5) == -2.

remainder(-10,3) should be larger than zero. floor(-10/3.0)*3.0 == -12.0. So the correct answer is ... not 100% intuitively ... 2. Evaluating -10 % 3 in Python gives the same answer.

remainder(10,-3) next, floor(10/-3.0)*-3.0 == 12 gives -2.

Somehow I convinced myself that Python always returns a value 0..y, but that's wrong. As long as this operation really means division remainder we are going to have to live with negative results.

So ... perhaps best to backtrack. What if this were a real modulo function where the definition is mod(x, y) = the integer x modulo y?

larsga · 2019-01-18T12:57:44Z

Using this paper I found an approach that I think works. Use the same basic equation, but Euclidean division. That gives a simple definition, a simple implementation, and a beautifully simple rule for understanding the result.

Basically 0 mod 3 == 0, 1 mod 3 == 1, 2 mod 3 == 2, 3 mod 3 == 0. This cycle simply repeats endlessly. So if n mod 3 == 0, then n-1 mod 3 == 2. If n mod 3 == 2 then n+1 mod 3 == 0. Always. Therefore, -1 mod 3 == 2. And we get:

-10 mod 3 = 2
-10 mod -3 = 2
-9 mod 3 = 0
-9 mod -3 = 0
-8 mod 3 = 1
-8 mod -3 = 1
-7 mod 3 = 2
-7 mod -3 = 2
-6 mod 3 = 0
-6 mod -3 = 0
-5 mod 3 = 1
-5 mod -3 = 1
-4 mod 3 = 2
-4 mod -3 = 2
-3 mod 3 = 0
-3 mod -3 = 0
-2 mod 3 = 1
-2 mod -3 = 1
-1 mod 3 = 2
-1 mod -3 = 2
0 mod 3 = 0
0 mod -3 = 0
1 mod 3 = 1
1 mod -3 = 1
2 mod 3 = 2
2 mod -3 = 2
3 mod 3 = 0
3 mod -3 = 0
4 mod 3 = 1
4 mod -3 = 1
5 mod 3 = 2
5 mod -3 = 2
6 mod 3 = 0
6 mod -3 = 0
7 mod 3 = 1
7 mod -3 = 1
8 mod 3 = 2
8 mod -3 = 2
9 mod 3 = 0
9 mod -3 = 0
10 mod 3 = 1
10 mod -3 = 1

Note how the right-hand side is just (0, 1, 2) endlessly repeated in that order.

So ... this is what we'll go for.

larsga · 2019-01-18T12:59:36Z

And since it now really is modulo, we can call the function mod after all.

torarvid · 2019-01-18T13:28:07Z

Interesting... This seems different from C and Node.js, where I get the feeling that they use truncation instead of the floor function to do the integer division part... (In both languages, -10 % 3 == -1)

larsga · 2019-01-18T13:31:11Z

I think that's correct. (I haven't checked the % definition in those languages.)

ecerulm added 2 commits December 10, 2018 21:58

Add modulo funtion

49f6379

Documenation and test for modulo

17c5d5a

ecerulm added 2 commits December 11, 2018 09:36

Add real number support for modulo operation

235b607

Modulo function only supports integral values now

3799079

larsga merged commit 3799079 into schibsted:master Jan 18, 2019

ecerulm deleted the modulo branch November 29, 2019 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `mod(a,d)` function #43

Add `mod(a,d)` function #43

ecerulm commented Dec 10, 2018

larsga commented Dec 10, 2018

ecerulm commented Dec 10, 2018

larsga commented Dec 11, 2018

ecerulm commented Dec 11, 2018

ecerulm commented Dec 11, 2018

larsga commented Dec 11, 2018

larsga commented Dec 11, 2018

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

torarvid commented Jan 18, 2019

larsga commented Jan 18, 2019

Add mod(a,d) function #43

Add mod(a,d) function #43

Conversation

ecerulm commented Dec 10, 2018

larsga commented Dec 10, 2018

ecerulm commented Dec 10, 2018

larsga commented Dec 11, 2018

ecerulm commented Dec 11, 2018

ecerulm commented Dec 11, 2018

larsga commented Dec 11, 2018

larsga commented Dec 11, 2018

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

larsga commented Jan 18, 2019

torarvid commented Jan 18, 2019

larsga commented Jan 18, 2019

Add `mod(a,d)` function #43

Add `mod(a,d)` function #43