# josdejong/mathjs

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

# Non uniform null behaviour #830

Closed
opened this Issue Apr 17, 2017 · 24 comments

Projects
None yet
8 participants

### avianey commented Apr 17, 2017

 Mathjs is having strange behaviour regarding null : ``````math.eval('null') `````` returns null ``````math.eval('null+null') `````` returns 0 and so does every function call like `cos(null)` that returns 1 shouldn't the parser return null as long as there's one undefine value in the tree ?
Owner

### josdejong commented Apr 17, 2017

 The reason is that math.js converts `null` into a number with value `0` when needed to do a calculation with it. Same with strings: `math.add("2", 3)` returns 5. I'm always open to reconsider these decisions if there is a good reason. Related topic: #353.
Contributor

### Nekomajin42 commented Apr 17, 2017

 I think there is a good reason why null exists. It means something does not have a value. It's empty. Zero means it has a numeric value, which is zero. In statistic functions it matters if you have an array containing empty elements or zeroes.

### avianey commented Apr 18, 2017 • edited

 In fact I need to identify when an expression has an undefined result... For instance, if x = null then any expression using x should be null... meaning undefined What I get is 1 = cos(x). (I'm using null because undefined break mathjs ;-)) I'll try to math.compile(expr) identify vars from the scope used in expr then see if they are undefined in the scope I'll post here if this approach succeed...

### balagge commented Apr 19, 2017

 I also beleive the general conversion logic that converts `null` to `0` is too universal and causes loss of information if `null` is intended to mean "value not available or not applicable". I use `null` in that sense. The meaning of `null` is also different from `undefined`, because `null` means a deliberate omission of a value (i.e. the value is known to be missing or empty), whereas in the case of `undefined` this may mean "not yet computed, but may be computed later". The current conversion makes `null` a synonym of `0`, so having `null` as a value makes no difference (and, after a computation/function call, this information is lost completely from the result). In any case, I would also support the idea of removing the default universal conversion of `null` to `0`, thus creating a possibility to use `null` meaningfully, (even if that use is quite limited). In my opinion, any computation performed on `null` should also return `null` (as a general rule; of course, exceptions may exist, and should be studied individually). One case where null could mean `0` effectively is sparse matrices, but I am not fully aware of the sparse matrix implementation, so I am not sure if that makes sense or is relevant at all.

### josdejong referenced this issue May 7, 2017

Closed

#### Breaking changes for v4 #682

13 of 13 tasks complete
Owner

### josdejong commented May 7, 2017

 Thanks guys, these are good arguments. Ok let's change the behavior of `null` in the first next breaking release (`v4`), see #682. Thinking about the changes that we need to do: Drop implicit conversion from `null` to `0`, like in `math.add(null, 0)` Change the behavior of stats functions like `min`, `max`, `mean` to simply ignore `null` values, so `mean([2, null, 4])` should return 3 instead of 2. Are there any more behaviors that need to be changed regarding `null`?

### whindes commented Jun 1, 2017 • edited by josdejong

 We need an implementation for parse.set and parse.eval ```let expression = "(calcium_mgdL < 7.2) or (calcium_mmolL > 3.5)" //TRUE because null is evaluated to zero (and we expect FALSE). parse.set("calcium_mgdL", null );//We want something FALSE or ignored. //TRUE parse.set("calcium_mmolL", 4.1 ); //Expected behavior: //The left side of the expression before OR should evaluate to FALSE. //The right side of the expression after OR is evaluated to TRUE. parse.eval(expression) === true (Even though the first part will evaluate 0 < 7.2)``` Current behavior using undefined instead of null fails the whole expression which is something we do not want. The workaround is create the expression to handle null/zero as boolean false. `let workaroundExpression = "(calcium_mgdL and (calcium_mgdL < 7.2)) or (calcium_mmolL and (calcium_mmolL > 3.5))"`
Owner

### josdejong commented Jun 4, 2017

 @whindes I'm not sure whether I understand your point. Do you mean that changing the behavior for null would make it more cumbersome to work with variables that can be both null or a number? (i.e. the proposed breaking change would make your life harder?)
Collaborator

### firepick1 commented Jun 14, 2017 • edited

 Having recently been burned by null and the native Math.min, I have become convinced that null should not be coerced to anything but itself. The native Math.min and Math.max do the following: var x; // variable to be clipped Math.max(Math.min(-1, x), 100) // returns 0 ! yukky :( This "feature" of native Math wasted hours of my time, since I was relying on the assumption that operations with null return null. That is indeed the case with "+" and "-". My brain simply applied induction to arrive at the assumption that max and min would behave in a similar civilized manner. They do not. The expectation I have is that any math package should be consistent in its handling of null. Since mathjs is built on Javascript, we do have the language itself creating the expectation that null is not zero. I use min and max as mathematical operators (like +), not as statistical operators. For the statistical use of min and max with collections of things, I would even prefer to do the following at length since the intent reads clearly: Math.min(lotsOfNumbers.map(n => n == null ? 0 : n)); If we need a statistical min and max that assume 0, I suggest we introduce new functions to handle the expected statistical behavior. For example, we might have minStat and maxStat for null coercion to 0. p.s., Amusingly, Wolfram Alpha parses null as 0. So 1+null is 1 for them. But we are Javascript.
Contributor

### Nekomajin42 commented Jun 14, 2017

 @firepick1 The problem with `null` is that it means a missing value. We don't know if it would be zero, or any other number. It's like calling every stranger Joe. Sure, we can do that, but because we don't know their names, we should use "person unknown" or such. In this case, I think the good solution is what the MS Excel does. It just skips the `null` values in an array, and works only with the rest of the array. If you have a rare use case, when you have to treat `null` as zero, you can always use the built-in `map` function to pre-process your data, and have actual zeros in the array.
Collaborator

### firepick1 commented Jun 14, 2017

 For mathematics, I favor consistency over "good", since "good" is subjective and endlessly debatable. I will admit that with Javascript we may be doomed in any effort to achieve consistency. For example 1+null is null but -1
Owner

### josdejong commented Jun 18, 2017

 I think we all agree that the behavior of `null` should changed from implicitly converting to 0 to keeping it explicitly `null` (unlike the behavior of JavaScript). See also the arguments of @honestserpent : #682 (comment), though lets continue the discussion here in #830. To summarize the thoughts so far: `mean([2, null, 4])` should return `null`. Same with max, min, etc We have a separate function or mode to deal with null by ignoring it. Can be `nanmean` or for passing an option like `mean([...], {'null': 'ignore'})` In case of wanting to treat null as 0, you can always convert null's explicitly via `number(matrix)`. Any help with implementing the new behavior will be welcome. Anyone interested in picking it up?
Contributor

### Nekomajin42 commented Jun 18, 2017

 @josdejong I don't think that every function should return with `null` automatically if some of the inputs are `null`. I think they should filter the `null` values, and do not take them into account. A few examples: `mean([2, null, 4])` should return `3` `min([2, null, 4)]` should return `2` `2 + null` should return `2` (because there is nothing to add) `2 * null` should also return `2` (because there is nothing to multiply by) On the other hand `sqrt(null)`, `pow(null, 2)` or `pow(2, null)` and other functions like these should return `null` of course. (While `sqrt(4, null, 9)` should return `[2, null, 3]`.) Can the behaviour be tied to the `predictable` config option? If it's `true`, all the functions should return `null`. If it's `false`, functions should filter the `null` values and do their jobs with the remaining inputs.
Owner

### josdejong commented Jun 18, 2017

 hm, I'm not sure about the behavior of operators `+` and `*` that you propose. I can follow your reasoning but I think the behavior will become quite unpredictable if some functions ignore `null` whilst others process it and return `null`. Ideally, the behavior should be easy to understand. I find it a strong argument that when some value is `null`, there is something special going on, and the default behavior should be to return `null` in that case and force the user to take action. As for the behavior of statistics functions like `mean([2, null, 4])`, we need two different implementations, and we need to decide upon the default behavior: one implementation that returns `null` when any of the values is `null`. one implementation that ignores `null` values. I think that the most common use case for a function like `mean` is wanting to ignore `null` values, and it may be inconvenient to have to deal with nulls every time again. On the other hand, it shouldn't be that hard to use the alternative `nanmean` or `nullmean` or something like that.
Contributor

### Nekomajin42 commented Jun 18, 2017

 If you think about it, the way I see it, each function treats `null` the same way. They all ignore it. If `null` is the only input, they return `null`, because there is nothing to work on. If there are multiple inputs, there are two cases: If a function works on an array element wise, then `null` elements return `null`, and other elements work normal. If a function works with the whole array, then `null` values are filtered. About the addition and multiplication, `prod[2, null, 3])` should return `6`. It is the equivalent of `2*null*3`, This is how MS Excel and similar spreadsheet programs work. I don't know about Mathlab and other software, but it seems quite logical to me. About the implementations, I think it would be best to handle both cases with one function, and a global config option or an optional parameter. Multiple implementations with different names would be messy. Either way, in my opinion, the default behaviour should be the one that ignores `null`. I think it, because it is the simple case. Don't mind `null`, carry on. In the other case, if we mind `null`, there can be different solutions to handle it. Keep it as is and get `null` in return, or treat it as `0` or any other default value, and do the given calculation with it.
Owner

### josdejong commented Jun 18, 2017

 That makes sense indeed, it's a consistent behavior. Let me give it some more thought.

### honestserpent commented Jun 19, 2017 • edited

 @Nekomajin42 @josdejong I think this is just a different way of approaching the problem. Of course both of them are ok and will handle `null` value properly. But here are my 2 cents (I am gonna use the `mean` function as example, but this applies to every other function -`median`, `sum`, etc-). I think that implementing the default `mean` function behavior to ignore and remove `null` may bring developers to not be able to spot and identify potential bugs in their code. For example: in my case, I have to calculate the `mean` of an array of `not-null` values. I need to know if some values was erroneously set to `null`. If the default behaviour of `mean` is to just ignore `null` values, I may be brought to think everything is fine, while insted there is a bug that has to be found and fixed, which may be found only after a lot of time, because it would not result in an error. Of course I can always check before running a `mean`, but this means a lot of more code everytime I use the `mean` function. Instead, if the `mean` function would not ignore `null` values by default, a potential bug would be instantly found. I think that if the developer wants to ignore null values, he should explicitly agree on that, either by using a separate function (eg. `nanmean`) or by providing options `mean([...], {'null': 'ignore'})`. I think this behavior should apply to every function, `sum` and `prod` included. Summarizing, here what I would expect to see (substitute nan-versions of functions with options if you want): `mean([1,4, null])` should return `null` `nanmean([1,4, null])` should return `2.5` `sum([1,4, null])` should return `null` `nansum([1,4, null])` should return `5` other functions I also would like to add that Matlab, Python (numpy) and R, 3 of the most used programming languages for data science, all deal with `nan` values by treating them as `nan `and not by ignoring them by default.
Owner

### josdejong commented Jun 23, 2017

 Thanks for sharing your ideas @honestserpent . This echos the opinion of others: `null` is a special value, and when encountering `null`, something is going on, and the user should explicitly handle it. Indeed similar to NaN.

### sfescape commented Nov 9, 2017 • edited

 I would also suggest allowing a default value for null (or undefined) to be explicitly set, perhaps in the scope variable.
Owner

### josdejong commented Nov 13, 2017

 Thanks for your input @sfescape . Some functions allow passing your own default value, like subset, is that what you mean?

### sfescape commented Nov 13, 2017

 Yes, exactly.

### honestserpent commented Nov 14, 2017

 Do you guys think this may be a good first contribution, or this would be too hard? Maybe I would like to take it and try to develop this as my first GitHub contribution ever
Owner

### josdejong commented Nov 15, 2017

 @sfescape ok that's indeed also my preferred solution, passing along with the function itself (not something globally configurable) @honestserpent great that you want to contribute! I think this topic (#830) is not te best to start with, because it requires changing code at quite some different places and levels in the project. An easy one to start with is for example #964. Or you could go deep into maths and try to improve the performance of determinant, see #908 . There are plenty of issues that you can pick up, is there any of your particular interest?

### josdejong added a commit that referenced this issue Jan 23, 2018

``` Remove support for implicit conversion of `null` to `0`. See #830. ```
``` 4469f6b ```
Owner

### josdejong commented Jan 24, 2018

 In the `v4` branch I've removed implicit conversion of `null` to `0`, and I've improved the error messages of the statistics function a bit. This means that the statistics functions simply will break when you pass a `null`, forcing you to explicitly solve the issue yourself by either converting `null` to `0`, or by filtering out zero's. For example: ```math.mean([1,2, null, 3]) //TypeError: Cannot calculate mean, unexpected type of argument (type: null, value: null) // Strategy 1: convert null to 0 math.mean(math.number([1,2, null, 3])) // 1.5 // Strategy 2: filter null values math.mean(math.filter([1,2, null, 3], x => x !== null)) // 2```
Owner

### josdejong commented Feb 25, 2018

 mathjs v4 is released today, containing the improved handling of `null`.

### josdejong closed this Feb 25, 2018

to join this conversation on GitHub. Already have an account? Sign in to comment