New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default matrix behaviour for null and zeroes #353

Closed
pwalimbe opened this Issue May 5, 2015 · 10 comments

Comments

Projects
None yet
4 participants
@pwalimbe

pwalimbe commented May 5, 2015

Hello
I am using this library extensively and have good results so far. There is one issue where I am not sure which is the best way to handle it. We have certain data matrices and we have to find mean, median, min max on a lot of these rows or columns. is there any way to identify null cells and exclude them from these calculations? Currently they get treated as zero and make the calculations incorrect.

Thanks in advance
Prasanna

@josdejong

This comment has been minimized.

Owner

josdejong commented May 5, 2015

Thanks, good to hear.

Math.js indeed treats null as 0 in calculations. If you have data containing null, you may have to filter the data first. You could do something like:

var data = [...]; 

// filter all non-null values from data. When data is two or multi dimensional, 
// you have to flatten the data first using `math.flatten(data)`
var filtered = math.filter(data, function (x) {
  return x !== null;
});

// calculate the mean of all non-null values:
var mean = math.mean(filtered);
@pwalimbe

This comment has been minimized.

pwalimbe commented May 5, 2015

Got it. Thanks for the clarification and will work accordingly. However, I do feel that many people will face these issues so it might be better to handle null in the calculations if possible or maybe have a setting which can be initiated and then it can work likewise.

@pwalimbe pwalimbe closed this May 5, 2015

@josdejong

This comment has been minimized.

Owner

josdejong commented May 5, 2015

You mean specially for the statistical function? Like passing an option to mean, min, max, etc, to ignore null values? That could make sense. We had a related discussion in the past but I can't find it right now.

@pwalimbe

This comment has been minimized.

pwalimbe commented May 5, 2015

that kind of a solution would be really super - would make it so much more easy to use and get all the results - especially when users compare results with tools like excel and other packages where null get excluded.

@josdejong

This comment has been minimized.

Owner

josdejong commented May 5, 2015

Ok let's keep this issue open then :)

@josdejong josdejong reopened this May 5, 2015

@jonrh

This comment has been minimized.

jonrh commented Feb 24, 2017

Just to weigh in a bit. Came across a similar scenario and found this old issue. My case was when using mean() and null being treated as 0 was throwing calculations off. It would have been very convenient to be able to pass in an optional parameter (or something) to specify if nulls should be skipped or treated as 0.

In my use case I'm preparing arrays to be consumed by Highcharts, a graphing library. I'm aggregating data from a domain where missing values is a perfectly normal thing, there will then simply be gaps in the chart.

Here is an early draft of the function I used to filter out null before applying mean, inspired by Jos' earlier reply (thanks!).

/**
 * Receives an array of integers and returns the average of the numbers rounded
 * to the nearest whole number. Importantly, null values are not treated as 0,
 * but rather simply ignored.
 *
 * @param array of numbers and potentially null values, e.g. [1337, null, 42]
 * @return {number | null}
 */
function arrayMeanExcludeNulls(array) {
  const nullFilteredOut = array.filter(number => {
    return number !== null;
  });

  // We have to add this check because math.mean() throws an error on an empty
  // array. This can happen if the input array only contains null values, e.g:
  // [null, null, null]. In that case the right thing for our case is to return
  // null, there is no numeric value for taking a mean of no values.
  if (nullFilteredOut.length === 0) {
    return null;
  }

  // Take a mean of the remaining numbers and round to a whole number
  return math.chain(nullFilteredOut).mean().round(0).done();
}

So please consider my input here as "would be nice" feature, but I can totally understand that treating null not as 0 might be an edge case in general, and an edge case that is probably easy to program around in most circumstances. Thanks for your time and sorry for bringing up an old issue!

@josdejong

This comment has been minimized.

Owner

josdejong commented Feb 26, 2017

Thanks for your feedback Jon. It's an old issue but still not resolved :) Implementing an option for this for functions like mean would be a good idea I think.

@honestserpent

This comment has been minimized.

honestserpent commented Jun 13, 2017

Came across this as well. I used MatLab in the past and there were both a mean and a nanmean function.

I actually think that it is a bad idea to consider nulls as 0s by default in mathematical calculations. They mean different things.

I would much rather have a null by default as a result of a calculation using an array that contains at least one null value.

I would suggest taking inspiration from MatLab's mean and nanmean functions.

@josdejong josdejong referenced this issue Jun 13, 2017

Closed

Breaking changes for v4 #682

13 of 13 tasks complete
@josdejong

This comment has been minimized.

Owner

josdejong commented Jun 13, 2017

Thanks for your input Marco. Yes we have to change the behavior and not treat null as 0 anymore.

@josdejong

This comment has been minimized.

Owner

josdejong commented Feb 25, 2018

The default behavior of null is changed now in v4: null is not implicitly converted to null anymore. See also #353

@josdejong josdejong closed this Feb 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment