You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, if you have nils in a dataset and call a $rollup function on it, you get NullPointerException, because the rollup functions treat the nils as they are, rather than as missing data (which is what I'd prefer).
Generally, when you do an aggregation (sum, min, max, avg) the missing values are excluded. So, for example, with this dataset and $rollup call:
you'd typically expect a mean weight for :color :green of 17 to be reported; the nil is just dropped.
Is this issue (missing data handling) a priority for 1.5.x or a 2.0.0 issue? There are a lot of different architectural directions we could take on this, ranging from the less intrusive (e.g. just change $rollup) to the much more massive architectural rewrites that would center on making missing data a major part of the scene.
I do think that, as ugly as it can be, missing data/NA is something we're going to have to deal with a lot in Dataset.
The text was updated successfully, but these errors were encountered:
Yes, It would be convenient to have the missing data dropped in the calculation.
I too faced this issue and for all my use cases dropping nil values was ok, and I just filtered out the rows having nil values in datasets for that col.
I guess patching $rollup is not that hard, what is the architectural rewrites that requires a big change, should we have to change many related functions?
The architectural question is how we want to handle NA/missing data in general. Do we treat it as nil? (That would be most idiomatic.) Or something else like the keyword :NA, or a dedicated object (def NA (Object.))? I would vote for using nil-- it's most idiomatic, and it will have edn support (which a dedicated object won't).
Moreover, when we see an empty string in a CSV file, do we automatically interpret that as an NA/nil instead of the empty string? (I would vote yes, with a :fill-empty option on loading that allows the user to interpret empty fields differently-- as empty strings or as zeros, as needed.)
Right now, if you have nils in a dataset and call a $rollup function on it, you get NullPointerException, because the rollup functions treat the nils as they are, rather than as missing data (which is what I'd prefer).
Generally, when you do an aggregation (sum, min, max, avg) the missing values are excluded. So, for example, with this dataset and $rollup call:
you'd typically expect a mean weight for :color :green of 17 to be reported; the nil is just dropped.
Is this issue (missing data handling) a priority for 1.5.x or a 2.0.0 issue? There are a lot of different architectural directions we could take on this, ranging from the less intrusive (e.g. just change $rollup) to the much more massive architectural rewrites that would center on making missing data a major part of the scene.
I do think that, as ugly as it can be, missing data/NA is something we're going to have to deal with a lot in Dataset.
The text was updated successfully, but these errors were encountered: