-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should -Inf be interpreted as a NULL value #6
Comments
I hijack this issue to push the question a bit further: in vol2bird's output, we find several different "not a number" values for columns such as For example:
In the case of the Any opinion? @peterdesmet, @CeciliaNilsson709 , @adokter ? |
For implementation, we have two options:
I'm in favour of 2. @adokter what nuances should we allow? |
Since we are discussing the standard here, I think the right question is: from a data consumer standpoint, how useful/important is it to distinguish between those different kinds of non-data:
Another way to think of it is: "when we later start building VPTS files from other sources (CAJUN?): will those nuances still make sense or not?" |
For users it is important to distinguish between 0, -Inf, nodata and undetect (and bioRad plotting functions already use this info, not niche usage). Instead of giving options to users, I feel it's important to force / strongly encourage users to be aware of the distinction between zero, nodata and undetect, as it's a potential source of confusion and incorrect interpretation of the data. For NaN there is currently a double usage: both for undetect, and for cases where the algorithm failed to calculate a quantity, e.g. due to incomplete or insufficient data. So you could split those two (but that requires a change to vol2bird) Re: how to code things in csv: -Inf is the equivalent of zero for logarithmic quantities like DBZH and dbz, so you could get rid of it by storing these as zero on a linear scale (although that is quite uncommon). There is no NULL value currently in use, so you could code NaN as NULL if that makes things easier |
So, -Inf should not be interpreted as Null, but as 0 on a logarithmic scale |
Thanks @adokter: I will check exactly how to do it best in CSV/frictionless, but I definitely wants to keep the distinction between 0, -inf, nodata and undetect (I might come back to you if I need more precision about those). About, |
You're right, I was referring to the CSV output - in the hdf5 profiles the nodata and undetect are coded with an explicit value that is stored as an attribute. |
If I'm not mistaken, the conclusion/decision here is that the standard should distinguish between For clarity, I suggest closing this issue and discussing how to achieve that in a different one. |
See:
https://github.com/enram/vpts/blob/c726842e60515d33fdf0f97343858a40f603b20c/table-schemas/vol2bird.json#L192
Note: if we don't do this, it will lead to type validation errors.
The text was updated successfully, but these errors were encountered: