You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.
Handling "Edge cases" values is important to get a reasonable design for or it will be difficult to build functions on top of data we are working with.
For example, in the case of a time series with the current model we have: A Vector of *time.Time and a Vector of *float64 for the values of a time series (within a data.Frame). For now it is assumed those vectors are of equal length.
Given this, we have the following logical situations to handle (or prevent):
The time is Null
The float value is Null
The float value is NaN
The float value is Inf- or Inf+
The series has no values (Vectors of Len 0). (in terms of implementation, the vectors could be Null, but I currently thing of that is implementation issue and not a logical one).
(Note: We could maybe have less options by making the particular case of a float not nullable, but then with integers and other types they will need to be pointers to be nullable (In Go), so it is probably easier in the big picture to keep it a pointer so it is like everything else. Unless we want to have mask arrays like arrow and approach nulls differently).
Given this situations, they can generally be handled by (depending on the situation):
Keeping it as (null stats null, nan stays nan, etc)
Dropping the datapoint (from both vectors)
Replacing the value with a constant
Replacing the value with some relative value
Raising an Error
In the context of data processing, there is also when these values occur in regards to handling them. For example, they could come from the data source query, or they could be the result of an operation done on that data.
For GEL I imagine each UI node will have a dropdown that will have some options on how to handle these. This will impact how that node handles these values.
We don't want more options than are necessary as it will just confuse us and the users. In general the options are somewhere on a scale of "strict" handling and "best effort" handling.
A sample of some examples of when these values matter
sorting a series by time or value when either time and/or values are null
taking the reduction of values that contain nan/null values
dividing by zero
doing series arthmatic (joines of two series by time) resulting in null values
resampling resulting if null values (to maybe be filled)
empty series arrising from dropping values
The text was updated successfully, but these errors were encountered:
Handling "Edge cases" values is important to get a reasonable design for or it will be difficult to build functions on top of data we are working with.
For example, in the case of a time series with the current model we have: A Vector of *time.Time and a Vector of *float64 for the values of a time series (within a data.Frame). For now it is assumed those vectors are of equal length.
Given this, we have the following logical situations to handle (or prevent):
(Note: We could maybe have less options by making the particular case of a float not nullable, but then with integers and other types they will need to be pointers to be nullable (In Go), so it is probably easier in the big picture to keep it a pointer so it is like everything else. Unless we want to have mask arrays like arrow and approach nulls differently).
Given this situations, they can generally be handled by (depending on the situation):
In the context of data processing, there is also when these values occur in regards to handling them. For example, they could come from the data source query, or they could be the result of an operation done on that data.
For GEL I imagine each UI node will have a dropdown that will have some options on how to handle these. This will impact how that node handles these values.
We don't want more options than are necessary as it will just confuse us and the users. In general the options are somewhere on a scale of "strict" handling and "best effort" handling.
A sample of some examples of when these values matter
The text was updated successfully, but these errors were encountered: