You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dt.mean() returns a date and mean() returns a float or null, depending on context, when the input has dtype pl.Date. Median similarly has inconsistent behavior.
Using the analogy for integers and real numbers, the mean and median of dates should be a datetime, similar to how the mean and median of an integers is a float:
Returns a datetime that is at the average (mean or median) time point, and is consistent across uses of mean and median.
Implementing this for the remaining temporal types (in a future issue) will enable the deprecation of .dt.mean and .dt.median, as those will become redundant with series.mean and series.median when the datatype is temporal.
Using the analogy for integers and real numbers, the mean and median of dates should be a datetime, similar to how the mean and median of an integers is a float
I am not sure the analogy is correct. I think it may depend on the use-case. There are cases one would want to get the "typical date" and sometimes the "point in time that is the center of the dataset".
From a mathematical viewpoint, the Frechet mean in the space of dates, is a date. But this formalism may be an overkill, I thus think the desired behavior depends on the use-case.
@johnros it would definitely be simpler implementation-wise to return a date. Nobody was previously opposed to the new implementation, which made more sense to me, and the analogy seems apt. I'm willing to backtrack and return to a simpler date implementation if others agree.
I am certainly no decision maker here. My intuition (lacking a concrete use-case), is that the mean and median of dates should be a date. Curious to hear others' opinions.
Checks
Reproducible example
Log output
Issue description
dt.mean()
returns a date andmean()
returns a float or null, depending on context, when the input has dtypepl.Date
. Median similarly has inconsistent behavior.Using the analogy for integers and real numbers, the mean and median of dates should be a datetime, similar to how the mean and median of an integers is a float:
Expected behavior
Returns a datetime that is at the average (mean or median) time point, and is consistent across uses of
mean
andmedian
.Implementing this for the remaining temporal types (in a future issue) will enable the deprecation of
.dt.mean
and.dt.median
, as those will become redundant withseries.mean
andseries.median
when the datatype is temporal.Installed versions
The text was updated successfully, but these errors were encountered: