Skip to content

Commit

Permalink
correct quantile to handle unsorted quantiles
Browse files Browse the repository at this point in the history
Currently dask.dataframe.core.quantile(df, q) can silently give incorrect results when the list of quantiles, q, is not sorted. For instance quantile(dask.array.arange(100), [0.75, 0.50, 0.25]) gives incorrect results. This patch uses numpy's mergesort to ensure that the quantiles are sorted. Note that with the patch behavior still differs from that in pandas.DataFrame.quantile() where quantiles are calculated correctly while preserving order. While this patch does not duplicate the behavior of pandas because it does not preserve the order of the quantiles, it does at least avoids the silent errors.
  • Loading branch information
gregrf committed Mar 29, 2019
1 parent 2391040 commit 45a8481
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions dask/dataframe/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3967,7 +3967,9 @@ def quantile(df, q):

# pandas uses quantile in [0, 1]
# numpy / everyone else uses [0, 100]
# current implementation needs qs to be sorted, sort in-place to make sure
qs = np.asarray(q) * 100
qs.sort(kind='mergesort')
token = tokenize(df, qs)

if len(qs) == 0:
Expand Down

0 comments on commit 45a8481

Please sign in to comment.