Clip the range of histograms when there are outliers#1157
Conversation
|
I'm sure the heuristic used to choose the range can be improved so if anybody gets the chance to try it on a couple of datasets and see how it looks that would be very helpful |
to avoid clipping very close to the actual min or max
|
|
||
| def _get_range(values, frac=0.2, factor=3.0): | ||
| min_value, low_p, high_p, max_value = np.percentile( | ||
| values, [0, frac * 100.0, (1.0 - frac) * 100.0, 100.0] |
There was a problem hiding this comment.
np.quantile to simplify this line?
| delta = high_p - low_p | ||
| if not delta: | ||
| return min_value, max_value | ||
| margin = factor * delta |
There was a problem hiding this comment.
Interesting heuristic, how did you get these default parameters?
There was a problem hiding this comment.
from the random number generator in my head 😅 as mentioned I think the heuristic can definitely be refined or replaced with somehting else
There was a problem hiding this comment.
I thought of using the inter-quantile range because that's what box plots do, but I think we could read more about simple outlier detection methods
|
thanks for the review @Vincent-Maladiere :) |




fixes #1155
this limits the range of data shown in some histograms to avoid having all the data in one bin and seeing no details of the distribution