Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxplot Aggregation #33112

Closed
jtibshirani opened this issue Aug 24, 2018 · 4 comments · Fixed by #51948
Closed

Boxplot Aggregation #33112

jtibshirani opened this issue Aug 24, 2018 · 4 comments · Fixed by #51948

Comments

@jtibshirani
Copy link
Contributor

jtibshirani commented Aug 24, 2018

(Previous title: 'Interquartile Range Aggregation')

The interquartile range is a common robust measure of statistical dispersion. Compared to the standard deviation, the IQR is less sensitive to outliers in the data, with a breakdown point of 0.25. Along with the median, it is often used in creating a box plot, a simple yet common way to summarize data and identify potential outliers.

The IQR is equal to the third minus the first quartile of a dataset, and could be calculated from the output of a percentiles aggregation. Even though it can be easily calculated from quantile information, it may still be useful to provide it as an aggregation for convenience, and to increase its visibility. An alternative option would be to describe the IQR as part of the percentiles documentation.

Compared to the MAD (#26681), the IQR has a lower breakdown point (0.25, compared to 0.5). However, it is simple to calculate and is better equipped to handle skewed (asymmetric) data.

Relates to #26681.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jtibshirani jtibshirani changed the title Inner Quartile Range Aggregation Interquartile Range Aggregation Aug 24, 2018
@pmoust
Copy link
Member

pmoust commented Aug 29, 2018

subscribe

@mattfield
Copy link

sub

@jtibshirani
Copy link
Contributor Author

After more thought, I think it’d be useful to provide a boxplot aggregation that returns essential information for making a boxplot. The output could look something like the following:

{
    "min": 0.1,
    "q1": 0.15,
    "q2": 0.43,
    "q3": 0.5,
    "max": 0.67
} 

The IQR could then be easily calculated through q3 - q1 (we could also just include an iqr entry in the response if we think it’d be convenient).

In a basic boxplot, the whiskers extend to the min and max values. But in another popular style, the whiskers extend to the furthest points within [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR]. Points that are outside this interval are considered outliers and usually displayed on the plot. Perhaps we could start with the basic five-number summary, then look into supporting the style with outliers if there’s interest.

@imotov imotov changed the title Interquartile Range Aggregation Boxplot Aggregation Feb 3, 2020
imotov added a commit to imotov/elasticsearch that referenced this issue Feb 5, 2020
Adds a `boxplot` aggregation that calculates min, max, medium and the first
and the third quartiles of the given data set.

Closes elastic#33112
imotov added a commit that referenced this issue Feb 7, 2020
Adds a `boxplot` aggregation that calculates min, max, medium and the first
and the third quartiles of the given data set.

Closes #33112
imotov added a commit that referenced this issue Feb 11, 2020
Adds a `boxplot` aggregation that calculates min, max, medium and the first
and the third quartiles of the given data set.

Closes #33112
imotov added a commit to imotov/elasticsearch that referenced this issue Feb 12, 2020
Add support for the histogram field type to boxplot aggs.

Closes elastic#52233
Relates to elastic#33112
imotov added a commit that referenced this issue Feb 13, 2020
Add support for the histogram field type to boxplot aggs.

Closes #52233
Relates to #33112
imotov added a commit to imotov/elasticsearch that referenced this issue Feb 13, 2020
Add support for the histogram field type to boxplot aggs.

Closes elastic#52233
Relates to elastic#33112
imotov added a commit that referenced this issue Feb 13, 2020
Add support for the histogram field type to boxplot aggs.

Closes #52233
Relates to #33112
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants