-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behavior of apoc.agg.statistics #534
Comments
Thanks for taking the time to report! I agree, this looks wrong. I'll make sure the relevant team is notified. Sorry for the inconvenience. |
Any news on this? |
Hi! sorry for the slow response. I took a look and APOC is simply using another library - HdrHistogram, I would assume this is a rounding error and that it would be the min value. It is using the getValueAtPercentile() of the DoubleHistogram class: http://hdrhistogram.org/. |
Thank you for the response. Here's an approx. Cypher version of MATCH (u:User)--(s:Specialization {name: "Web Engineer"}) // customize this part
WITH u
ORDER BY u.articlerank
WITH collect(u) AS us
WITH us[0] AS `u-0.0`,
us[size(us) * 1 / 10] AS `u-0.1`,
us[size(us) * 2 / 10] AS `u-0.2`,
us[size(us) * 3 / 10] AS `u-0.3`,
us[size(us) * 4 / 10] AS `u-0.4`,
us[size(us) * 5 / 10] AS `u-0.5`,
us[size(us) * 6 / 10] AS `u-0.6`,
us[size(us) * 7 / 10] AS `u-0.7`,
us[size(us) * 75 / 100] AS `u-0.75`,
us[size(us) * 8 / 10] AS `u-0.8`,
us[size(us) * 85 / 100] AS `u-0.85`,
us[size(us) * 9 / 10] AS `u-0.9`,
us[size(us) * 95 / 100] AS `u-0.95`,
us[size(us) * 97 / 100] AS `u-0.97`,
us[size(us) * 98 / 100] AS `u-0.98`,
us[size(us) * 99 / 100] AS `u-0.99`,
us[size(us) - 1] AS `u-1.0`
WITH {
`0.0`: `u-0.0`.articlerank,
`0.1`: `u-0.1`.articlerank,
`0.2`: `u-0.2`.articlerank,
`0.3`: `u-0.3`.articlerank,
`0.4`: `u-0.4`.articlerank,
`0.5`: `u-0.5`.articlerank,
`0.6`: `u-0.6`.articlerank,
`0.7`: `u-0.7`.articlerank,
`0.75`: `u-0.75`.articlerank,
`0.8`: `u-0.8`.articlerank,
`0.85`: `u-0.85`.articlerank,
`0.9`: `u-0.9`.articlerank,
`0.95`: `u-0.95`.articlerank,
`0.97`: `u-0.97`.articlerank,
`0.98`: `u-0.98`.articlerank,
`0.99`: `u-0.99`.articlerank,
`1.0`: `u-1.0`.articlerank
} AS r
RETURN r It returns roughly the same data (minus rounding). The distribution, in general, is correct. The remaining mistake is:
and
which algebraically are impossible. The error comes from rounding which is somehow different for To workaround, users should manually:
|
apoc.agg.statistics
looks broken. At least with custom percentiles:Pls. correct me if I'm wrong. To my understanding, passing
0.1
percentile is a request to evaluate a threshold between lowest 10% of the population and the rest of the population.0.5
percentile is a median.There's no way in which values for 0.1 and 0.25 could be below (or even equal to) min.
Is it a rounding issue went too far or what? There's a large dispersion, yes, but median & other percentiles should be unaffected by outliers 🤔
The text was updated successfully, but these errors were encountered: