Skip to content

On-line accumulation of rank-based statistics such as quantiles and trimmed means

License

Notifications You must be signed in to change notification settings

silky/haskell-tdigest

 
 

Repository files navigation

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

About

On-line accumulation of rank-based statistics such as quantiles and trimmed means

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Haskell 98.9%
  • Shell 1.1%