# Some useful math/statistics functions are missing #4

Open
opened this Issue Mar 20, 2019 · 11 comments

Projects
None yet
6 participants
Member

### AlexDaniel commented Mar 20, 2019 • edited

 Some examples of things that are missing: `clamp` or `clip` https://stackoverflow.com/questions/55250700/is-there-a-clamp-method-sub-for-ranges-num-etc-in-perl6 “One final observation about Perl 6 and math: although Perl 6 has all the usual functions from math.h, it could certainly use a few more.” https://www.evanmiller.org/statistical-shortcomings-in-standard-math-libraries.html `double incbet(double a, double b, double x); # Regularized incomplete beta function` `double incbi(double a, double b, double y); # Inverse of incomplete beta integral` `double igam(double a, double x); # Regularized incomplete gamma integral` `double igamc(double a, double x); # Complemented incomplete gamma integral` `double igami(double a, double p); # Inverse of complemented incomplete gamma integral` `double ndtr(double x); # Normal distribution function` `double ndtri(double y); # Inverse of Normal distribution function` `double jv(double v, double x); # Bessel function of non-integer order` `prod`. It's easy to do it yourself but if we have `sum` then why not have `prod` too (for example, numpy has both) `mean` `median` `mode` ? `peak-to-peak` (range) – (numpy example) `standard-deviation` `histogram` and so on…

Member Author

### AlexDaniel commented Mar 20, 2019

 @moritz any thoughts on this?

### japhb commented Mar 20, 2019

 Just as a side note about stats on non-scalar data -- if you need more than one statistic, there's often a large performance advantage to calculating some or all of them at once in a single pass through the data. Certainly for known-immutable data it would be easy to cache the results for some statistics while calculating others, but in the general case it would be useful to have some way to request calculation of several stats (particularly commonly used ones) at once, without having to hand-roll one's own calculations -- the latter being frankly an easy way for non-experts to fall prey to all sorts of numerical stability issues.
Member

### moritz commented Mar 20, 2019

 IMHO these belong into a statistics module. The naming is not obvious, (don't tell me you want a function called `ndtr` by default in the setting in Perl 6, please; and I don't know if `average` or `avg` or `mean` is the best), as are the performance issues that @japhb mentioned. Has anybody written such a module? This is a perfect use case of something that can be prototyped and ironed out outside the core language. If there's a really well-working module, we might consider inclusion in core (though I still think it's out of scope for Perl 6).
Member Author

### AlexDaniel commented Mar 20, 2019

 though I still think it's out of scope for Perl 6 Well, Evan Miller makes a point that these should be part of the standard library. Then there's also: ☞ Math just is. Don’t make people declare it. And also it makes me wonder why something like acosech is in core but a commonly needed `mean` is not. I agree, however, that the first implementation of all that can be done in a module. don't tell me you want a function called ndtr by default in the setting in Perl 6, please Of course not. From the article: The Cephes folks seem to be stingy when it comes to doling out letters in function names, so the C committee may want to add a few characters to the above for clarity
Member

### moritz commented Mar 20, 2019

 ☞ Math just is. Don’t make people declare it. Yet none of us are trying to turn Perl 6 into a fully-featured Computer Algebra System. (Side note, people have, in fact, proposed that in the past, but @TimToady has stopped them). We have to draw a boundary somwhere. For me, the boundary excludes the beta and gamma-related functions. We can argue about `mean`, if you want, but then please be more precise about its semantics (what will it return for the empty list, for example?). Why "mean" as the name (when there is a Geometric Mean as well as the "normal" arithmetic mean), why not "average"? mean/average and standard deviation suffer from the performance penalty of multi-pass calculations, which is why I think that a regular function interface might not be the best. Which is why somebody should first come up with a working design in form of a module.
Member

### lizmat commented Mar 22, 2019

 FWIW, I think a `clamp` method should take a `Range` (or 2 values) as parameter. This would allow it to be used on e.g. a List, a Supply, etc: ``````42.clamp(^10); # 9 (10,20,30).clamp( 25..35 ); # (25,25,30) `````` etc. etc.
Member

### jnthn commented Mar 23, 2019

 There's a wide variety of suggested additions here. I'm in principle not opposed to adding things to `CORE.setting`, but there should be a good argument for those we do add, as well as a lack of strong counter-arguments for not adding them. A general counter-argument is that everyone pays for the things we put into `CORE.setting`: its compiled form is over 14 MB by now, which everyone has to download, store, have mapped into memory, and so forth. While there will be technical measures we can take to make it more compact, and try to further reduce the impact the setting size has on startup time, additions there will never be free. (Some argue "it makes the language bigger and so more to learn", but for things in `CORE.setting` I don't really buy that argument; you don't have to know all of a language's standard library in order to use the language. Or at least, I sure hope not, or I should stop programming. :-)) One consideration that has not yet been mentioned here is whether there is a significant performance benefit to be had from providing the operation as a built-in. If, for example, some platforms provide for doing the operation at CPU level, or there exists a means to implement it more efficiently than would be possible through the composition of other operations, then there's a case for having it in `CORE.setting` so we can JIT it into something good. I've no idea if this is the case for any of those suggested here; research is needed. It's also worth considering how widely used something would be. For example, there's probably a quite strong case for `average`, which for most people means `sub average(@xs) { @xs.sum / @x }`, even if there are many other kind of average. I suspect that's been defined by quite a few folks by now (and it's so short/simple to write, it's not really worth a module dependency). A few assorted notes on various of the proposals: We've tried to avoid abbreviations, so `prod` - if we were to add it - would want to be `product`. `clip` seems a more evocative name to me than `clamp`. Also, I think `(10,20,30).clamp( 25..35 );` should just be done using a `map` over the list, applying it to each element. It's arguably a useful enough thing to have it `CORE.setting`, but it's not an obvious list operation. As for a way forward: I think that it's worth making a more detailed proposal (perhaps with a prototype implementation) for these ones to go into `CORE.setting`: `clip` (or `clamp`, or `bound`, or whatever we end up calling it) `average` (with the semantics that the typical punter expects); I know that if you're doing other statistical things then it's more efficient to do it in a single pass, but my feeling is that - for better or worse - simple averaging is overwhelmingly the most commonly done thing. Even if we did later decide a means to calculate a bunch of statistical things at once belonged in `CORE.setting`, that'd still not take much from the value of a convenient `average` built-in. I'm not sure `product` pulls its weight, especially since `@x.product` is more to type than `[*](@x)`, and unlike `sum`, I don't see any obvious optimization opportunities (we ended up with `.sum` because, if done on a `Range`, you can calculate the answer without iterating the `Range`). However, I'd entertain arguments for why it should be included. For other statistical things, my feeling is "module first".
Member Author

### AlexDaniel commented Mar 23, 2019

 To clarify the situation in this ticket: there was no proposal yet, the original post is simply stating that some functions may be missing. If somebody wants to make a proposal, see @jnthn's comment.

### japhb commented Mar 23, 2019

 jnthn: Aside from the pure performance implications of using builtins, there's also a matter of numerical accuracy and stability; some of these functions may need to be calculated using the processor's extended precision (e.g. 80, 96, or 128 bits) in order to be accurate to one ULP (Unit in the Last Place) of their 64-bit output across their domain. Which is to say that some of them we'd just want to implement as VM ops or NativeCall to a math lib anyway, because we can't efficiently fake that extra precision in NQP space.

### MattOates commented Mar 26, 2019

 This annoyed me enough to create https://github.com/MattOates/Stats/blob/master/lib/Stats.pm6 The Evan Miller article is really great. The point of the functions he defines is its a core set of operations most of the rest of scientific programming is actually based on at a higher level. If this were in the CORE.setting though I think it makes more sense to really limit what gets exported or provided, like mean/median/mode/stddev are common. A core module with `use Maths` for the more science/analysis end of the spectrum feels sensible. Outside of Rakudo Star do we have properly core level modules? Really it would be great if these were optimised implementations from a maths library. Otherwise its hard to see how this might not stunt or slow down development in the ecosystem in this space.
Member Author

### AlexDaniel commented Mar 26, 2019

 Outside of Rakudo Star do we have properly core level modules? Yes, Telemetry comes to mind. This annoyed me enough … @MattOates, by any chance can you come up with a detailed proposal (discussing what should be available to everyone, what needs to be in a potential maths module and what is left for the ecosystem), and later an implementation? I think nobody objects that at least some things need to be added, we just need a knowledgeable person with enough tuits to think this through.

Open