-
Notifications
You must be signed in to change notification settings - Fork 23
use MulTrans instead of a TransCopy and Mul #22
Conversation
Calling MulTrans (which in turn calls BLAS) is as much as 45% faster.
|
Is that with cblas or goblas? Just curious. |
That was with cblas on both sides. |
covariancematrix.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this too? The formula doesn't make sense with negative weights. I'm not sure where we want to document it, but no negative weights should be a general rule of this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative weights can also be interpreted as removing a sample. See for example https://github.com/glycerine/zettalm, which is a regression package. So there are similar cases where a negative weight can be used.
That change made @btracey's comment re negative weights:
Show up as outdated, but the issue still stands. @kortschak, do you have an opinion on whether covariancematrix should accept negative weights? The R equivalent says that it shouldn't be negative or zero. https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cov.wt.html. I think excluding zero is definitely wrong. The https://github.com/glycerine/zettalm package allows negative weights to indicate "removing" a sample, and linear regression is in some ways equivalent to taking the covariance. It will come down to the same check if we accept negative weights or not, because covariance with weights can be made faster by taking the square root of the weights, so we'll still check it to try to take a faster code path. The question is, should it panic on the slow one, or use the current code. |
I have no strong opinion here. |
Zero weights should be allowed. There's no problem with them in the formulas. It's also very possible to generate zero-weighted data, for example by using importance sampling where you have over-guessed the support of the distribution. I can see the argument for negative weights in the zettabyte package, where you have reems of barely accessible data, and need an easy way to manage it. Here though, we have rows of data in memory, so the element you'd like to remove is already right there for you to remove it. Allowing negative weights also complicates other improvements we may want to make, like transforming weights into log space to give better numeric performance on rare events. I could still be convinced, but it doesn't seem like we gain much from negative weights, while we do lose some. |
There is additionally consistency to think about with other packages. I'm pretty sure you cannot have negative weights in a weighted least squares problem, because the optimum line would be one where that point wants to be predicted as poorly as possible. I guess you could have another point at the exact same location with a higher positive weight, but that seems like a recipe for disaster in WLS. |
That's fair. If someone wants to remove a sample, they could just set the weight to zero. And in any case, it will be easier to relax requirements and allow negative weights than it would be to make them more strict. I'll add a check, and then see if taking the sqrt of the weights and then using MulTrans results in a speedup. |
Superseded by a better approach in #27 |
Calling MulTrans (which in turn calls BLAS) is as much as 45% faster.