Skip to content
This repository was archived by the owner on Dec 22, 2018. It is now read-only.

Conversation

jonlawlor
Copy link
Contributor

Calling MulTrans (which in turn calls BLAS) is as much as 45% faster.

Calling MulTrans (which in turn calls BLAS) is as much as 45% faster.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.21%) when pulling 3f9a4f9 on covariancematrix-multrans into df53b21 on master.

@jonlawlor
Copy link
Contributor Author

benchmark                                         old ns/op      new ns/op      delta
BenchmarkCovarianceMatrixSmallxSmall              17928          13997          -21.93%
BenchmarkCovarianceMatrixSmallxMedium             20735644       21302799       +2.74%
BenchmarkCovarianceMatrixMediumxSmall             399850         252919         -36.75%
BenchmarkCovarianceMatrixMediumxMedium            320923951      173250045      -46.02%
BenchmarkCovarianceMatrixLargexSmall              56628605       39087916       -30.97%
BenchmarkCovarianceMatrixHugexSmall               6571752633     4337103457     -34.00%
BenchmarkCovarianceMatrixSmallxSmallInPlace       11989          8847           -26.21%
BenchmarkCovarianceMatrixSmallxMediumInPlace      15443508       15929012       +3.14%
BenchmarkCovarianceMatrixMediumxSmallInPlace      370849         238972         -35.56%
BenchmarkCovarianceMatrixMediumxMediumInPlace     306710031      166414344      -45.74%
BenchmarkCovarianceMatrixLargexSmallInPlace       55798314       38903944       -30.28%
BenchmarkCovarianceMatrixHugexSmallInPlace        5709754453     3982521912     -30.25%

@btracey
Copy link
Member

btracey commented Jan 1, 2015

Is that with cblas or goblas? Just curious.

@jonlawlor
Copy link
Contributor Author

That was with cblas on both sides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this too? The formula doesn't make sense with negative weights. I'm not sure where we want to document it, but no negative weights should be a general rule of this package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative weights can also be interpreted as removing a sample. See for example https://github.com/glycerine/zettalm, which is a regression package. So there are similar cases where a negative weight can be used.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.21%) when pulling 4ab4469 on covariancematrix-multrans into df53b21 on master.

@jonlawlor
Copy link
Contributor Author

That change made @btracey's comment re negative weights:

Could we do this too? The formula doesn't make sense with negative weights. I'm not sure where we want to document it, but no negative weights should be a general rule of this package.

Show up as outdated, but the issue still stands. @kortschak, do you have an opinion on whether covariancematrix should accept negative weights? The R equivalent says that it shouldn't be negative or zero. https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cov.wt.html. I think excluding zero is definitely wrong. The https://github.com/glycerine/zettalm package allows negative weights to indicate "removing" a sample, and linear regression is in some ways equivalent to taking the covariance.

It will come down to the same check if we accept negative weights or not, because covariance with weights can be made faster by taking the square root of the weights, so we'll still check it to try to take a faster code path. The question is, should it panic on the slow one, or use the current code.

@kortschak
Copy link
Member

I have no strong opinion here.

@btracey
Copy link
Member

btracey commented Jan 1, 2015

Zero weights should be allowed. There's no problem with them in the formulas. It's also very possible to generate zero-weighted data, for example by using importance sampling where you have over-guessed the support of the distribution. I can see the argument for negative weights in the zettabyte package, where you have reems of barely accessible data, and need an easy way to manage it. Here though, we have rows of data in memory, so the element you'd like to remove is already right there for you to remove it. Allowing negative weights also complicates other improvements we may want to make, like transforming weights into log space to give better numeric performance on rare events. I could still be convinced, but it doesn't seem like we gain much from negative weights, while we do lose some.

@btracey
Copy link
Member

btracey commented Jan 1, 2015

There is additionally consistency to think about with other packages. I'm pretty sure you cannot have negative weights in a weighted least squares problem, because the optimum line would be one where that point wants to be predicted as poorly as possible. I guess you could have another point at the exact same location with a higher positive weight, but that seems like a recipe for disaster in WLS.

@jonlawlor
Copy link
Contributor Author

That's fair. If someone wants to remove a sample, they could just set the weight to zero. And in any case, it will be easier to relax requirements and allow negative weights than it would be to make them more strict. I'll add a check, and then see if taking the sqrt of the weights and then using MulTrans results in a speedup.

@jonlawlor jonlawlor closed this Jan 19, 2015
@jonlawlor
Copy link
Contributor Author

Superseded by a better approach in #27

@jonlawlor jonlawlor deleted the covariancematrix-multrans branch January 19, 2015 04:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants