Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #15: One Pass Statistics #38

Closed
wants to merge 1 commit into from
Closed

Conversation

manyue
Copy link
Contributor

@manyue manyue commented Jun 20, 2012

Response to Issue #15:
Changed the computations of statistics in the DescriptiveStatistics and Correlation classes to the one pass algorithm based on J. Bannett et al. Numerically Stable, Single Pass, Parallel Statistics Algorithms. Proc. IEEE International Conference on Cluster Computing. (2009). (Although DescriptiveStatistics still needs two passes, one for computing mean, variances, etc. and the other one for computing median, whereas the current version needs three passes, first compute mean, then the other statistics except median, then compute median) The ones in the static Statistics class are already one pass.

Changed the choice of pivot in the private OrderSelect method in the Statistics class. The current choice of pivot performs badly with a partially sorted list, which is a problem when using it on a large dataset with even number of entries. (because in the Median method, OrderSelect is called twice when the number of entries are even and in the first call, it partially sorted the buffer set, which leads to poor performance in the second call)

All these changes have passed the existing unit tests.

@ghost ghost assigned cdrnet Jul 14, 2012
@cdrnet
Copy link
Member

cdrnet commented Jul 14, 2012

Pulled to mainline, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants