Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of DescriptiveStatistics is questionable #10

Closed
cdrnet opened this issue Oct 1, 2011 · 4 comments
Closed

Performance of DescriptiveStatistics is questionable #10

cdrnet opened this issue Oct 1, 2011 · 4 comments

Comments

@cdrnet
Copy link
Member

cdrnet commented Oct 1, 2011

Original Report: https://mathnetnumerics.codeplex.com/workitem/5698

I have a VERY large dataset (TBs in size) that I will be wanting to run against. I was testing against only 20 MBs of data and using the DescriptiveStatistics class. I noticed that performance was harsh when i used this class even if just asking for 1 or 2 types of stats. This would run the DescriptiveStatistics class over my aggregate contents in approx 104 seconds. (even just for the MEAN)

I then decided I should give the static class/ienumerable extensions a try, this shocked me in my evaluation of the exact same dataset, 69 milliseconds. (just MEAN)

I would like to suggest an update to the DescriptiveStatistics class. lazy load the statistics. this would increase performance drastically.

Thanks,

@tibel
Copy link
Contributor

tibel commented Jan 30, 2013

Why not change DescriptiveStatistics.Median to:

public double Median
{
    get { return _medianLazy.Value; }
}

readonly Lazy<double> _medianLazy;

And in constructor call:

_medianLazy = new Lazy<double>(() => data.Median());

This way Median is only calculated on first access to Median property and not in constructor. This would make DescriptiveStatistics much faster.

Or you remove Median from DescriptiveStatistics now. It is deprecated already.

@cdrnet
Copy link
Member Author

cdrnet commented Jan 30, 2013

My plan so far is to remove Median on the next major release (together with all other deprecated code) when we can "officially" break compatibility according to semver.

Although the lazy approach would be a good idea in the meantime!

@tibel
Copy link
Contributor

tibel commented Jan 30, 2013

I have send a pull request with the lazy approach. Then I think this issue can be closed.

@cdrnet
Copy link
Member Author

cdrnet commented Jan 30, 2013

Thanks!

@cdrnet cdrnet closed this as completed Jan 30, 2013
@cdrnet cdrnet removed this from the v3.0 milestone Apr 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants