Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add benchmarking script for multilabel metrics #2643

Merged
merged 2 commits into from Dec 9, 2013

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Dec 7, 2013

These are not very important metrics in the context of scikit-learn. Yet whenever metric implementations gets changed, people seem to be interested in how it affects execution time. This makes such reports easy to calculate.

This benchmarks metrics for different multilabel target formats, also giving us an idea of their relative performance. Benchmarks are otherwise parametrised by (number of samples, classes, average density of positive labels), one of which may be plotted against time.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling fa5acb2 on jnothman:bench_mutilabel_metrics into 66a5a4a on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling fa5acb2 on jnothman:bench_mutilabel_metrics into 66a5a4a on scikit-learn:master.

@larsmans
Copy link
Member

larsmans commented Dec 7, 2013

LGTM. Ping @arjoly.

Care to do a quick review of #2642 for me? :)

def benchmark(metrics=[v for k, v in sorted(METRICS.items())],
formats=[v for k, v in sorted(FORMATS.items())],
samples=1000, classes=4, density=.2,
n_times=5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use tuple instead of list for function arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're concerned that they're mutable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is better to use immutable for default arguments.

@arjoly
Copy link
Member

arjoly commented Dec 9, 2013

Except for the minor comments, +1 to merge.

@jnothman
Copy link
Member Author

jnothman commented Dec 9, 2013

@arjoly Okay, yes, it's quick-and-dirty code. I don't think that's a big deal for benchmarks, but I'll get some of the lint out of it.

@arjoly
Copy link
Member

arjoly commented Dec 9, 2013

Thanks !

@arjoly
Copy link
Member

arjoly commented Dec 9, 2013

Your benchmark could be improved by adding dense c-layout and dense fortran-layout.

@jnothman
Copy link
Member Author

jnothman commented Dec 9, 2013

Your benchmark could be improved by adding dense c-layout and dense fortran-layout.

Only if you want to see closely-overlapping curves...

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 84ec4f9 on jnothman:bench_mutilabel_metrics into 66a5a4a on scikit-learn:master.

arjoly added a commit that referenced this pull request Dec 9, 2013
[MRG] Add benchmarking script for multilabel metrics
@arjoly arjoly merged commit 5f57f85 into scikit-learn:master Dec 9, 2013
@arjoly
Copy link
Member

arjoly commented Dec 9, 2013

merged ! Thanks for the bench !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants