Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Built In Scorers #242

Open
quasiben opened this issue Feb 26, 2019 · 5 comments

Comments

@quasiben
Copy link
Contributor

commented Feb 26, 2019

All sklearn estimators have builtin score method. When performing hyperparamter optimization this builtin method is extremely useful so one doesn't have to also build a custom metric for scoring.

It would be nice if cuml also exposed such a method on all estimators

cc @dantegd

@fondaing fondaing added this to Needs prioritizing in Feature Planning Feb 28, 2019

@fondaing fondaing moved this from Needs prioritizing to Next release in Feature Planning Feb 28, 2019

@fondaing fondaing added this to Issue-Needs prioritizing in v0.6 Release via automation Feb 28, 2019

@dantegd dantegd moved this from Issue-Needs prioritizing to Issue-P1 in v0.6 Release Feb 28, 2019

@fondaing fondaing added this to Issue-Needs prioritizing in v0.7 Release via automation Mar 7, 2019

@fondaing fondaing removed this from Issue-P1 in v0.6 Release Mar 7, 2019

@quasiben

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2019

This is also necessary for building sklearn pipelines

@quasiben

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2019

I wanted to also note that there are two options for tackling this issue:

  1. build scoring functions with cuda/numba
  2. leverage existing scoring function within sklearn and build out the appropriate array_ufunc interfaces.

Both options, I believe, are equally reasonable but also have drawbacks. For example, with option 1., it will be time consuming to build out many scoring algorithms with proper testing (though perhaps we only start with a handful 3-5?) . With option 2., cuml would need a fair amount of support from cudf to implement much of the numpy interface (unary and binary ops are on the way) however, more importantly, cuml would need to build more sklearn comparable methods like get_precision and this is rather time consuming and perhaps undesirable.

@cjnolet

This comment has been minimized.

Copy link
Collaborator

commented Mar 14, 2019

My vote is definitely a +1 on this, as I would eventually like to see all the scoring & metrics be exposed through cuml. Most of these scores involve a massively parallel operation with a simple reduction at the end, which make them perfect for the cuda design.

I would also prefer that these were implemented in the c++ layer and exposed through cython, as we do with all of our algorithms, so that they can be ported easily to other distributed frameworks (eg Spark).

My vote would be start with option #2 and evolve to #1 over time. Starting with #2 would enable us to leverage the path of least resistance for finishing the hyper-param tuning feature for now.

As the metrics & scores become available within cuml, we can swap them out in our hyper-param tuning framework. I have entries in our algorithms tracker to support these.

@oyilmaz-nvidia

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2019

@quasiben quasiben changed the title [FEA] Bultin Scorers [FEA] Built In Scorers Mar 27, 2019

@cjnolet

This comment has been minimized.

Copy link
Collaborator

commented Apr 16, 2019

Set of initial evaluation metrics / scores that are being planned:

  • R2 Score (#429)
  • K-means score (opposite of value of objective)
  • Sillhouette Score
  • Adjusted Rand Score
  • Sillhouette Score
  • Homogeneity Score
  • Basic supervised perf metrics (accuracy, precision, recall, f1, auc, confusion matrix, etc...)

@robocopnixon robocopnixon removed this from Issue-Needs prioritizing in v0.7 Release Apr 23, 2019

@fondaing fondaing added this to Issue-Needs prioritizing in v0.8 Release via automation May 28, 2019

@cjnolet cjnolet added the Tracker label May 30, 2019

@fondaing fondaing moved this from Issue-Needs prioritizing to Issue-P2 in v0.8 Release May 30, 2019

@cjnolet cjnolet added this to Issue-Needs prioritizing in v0.9 Release via automation Jun 21, 2019

@cjnolet cjnolet removed this from Issue-P2 in v0.8 Release Jun 21, 2019

@JohnZed JohnZed moved this from Issue-Needs prioritizing to Defer to post-0.9 in v0.9 Release Aug 8, 2019

@cjnolet cjnolet added this to Issue- Needs Prioritizing in v0.10 Release via automation Aug 9, 2019

@cjnolet cjnolet removed this from Defer to post-0.9 in v0.9 Release Aug 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.