Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Metrics & Fitness Functions #33

Closed
bartleyn opened this issue Nov 30, 2015 · 4 comments
Closed

Performance Metrics & Fitness Functions #33

bartleyn opened this issue Nov 30, 2015 · 4 comments
Labels

Comments

@bartleyn
Copy link
Contributor

I imagine that it would be useful to use TPOT to find models that optimize alternate performance metrics like precision, recall, etc. As such, I've come up with the following brainstorming questions:

  1. Does having alternate performance metrics/fitness functions make sense for the users?
  2. Does it make sense to add alternate metrics when reporting the best model? If so, which metric do we use as the fitness function?
  3. Since there is native support for multi-class/label classification regular precision, recall, and F1 may not be that useful. Should we just take the average version of scores like these when necessary?

There are plenty of other questions, but I figured this would be a decent place to start. Let me know if I'm totally misguided in proposing this -- I won't profess to be an expert in genetic programming.

@bartleyn bartleyn changed the title Performance Metrics & Fitness Functions Performance Metrics & Fitness Functions (brainstorm) Nov 30, 2015
@rhiever
Copy link
Contributor

rhiever commented Dec 2, 2015

Does having alternate performance metrics/fitness functions make sense for the users?

Yes. I think we should eventually support allowing the user to pass arbitrary scoring functions, similar to how sklearn does it.

Does it make sense to add alternate metrics when reporting the best model? If so, which metric do we use as the fitness function?

As in, use one scoring functions for optimization then a different scoring metric for final model selection? Interesting idea. I'm currently working on a version of TPOT that allows it to optimize on multiple criteria simultaneously, so perhaps that will help in this regard.

Since there is native support for multi-class/label classification regular precision, recall, and F1 may not be that useful. Should we just take the average version of scores like these when necessary?

That's what we currently do with accuracy. I think it makes sense to do the same with other measures.

@bartleyn
Copy link
Contributor Author

bartleyn commented Dec 2, 2015

Yes. I think we should eventually support allowing the user to pass arbitrary scoring functions, similar to how sklearn does it.

Would passing in a keyword for the specific metric work for now?

As in, use one scoring functions for optimization then a different scoring metric for final model selection? Interesting idea. I'm currently working on a version of TPOT that allows it to optimize on multiple criteria simultaneously, so perhaps that will help in this regard.

Yeah, I was thinking that and/or adding additional metrics to the report of the final model.

That's what we currently do with accuracy. I think it makes sense to do the same with other measures.

I'm happy to contribute by adding simple support for F1/Precision/Recall in the same vein as the accuracy. I think it should be relatively straightforward.

@rhiever
Copy link
Contributor

rhiever commented Dec 2, 2015

Yes. I think we should eventually support allowing the user to pass arbitrary scoring functions, similar to how sklearn does it.

Would passing in a keyword for the specific metric work for now?

At first thought, I think it'd be better/easier to simply allow the user to pass an arbitrary scoring function. Otherwise, we have to choose what scoring functions to support, write a special case for each one, etc. Not very scalable from a coding point of view. Of course, we'd have to also clarify that the user should provide a scoring function that's appropriate for their data.

As in, use one scoring functions for optimization then a different scoring metric for final model selection? Interesting idea. I'm currently working on a version of TPOT that allows it to optimize on multiple criteria simultaneously, so perhaps that will help in this regard.

Yeah, I was thinking that and/or adding additional metrics to the report of the final model.

Is there a specific use case for this that you can think of?

That's what we currently do with accuracy. I think it makes sense to do the same with other measures.

I'm happy to contribute by adding simple support for F1/Precision/Recall in the same vein as the accuracy. I think it should be relatively straightforward.

Would you be interested in making an attempt at the implementation that allows the passing of arbitrary scoring functions discussed above? Perhaps we could also provide some example snippets in the docs of how to expand F1/precision/recall/etc. to support multiple classes, which can then be passed as the arbitrary scoring function.

@bartleyn
Copy link
Contributor Author

bartleyn commented Dec 2, 2015

I see your point about making the effort now to support the arbitrary functions. I'll make an attempt at it and provide examples for at least F/P/R.

@bartleyn bartleyn changed the title Performance Metrics & Fitness Functions (brainstorm) Performance Metrics & Fitness Functions Dec 3, 2015
@rhiever rhiever closed this as completed Feb 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants