Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added roc_auc as a fit_choice #5

Merged
merged 1 commit into from
May 3, 2017
Merged

Added roc_auc as a fit_choice #5

merged 1 commit into from
May 3, 2017

Conversation

erp12
Copy link
Contributor

@erp12 erp12 commented May 3, 2017

Tested on a single sample dataset and it seams to work well.

Currently, it is not compatible with any lexicase selection variants because there is no function that returns a vector of roc auc values. I am not sure what such a function would look like, because it is impossible to compute the roc auc of a single prediction.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 77.498% when pulling 96f0ca5 on massmutual:roc_auc into c4c50d0 on lacava:master.

@lacava lacava merged commit ac5cc79 into lacava:master May 3, 2017
@lacava
Copy link
Owner

lacava commented May 3, 2017

we should think more about how each sample contributes to roc_auc and see if we can write a "vectorized" function for lexicase.

@erp12
Copy link
Contributor Author

erp12 commented May 4, 2017

I don't have any ideas yet for the typical 1 error per sample, but one way to get a vector of 4 errors would be to report the entire confusion matrix.

Something like:
[1 - true_pos_rate, false_pos_rate, 1 - true_neg_rate, false_neg_rate]

I am not sure if that is a good idea... I have only seen lexicase used where there is 1 error per sample.

@lacava
Copy link
Owner

lacava commented May 4, 2017

i don't think 4 errors would work since it is not enough samples for lexicase to perform well. however, with roc_auc you have an area calculation of a set of values form the roc. so could you take the raw roc values and interpret them as a set of fitness values to be maximized?

@erp12
Copy link
Contributor Author

erp12 commented May 4, 2017

I am not sure what you mean by raw roc values. Would that just be the true positive rate at a particular value for false positive rate? Would that necessarily be 1 per sample?

@lacava
Copy link
Owner

lacava commented May 4, 2017

yes, i guess it would be the true positive rate as the threshold increases.

there is no hard requirement in lexicase that there be 1 case per sample. That's just normally how it's mapped. the important thing is, roughly, that are many cases (more than, say, 15). I'm thinking something like

def roc_fit(y_true,y_pred):
    fpr, tpr, _ = roc_curve(y_true,y_pred)
    return 1-tpr

could work for your purposes as a 'vectorized' fitness function.

@erp12
Copy link
Contributor Author

erp12 commented May 4, 2017

In your roc_fit function, is y_true and y_pred entire arrays of labels and predictions, or just a single label and a single sample?

@lacava
Copy link
Owner

lacava commented May 4, 2017

entire arrays. y_pred is the feature output. i should clarify that i think this would make lexicase selection work but i'm not sure it's the best way to formulate the problem. i'm also unclear on how ROC works when you have an arbitrarily scaled floating point vector for y_pred, which could be the case with a program's output in FEW.

@erp12
Copy link
Contributor Author

erp12 commented May 5, 2017

When using a logistic regression classifier y_pred should never be outside the 0 to 1 range, right?
Perhaps I am not understanding how the programs in the population are evaluated as transformations.

I assumed that the model set as the ml param was trained and their (cross-validated?) performance using the fit_choice metric is considered the fitness of the transformations. Something like that?

@lacava
Copy link
Owner

lacava commented May 5, 2017

oh! no. the feature transformations each get their own fitness to determine which survive. This is a separate step from evaluating the performance of the ML method with which FEW is paired. Currently, that scoring function is specific to the ML.

@erp12
Copy link
Contributor Author

erp12 commented May 5, 2017

So is the fit_choice basically used to get an error for each transformation by predicting the output based only that single transformation?

@lacava
Copy link
Owner

lacava commented May 5, 2017

yes. check out the gecco paper where define & compare different test metrics.

i looked into the roc_curve metric in sklearn a bit more, and it seems like you need an estimator with a decision function to get a reasonable result. is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants