Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAGE values on cross-validation #6

Closed
garkavem opened this issue Feb 11, 2021 · 2 comments
Closed

SAGE values on cross-validation #6

garkavem opened this issue Feb 11, 2021 · 2 comments

Comments

@garkavem
Copy link

Hello! On my dataset SAGE values depend quite a lot on the train-test split. Would it be correct to average the SAGE values means and stds on cross-validation?

@iancovert
Copy link
Owner

Hi there, that's an interesting situation. When you try a different train-test split, do you train a new model? Or do you use a different train-test split (with the same model) just when estimating SAGE values? And also, is the estimator running to convergence so that you get pretty narrow confidence intervals?

Assuming that the SAGE values are known with high confidence (narrow confidence intervals), here's what I think you can do.

If it's the first situation, then it may mean that your model depends quite a bit on the train-test split. Ideally that wouldn't happen, especially if there's enough data, but averaging the SAGE values is a reasonable approach. (For the confidence intervals, I would calculate the standard deviations by taking the square root of the average variance.)

If it's the second situation, then I would put more trust in the SAGE values that are calculated using data that was not touched during training (the test data), because the loss values (and therefore the SAGE values) may be artificially changed by overfitting to the train set.

Let me know how that sounds.

@garkavem
Copy link
Author

Hello, thank you for the answer! It is the first situation. Maybe there is not enough data. I guess I will average values and calculate confidence intervals as you suggest. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants