You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! On my dataset SAGE values depend quite a lot on the train-test split. Would it be correct to average the SAGE values means and stds on cross-validation?
The text was updated successfully, but these errors were encountered:
Hi there, that's an interesting situation. When you try a different train-test split, do you train a new model? Or do you use a different train-test split (with the same model) just when estimating SAGE values? And also, is the estimator running to convergence so that you get pretty narrow confidence intervals?
Assuming that the SAGE values are known with high confidence (narrow confidence intervals), here's what I think you can do.
If it's the first situation, then it may mean that your model depends quite a bit on the train-test split. Ideally that wouldn't happen, especially if there's enough data, but averaging the SAGE values is a reasonable approach. (For the confidence intervals, I would calculate the standard deviations by taking the square root of the average variance.)
If it's the second situation, then I would put more trust in the SAGE values that are calculated using data that was not touched during training (the test data), because the loss values (and therefore the SAGE values) may be artificially changed by overfitting to the train set.
Hello, thank you for the answer! It is the first situation. Maybe there is not enough data. I guess I will average values and calculate confidence intervals as you suggest. Thanks!
Hello! On my dataset SAGE values depend quite a lot on the train-test split. Would it be correct to average the SAGE values means and stds on cross-validation?
The text was updated successfully, but these errors were encountered: