Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The oob score #56

Open
wjj5881005 opened this issue Jun 8, 2021 · 0 comments
Open

The oob score #56

wjj5881005 opened this issue Jun 8, 2021 · 0 comments

Comments

@wjj5881005
Copy link

I think the oob score computed in the fit function is wrong.

The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples.
Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).

I also turn to the implementation of sampling oob of randomforest, and I found following codes:

random_instance = check_random_state(random_state)
sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples
sample_counts = np.bincount(sample_indices, minlength=len(samples))
unsampled_mask = sample_counts == 0
indices_range = np.arange(len(samples))
unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples

then the unsampled_indices is the truely oob sample indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant