Correct statement about repeated samples in bootstrap #647

lesteve · 2022-07-04T09:16:15Z

https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_bagging.html

On average, ~63.2% of the original data points of the original dataset will be present in a given bootstrap sample. The other ~36.8% are repeated samples.

I think the last part is wrong 36.8% are not repeated samples, 36.8% of the original dataset are not in the bootstrap sample. I guess this can be removed since we already say that 63.2% are in the bootstrap sample.

If we want to talk about repeated samples we can say that the bootstrap is the same size of the original dataset and contains only 63.2% of the original dataset, so there will be repeated samples.

I seem to remember mentioning this in the past and indeed I manage to find it: https://github.com/INRIA/scikit-learn-mooc/pull/53/files

ArturoAmorQ mentioned this issue Jul 7, 2022

FIX Statement about repeated samples in bootstrap #649

Merged

lesteve closed this as completed in #649 Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct statement about repeated samples in bootstrap #647

Correct statement about repeated samples in bootstrap #647

lesteve commented Jul 4, 2022

Correct statement about repeated samples in bootstrap #647

Correct statement about repeated samples in bootstrap #647

Comments

lesteve commented Jul 4, 2022