Skip to content

Side note on SMOTE

Marjolein Fokkema edited this page Apr 2, 2021 · 1 revision

The use of SMOTE should generally be advised against. To quote Frank Harrell:

"Oversampling is a completely invalid statistical technique and represents a misunderstanding of proper accuracy scoring rules. See for example https://www.fharrell.com/post/class-damage/." (source: https://stats.stackexchange.com/q/367067/173546)

These are true words of wisdom. Function pre employs proper scoring rules for rule generation and fitting the final model (unless you change the defaults by for example specifying type.measure="class". Please do not do that. I have yet to encounter a situation where that makes any sense). As long as you use a proper scoring rule (like for example the Brier score) for evaluating performance of your model. If you stubbornly do prefer to use classification accuracy, you can do so, without the use of SMOTE.

Clone this wiki locally